wiki:WikiStart

NoSketch Engine

No Sketch Engine logo

Welcome to NoSketch Engine, an open-source project combining Manatee and Bonito and Crystal into a powerful and free corpus management system. NoSketch Engine is a limited version of the software empowering the famous Sketch Engine service, a commercial variant offering word sketches, thesaurus, keyword computation, user-friendly corpus creation and many other excellent features.

Try Sketch Engine trial account - word sketches, thesaurus, keywords, online corpus building and space for your corpora, online availability and technical support. See overview of Sketch Engine versus NoSketch Engine.

News

For receiving updates about new versions and futures, please subscribe to the NoSketch Engine Google group.

Documentation

You are free to use the documentation available for commercial Sketch Engine.

System requirements

NoSketch Engine packages are available for a Linux distribution CentOS 7 64 bit. The requirements: 8 GB of RAM; 20 GB of space, SSD is strongly preferred; (CPU): contemporary 64-bit Intel or AMD processor.

NoSketch Engine packages

manatee

Manatee is a corpus management and query system. License: GPLv2+.

bonito

Bonito is an API interface for the Manatee corpus management system. License: GPLv2+.

gdex

GDEX (Good Dictionary Examples) is a Bonito module for sorting concordances according to their suitability as dictionary examples. License: GPLv3.

crystal

Crystal is a web interface for Sketch Engine. License: GPLv3.

third party dependencies

  • You may install python-prctl and Bonito will use it to setup nice process titles for background jobs (that you will see in the output of the ps command and similar).

    Bonito RPM package requires python3-prctl, you can ignore the dependency (rpm -ivh ./bonito-open-*.el7.noarch.rpm --nodeps) or you can build python3-prctl RPM package from python-prctl sources (git clone https://github.com/seveas/python-prctl.git ; cd python-prctl/ ; sed -i 's|name = "python-prctl"|name = "python3-prctl"|' setup.py ; ./setup.py bdist_rpm).
  • You may install openpyxl and Bonito will use it to export into Office Open XML format (xlsx).

    Bonito RPM package requires python3-openpyxl, you can ignore the dependency (rpm -ivh ./bonito-open-*.el7.noarch.rpm --nodeps) or you can build python3-openpyxl RPM package from openpyxl sources (hg clone https://foss.heptapod.net/openpyxl/openpyxl/ ; cd openpyxl ; sed -i "s|name='openpyxl'|name='python3-openpyxl'|" setup.py ; sed -i "1c#\!/usr/bin/python3" setup.py ; ./setup.py bdist_rpm).

Downloads

Latest stable release

You should always download the latest versions of all components.

manatee-open bonito-open gdex crystal-open sample corpus
tar.gz manatee-open-2.214.1.tar.gz bonito-open-5.58.1.tar.gz gdex-4.12.tar.gz crystal-open-2.129.1.tar.gz susanne-example-source.tar.bz2
rpm (Centos 7) 2.214.1 5.58.1 4.12 2.129.1 2.214.1

Release notes

IMPORTANT: When updating from a version of Manatee before 2.207.2, existing corpora need to be recompiled or updated in-place by executing

corpus4fsa CORPUS_CONFIG_FILE

for every corpus. If you don't do this, the process will be automatically started as a background job when new indices are needed, making users to wait (up to several minutes for big corpora) until it finishes.

Older releases

Older releases can be downloaded from the archive.

Build and installation

manatee

tar xzvf manatee-open-<version>.tar.gz
cd manatee-open-<version>
./configure --with-pcre
make
sudo make install

bonito

tar xzvf bonito-open-<version>.tar.gz
cd bonito-open-<version>
./configure
make
sudo make install
sudo ./setupbonito <CGIPATH> <DATAPATH> 
# where CGIPATH is the your webserver CGI directory and DATAPATH is a data directory writable by the webserver

gdex

tar xzvf gdex-<version>.tar.gz
cd gdex-<version>
VERSION=<version>
sed -i "s/<version>/$VERSION/g" setup.py
./setup.py build
sudo ./setup.py install

crystal

tar xzvf crystal-open-<version>.tar.gz
cd crsytal-open-<version>
make
sudo make install VERSION=<version>

Installation from RPM packages

NoSketch Engine packages

rpm -ivh crystal-open-*.el7.noarch bonito-open-*.el7.noarch manatee-open-*.el7.x86_64 manatee-open-python3-*.el7.x86_64

sample corpora

rpm -ivh manatee-open-susanne-*.el7.noarch

Configuration

Apache (httpd) configuration without authentication

    Alias /crystal /var/www/crystal

    Alias /bonito /var/www/bonito

    <Directory /var/www/bonito>
            AllowOverride All
            Options +ExecCGI -Indexes
            AddHandler cgi-script .cgi
    </Directory>

Apache (httpd) configuration with authentication

    Alias /crystal-auth /var/www/crystal

    Alias /bonito-auth /var/www/bonito

    <Directory /var/www/bonito>
                AllowOverride All
                Options +ExecCGI -Indexes
                AddHandler cgi-script .cgi
    </Directory>

   
    <Location "/bonito-auth">
        <LimitExcept OPTIONS>
            AuthType Basic
            AuthName "Secure Content"
            AuthUserFile /var/lib/bonito/htpasswd
            Require valid-user
         </LimitExcept>
     </Location>

Apache (httpd) configuration with authentication and self-registration

    Alias /crystal-registration /var/www/crystal

    Alias /bonito-registration /var/www/bonito

    <Directory /var/www/bonito>
                AllowOverride All
                Options +ExecCGI -Indexes
                AddHandler cgi-script .cgi
    </Directory>

    <Location "/bonito-auth">
        <LimitExcept OPTIONS>
            AuthType Basic
            AuthName "Secure Content"
            AuthUserFile /var/lib/bonito/htpasswd
            <RequireAny>
                <RequireAll>
                    Require expr %{REQUEST_URI} =~ m#^/bonito-registration/registration.cgi/register_user_new.*#
                </RequireAll>
                Require valid-user
            </RequireAny>
        </LimitExcept>
    </Location>

Bonito (run.cgi) configuration

Bonito configuration file is run.cgi you may run multiple instances just by copying this file and changing the configuration. To enable authentication set _anonymous = False. If you run Crystal on a different hostname than Bonito, you may need to setup CORS headers appropriately, see the top of the run.cgi file.

Bonito provides simple registration feature which configuration is in registration.cgi.

The registration module works in three different modes:

  • Disabled – no registration is allowed – default
  • Self registration – user can make a registration and enter into the system
    • add URL_REGISTER_NEW_USER endpoint into Crystal (config.js)
    • set self._enable_registration = True inside registration.cgi
  • Registration with approval – after user make registration an access request e-mail is sent to admins that have to allow or deny the user access
    • to turn this feature on, change value of variable self._enable_mail = True inside registration.cgi
    • set variables self._smtp_server and self._from_mail in registration.cgi appropriately
    • you may change the subject and content of e-mails in registration.cgi as well

Files related to registration:

  • /var/lib/ske/htpasswd – standard .htpasswd file documentation
  • /var/lib/ske/registration/admins – list of all admin users that can allow or deny new registration – <login> per line
  • /var/lib/ske/registration/users – list of all users (including admins, approved and denied users) – <login>\t<full name>\t<e-mail>\t<address>\t<phone number>\t<password hash> tab-separated values
  • /var/lib/ske/registration/invalid_users – list of denied users – <login> per line

Crystal (config.js) configuration

# set URL to run.cgi script of bonito
URL_BONITO: "https://no.sketchengine.eu/bonito/run.cgi/",

# URL of endpoint for registering new users (e.g bonito/registration.cgi). Leave empty to disable registration.
URL_REGISTER_NEW_USER: "https://no.sketchengine.eu/bonito-registration/registration.cgi/register_user_new",

Credits

Finlib, Manatee and Bonito have been crafted by Pavel Rychlý, starting with his PhD thesis. Sketch Engine is product of Lexical Computing. When using NoSketch Engine for research purposes, please cite the following two publications:

RYCHLÝ, Pavel. Manatee/Bonito-A Modular Corpus Manager. In: RASLAN. 2007. p. 65-70.

@inproceedings{rychly2007manatee,
  title={Manatee/Bonito-A Modular Corpus Manager.},
  author={Rychl{\`y}, Pavel},
  booktitle={RASLAN},
  pages={65--70},
  year={2007}
}

KILGARRIFF, Adam, et al. The Sketch Engine: Ten Years on. Lexicography, 2014, 1.1: 7-36.

@article{kilgarriff2014sketch,
  title={The Sketch Engine: ten years on},
  author={Kilgarriff, Adam and Baisa, V{\'\i}t and Bu{\v{s}}ta, Jan and Jakub{\'\i}{\v{c}}ek, Milo{\v{s}} and Kov{\'a}{\v{r}}, Vojt{\v{e}}ch and Michelfeit, Jan and Rychl{\`y}, Pavel and Suchomel, V{\'\i}t},
  journal={Lexicography},
  volume={1},
  number={1},
  pages={7--36},
  year={2014},
  publisher={Springer}
}

When using the GDEX module, please cite:

KOSEM, Iztok, et al. Identification and automatic extraction of good dictionary examples: the case (s) of GDEX. International Journal of Lexicography, 2019, 32.2: 119-137.

@article{kosem2019identification,
  title={Identification and automatic extraction of good dictionary examples: the case (s) of GDEX},
  author={Kosem, Iztok and Koppel, Kristina and Zingano Kuhn, Tanara and Michelfeit, Jan and Tiberius, Carole},
  journal={International Journal of Lexicography},
  volume={32},
  number={2},
  pages={119--137},
  year={2019},
  publisher={Oxford University Press}
}

For a list of related publications, please refer to the Sketch Engine publications page.

Testing installation

NoSketch Engine plain installation:

NoSketch Engine with enabled authentication (test/t):

NoSketch Engine with enabled authentication and registration:

(No)Sketch Engine installations over the world

Last modified 3 weeks ago Last modified on Nov 8, 2022, 11:43:36 AM

Attachments (1)

Download all attachments as: .zip