NoSketch Engine
Welcome to NoSketch Engine, an open-source project combining Manatee and Bonito and Crystal into a powerful and free corpus management system. NoSketch Engine is a limited version of the software empowering the famous Sketch Engine service, a commercial variant offering word sketches, thesaurus, keyword computation, user-friendly corpus creation and many other excellent features.
Try Sketch Engine trial account - word sketches, thesaurus, keywords, online corpus building and space for your corpora, online availability and technical support. See overview of Sketch Engine versus NoSketch Engine.
News
For receiving updates about new versions and futures, please subscribe to the NoSketch Engine Google group.
Documentation
You are free to use the documentation available for commercial Sketch Engine.
System requirements
NoSketch Engine packages are available for a Linux distribution CentOS 7 64 bit. The requirements: 8 GB of RAM; 20 GB of space, SSD is strongly preferred; (CPU): contemporary 64-bit Intel or AMD processor.
NoSketch Engine packages
manatee
Manatee is a corpus management and query system. License: GPLv2+.
bonito
Bonito is an API interface for the Manatee corpus management system. License: GPLv2+.
gdex
GDEX (Good Dictionary Examples) is a Bonito module for sorting concordances according to their suitability as dictionary examples. License: GPLv3.
crystal
Crystal is a web interface for Sketch Engine. License: GPLv3.
third party dependencies
- You may install python-prctl and Bonito will use it to setup nice process titles for background jobs (that you will see in the output of the
ps
command and similar).
Bonito RPM package requires python3-prctl, you can ignore the dependency (rpm -ivh ./bonito-open-*.el7.noarch.rpm --nodeps
) or you can build python3-prctl RPM package from python-prctl sources (git clone https://github.com/seveas/python-prctl.git ; cd python-prctl/ ; sed -i 's|name = "python-prctl"|name = "python3-prctl"|' setup.py ; ./setup.py bdist_rpm
).
- You may install openpyxl and Bonito will use it to export into Office Open XML format (xlsx).
Bonito RPM package requires python3-openpyxl, you can ignore the dependency (rpm -ivh ./bonito-open-*.el7.noarch.rpm --nodeps
) or you can build python3-openpyxl RPM package from openpyxl sources (hg clone https://foss.heptapod.net/openpyxl/openpyxl/ ; cd openpyxl ; sed -i "s|name='openpyxl'|name='python3-openpyxl'|" setup.py ; sed -i "1c#\!/usr/bin/python3" setup.py ; ./setup.py bdist_rpm
).
Downloads
Latest stable release
You should always download the latest versions of all components.
manatee-open | bonito-open | gdex | crystal-open | sample corpus | |
---|---|---|---|---|---|
source | manatee-open-2.225.8.tar.gz | bonito-open-5.71.15.tar.gz | gdex-4.13.2.tar.gz | crystal-open-2.166.4.tar.gz or compiled | susanne-example-source.tar.bz2 |
RPM (Centos 7) | 2.225.8 | 5.71.15 | 4.13.2 | 2.166.4 | 2.225.8 |
Release notes
IMPORTANT: When updating from a version of Manatee before 2.207.2, existing corpora need to be recompiled or updated in-place by executing
corpus4fsa CORPUS_CONFIG_FILE
for every corpus. If you don't do this, the process will be automatically started as a background job when new indices are needed, making users to wait (up to several minutes for big corpora) until it finishes.
Older releases
Older releases can be downloaded from the archive.
Build and installation
manatee
tar xzvf manatee-open-<version>.tar.gz cd manatee-open-<version> ./configure --with-pcre2 make sudo make install
bonito
tar xzvf bonito-open-<version>.tar.gz cd bonito-open-<version> ./configure make sudo make install sudo ./setupbonito <CGIPATH> <DATAPATH> # where CGIPATH is the your webserver CGI directory and DATAPATH is a data directory writable by the webserver
gdex
tar xzvf gdex-<version>.tar.gz cd gdex-<version> VERSION=<version> sed -i "s/<version>/$VERSION/g" setup.py ./setup.py build sudo ./setup.py install
crystal
tar xzvf crystal-open-<version>.tar.gz cd crsytal-open-<version> make sudo make install VERSION=<version>
Installation from RPM packages
NoSketch Engine packages
rpm -ivh crystal-open-*.el7.noarch bonito-open-*.el7.noarch manatee-open-*.el7.x86_64 manatee-open-python3-*.el7.x86_64
sample corpora
rpm -ivh manatee-open-susanne-*.el7.noarch
Configuration
Apache (httpd) configuration without authentication
Alias /crystal /var/www/crystal Alias /bonito /var/www/bonito <Directory /var/www/bonito> AllowOverride All Options +ExecCGI -Indexes AddHandler cgi-script .cgi </Directory>
Apache (httpd) configuration with authentication
Alias /crystal-auth /var/www/crystal Alias /bonito-auth /var/www/bonito <Directory /var/www/bonito> AllowOverride All Options +ExecCGI -Indexes AddHandler cgi-script .cgi </Directory> <Location "/bonito-auth"> <LimitExcept OPTIONS> AuthType Basic AuthName "Secure Content" AuthUserFile /var/lib/bonito/htpasswd Require valid-user </LimitExcept> </Location>
Apache (httpd) configuration with authentication and self-registration
Alias /crystal-registration /var/www/crystal Alias /bonito-registration /var/www/bonito <Directory /var/www/bonito> AllowOverride All Options +ExecCGI -Indexes AddHandler cgi-script .cgi </Directory> <Location "/bonito-auth"> <LimitExcept OPTIONS> AuthType Basic AuthName "Secure Content" AuthUserFile /var/lib/bonito/htpasswd <RequireAny> <RequireAll> Require expr %{REQUEST_URI} =~ m#^/bonito-registration/registration.cgi/register_user_new.*# </RequireAll> Require valid-user </RequireAny> </LimitExcept> </Location>
Bonito (run.cgi) configuration
Bonito configuration file is run.cgi
you may run multiple instances just by copying this file and changing the configuration.
To enable authentication set _anonymous = False
.
If you run Crystal on a different hostname than Bonito, you may need to setup CORS headers appropriately, see the top of the run.cgi
file.
Bonito provides simple registration feature which configuration is in registration.cgi
.
The registration module works in three different modes:
- Disabled – no registration is allowed – default
- Self registration – user can make a registration and enter into the system
- add
URL_REGISTER_NEW_USER
endpoint into Crystal (config.js
) - set
self._enable_registration = True
insideregistration.cgi
- add
- Registration with approval – after user make registration an access request e-mail is sent to admins that have to allow or deny the user access
- to turn this feature on, change value of variable
self._enable_mail = True
insideregistration.cgi
- set variables
self._smtp_server
andself._from_mail
inregistration.cgi
appropriately - you may change the subject and content of e-mails in
registration.cgi
as well
- to turn this feature on, change value of variable
Files related to registration:
/var/lib/ske/htpasswd
– standard .htpasswd file documentation/var/lib/ske/registration/admins
– list of all admin users that can allow or deny new registration –<login>
per line/var/lib/ske/registration/users
– list of all users (including admins, approved and denied users) –<login>\t<full name>\t<e-mail>\t<address>\t<phone number>\t<password hash>
tab-separated values/var/lib/ske/registration/invalid_users
– list of denied users –<login>
per line
Crystal (config.js) configuration
# set URL to run.cgi script of bonito URL_BONITO: "https://no.sketchengine.eu/bonito/run.cgi/", # URL of endpoint for registering new users (e.g bonito/registration.cgi). Leave empty to disable registration. URL_REGISTER_NEW_USER: "https://no.sketchengine.eu/bonito-registration/registration.cgi/register_user_new",
Credits
Finlib, Manatee and Bonito have been crafted by Pavel Rychlý, starting with his PhD thesis. Sketch Engine is product of Lexical Computing. When using NoSketch Engine for research purposes, please cite the following two publications:
RYCHLÝ, Pavel. Manatee/Bonito-A Modular Corpus Manager. In: RASLAN. 2007. p. 65-70.
@inproceedings{rychly2007manatee, title={Manatee/Bonito-A Modular Corpus Manager.}, author={Rychl{\`y}, Pavel}, booktitle={RASLAN}, pages={65--70}, year={2007} }
KILGARRIFF, Adam, et al. The Sketch Engine: Ten Years on. Lexicography, 2014, 1.1: 7-36.
@article{kilgarriff2014sketch, title={The Sketch Engine: ten years on}, author={Kilgarriff, Adam and Baisa, V{\'\i}t and Bu{\v{s}}ta, Jan and Jakub{\'\i}{\v{c}}ek, Milo{\v{s}} and Kov{\'a}{\v{r}}, Vojt{\v{e}}ch and Michelfeit, Jan and Rychl{\`y}, Pavel and Suchomel, V{\'\i}t}, journal={Lexicography}, volume={1}, number={1}, pages={7--36}, year={2014}, publisher={Springer} }
When using the GDEX module, please cite:
KOSEM, Iztok, et al. Identification and automatic extraction of good dictionary examples: the case (s) of GDEX. International Journal of Lexicography, 2019, 32.2: 119-137.
@article{kosem2019identification, title={Identification and automatic extraction of good dictionary examples: the case (s) of GDEX}, author={Kosem, Iztok and Koppel, Kristina and Zingano Kuhn, Tanara and Michelfeit, Jan and Tiberius, Carole}, journal={International Journal of Lexicography}, volume={32}, number={2}, pages={119--137}, year={2019}, publisher={Oxford University Press} }
For a list of related publications, please refer to the Sketch Engine publications page.
Testing installation
NoSketch Engine plain installation:
NoSketch Engine with enabled authentication (test/t):
NoSketch Engine with enabled authentication and registration:
(No)Sketch Engine installations over the world
Attachments (1)
-
NoSkE_logo.png (10.2 KB) - added by 4 years ago.
No Sketch Engine logo
Download all attachments as: .zip