Open morphology for Finnish
The NLP analysers / language models are based on finite-state automata technology and require some special tools to be installed before compilation and use:
Choose one:
Required for compilation and use:
Optionally:
Choose only one of the methods of installing dependencies!
It is recommended to follow the Installing grammar libraries instructions provided by apertium project. In summary:
wget http://apertium.projectjj.com/apt/install-nightly.sh -O - | sudo bash
sudo apt-get install hfst python3-hfst libhfst-dev cg3
But also check the apertium wiki for updates e.g. if the package names may change.
Omorfi has preliminary python packaging on pip, it can be used to install some of the relevant dependencies and run parts of omorfi without installing extra software. This installation lacks tools like spell-checking and correction or morphological disambiguation, which require non-python dependencies not found in pip repositories.
pip install omorfi
This installs everything needed to run omorfi, but not the language models themselves. They need to be downloaded separately.
Omorfi has preliminary python pacakging on anaconda, it can be used to install some of the relevant dependencies and run parts of omorfi without installing extra software. This installation lacks tools like spell-checking and correction or morphological disambiguation, which require non-python dependencies not found in anaconda repositories.
This installs everything needed to run omorfi, but not the language models themselves. They need to be downloaded separately.
Follow the instructions by HFST and VISL CG 3 projects.
Choose only one of the methods of installing omorfi language models!
The binary models are available in omorfi github releases, you can jump right to using them with the installed dependencies.
It is possible to download pre-compiled omorfi models and use them without going through the compilation process. If you have downloaded and installed all the dependencies, you can run the configuration script:
./configure
To check that all necessary tools are installed and usable, it also sets up some details of the convenience scripts. Then:
omorfi-download.bash
found in src/bash
folder will fetch the language models.
Note that if you downloaded a python-only version, you can only use:
omorfi-download.py
instead of the bash version.
If the downloading worked you can proceed to usage examples. Be aware that some of the examples may not work depending on which depdendencies you installed and version downloaded.
Installation uses standard autotools system (see the contents of INSTALL from GNU project if you are not familiar):
./configure && make && make install
The compiling may take minutes to hours, depending on the hardware and settings used. You should be prepared with at least 4 gigs of RAM, however, it is possible to compile a limited vocabulary system with less. This is a limitation of the HFST system used to compile the language models, and it is only present at compile time, the final models use perhaps up to hundreds of megabytes in memory.
If you did not install HFST from package manager, you may need to adjust ocnfigurations:
./configure --with-hfst=${HFSTPATH}
Autotools system supports installation to e.g. home directory:
./configure --prefix=${HOME}
With git version you must create the necessary autotools files in the host system once, after initial checkout:
./autogen.sh
There are a number of options that you can pass to configure
script. The
default configuration leaves lots of features out to speed up the process,
for a full listing:
./configure --help
Some of the features that build more automata double the time required to
compile and the space used by the automata (approximately). Some features are
to enable or disable the API bindings for Java or other languages. The
configure
script displays the current setup in the end:
* Analysers: yes
* OMOR yes (flags: --omor-props --omor-sem)
* FTB3.1 no
* apertium no
* giella: no
* labeled segmenter: no
* Limits:
* tiny lexicons:
* big tests:
* Applications
* Voikko speller: yes
* segmenter: no
* lemmatiser: no
* hyphenators: no
* Clusters
* run tests on PBS cluster: false → mailto: no
* run tests on SLURM cluster: false → mailto: no
If you have ran configure
, make
and make install
, you can carry on to
usage examples.