omorfi 0.9.9
Open morphology of Finnish
|
An object holding automata for all functions of omorfi. More...
Public Member Functions | |
Omorfi () | |
construct empty omorfi holder. | |
void | loadAnalyser (String path) throws java.io.FileNotFoundException, java.io.IOException, net.sf.hfst.FormatException |
Load an omorfi analyser from the given file. The file should contain a single HFST automaton for omorfi style analyses. More... | |
Collection< String > | analyse (String wf) throws net.sf.hfst.NoTokenizationException |
Perform a simple morphological analysis lookup. More... | |
List< String > | tokenise (String line) |
Perform tokenisation with loaded tokeniser if any, or split. More... | |
Static Public Member Functions | |
static void | main (String[] args) |
example CLI analysis app. More... | |
An object holding automata for all functions of omorfi.
Currently supported automata functions are:
The java code can perform minimal string munging by tokenisation, recasing.
Collection< String > com.github.flammie.omorfi.Omorfi.analyse | ( | String | wf | ) | throws net.sf.hfst.NoTokenizationException |
Perform a simple morphological analysis lookup.
If can_titlecase does not evaluate to False, the analysis will also be performed with first letter uppercased and rest lowercased. If can_uppercase evaluates to not False, the analysis will also be performed on all uppercase variant. If can_lowercase evaluates to not False, the analysis will also be performed on all lowercase variant.
The analyses with case mangling will have an additional element to them identifying the casing.
wf | the token to analyse as a string. |
void com.github.flammie.omorfi.Omorfi.loadAnalyser | ( | String | path | ) | throws java.io.FileNotFoundException, java.io.IOException, net.sf.hfst.FormatException |
Load an omorfi analyser from the given file. The file should contain a single HFST automaton for omorfi style analyses.
path | the path to analyser encoded as string. |
|
static |
example CLI analysis app.
args | Command-line arguments. |
List< String > com.github.flammie.omorfi.Omorfi.tokenise | ( | String | line | ) |
Perform tokenisation with loaded tokeniser if any, or split.
If tokeniser is available, it is applied to input line and if result is achieved, it is split to tokens according to tokenisation strategy and returned as a list.
If no tokeniser are present, or none give results, the line will be tokenised using java's basic string functions.
line | A string containing a line from corpus to split into tokens. |