![]() |
omorfi 0.9.9
Open morphology of Finnish
|
Public Member Functions | |
def | __init__ (self) |
def | load_analyser (self, str hfstfile) |
def | load_udpipe (self, str filename) |
def | load_lexical_frequencies (self, lexfile) |
def | load_omortag_frequencies (self, omorfile) |
def | analyse (self, Token token) |
def | analyse_sentence (self, tokens) |
def | accept (self, token) |
Data Fields | |
analyser | |
udpiper | |
udpipeline | |
uderror | |
can_udpipe | |
udpipe is loaded | |
lexlogprobs | |
taglogprobs | |
Static Public Attributes | |
int | PENALTY = 28021984 |
An object for omorfi’s morphological analysis.
def omorfi.analyser.Analyser.__init__ | ( | self | ) |
Initialise an empty analyser.
def omorfi.analyser.Analyser.accept | ( | self, | |
token | |||
) |
Check if the token is in the dictionary or not. Returns: False for OOVs, True otherwise. Note, that this is not necessarily more efficient than bool(analyse(token))
def omorfi.analyser.Analyser.analyse | ( | self, | |
Token | token | ||
) |
Perform a simple morphological analysis lookup. The analysis will be performed for re-cased variants based on the state of the member variables. The re-cased analyses will have more penalty weight and additional analyses indicating the changes. Side-Effects: The analyses are stored in the token, and only the new analyses are returned. Args: token: token to be analysed. Returns: An HFST structure of raw analyses, or None if there are no matches in the dictionary.
def omorfi.analyser.Analyser.analyse_sentence | ( | self, | |
tokens | |||
) |
Analyse a full tokenised sentence. for details of analysis, see @c analyse(self, token). If further models like udpipe are loaded, may fill in gaps with that.
def omorfi.analyser.Analyser.load_analyser | ( | self, | |
str | hfstfile | ||
) |
Load analyser model from a file. Args f: containing single hfst automaton binary.
def omorfi.analyser.Analyser.load_lexical_frequencies | ( | self, | |
lexfile | |||
) |
Load a frequency list for lemmas. Experimental. Currently in uniq -c format, subject to change. Args: lexfile: file with frequencies.
def omorfi.analyser.Analyser.load_omortag_frequencies | ( | self, | |
omorfile | |||
) |
Load a frequenc list for tags. Experimental. Currently in uniq -c format. Subject to change. Args: omorfile: path to file with frequencies.
def omorfi.analyser.Analyser.load_udpipe | ( | self, | |
str | filename | ||
) |
Load UDPipe model for statistical parsing. UDPipe can be used as extra information source for OOV symbols or all tokens. It works best with sentence-based analysis, token based does not keep track of context. @param filename path to UDPipe model