![]() |
omorfi 0.9.9
Open morphology of Finnish
|
Public Member Functions | |
| def | __init__ (self, surf=None) |
| def | __getitem__ (self, key) |
| def | __str__ (self) |
| def | is_oov (self) |
| def | printable_vislcg (self) |
| def | printable_conllu (self, hacks=None, which="1best") |
| def | printable_ftb3 (self, which="1best") |
| def | get_nbest (self, int n) |
| def | get_best (self) |
| def | get_best_segments (self) |
Static Public Member Functions | |
| def | fromstr (str s) |
| def | fromdict (dict token) |
| def | fromsurf (str surf) |
| def | fromconllu (str conllu) |
| def | fromvislcg (str s) |
Data Fields | |
| analyses | |
| all gathered analyses so far as Analysis objects | |
| segmentations | |
| all constructed morph segmentations | |
| labelsegmentations | |
| all constructed morph labelsegmentations | |
| lemmatisations | |
| all constructed lemmatisations | |
| surf | |
| original surface form | |
| pos | |
| word index in context, e.g. More... | |
| nontoken | |
| nontoken | |
| comment | |
| comment (esp. More... | |
| error | |
| use when tokenisation or parsing breaks | |
| spacebefore | |
| If token is separated by space from left. | |
| spaceafter | |
| If token is separated by space from right. | |
| gold | |
| Gold reference can be stored in token for few apps. | |
Token holds a slice of text with its analyses and features. Token is typically a word-form, such as "pojat" or "juopottelevat", but can also be a white-space sequence or placeholder for some out of text metadata, a tag, comment or i/o error. For a reference, see [spaCy tokens](https://spacy.io/api/token), it's not exactly the same thing and I don't agree with all there, but it's quite cool and well-documented.
| def omorfi.token.Token.__init__ | ( | self, | |
surf = None |
|||
| ) |
Create token with surface string optionally.
| def omorfi.token.Token.__getitem__ | ( | self, | |
| key | |||
| ) |
Tokens can still be accessed like dicts for compatibility. Some keys like surf and pos are obvious and direct while some old keys like omor for analysis is mapped to 1-random analysis string if there are any.
|
static |
Create token from conll-u line.
|
static |
Create token from pre-2019 tokendict.
|
static |
Creates token from string. Strings should be made with print(token).
|
static |
Creat token from surface string.
|
static |
Create a token from VISL CG-3 text block. The content should at most contain one surface form with a set of analyses.
| def omorfi.token.Token.get_best | ( | self | ) |
Get most likely analysis.
Returns:
most probably analysis of given type, or None if analyses have not
been made for the type.
| def omorfi.token.Token.get_best_segments | ( | self | ) |
Get most likely segmentation.
Returns:
list of strings each one being a morph or other sub-word segment.
| def omorfi.token.Token.get_nbest | ( | self, | |
| int | n | ||
| ) |
Get n most likely analyses.
Args:
n: number of analyses, use 0 to get all
Returns:
At most n analyses of given type or empty list if there aren't any.
| def omorfi.token.Token.is_oov | ( | self | ) |
Checks if all hypotheses are OOV guesses.
| def omorfi.token.Token.printable_conllu | ( | self, | |
hacks = None, |
|||
which = "1best" |
|||
| ) |
Create CONLL-U output based on token and selected analysis.
| def omorfi.token.Token.printable_ftb3 | ( | self, | |
which = "1best" |
|||
| ) |
Create FTB-3 output based on token and selected analysis.
| def omorfi.token.Token.printable_vislcg | ( | self | ) |
Create VISL-CG 3 output based on token and its analyses.
| omorfi.token.Token.comment |
comment (esp.
with non-token)
| omorfi.token.Token.pos |
word index in context, e.g.
UD column 1