![]() |
omorfi 0.9.9
Open morphology of Finnish
|
Public Member Functions | |
def | __init__ (self, surf=None) |
def | __getitem__ (self, key) |
def | __str__ (self) |
def | is_oov (self) |
def | printable_vislcg (self) |
def | printable_conllu (self, hacks=None, which="1best") |
def | printable_ftb3 (self, which="1best") |
def | get_nbest (self, int n) |
def | get_best (self) |
def | get_best_segments (self) |
Static Public Member Functions | |
def | fromstr (str s) |
def | fromdict (dict token) |
def | fromsurf (str surf) |
def | fromconllu (str conllu) |
def | fromvislcg (str s) |
Data Fields | |
analyses | |
all gathered analyses so far as Analysis objects | |
segmentations | |
all constructed morph segmentations | |
labelsegmentations | |
all constructed morph labelsegmentations | |
lemmatisations | |
all constructed lemmatisations | |
surf | |
original surface form | |
pos | |
word index in context, e.g. More... | |
nontoken | |
nontoken | |
comment | |
comment (esp. More... | |
error | |
use when tokenisation or parsing breaks | |
spacebefore | |
If token is separated by space from left. | |
spaceafter | |
If token is separated by space from right. | |
gold | |
Gold reference can be stored in token for few apps. | |
Token holds a slice of text with its analyses and features. Token is typically a word-form, such as "pojat" or "juopottelevat", but can also be a white-space sequence or placeholder for some out of text metadata, a tag, comment or i/o error. For a reference, see [spaCy tokens](https://spacy.io/api/token), it's not exactly the same thing and I don't agree with all there, but it's quite cool and well-documented.
def omorfi.token.Token.__init__ | ( | self, | |
surf = None |
|||
) |
Create token with surface string optionally.
def omorfi.token.Token.__getitem__ | ( | self, | |
key | |||
) |
Tokens can still be accessed like dicts for compatibility. Some keys like surf and pos are obvious and direct while some old keys like omor for analysis is mapped to 1-random analysis string if there are any.
|
static |
Create token from conll-u line.
|
static |
Create token from pre-2019 tokendict.
|
static |
Creates token from string. Strings should be made with print(token).
|
static |
Creat token from surface string.
|
static |
Create a token from VISL CG-3 text block. The content should at most contain one surface form with a set of analyses.
def omorfi.token.Token.get_best | ( | self | ) |
Get most likely analysis. Returns: most probably analysis of given type, or None if analyses have not been made for the type.
def omorfi.token.Token.get_best_segments | ( | self | ) |
Get most likely segmentation. Returns: list of strings each one being a morph or other sub-word segment.
def omorfi.token.Token.get_nbest | ( | self, | |
int | n | ||
) |
Get n most likely analyses. Args: n: number of analyses, use 0 to get all Returns: At most n analyses of given type or empty list if there aren't any.
def omorfi.token.Token.is_oov | ( | self | ) |
Checks if all hypotheses are OOV guesses.
def omorfi.token.Token.printable_conllu | ( | self, | |
hacks = None , |
|||
which = "1best" |
|||
) |
Create CONLL-U output based on token and selected analysis.
def omorfi.token.Token.printable_ftb3 | ( | self, | |
which = "1best" |
|||
) |
Create FTB-3 output based on token and selected analysis.
def omorfi.token.Token.printable_vislcg | ( | self | ) |
Create VISL-CG 3 output based on token and its analyses.
omorfi.token.Token.comment |
comment (esp.
with non-token)
omorfi.token.Token.pos |
word index in context, e.g.
UD column 1