omorfi 0.9.9
Open morphology of Finnish
Public Member Functions | Static Public Member Functions | Data Fields
omorfi.token.Token Class Reference

Public Member Functions

def __init__ (self, surf=None)
def __getitem__ (self, key)
def __str__ (self)
def is_oov (self)
def printable_vislcg (self)
def printable_conllu (self, hacks=None, which="1best")
def printable_ftb3 (self, which="1best")
def get_nbest (self, int n)
def get_best (self)
def get_best_segments (self)

Static Public Member Functions

def fromstr (str s)
def fromdict (dict token)
def fromsurf (str surf)
def fromconllu (str conllu)
def fromvislcg (str s)

Data Fields

 all gathered analyses so far as Analysis objects
 all constructed morph segmentations
 all constructed morph labelsegmentations
 all constructed lemmatisations
 original surface form
 word index in context, e.g. More...
 comment (esp. More...
 use when tokenisation or parsing breaks
 If token is separated by space from left.
 If token is separated by space from right.
 Gold reference can be stored in token for few apps.

Detailed Description

Token holds a slice of text with its analyses and features.

Token is typically a word-form, such as "pojat" or "juopottelevat", but
can also be a white-space sequence or placeholder for some out of text
metadata, a tag, comment or i/o error.

For a reference, see [spaCy tokens](, it's
not exactly the same thing and I don't agree with all there, but it's
quite cool and well-documented.

Constructor & Destructor Documentation

◆ __init__()

def omorfi.token.Token.__init__ (   self,
  surf = None 
Create token with surface string optionally.

Member Function Documentation

◆ __getitem__()

def omorfi.token.Token.__getitem__ (   self,
Tokens can still be accessed like dicts for compatibility.

Some keys like surf and pos are obvious and direct while some old keys
like omor for analysis is mapped to 1-random analysis string if there
are any.

◆ fromconllu()

def omorfi.token.Token.fromconllu ( str  conllu)
Create token from conll-u line.

◆ fromdict()

def omorfi.token.Token.fromdict ( dict  token)
Create token from pre-2019 tokendict.

◆ fromstr()

def omorfi.token.Token.fromstr ( str  s)
Creates token from string.

Strings should be made with print(token).

◆ fromsurf()

def omorfi.token.Token.fromsurf ( str  surf)
Creat token from surface string.

◆ fromvislcg()

def omorfi.token.Token.fromvislcg ( str  s)
Create a token from VISL CG-3 text block.

The content should at most contain one surface form with a set of

◆ get_best()

def omorfi.token.Token.get_best (   self)
Get most likely analysis.

    most probably analysis of given type, or None if analyses have not
    been made for the type.

◆ get_best_segments()

def omorfi.token.Token.get_best_segments (   self)
Get most likely segmentation.

    list of strings each one being a morph or other sub-word segment.

◆ get_nbest()

def omorfi.token.Token.get_nbest (   self,
int  n 
Get n most likely analyses.

    n: number of analyses, use 0 to get all

    At most n analyses of given type or empty list if there aren't any.

◆ is_oov()

def omorfi.token.Token.is_oov (   self)
Checks if all hypotheses are OOV guesses.

◆ printable_conllu()

def omorfi.token.Token.printable_conllu (   self,
  hacks = None,
  which = "1best" 
Create CONLL-U output based on token and selected analysis.

◆ printable_ftb3()

def omorfi.token.Token.printable_ftb3 (   self,
  which = "1best" 
Create FTB-3 output based on token and selected analysis.

◆ printable_vislcg()

def omorfi.token.Token.printable_vislcg (   self)
Create VISL-CG 3 output based on token and its analyses.

Field Documentation

◆ comment


comment (esp.

with non-token)

◆ pos


word index in context, e.g.

UD column 1

The documentation for this class was generated from the following file: