omorfi 0.9.9
Open morphology of Finnish
Public Member Functions | Static Public Member Functions | Data Fields
omorfi.token.Token Class Reference

Public Member Functions

def __init__ (self, surf=None)
 
def __getitem__ (self, key)
 
def __str__ (self)
 
def is_oov (self)
 
def printable_vislcg (self)
 
def printable_conllu (self, hacks=None, which="1best")
 
def printable_ftb3 (self, which="1best")
 
def get_nbest (self, int n)
 
def get_best (self)
 
def get_best_segments (self)
 

Static Public Member Functions

def fromstr (str s)
 
def fromdict (dict token)
 
def fromsurf (str surf)
 
def fromconllu (str conllu)
 
def fromvislcg (str s)
 

Data Fields

 analyses
 all gathered analyses so far as Analysis objects
 
 segmentations
 all constructed morph segmentations
 
 labelsegmentations
 all constructed morph labelsegmentations
 
 lemmatisations
 all constructed lemmatisations
 
 surf
 original surface form
 
 pos
 word index in context, e.g. More...
 
 nontoken
 nontoken
 
 comment
 comment (esp. More...
 
 error
 use when tokenisation or parsing breaks
 
 spacebefore
 If token is separated by space from left.
 
 spaceafter
 If token is separated by space from right.
 
 gold
 Gold reference can be stored in token for few apps.
 

Detailed Description

Token holds a slice of text with its analyses and features.

Token is typically a word-form, such as "pojat" or "juopottelevat", but
can also be a white-space sequence or placeholder for some out of text
metadata, a tag, comment or i/o error.

For a reference, see [spaCy tokens](https://spacy.io/api/token), it's
not exactly the same thing and I don't agree with all there, but it's
quite cool and well-documented.

Constructor & Destructor Documentation

◆ __init__()

def omorfi.token.Token.__init__ (   self,
  surf = None 
)
Create token with surface string optionally.

Member Function Documentation

◆ __getitem__()

def omorfi.token.Token.__getitem__ (   self,
  key 
)
Tokens can still be accessed like dicts for compatibility.

Some keys like surf and pos are obvious and direct while some old keys
like omor for analysis is mapped to 1-random analysis string if there
are any.

◆ fromconllu()

def omorfi.token.Token.fromconllu ( str  conllu)
static
Create token from conll-u line.

◆ fromdict()

def omorfi.token.Token.fromdict ( dict  token)
static
Create token from pre-2019 tokendict.

◆ fromstr()

def omorfi.token.Token.fromstr ( str  s)
static
Creates token from string.

Strings should be made with print(token).

◆ fromsurf()

def omorfi.token.Token.fromsurf ( str  surf)
static
Creat token from surface string.

◆ fromvislcg()

def omorfi.token.Token.fromvislcg ( str  s)
static
Create a token from VISL CG-3 text block.

The content should at most contain one surface form with a set of
analyses.

◆ get_best()

def omorfi.token.Token.get_best (   self)
Get most likely analysis.

Returns:
    most probably analysis of given type, or None if analyses have not
    been made for the type.

◆ get_best_segments()

def omorfi.token.Token.get_best_segments (   self)
Get most likely segmentation.

Returns:
    list of strings each one being a morph or other sub-word segment.

◆ get_nbest()

def omorfi.token.Token.get_nbest (   self,
int  n 
)
Get n most likely analyses.

Args:
    n: number of analyses, use 0 to get all

Returns:
    At most n analyses of given type or empty list if there aren't any.

◆ is_oov()

def omorfi.token.Token.is_oov (   self)
Checks if all hypotheses are OOV guesses.

◆ printable_conllu()

def omorfi.token.Token.printable_conllu (   self,
  hacks = None,
  which = "1best" 
)
Create CONLL-U output based on token and selected analysis.

◆ printable_ftb3()

def omorfi.token.Token.printable_ftb3 (   self,
  which = "1best" 
)
Create FTB-3 output based on token and selected analysis.

◆ printable_vislcg()

def omorfi.token.Token.printable_vislcg (   self)
Create VISL-CG 3 output based on token and its analyses.

Field Documentation

◆ comment

omorfi.token.Token.comment

comment (esp.

with non-token)

◆ pos

omorfi.token.Token.pos

word index in context, e.g.

UD column 1


The documentation for this class was generated from the following file: