omorfi 0.9.9
Open morphology of Finnish
Functions
omorfi.formats.fileformats Namespace Reference

Functions

def next_plaintext (f)
 
def next_conllu (f)
 
def next_finer (f)
 
def next_vislcg (f, isgold=True)
 
def next_omorfi (f)
 

Detailed Description

File format I/O handlings

Function Documentation

◆ next_conllu()

def omorfi.formats.fileformats.next_conllu (   f)
tokenise a conllu sentence or comment.

Should be used on a file-like iterable that has CONLL-U sentence or
comment or empty block coming up.

◆ next_finer()

def omorfi.formats.fileformats.next_finer (   f)
tokenise a finer sentence.

Should be used on a file-like iterable that has finer sentence.

◆ next_omorfi()

def omorfi.formats.fileformats.next_omorfi (   f)
Read next block in omorfi internal stream format.

◆ next_plaintext()

def omorfi.formats.fileformats.next_plaintext (   f)
tokenise a line of text.

This tokenisation only uses split and only considers tokens '?', '!', and
'.' as the end of a sentence.

◆ next_vislcg()

def omorfi.formats.fileformats.next_vislcg (   f,
  isgold = True 
)
Tokenises a sentence from VISL-CG format data.

Returns a list of tokens when it hits first non-token block, including
a token representing this non-token block. If the block contains analyses
as well as surface forms, they will be processed too.

Args:
    isgold: if True, the VISL CG-3 analyses are read into token's gold
            analysis data, otherwise they are appended to token's analyses
            list.

Returns:
    list of tokens found in f at its current read position, up to and
    including next non-token found (can be EOF).