![]() |
omorfi 0.9.9
Open morphology of Finnish
|
Functions | |
def | next_plaintext (f) |
def | next_conllu (f) |
def | next_finer (f) |
def | next_vislcg (f, isgold=True) |
def | next_omorfi (f) |
File format I/O handlings
def omorfi.formats.fileformats.next_conllu | ( | f | ) |
tokenise a conllu sentence or comment. Should be used on a file-like iterable that has CONLL-U sentence or comment or empty block coming up.
def omorfi.formats.fileformats.next_finer | ( | f | ) |
tokenise a finer sentence. Should be used on a file-like iterable that has finer sentence.
def omorfi.formats.fileformats.next_omorfi | ( | f | ) |
Read next block in omorfi internal stream format.
def omorfi.formats.fileformats.next_plaintext | ( | f | ) |
tokenise a line of text. This tokenisation only uses split and only considers tokens '?', '!', and '.' as the end of a sentence.
def omorfi.formats.fileformats.next_vislcg | ( | f, | |
isgold = True |
|||
) |
Tokenises a sentence from VISL-CG format data. Returns a list of tokens when it hits first non-token block, including a token representing this non-token block. If the block contains analyses as well as surface forms, they will be processed too. Args: isgold: if True, the VISL CG-3 analyses are read into token's gold analysis data, otherwise they are appended to token's analyses list. Returns: list of tokens found in f at its current read position, up to and including next non-token found (can be EOF).