![]() |
omorfi 0.9.9
Open morphology of Finnish
|
Functions | |
| def | next_plaintext (f) |
| def | next_conllu (f) |
| def | next_finer (f) |
| def | next_vislcg (f, isgold=True) |
| def | next_omorfi (f) |
File format I/O handlings
| def omorfi.formats.fileformats.next_conllu | ( | f | ) |
tokenise a conllu sentence or comment. Should be used on a file-like iterable that has CONLL-U sentence or comment or empty block coming up.
| def omorfi.formats.fileformats.next_finer | ( | f | ) |
tokenise a finer sentence. Should be used on a file-like iterable that has finer sentence.
| def omorfi.formats.fileformats.next_omorfi | ( | f | ) |
Read next block in omorfi internal stream format.
| def omorfi.formats.fileformats.next_plaintext | ( | f | ) |
tokenise a line of text. This tokenisation only uses split and only considers tokens '?', '!', and '.' as the end of a sentence.
| def omorfi.formats.fileformats.next_vislcg | ( | f, | |
isgold = True |
|||
| ) |
Tokenises a sentence from VISL-CG format data.
Returns a list of tokens when it hits first non-token block, including
a token representing this non-token block. If the block contains analyses
as well as surface forms, they will be processed too.
Args:
isgold: if True, the VISL CG-3 analyses are read into token's gold
analysis data, otherwise they are appended to token's analyses
list.
Returns:
list of tokens found in f at its current read position, up to and
including next non-token found (can be EOF).