Open morphology for Finnish
These are semi-automatically generated statistics from omorfi database. The statistics are based on the actual data in the database tables and the versions of whole analysed corpora and tools on this date.
Generation time was 2025-01-22T04+01:00:
It was created by omorfi configure 0.9.11, which was
generated by GNU Autoconf 2.71. Invocation command line was
$ ./configure --enable-big-tests 'CFLAGS=-O2 -march=native -ggdb -Wall -Wextra ' 'CXXFLAGS=-O2 -march=native -ggdb -Wall -Wextra ' PKG_CONFIG_PATH=/opt/local/lib/pkgconfig:/opt/local/share/pkgconfig:/usr/local/lib/pkgconfig:/usr/local/share/pkgconfig:/home/flammie/lib/pkgconfig:/home/flammie/share/pkgconfig:/usr/lib/pkgconfig:/usr/share/pkgconfig --no-create --no-recursion
This is a released version, and can be downloaded from github.
The numbers are counted from the database, unique lexical items. Depending on your definitions there may be ±1 % difference, e.g. with homonyms, defective and doubled paradigms, etc. There are total of 562690 lexemes.
The universal parts-of-speech are described in Universal dependencies UPOS documentation and its Finnish UPOS definitions.
| Frequency | UPOS |
|---|---|
| 351083 | PROPN |
| 162910 | NOUN |
| 25184 | ADJ |
| 13327 | VERB |
| 7910 | ADV |
| 918 | NUM |
| 571 | INTJ |
| 376 | ADP |
| 85 | PRON |
| 80 | X |
| 75 | SCONJ |
| 74 | SYM |
| 54 | PUNCT |
| 17 | CCONJ |
| 13 | AUX |
| 10 | CCONJ, VERB |
| 2 | DET |
| 562690 | TOTAL |
Sources of origin are:
| Frequency | origin |
|---|---|
| 289916 | nimistö |
| 254098 | finer |
| 149181 | enwikt |
| 94444 | kotus |
| 60451 | fiwikt |
| 43275 | joukahainen |
| 42855 | dvvfi |
| 31349 | finnwordnet |
| 15121 | omorfi |
| 7703 | ftb3 |
…or split across lexemes:
| Frequency | origin(s) |
|---|---|
| 211713 | finer, nimistö |
| 63756 | nimistö |
| 48713 | enwikt, kotus |
| 39006 | enwikt |
| 21220 | finnwordnet |
| 20963 | finer |
| 20110 | dvvfi |
| 12851 | enwikt, fiwikt, joukahainen, kotus |
| 12509 | enwikt, fiwikt, kotus |
| 11730 | fiwikt |
| 10503 | omorfi |
| 7634 | enwikt, joukahainen, kotus |
| 7250 | ftb3 |
| 6959 | joukahainen |
| 6573 | kotus |
| 6027 | unk |
| 5873 | enwikt, finnwordnet |
| 5165 | enwikt, fiwikt |
| 3933 | dvvfi, finer |
| 3601 | dvvfi, finer, nimistö |
| 3078 | dvvfi, finer, fiwikt, nimistö |
| 2825 | enwikt, omorfi |
| 2572 | enwikt, joukahainen |
| 2402 | fiwikt, kotus |
| 2118 | fiwikt, joukahainen, kotus |
| 1817 | enwikt, finnwordnet, fiwikt |
| 1781 | dvvfi, enwikt, finer, nimistö |
| 1684 | dvvfi, fiwikt |
| 1511 | joukahainen, kotus |
| 1325 | dvvfi, enwikt, finer, fiwikt, nimistö |
| 1186 | dvvfi, enwikt, finer, joukahainen, nimistö |
| 924 | enwikt, fiwikt, joukahainen |
| 895 | finnwordnet, fiwikt |
| 791 | finer, joukahainen, nimistö |
| 789 | dvvfi, omorfi |
| 769 | dvvfi, nimistö |
| 749 | dvvfi, finer, joukahainen |
| 716 | enwikt, finnwordnet, joukahainen |
| 591 | dvvfi, enwikt, finer, joukahainen |
| 589 | dvvfi, finer, joukahainen, nimistö |
| 544 | finer, joukahainen |
| 543 | finnwordnet, joukahainen |
| 524 | enwikt, fiwikt, omorfi |
| 479 | dvvfi, enwikt, finer, fiwikt, joukahainen, nimistö |
| 425 | enwikt, finer |
| 418 | fiwikt, joukahainen |
| 326 | dvvfi, finer, fiwikt |
| 326 | dvvfi, enwikt, finer |
| 310 | dvvfi, joukahainen |
| 236 | ftb3, joukahainen |
| 224 | enwikt, finnwordnet, fiwikt, joukahainen |
| 207 | enwikt, finer, fiwikt |
| 195 | dvvfi, enwikt, finer, fiwikt, joukahainen |
| 194 | finer, fiwikt |
| 160 | dvvfi, enwikt, finer, fiwikt |
| 158 | dvvfi, enwikt |
| 153 | dvvfi, enwikt, fiwikt |
| 139 | fiwikt, omorfi |
| 134 | enwikt, finer, nimistö |
| 127 | enwikt, finer, joukahainen |
| 123 | dvvfi, fiwikt, nimistö |
| 122 | finer, fiwikt, nimistö |
| 119 | dvvfi, finer, fiwikt, joukahainen, nimistö |
| 99 | finer, fiwikt, joukahainen |
| 98 | enwikt, ftb3 |
| 94 | enwikt, ftb3, joukahainen |
| 76 | joukahainen, omorfi |
| 65 | omorfi++ |
| 62 | enwikt, finer, joukahainen, nimistö |
| 60 | joukahainen, nimistö |
| 60 | finnwordnet, fiwikt, joukahainen |
| 60 | dvvfi, fiwikt, joukahainen |
| 60 | dvvfi, finer, fiwikt, joukahainen |
| 50 | dvvfi, enwikt, fiwikt, joukahainen |
| 48 | nimistö, omorfi |
| 43 | enwikt, finer, fiwikt, joukahainen |
| 39 | dvvfi, joukahainen, nimistö |
| 30 | dvvfi, enwikt, joukahainen |
| 28 | enwikt, finer, kotus |
| 27 | dvvfi, enwikt, fiwikt, nimistö |
| 26 | finer, kotus |
| 24 | enwikt, finer, fiwikt, kotus |
| 21 | finer, fiwikt, joukahainen, nimistö |
| 19 | finer, fiwikt, kotus |
| 19 | enwikt, fiwikt, joukahainen, omorfi |
| 18 | enwikt, joukahainen, omorfi |
| 15 | enwikt, finer, fiwikt, joukahainen, nimistö |
| 13 | enwikt, fiwikt, ftb3, joukahainen |
| 13 | dvvfi, nimistö, omorfi |
| 12 | dvvfi, enwikt, joukahainen, nimistö |
| 8 | enwikt, finer, fiwikt, nimistö |
| 6 | fiwikt, nimistö |
| 6 | fiwikt, joukahainen, omorfi |
| 6 | finer, omorfi++ |
| 6 | enwikt, nimistö |
| 5 | finer, joukahainen, omorfi++ |
| 5 | enwikt, fiwikt, ftb3 |
| 5 | dvvfi, enwikt, nimistö |
| 5 | dvvfi, enwikt, fiwikt, joukahainen, nimistö |
| 4 | joukahainen, omorfi++ |
| 4 | fiwikt, ftb3 |
| 4 | finer, omorfi |
| 4 | finer, joukahainen, kotus |
| 4 | enwikt, finer, fiwikt, joukahainen, kotus |
| 4 | dvvfi, joukahainen, omorfi |
| 4 | dvvfi, fiwikt, joukahainen, nimistö |
| 3 | fiwikt, ftb3, joukahainen |
| 3 | enwikt, finer, joukahainen, kotus |
| 3 | dvvfi, fiwikt, omorfi |
| 3 | dvvfi, enwikt, omorfi |
| 2 | omori |
| 2 | fiwikt, omorfi++ |
| 2 | fiwikt, joukahainen, omorfi++ |
| 2 | dvvfi, fiwikt, joukahainen, omorfi |
| 1 | omorfo |
| 1 | omorf |
| 1 | omofi |
| 1 | nimistö, unihu |
| 1 | kotus, omorfi++ |
| 1 | kotus, omorfi |
| 1 | kenwikt, otus |
| 1 | joukahainen, kotus, omorfi++ |
| 1 | finnwordnet, omorfi |
| 1 | finer, joukahainen, nimistö, omorfi++ |
| 1 | finer, joukahainen, nimistö, omorfi |
| 1 | finer, fiwikt, joukahainen, nimistö, omorfi++ |
| 1 | finer, fiwikt, joukahainen, kotus |
| 1 | enwikt, omorfi++ |
| 1 | enwikt, nimistö, omorfi |
| 1 | enwikt, joukahainen, kotus, omorfi |
| 1 | enwikt, fiwikt, joukahainen, omorfi++ |
| 1 | enwikt, finer, joukahainen, nimistö, omorfi++ |
| 1 | enwikt, finer, fiwikt, joukahainen, nimistö, omorfi++ |
| 1 | dvvfi, finer, joukahainen, nimistö, omorfi |
| 1 | dvvfi, finer, fiwikt, joukahainen, nimistö, omorfi++ |
| 1 | dvvfi, enwikt, fiwikt, omorfi |
| 1 | dvvfi, enwikt, finer, joukahainen, nimistö, omorfi++ |
Paradigms are the classes you need to separate the lexemes into for inflection and some of the lexical features, such as UPOS. You can see the Paradigms generated documentation for some automatically gathered details about each paradigm.
| Paradigms per | UPOS |
|---|---|
| 555 | PROPN |
| 541 | NOUN |
| 231 | VERB |
| 141 | ADJ |
| 52 | PRON |
| 26 | NUM |
| 14 | SYM |
| 14 | ADV |
| 12 | ADP |
| 11 | AUX |
| 5 | X |
| 4 | PUNCT |
| 2 | INTJ |
| 2 | DET |
| 1 | SCONJ |
| 1 | CCONJ |
Naïve coverage is number of tokens (types) that receive one or more non-heuristic readings divided by total number of tokens, i.e. how many words are part of the lexical database.
For list of common tokens not covered by the lexicon, see the most frequent missing tokens per corpus.
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 1603735177 | 99.6100 % | 1610031971 |
| Types | 100918 | 97.3200 % | 103702 |
The coverages were measured with full lexicon, if you use the smaller lexicon coverages are slightly worse.
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 1496082014 | 92.9300 % | 1610031971 |
| Types | 80548 | 77.6800 % | 103702 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 37347009 | 99.4000 % | 37572899 |
| Types | 702081 | 89.0200 % | 788709 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 141945 | 98.8500 % | 143599 |
| Types | 41810 | 96.6700 % | 43252 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 502481 | 96.4100 % | 521209 |
| Types | 61649 | 90.8600 % | 67851 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 178417 | 98.2700 % | 181571 |
| Types | 50622 | 95.2100 % | 53174 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 89266434 | 94.6600 % | 94309549 |
| Types | 2980440 | 60.6100 % | 4918107 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 160047 | 98.6100 % | 162312 |
| Types | 44815 | 95.9000 % | 46734 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 74521604 | 97.5900 % | 76369439 |
| Types | 1260704 | 76.4300 % | 1649644 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 44801860 | 95.7700 % | 46783240 |
| Types | 928149 | 64.1300 % | 1447515 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 282879324 | 98.6400 % | 286805178 |
| Types | 2590431 | 69.4600 % | 3729496 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 921276 | 99.4800 % | 926170 |
| Types | 81876 | 95.5500 % | 85697 |
| Feature | Coverage # | Coverage % | All |
|---|---|---|---|
| Tokens | 1974781 | 98.9300 % | 1996324 |
| Types | 1768188 | 98.8600 % | 1788650 |
These are the most common tokens still left unrecognised by the lexicon. Most of them should be foreign languages, codes and rubbish. These are used from time to time improve the lexical coverage.
| Frequency | Word-form |
|---|---|
| 263 | Lehnen |
| 258 | T?et?enian |
| 240 | Prestigen |
| 229 | Milosevi |
| 210 | Miloseviin |
| 209 | (KOM(2001) |
| 208 | Salafranca |
| 204 | Oomen-Ruijtenin |
| 203 | T?ekin |
| 202 | Lamfalussyn |
| 196 | Böschin |
| 193 | Sterckxin |
| 193 | Dalai |
| 188 | Lamy |
| 186 | Junilistan |
| 185 | YKP:n |
| 184 | UCLAFin |
| 179 | Randzio-Plathin |
| 176 | Mundus |
| 176 | Haugin |
| 173 | Vihreät/Euroopan |
| 171 | Lamassouren |
| 169 | Miguélez |
| 168 | Helms-Burtonin |
| 168 | Florenzin |
| 165 | Ribeiro |
| 165 | Coelhon |
| 164 | vuoden� |
| 164 | PO |
| 162 | Öcalanin |
| 162 | (KOM(2000) |
| 162 | EFD-ryhmän |
| 161 | Velzenin |
| 159 | Oostlanderin |
| 157 | Lannoyen |
| 157 | Garriga |
| 155 | Pirkerin |
| 154 | Savaryn |
| 153 | (KOM(1999) |
| 152 | Favan |
| 151 | (KOM(2002) |
| 151 | Goldstonen |
| 150 | Titleyn |
| 150 | NUTS |
| 149 | Lullingin |
| 149 | Buitenwegin |
| 146 | Kambodžan |
| 145 | Graça |
| 142 | Ludfordin |
| 141 | Oomen-Ruijten |
| 139 | Petersbergin |
| 139 | Baringdorfin |
| 138 | Dührkop |
| 137 | Fontainen |
| 136 | Ellesin |
| 135 | YMJ: |
| 133 | Sinn |
| 133 | Sellafieldin |
| 132 | Colom |
| 132 | Berèsin |
| 130 | Morillonin |
| 129 | Solbes |
| 128 | Roth-Behrendt |
| 127 | Isler |
| 127 | Fabra |
| 125 | Roth-Behrendtin |
| 125 | Gaddafin |
| 125 | Act |
| 124 | Howittin |
| 122 | Eurlingsin |
| 121 | Rocardin |
| 121 | Marset |
| 121 | Ferberin |
| 121 | Aznarin |
| 119 | Reimer |
| 119 | Lamyn |
| 119 | Gebhardtin |
| 118 | Tindemansin |
| 118 | Schreyer |
| 118 | hoc |
| 118 | Cappaton |
| 118 | Almunia |
| 117 | Bloklandin |
| 117 | Bertens |
| 116 | Whiteheadin |
| 116 | Hernández |
| 115 | Nassauerin |
| 115 | Galeote |
| 114 | Lanckerin |
| 114 | Broek |
| 112 | Kyi |
| 111 | Plooij-van |
| 111 | Pervenche |
| 111 | Oostlander |
| 111 | Alleanza |
| 110 | Titley |
| 110 | Monnet’n |
| 110 | Lehne |
| 108 | Randzio-Plath |
| 108 | Mont |
| Frequency | Word-form |
|---|---|
| 15 | joll |
| 7 | sillee |
| 6 | ol |
| 5 | tääl |
| 5 | pitäs |
| 4 | semmone |
| 4 | rupee |
| 4 | rauhotu |
| 4 | puol |
| 4 | johonki |
| 4 | esimerkiks |
| 4 | Emmä |
| 3 | yhtää |
| 3 | vähäm |
| 3 | viikkoo |
| 3 | upeeta |
| 3 | tiä |
| 3 | tietsä |
| 3 | siihe |
| 3 | semmost |
| 3 | sellaist |
| 3 | sd |
| 3 | pitäskö |
| 3 | Oottekste |
| 3 | onk |
| 3 | ollenkaa |
| 3 | näis |
| 3 | näi |
| 3 | nytte |
| 3 | ninku |
| 3 | mis |
| 3 | keng- |
| 3 | jotaki |
| 3 | jonnekkii |
| 3 | ens |
| 3 | Akiro |
| 3 | 31.8. |
| 3 | 30_000 |
| 3 | 3_000 |
| 3 | 10_000 |
| 2 | Yoeune |
| 2 | yksie |
| 2 | yhtäkkii |
| 2 | yhes |
| 2 | yheksän |
| 2 | vähäks |
| 2 | vuuen |
| 2 | vuoks |
| 2 | täälä |
| 2 | täst |
| 2 | tämmöst |
| 2 | tällasen |
| 2 | tuol |
| 2 | Tuleek |
| 2 | Troia |
| 2 | tommost |
| 2 | tiäks |
| 2 | tilloo |
| 2 | teil |
| 2 | TBK |
| 2 | Soiliki |
| 2 | siälä |
| 2 | siält |
| 2 | sielt |
| 2 | siell |
| 2 | seuraavaks |
| 2 | semssi |
| 2 | semmoseks |
| 2 | seitsemä |
| 2 | satayheksänkymmentä |
| 2 | saadas |
| 2 | Ride |
| 2 | pääs |
| 2 | puhutaal |
| 2 | Pretty |
| 2 | pittää |
| 2 | piikkainporaa |
| 2 | Ootsä |
| 2 | ook |
| 2 | Onk |
| 2 | ollum |
| 2 | näkönen |
| 2 | Näil |
| 2 | nimittäi |
| 2 | niim |
| 2 | Nakayama |
| 2 | muute |
| 2 | mum |
| 2 | mk/kg |
| 2 | minuuttii |
| 2 | Mikhailov |
| 2 | mihi |
| 2 | meritalaiset |
| 2 | mentäs |
| 2 | mennssä |
| 2 | Mencius |
| 2 | meill |
| 2 | Meil |
| 2 | meil |
| 2 | lähössä |
| Frequency | Word-form |
|---|---|
| 323 | |
| 323 | <BODY> |
| 240 | |
| 126 | Lizard |
| 106 | Snowdenin |
| 103 | Snowden |
| 82 | Glass |
| 77 | Oculus |
| 76 | Cnet |
| 76 | App |
| 66 | Angry |
| 62 | Glassin |
| 60 | Eich |
| 60 | Birds |
| 54 | Blackberryn |
| 52 | Squad |
| 51 | Verge |
| 46 | Blackberry |
| 45 | Silk |
| 45 | S5 |
| 44 | Squadin |
| 43 | SpaceX:n |
| 40 | Yamamoto |
| 40 | ei-IFRS |
| 40 | Dotcom |
| 39 | SpaceX |
| 39 | S6 |
| 38 | Zoo |
| 38 | Yotaphone |
| 38 | Cnetin |
| 36 | Digitoday |
| 35 | Play |
| 34 | Neowin |
| 34 | Model |
| 34 | Digitodaylle |
| 33 | Interview |
| 32 | Softpedia |
| 32 | Gmail |
| 31 | Trapattoni |
| 30 | Xperia |
| 30 | Snapdragon |
| 30 | MacBook |
| 30 | FierceWireless |
| 30 | Dotcomin |
| 28 | Xiaomin |
| 28 | Xiaomi |
| 28 | Cortanan |
| 27 | Treholt |
| 26 | Raspberry |
| 26 | loka–joulukuussa |
| 26 | Flappy |
| 26 | Chromecast |
| 26 | 10_000 |
| 25 | Payn |
| 25 | Cortana |
| 24 | TechCrunch |
| 24 | Syrian |
| 24 | Steiber |
| 24 | Rift |
| 24 | Marriottin |
| 24 | Anthemin |
| 24 | 9to5Mac |
| 24 | 50_000 |
| 23 | Vergen |
| 23 | Player |
| 22 | update |
| 22 | Server |
| 22 | Gear |
| 21 | S5:n |
| 21 | Nadella |
| 21 | Gmailin |
| 21 | 100_000 |
| 20 | Yota |
| 20 | Wear |
| 20 | PCMag |
| 20 | Nest |
| 20 | Engadget |
| 20 | Army |
| 20 | 5S |
| 19 | Zbořil |
| 19 | Z5 |
| 19 | Register |
| 19 | Lollipop |
| 18 | TechCrunchin |
| 18 | Synchronossin |
| 18 | Riftin |
| 18 | RadioShackin |
| 18 | N1 |
| 18 | Lollipopin |
| 18 | iOS:n |
| 18 | Hacker |
| 18 | Google+:n |
| 18 | Digitodayn |
| 18 | Cablen |
| 18 | 30_000 |
| 18 | 200_000 |
| 17 | Update |
| 17 | mAh |
| 17 | Hickersberger |
| 17 | Devices |
| Frequency | Word-form |
|---|---|
| 18 | 5(n) |
| 7 | türki |
| 7 | Trifolium |
| 7 | Ratcliffe |
| 7 | Binderup |
| 7 | 3.Rf3 |
| 6 | Tšerepanov |
| 6 | Tienshinhan |
| 6 | Nikomedes |
| 6 | N63 |
| 6 | Lupinus |
| 6 | Lolium |
| 6 | Laodiken |
| 6 | Judge |
| 6 | e5 |
| 6 | E21 |
| 6 | 2.f4 |
| 5 | Wars |
| 5 | Tracon |
| 5 | tajuu |
| 5 | Rodrigues |
| 5 | Qazibe |
| 5 | Origenes |
| 5 | Moolenaar |
| 5 | Molvania |
| 5 | Medicago |
| 5 | Laodike |
| 5 | Know |
| 5 | Iglesiaksen |
| 5 | Finnjet |
| 5 | Filen |
| 5 | death |
| 5 | Charger |
| 5 | Brassica |
| 5 | : |
| 4 | Åsbrink |
| 4 | TTW |
| 4 | Trunkenpolz |
| 4 | Thriller |
| 4 | Thrill |
| 4 | sativa |
| 4 | Routila |
| 4 | Risperidon |
| 4 | pratensis |
| 4 | Luminance |
| 4 | Libuše |
| 4 | LHC:n |
| 4 | Large |
| 4 | kakskytvaille |
| 4 | Immortal |
| 4 | IHN |
| 4 | Hoskins |
| 4 | Hodgkinson |
| 4 | Head |
| 4 | Hadron |
| 4 | Grisay |
| 4 | EY: |
| 4 | exf4 |
| 4 | EKP/1998/15 |
| 4 | DeMille |
| 4 | Costazza |
| 4 | Collection |
| 4 | Chávezin |
| 4 | Bithynian |
| 4 | Aktan-Collan |
| 4 | 1.e4 |
| 3 | Zawinul |
| 3 | Vakhtang |
| 3 | ugh |
| 3 | Trotzigin |
| 3 | Tremonti |
| 3 | sith-lordi |
| 3 | Sidious |
| 3 | sd |
| 3 | Rösslerin |
| 3 | Rumex |
| 3 | Realsoft |
| 3 | Radioheadin |
| 3 | Pong |
| 3 | Plotinoksen |
| 3 | pic |
| 3 | Philokrates |
| 3 | Origeneen |
| 3 | Obornen |
| 3 | Muammar |
| 3 | Mocumbi |
| 3 | Libušen |
| 3 | Kuypers |
| 3 | Kieseritzkyn |
| 3 | Khosroes |
| 3 | –katse |
| 3 | häne |
| 3 | Hypatian |
| 3 | Hypatialle |
| 3 | Hypatiaa |
| 3 | Gugi |
| 3 | guaramidihaara |
| 3 | Globate |
| 3 | Gaddafin |
| 3 | Gaddafi |
| Frequency | Word-form |
|---|---|
| 147644 | Lä |
| 6960 | xxxx |
| 4736 | 20px |
| 4091 | • |
| 3103 | end |
| 3089 | function |
| 3043 | Serie |
| 2528 | sign |
| 1792 | jä |
| 1778 | Dark |
| 1741 | This |
| 1662 | Series |
| 1637 | death |
| 1564 | vuosis |
| 1556 | vuosie |
| 1511 | Museum |
| 1499 | JR |
| 1442 | We |
| 1391 | this |
| 1387 | 1855−1856 |
| 1321 | Kuva:Finland |
| 1293 | Championship |
| 1279 | Death |
| 1245 | vapen.svg |
| 1240 | Jä |
| 1223 | Dead |
| 1220 | –Höyhens |
| 1217 | Last |
| 1208 | –Pxos |
| 1159 | Copa |
| 1079 | not |
| 1060 | Val |
| 1044 | Fameen |
| 1013 | Dimotikí |
| 1010 | Slam |
| 970 | 1.jpg |
| 966 | –Zache |
| 946 | Transformers: |
| 935 | Legend |
| 919 | Segunda |
| 909 | 2.jpg |
| 881 | Light |
| 858 | –Abc10 |
| 852 | Château |
| 850 | Way |
| 840 | maalaiskunta( |
| 833 | Boys |
| 824 | Blood |
| 822 | Award |
| 809 | Ενότητα |
| 809 | Enótita |
| 805 | Jääkä |
| 796 | Two |
| 768 | Book |
| 767 | Statistiska |
| 761 | 01.jpg |
| 755 | What |
| 755 | was |
| 743 | Not |
| 727 | Heart |
| 718 | (Δημοτική |
| 704 | Gear |
| 693 | École |
| 686 | Greatest |
| 682 | we |
| 681 | Fire |
| 680 | Girl |
| 668 | Special |
| 666 | Sweet |
| 664 | syn. |
| 660 | arms.svg |
| 659 | Dinamo |
| 653 | (eng. |
| 651 | maalit2 |
| 650 | maalit1 |
| 649 | joukkue2 |
| 649 | joukkue1 |
| 648 | coats |
| 646 | Evil |
| 645 | Kuva:No |
| 636 | Näsijärven–Ruoveden |
| 629 | “Luokka:Poistoäänestykset |
| 625 | Ball |
| 621 | Public |
| 611 | Scapa |
| 611 | Rally |
| 604 | Κοινότητα |
| 604 | Koinótita |
| 599 | Mighty |
| 599 | God |
| 597 | NCAP |
| 595 | which |
| 594 | End |
| 593 | Wars |
| 593 | Station |
| 591 | UKBot-botti. |
| 590 | Earth |
| 588 | Never |
| 587 | Please |
| 580 | States |
| Frequency | Word-form |
|---|---|
| 17 | joll |
| 9 | sillee |
| 8 | Tarja_Halonen |
| 8 | pitäs |
| 8 | ol |
| 6 | Helsingin_Sanomat |
| 5 | tääl |
| 5 | rupee |
| 5 | jotenki |
| 4 | viikkoo |
| 4 | vaanmutta |
| 4 | upeeta |
| 4 | siihe |
| 4 | semmone |
| 4 | rauhotu |
| 4 | puol |
| 4 | näis |
| 4 | näin_ollen |
| 4 | nytte |
| 4 | jotaki |
| 4 | johonki |
| 4 | Euroopan_unionin |
| 4 | esimerkiks |
| 4 | 4_600 |
| 3 | yhtää |
| 3 | vähäm |
| 3 | vähäks |
| 3 | tiä |
| 3 | tietsä |
| 3 | TBK |
| 3 | Tampereen_yliopistossa |
| 3 | sielt |
| 3 | semmost |
| 3 | sellaist |
| 3 | sd |
| 3 | Roman_Polanskin |
| 3 | pääs |
| 3 | Punaisen_Ristin |
| 3 | pitäskö |
| 3 | Paavo_Lipponen |
| 3 | Oottekste |
| 3 | onk |
| 3 | ollenkaa |
| 3 | näi |
| 3 | nipin_napin |
| 3 | ninku |
| 3 | New_Yorkissa |
| 3 | muute |
| 3 | mis |
| 3 | Martti_Ahtisaari |
| 3 | keng |
| 3 | Katotaas |
| 3 | jonnekkii |
| 3 | hirveesti |
| 3 | Helsingin_yliopiston |
| 3 | Helsingin_Sanomien |
| 3 | Esko_Aho |
| 3 | ens |
| 3 | Akiro |
| 3 | 300_000 |
| 3 | 30_000 |
| 3 | 3_000 |
| 3 | 10_000 |
| 2 | Yoeune |
| 2 | yksie |
| 2 | yhtäkkii |
| 2 | yhes |
| 2 | yheksän |
| 2 | vuuen |
| 2 | vuoks |
| 2 | Voi_hyvinkin |
| 2 | Voi_hyvin |
| 2 | viimeks |
| 2 | vieläkää |
| 2 | Veijo_Meri |
| 2 | varmmaam |
| 2 | Uuno_Turhapuroon |
| 2 | täälä |
| 2 | täst |
| 2 | tämmöst |
| 2 | tällasen |
| 2 | Turun_Sanomissa |
| 2 | tuol |
| 2 | Tuleekse |
| 2 | Troia |
| 2 | Tony_Blairin |
| 2 | tommost |
| 2 | tiäks |
| 2 | tilloo |
| 2 | teil |
| 2 | Soiliki |
| 2 | siälä |
| 2 | siält |
| 2 | siell |
| 2 | seuraavaks |
| 2 | semssi |
| 2 | semmoseks |
| 2 | seitsemä |
| 2 | satayheksänkymmentä |
| 2 | saadas |
| Frequency | Word-form |
|---|---|
| 6051 | amp |
| 4489 | p.m |
| 3382 | PIC |
| 2805 | � |
| 2705 | της |
| 2517 | και |
| 2246 | nr |
| 2186 | europa.eu.int |
| 2145 | pic |
| 2094 | comm |
| 2043 | un |
| 1950 | την |
| 1941 | του |
| 1751 | το |
| 1749 | dans |
| 1665 | των |
| 1639 | Fax |
| 1582 | που |
| 1547 | this |
| 1497 | Act |
| 1463 | S.A |
| 1430 | για |
| 1416 | να |
| 1399 | eG |
| 1379 | 1.1.2006– |
| 1346 | η |
| 1343 | από |
| 1305 | EG |
| 1293 | με |
| 1274 | voor |
| 1270 | LOOPU |
| 1268 | not |
| 1264 | which |
| 1250 | Nr |
| 1215 | PO |
| 1182 | Société |
| 1134 | under |
| 1130 | Classification |
| 1122 | including |
| 1104 | States |
| 1098 | Kingdom |
| 1088 | Klassificering |
| 1059 | C10 |
| 1024 | sgb |
| 1018 | NEWLINE |
| 1009 | secretariat |
| 1005 | Member |
| 946 | Verts |
| 928 | ec.europa.eu |
| 925 | NGL-ryhmän |
| 924 | its |
| 915 | Ministry |
| 903 | shall |
| 902 | Article |
| 899 | Extract |
| 896 | implementation |
| 890 | EXTRACT |
| 883 | aux |
| 882 | ANNEX |
| 878 | spp |
| 875 | Regulation |
| 864 | τις |
| 858 | Department |
| 843 | delle |
| 819 | between |
| 816 | og |
| 814 | est |
| 803 | net |
| 797 | σε |
| 792 | artikel |
| 790 | ότι |
| 782 | por |
| 777 | anonyme |
| 774 | mod. |
| 772 | nie |
| 764 | Brussel |
| 757 | such |
| 757 | -1 |
| 751 | EMOTR:n |
| 750 | B-1049 |
| 749 | BV |
| 746 | IE |
| 739 | EKT |
| 735 | public |
| 704 | measures |
| 688 | til |
| 688 | reg |
| 676 | αριθ |
| 663 | LU |
| 663 | aid |
| 659 | million |
| 658 | τα |
| 657 | je |
| 657 | Commissione |
| 656 | på |
| 653 | δεν |
| 653 | Postbus |
| 653 | č |
| 652 | GenmbH |
| 649 | οι |
| Frequency | Word-form |
|---|---|
| 16379 | EUR/100 |
| 6675 | +++++ |
| 4729 | *IT |
| 3920 | *FR |
| 3893 | FILE= |
| 3866 | >PIC |
| 3573 | EUR/t |
| 3110 | %amp% |
| 2698 | της |
| 2466 | // |
| 2463 | και |
| 2422 | 0,— |
| 2335 | *DE |
| 2229 | (2006/C |
| 2040 | την |
| 1989 | %:a |
| 1979 | [pic] |
| 1972 | KOM(2005) |
| 1952 | un |
| 1931 | του |
| 1925 | *HU |
| 1907 | *ES |
| 1793 | dans |
| 1755 | το |
| 1668 | των |
| 1611 | *CZ |
| 1595 | this |
| 1573 | που |
| 1507 | p/st |
| 1431 | *NL |
| 1408 | για |
| 1401 | eG |
| 1400 | να |
| 1382 | *SK |
| 1377 | *PL |
| 1352 | ………. |
| 1344 | από |
| 1333 | (2005/C |
| 1327 | η |
| 1325 | S.A. |
| 1305 | KOM(2006) |
| 1303 | LOOPU> |
| 1300 | voor |
| 1291 | Act |
| 1290 | *UK |
| 1284 | με |
| 1269 | not |
| 1263 | which |
| 1259 | lopull. |
| 1257 | /* |
| 1254 | */ |
| 1208 | nr. |
| 1189 | Nr. |
| 1150 | Société |
| 1149 | under |
| 1149 | KOM(2004) |
| 1095 | Classification, |
| 1091 | Kingdom |
| 1090 | lausunnon(2), |
| 1075 | %mdash% |
| 1068 | nr |
| 1067 | (EG) |
| 1045 | Member |
| 1040 | %lt% |
| 1038 | C10 |
| 994 | including |
| 988 | its |
| 986 | PO |
| 948 | ehdotuksen(1), |
| 943 | *AT |
| 932 | aux |
| 923 | http://europa.eu.int/comm/secretariat_general/sgb/state_aids/ |
| 908 | States |
| 908 | delle |
| 902 | Article |
| 898 | og |
| 898 | aine/ihoa |
| 897 | Extract |
| 890 | EXTRACT |
| 883 | τις |
| 844 | shall |
| 842 | ANNEX |
| 839 | between |
| 828 | Demarty |
| 821 | implementation |
| 804 | Fax: |
| 798 | K(2005) |
| 795 | Fax |
| 793 | σε |
| 786 | B-1049 |
| 779 | por |
| 779 | Department |
| 776 | artikel |
| 774 | Regulation |
| 766 | est |
| 765 | nie |
| 746 | EMOTR:n |
| 742 | such |
| 736 | anonyme |
| 730 | (EUR/100 |
| Frequency | Word-form |
|---|---|
| 3949 | oIi |
| 2793 | OIen |
| 2789 | Pinmontagne |
| 2588 | S.org |
| 2381 | Original |
| 2367 | …ja |
| 2059 | ÄIä |
| 1801 | oIIa |
| 1682 | :…. |
| 1496 | lsä |
| 1460 | Text |
| 1382 | lhmiset |
| 1372 | oIen |
| 1351 | llman |
| 1312 | oIisi |
| 1297 | Mitä…? |
| 1281 | oIet |
| 1276 | oIIut |
| 1267 | FBl: |
| 1253 | horge |
| 1226 | ltse |
| 1184 | juzkaaz, |
| 1157 | SubFinland.org…: |
| 1133 | OIet |
| 1119 | my |
| 1108 | Herr |
| 1106 | tääIIä |
| 980 | vieIä |
| 968 | Stevie |
| 957 | lstu |
| 902 | Führer |
| 892 | ltä. |
| 889 | Matti_, |
| 875 | Gossip |
| 865 | wraithien |
| 853 | -…ja |
| 846 | MinuIIa |
| 827 | SubLand.info |
| 821 | sinä…? |
| 802 | BTI |
| 783 | Siinäs |
| 770 | SDI |
| 739 | Broadcast |
| 734 | minuIIe |
| 728 | lsä, |
| 725 | tääIIä. |
| 717 | FBl:n |
| 711 | HaIuan |
| 709 | Foreman |
| 704 | ¡Ó |
| 699 | sinuIIe |
| 693 | 0len |
| 690 | haIua |
| 690 | FBI:sta. |
| 687 | Juuseri, |
| 686 | Heil |
| 678 | lta. |
| 658 | camel, |
| 651 | lsäni |
| 649 | lsäsi |
| 629 | DickJohnson, |
| 627 | OIemme |
| 623 | J.R. |
| 615 | BarFly83, |
| 602 | ..ja |
| 591 | I’m |
| 590 | Subtitles |
| 590 | Señor |
| 581 | wraithit |
| 577 | Fat |
| 573 | amigo. |
| 570 | Mayday! |
| 569 | Führerin |
| 566 | S01 |
| 566 | neohifk, |
| 565 | Mama |
| 563 | lkävä |
| 563 | BarFly83 |
| 555 | A_atoli, |
| 553 | oIette |
| 553 | 0nko |
| 551 | TuIe |
| 551 | Li’l |
| 549 | L.A. |
| 544 | Lhmiset |
| 540 | …mutta |
| 540 | 2O |
| 539 | sinuIIa |
| 533 | Darryl |
| 523 | Val |
| 522 | ltä |
| 520 | Iiian |
| 520 | [FINNISH] |
| 513 | Pope |
| 512 | Morty. |
| 511 | Ghost |
| 509 | Shelly |
| 504 | Maddie |
| 504 | Coulson |
| 498 | Girl |
| Frequency | Word-form |
|---|---|
| 44 | Muiriel |
| 42 | Yanni |
| 33 | Ziri |
| 25 | Ooksä |
| 20 | tähdistäennustamiseen. |
| 16 | Ootsä |
| 14 | Tykkääk |
| 13 | Tom’in |
| 13 | Mennad |
| 11 | pitäsi |
| 10 | ve’en |
| 10 | tarviin |
| 10 | Haluuk |
| 9 | Yumi |
| 8 | tähdistäennustamisesta? |
| 8 | Ook |
| 8 | Muirielin |
| 8 | Mis |
| 8 | Lipschitz-jatkuva! |
| 8 | Fréchet-avaruudessa |
| 7 | Tomil |
| 7 | Skura |
| 7 | Mayuko |
| 7 | Majk! |
| 7 | koskaa |
| 6 | π |
| 6 | vetrehet |
| 6 | törkeen |
| 6 | tähdistäennustamista. |
| 6 | tosissas? |
| 6 | Tom’ista. |
| 6 | Tom’ista |
| 6 | Tom’ia. |
| 6 | Skuran |
| 6 | Sagittarius |
| 6 | saatantahan, |
| 6 | kakskyt |
| 6 | Atuqtuaq |
| 5 | tähdistäennustamista |
| 5 | nään |
| 5 | kolmenkympin. |
| 5 | ”Emmä |
| 4 | Ælfred. |
| 4 | Zamenhof |
| 4 | ymmärteneen |
| 4 | väärinymmärteneen |
| 4 | tähdistäennustamisesta. |
| 4 | tähdistäennustamiseen? |
| 4 | tähdistäennustaminen |
| 4 | Tomr. |
| 4 | Tom’iin. |
| 4 | Titanic |
| 4 | tieän |
| 4 | telkkarii. |
| 4 | tarvin. |
| 4 | tarvin |
| 4 | Tarviik |
| 4 | Taroun |
| 4 | Taj |
| 4 | syvemmä. |
| 4 | ”Sommerfugl” |
| 4 | siitäki |
| 4 | Siirrää |
| 4 | sep’ |
| 4 | Seinfeldistä? |
| 4 | Sauronin |
| 4 | šahadaa. |
| 4 | Quebecissä. |
| 4 | parametrisoitte |
| 4 | parametrisoit |
| 4 | noloo. |
| 4 | naimaiässä. |
| 4 | Mist |
| 4 | Minogue. |
| 4 | Marguerit |
| 4 | Manjolle |
| 4 | L’Hôpitalin |
| 4 | kakskytvuotias. |
| 4 | kakskyt. |
| 4 | kabylia, |
| 4 | juuta |
| 4 | Johanolle |
| 4 | itselläniki |
| 4 | Ichiro |
| 4 | Höm! |
| 4 | hoisi |
| 4 | että… |
| 4 | Cookie |
| 4 | cones” |
| 4 | autosa. |
| 4 | asiosta, |
| 4 | 2²⁰¹³ |
| 3 | Yooko |
| 3 | ymmärtynyt |
| 3 | WC:seen. |
| 3 | Voisiksä |
| 3 | vibrofoniin |
| 3 | vaikeeta. |
| 3 | uuen |
| 3 | UFO:a? |
| Frequency | Word-form |
|---|---|
| 8 | Merisi |
| 4 | vadat |
| 4 | upeet |
| 4 | suskeptibiliteetit |
| 4 | signifikantit |
| 4 | ohet |
| 4 | kahkot |
| 4 | csardas |
| 4 | chuukilaiset |
| 4 | chuukilainen |
| 4 | amerikanbandoggi |
| 2 | äpit |
| 2 | äikät |
| 2 | åkermaniitit |
| 2 | zoofyytit |
| 2 | zeptosekunnit |
| 2 | yröt |
| 2 | yritteet |
| 2 | Ylä-Volta |
| 2 | yläpaarteet |
| 2 | yksöistunnit |
| 2 | yarkantinjänis |
| 2 | yarkantinjänikset |
| 2 | wurtziitit |
| 2 | wolfinmarakatti |
| 2 | wolfinmarakatit |
| 2 | Weddellinmeri |
| 2 | Vähä-Syrtti |
| 2 | väestötiheykset |
| 2 | vohlivat |
| 2 | vohlitte |
| 2 | vohlimista |
| 2 | vohliminen |
| 2 | vohlimaisillaan |
| 2 | vohliakseen |
| 2 | virvit |
| 2 | virtuaalilemmikkit |
| 2 | virtuaalilemmikki |
| 2 | virpit |
| 2 | virpisi |
| 2 | virpinsä |
| 2 | virpinne |
| 2 | virpini |
| 2 | virpimme |
| 2 | virpejä |
| 2 | virpeinä |
| 2 | virpeineen |
| 2 | virpeihin |
| 2 | villieringenetit |
| 2 | Vienanmeri |
| 2 | vetot |
| 2 | veloute-kastikkeet |
| 2 | veloute-kastike |
| 2 | vehjeet |
| 2 | vastenmielisimmat |
| 2 | Valtot |
| 2 | valkoisetpaimenkoirat |
| 2 | vaihdannaisuudet |
| 2 | vadoitta |
| 2 | vadoista |
| 2 | vadoissa |
| 2 | vadoin |
| 2 | vadoilta |
| 2 | vadoille |
| 2 | vadoilla |
| 2 | vadoiksi |
| 2 | vadatta |
| 2 | vadasta |
| 2 | vadassa |
| 2 | vadan |
| 2 | vadalta |
| 2 | vadalle |
| 2 | vadalla |
| 2 | vadaksi |
| 2 | vaakaamista |
| 2 | vaakaaminen |
| 2 | vaakaamaisillaan |
| 2 | Uudet-Seelannit |
| 2 | uudenseelanninlokit |
| 2 | upeetta |
| 2 | upeesta |
| 2 | upeessa |
| 2 | upeesi |
| 2 | upeeseen |
| 2 | upeensa |
| 2 | upeenne |
| 2 | upeeni |
| 2 | upeena |
| 2 | upeen |
| 2 | upeemme |
| 2 | upeelta |
| 2 | upeelle |
| 2 | upeella |
| 2 | upeeksi |
| 2 | Uotsit |
| 2 | Uotsi |
| 2 | Uotit |
| 2 | ulut |
| 2 | ubiikit |
| 2 | töyhtöt |
The underlying language models are mostly represented by finite-state automata (FSAs). The figures may give some indication of the speed and size of the models in practical applications.
| Feature | Measure |
|---|---|
| On-disk size | 9,1M |
| states | 112277 |
| arcs | 391855 |
| final states | 18057 |
| input/output epsilons | 0 |
| input epsilons | 791 |
| output epsilons | 9339 |
| Feature | Measure |
|---|---|
| On-disk size | 4,7M |
| states | 103416 |
| arcs | 227122 |
| final states | 10233 |
| input/output epsilons | 0 |
| input epsilons | 8952 |
| output epsilons | 7074 |
| Feature | Measure |
|---|---|
| On-disk size | 5,7M |
| states | 110329 |
| arcs | 239306 |
| final states | 42 |
| input/output epsilons | 0 |
| input epsilons | 34362 |
| output epsilons | 26408 |
| Feature | Measure |
|---|---|
| On-disk size | 29M |
| states | 536274 |
| arcs | 1237102 |
| final states | 121 |
| input/output epsilons | 0 |
| input epsilons | 158366 |
| output epsilons | 63232 |
| Feature | Measure |
|---|---|
| On-disk size | 29M |
| states | 550451 |
| arcs | 1225472 |
| final states | 125 |
| input/output epsilons | 21589 |
| input epsilons | 77549 |
| output epsilons | 174631 |
| Feature | Measure |
|---|---|
| On-disk size | 5,0M |
| states | 119491 |
| arcs | 237437 |
| final states | 47 |
| input/output epsilons | 0 |
| input epsilons | 31707 |
| output epsilons | 33440 |