A public git version of my research projects, i.e. articles and all that

View project on GitHub

Dr. Tommi A Pirinen, publications and other academic work

Universität Hamburg, Hamburger Zentrum für Sprachkorpora, CLARIN-D developer, etc. usw.


Yikes, now (2019) there’s a bit of proliferation of academic profiles in services that everyone must have, I’ve listed a few here:

I have profiles in researchgate and academia.edu, but they seem a bit iffy and I don’t use them and don’t really want to link to them either…

CV and bio

Here’s a sample CV that I update usually only when I apply for jobs or so, so it might be outdated. And here’s an academic bio,

Short academic history

  • Universität Hamburg, HZSK, CLARIN-D (2016–2020…)
  • Dublin City University (2014–2016): Abu-Matran
  • University of Helsinki (2007–2014): HFST project, finite-state spell-checking (PhD), Open source morphology of Finnish (masters), TTS system simple4all
  • University of Joensuu (2003–2007): Computer Science (Bachelors), Finnish linguistics, etc.

Research interests

The things I’ve studied and am good at and interested in using my time in:

  • Weighted finite-state automata in computational lingustics
  • Lesser-resourced and minority languages
  • Uralic languages
  • Software engineering practices in computational linguistics
  • Computer science—Computational linguistics—Linguistics interdisciplinary co-operation
  • End-user apps with computational linguistics: Machine translation, writers tools, computer-aided language learning
  • Digital humanities—computational linguistics interactions
  • Neural models for very underresourced languages

The list is not exhaustive.


Following is a list of all my accepted publications and links to author’s post-print versions. The versions on this page may differ significantly from the officials in that they have been optimised for screen reading, they have been reformatted, the hyperlinks have been added, and so forth.

It may be noteworthy at the moment, that google scholar offers a great way to browse my publications and see their incoming citations.

Here is a bib-file of all my publications, it may or may not be as accurate and up-to-date as google scholar.

Publications in conferences and journals

  1. Tommi Pirinen, Hanna Hedeland, Heidemarie Sambale (2019), User Support for Digital Humanities, in CLARIN Annual Conference 2019 (CAC 2019), Leipzig, Germany,
  2. Tommi A Pirinen (2019), Building minority dependency treebanks, dictionaries and computational grammars at the same time—an experiment in Karelian treebanking, in Universal Dependencies Workshop 2019 (UDW 2019) at Syntaxfest 2019, Paris, France. TeX version, HTML (LaTeXML) version,
  3. Tommi A Pirinen (2019), Workflows for kickstarting RBMT in virtually No-Resource Situation, in The 2nd Workshop on Technologies for MT of Low Resource Languages (LoResMT 2019), at MTsummit 2019, Dublin. TeX version, HTML (LaTeXML) version, ACL Anthology version
  4. Tommi A Pirinen (2019), Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task in the Fourth conference on Machine Translation (wmt19) at ACL 2019, Firenze, Italy. TeX version, HTML (LaTeXML) version, ACL Anthology version
  5. Tommi A Pirinen (2019), Neural and rule-based Finnish NLP models–expectations, experiments and experiences in 5th International Workshop for Computational Linguistics of Uralic Languages. Tartu, Estonia. TeX version, HTML (LaTeXML) version, ACL Anthology version
  6. Tommi Pirinen (2018), Rule-based machine-translation between Finnish and German in 40. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, CL-Postersession TeX version, HTML (LaTeXML) version
  7. Tommi A Pirinen, Hanna Hedeland, Daniel Jettka (2017b), Developing a CLARIN compatible AAI solution for academic and restricted resources.
  8. Tommi Pirinen, Francis M. Tyers, Trond Trosterud, Ryan Johnson, Kevin Unhammer, Tiina Puolakainen (2017a, equal contribution) North-Sámi to Finnish rule-based machine translation system TeX version, HTML (LaTeXML) version
  9. Tommi A Pirinen, Eszter Simon, Francis M Tyers, Veronika Vincze, (2016c), Report on the Second International Workshop on Computational Linguistics for Uralic languages, in Finno-Ugric languages and linguistics,
  10. Francis Tyers, Tommi Pirinen (2016b) Intermediate Representations in Rule-Based Machine Translation for Uralic languages in Proceedings of Second International Workshop on Computational Linguistics for Uralic Languages (IWCLUL2016) TeX version, HTML (LaTeXML) version
  11. Tommi Pirinen, Antonio Toral, Raphael Rubino (2016a) Rule-Based and Statistical Morph Segments in English-to-Finnish SMT, in Proceedings of Second International Workshop on Computational Linguistics for Uralic Languages (IWCLUL), Szeged, Hungary TeX version, HTML (LaTeXML) version
  12. Tommi A Pirinen (2015e) Development and Use of Computational Morphology of Finnish in the Open Source and Open Science Era: Notes on Experiences with Omorfi Development. SKY Journal of Linguistics. TeX version, HTML (LaTeXML) version
  13. Antonio Toral, Xiaofeng Wu, Tommi Pirinen, Zhengwei Qiu, Ergun Bicici and Jinhua Du (2015d) Dublin City University at the TweetMT 2015 Shared Task in Proceedings of TweetMT shared task at SEPLN 2015 TeX version, HTML (LaTeXML) version
  14. Raphael Rubino, Tommi Pirinen, Miquel Esplà-Gomis, Nikola Ljubešić, Sergio Ortiz Rojas, Vassilis Papavassiliou, Prokopis Prokopidis and Antonio Toral (2015c), Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling In proceedings of WMT shared task at EMNLP 2015 TeX version, HTML (LaTeXML) version
  15. Tommi A Pirinen (2015a), Omorfi—Free and open source morphological lexical database for Finnish, in Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA 2015
  16. Tommi A Pirinen (2015b), Using weighted finite state morphology with VISL CG-3—Some experiments with free open source Finnish resources, in Proceedings of Constraint grammar - methods, tools and applications Workshop at NoDaLiDa TeX version, HTML (LaTeXML) version
  17. Senka Drobac, Krister Lindén, Tommi Pirinen, Miikka Silfverberg (2014e), Heuristic hyper-minimization of finite state lexicons, in LREC 2014
  18. Antonio Toral, Raphael Rubino, Miquel Esplà, Tommi Pirinen, Andy Way and Gema Ramírez-Sánchez (2014d). Extrinsic Evaluation of Web-Crawlers in Machine Translation: a Case Study on Croatian–English for the Tourism Domain in Proceedings of EAMT 2014
  19. Sjur Moshagen, Trond Trosterud, Jack Rueter, Francis Tyers and Tommi A Pirinen (2014c), Open-source infrastructures for collaborative wrok on under-resourced languages, in Proceedings of CCURL workshop 2014 in LREC
  20. Senka Drobac, Krister Lindén, Tommi A Pirinen and Miikka Silfverberg (2014b), Heuristic Hyperminimisation of Finite-State Lexicons, in Proceedings of LREC 2014
  21. Tommi A Pirinen, Krister Lindén (2014a) State-of-the-art in Weighted Finite-State Spell-Checking in Proceedings of CICLing 2014 TeX version, HTML (LaTeXML) version
  22. Sjur Moshagen, Tommi A Pirinen, Trond Trosterud (2013a) Building an open-source development infrastructure for language technology projects, in Proceedings of Nodalida 2013
  23. Tommi A Pirinen, Sam Hardwick (2012d) Effect of Language and Error Models on Efficiency of Finite-State Spell-Checking and Correction, in Proceedings of 10th International Workshop on Finite-State Methods and/in Natural Language Processing FSMNLP 2012 TeX version, HTML (LaTeXML) version
  24. Krister Lindén, Miikka Silfverberg, Erik Axelson, Senka Drobac, Sam Hardwick, Tommi A Pirinen (2012c) Using HFST for creating Computational Linguistic Applications in Computational Linguistics-Applications 2012
  25. Tommi A Pirinen, Francis M. Tyers (2012b) Compiling Apertium morphological dictionaries with HFST and using them in HFST applications in Proceedings of Workshops in Language Resources and Evaluation conference LREC 2012, in saltmil-aflat workshop on “language technology for normalisation of less-resourced languages” TeX version, HTML (LaTeXML) version
  26. Tommi A Pirinen, Miikka Silfverberg (2012a) Improving Finite-State Spell-Checker Suggestions with Part-of-Speech N-grams in Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics CICLING 2012 TeX version, HTML (LaTeXML) version
  27. Krister Lindén, Miikka Silfverberg, Erik Axelson, Sam Hardwick, Tommi A Pirinen (2011c) HFST—Framework for Compiling and Applying Morphologies in Systems and Frameworks for Computational Morphology 2011, in Communications in Computer and Information Science (100), ISBN: 978-3-642-23138-4
  28. Miikka Silfverberg, Mirka Hyvärinen, Tommi A Pirinen (2011b), Improving Predictive Entry of Finnish Text Messages using IRC Logs in Proceedings of the Computational Linguistics-Applications Conference 2011, ISBN: 978-83-60810-47-7.
  29. Tommi A Pirinen (2011a), Modularisation of Finnish Finite-State Language Description—Towards Wide Collaboration in Open Source Development of Morphological Analyser in Proceedings of Nodalida 2011 (18). TeX version, HTML (LaTeXML) version
  30. Tommi A Pirinen, Krister Lindén (2010c), Creating and Weighting Hunspell Dictionaries as Finite-State Automata , in Investigationes Linguisticae (19). TeX version, HTML (LaTeXML) version
  31. Tommi A Pirinen, Krister Lindén (2010b), Building and Using Existing Hunspell Dictionaries and TEX Hyphenators as Finite-State Automata, in Proceedings of International Multiconference in Computer Science and Information Technology TeX version, HTML (LaTeXML) version
  32. Tommi A Pirinen, Krister Lindén (2010a), Finite-State Spell-Checking with Weighted Language and Error Models, , in Proceedings of Workshops of Language Resources and Evaluation Conference 7 in Valletta, Malta.
  33. Krister Lindén, Tommi A Pirinen (2009a), Weighted Finite-State Morphological Analysis of Finnish Compounding with hfst-lexc, in Proceedings of Nodalida 2009 presentation in PDF] TeX version, HTML (LaTeXML) version
  34. Krister Lindén, Tommi A Pirinen (2009b), Weighting Finite-State Morphological Analyzers using HFST tools, in Pre-proceedings of FSMNLP 2009 TeX version, HTML (LaTeXML) version
  35. Krister Lindén, Miikka Silfverberg, Tommi A Pirinen (2009c), HFST Tools for morphology—An Efficient Open-Source Pacakge for Construction of Morphological Analyzers in Proceedings of Workshop on Systems and Frameworks for Computational Morphology TeX version, HTML (LaTeXML) version


  1. Tommi A Pirinen (2014), Weighted Finite-State Methods in Spell-Checking and Correction, Doctoral dissertation, University of Helsinki. TeX version, HTML (LaTeXML) version
  2. Tommi Pirinen (2008), Suomen kielen äärellistilainen automaattinen morfologinen analyysi avoimen lähdekoodin menetelmin, Master’s Thesis, University of Helsinki (in Finnish). TeX version, HTML (LaTeXML) version

Edited volumes

  1. Tommi A Pirinen et al. (2017) Acta Linguistica Hungarica, special issue. Volume 64, Issue 3, September 2017. publisher’s version
  2. Tommi A Pirinen, Francis Tyers, Trond Trosterud, Michael Rießler (2017), Proceedings of the third international workshop for computational linguistics of Uralic languages held in St. Petersburg, published by ACL anthology: SIG workshops ACL anthology version
  3. Tommi A Pirinen, Francis Tyers, Veronika Vincze (2016), Proceedings of the second international workshop for computational linguistics of Uralic languages held in Szeged, published in Szeged workshops
  4. Tommi A Pirinen, Francis Tyers, Trond Trosterud (2016), NEJLT (Nordic European Journal of Language Technology, special issue in Uralic Language Technology, published in NEJLT publisher’s version
  5. Tommi A Pirinen, Francis Tyers, Trond Trosterud (2015), Proceedings of the first international workshop for computational linguistics of Uralic languages held in Tromsø, published by university library library’s version

Presentations, tutorials, invited speeches

  1. Tommi A Pirinen, Antonio Toral (2015) Why linguistics in SMT? in Why Linguistics? workshop, Tarto, 2015
  2. Morphological segmentation for machine translation, in internal project meeting of abumatran, in Elx, 2014
  3. Weighted finite-state methods as a bridge between strictly rule-based and mostly statistical nlp systems in NCLT seminar series, DCU, 2014
  4. Crowd-Sourcing morphology and lexicography, productising NLP research, in FSCONS 2013, Gothenburg.
  5. Weighted Finite-State Spell-Checking, in Research Seminar of Uni Helsinki. A ~final report on PhD thesis.
  6. Building finite-state spell-checkers with HFST tools, in FSMNLP 2012, Donostia-San Sebastian.
  7. Building and Using Apertium Dictionaries with HFST in LREC 2012, Istanbul.
  8. Using POS taggers to rerank spell-checking results in CICLING 2012, Delhi.
  9. Building and using Hunspell and T_E_X hyphenation descriptions with HFST in CLA 2010, Wisła.
  10. Using Wikipedia to Weight a Spelling-Checker in LREC 2010, Valletta.
  11. Weighting Finnish Compound Boundaries in Nodalida 2009, Odense.
  12. Weighted Finite-State analysis of Finnish Compounds, in CLARIN/D-SPIN meeting
  13. (Unigram-)Weighting Language Models with HFST in FSMNLP 2009, Pretoria.
  14. Avoimen lähdekoodin menetelmät äärellistilaista morfologiaa varten, in 2008, Helsinki.

Software projects and resources

The following projects I participate are more or less related to my work at university and sparetime hobbies related to science:


And some TA jobs (Uni. Helsinki):

  • Introduction to Speech Synthesis
  • Programming NLS
  • Speech Analysis
  • XML
  • Natural language parsing
  • Grammar engineering
  • Morphological language processing
  • Finite state parsing methods