Dr. Tommi A Pirinen, publications and other academic work

Universität Hamburg, Hamburger Zentrum für Sprachkorpora, CLARIN-D developer, etc. usw.

Here’s some of the same info formatted as a CV.

Short academic history

Research interests

The things I’ve studied and am good at and interested in using my time in:

The list is not exhaustive.


Following is a list of all my accepted publications and links to author’s post-print versions. The versions on this page may differ significantly from the officials in that they have been optimised for screen reading, they have been reformatted, the hyperlinks have been added, and so forth.

It may be noteworthy at the moment, that google scholar offers a great way to browse my publications and see their incoming citations.

Here is a bib-file of all my publications, it may or may not be as accurat and up-to-date as google scholar.

Publications in conferences and journals

  1. Tommi A Pirinen, Hanna Hedeland, Daniel Jettka (2017b), Developing a CLARIN compatible AAI solution for academic and restricted resources.
  2. Tommi Pirinen, Francis M. Tyers, Trond Trosterud, Ryan Johnson, Kevin Unhammer, Tiina Puolakainen (2017a, equal contribution) North-Sámi to Finnish rule-based machine translation system TeX version, HTML (LaTeXML) version
  3. Tommi A Pirinen, Eszter Simon, Francis M Tyers, Veronika Vincze, (2016c), Report on the Second International Workshop on Computational Linguistics for Uralic languages, in Finno-Ugric languages and linguistics,
  4. Francis Tyers, Tommi Pirinen (2016b) Intermediate Representations in Rule-Based Machine Translation for Uralic languages in Proceedings of Second International Workshop on Computational Linguistics for Uralic Languages (IWCLUL2016) TeX version, HTML (LaTeXML) version
  5. Tommi Pirinen, Antonio Toral, Raphael Rubino (2016a) Rule-Based and Statistical Morph Segments in English-to-Finnish SMT, in Proceedings of Second International Workshop on Computational Linguistics for Uralic Languages (IWCLUL), Szeged, Hungary TeX version, HTML (LaTeXML) version
  6. Tommi A Pirinen (2015e) Development and Use of Computational Morphology of Finnish in the Open Source and Open Science Era: Notes on Experiences with Omorfi Development. SKY Journal of Linguistics. TeX version, HTML (LaTeXML) version
  7. Antonio Toral, Xiaofeng Wu, Tommi Pirinen, Zhengwei Qiu, Ergun Bicici and Jinhua Du (2015d) Dublin City University at the TweetMT 2015 Shared Task in Proceedings of TweetMT shared task at SEPLN 2015 TeX version, HTML (LaTeXML) version
  8. Raphael Rubino, Tommi Pirinen, Miquel Esplà-Gomis, Nikola Ljubešić, Sergio Ortiz Rojas, Vassilis Papavassiliou, Prokopis Prokopidis and Antonio Toral (2015c), Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling In proceedings of WMT shared task at EMNLP 2015 TeX version, HTML (LaTeXML) version
  9. Tommi A Pirinen (2015a), Omorfi—Free and open source morphological lexical database for Finnish, in Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA 2015
  10. Tommi A Pirinen (2015b), Using weighted finite state morphology with VISL CG-3—Some experiments with free open source Finnish resources, in Proceedings of Constraint grammar - methods, tools and applications Workshop at NoDaLiDa TeX version, HTML (LaTeXML) version
  11. Senka Drobac, Krister Lindén, Tommi Pirinen, Miikka Silfverberg (2014e), Heuristic hyper-minimization of finite state lexicons, in LREC 2014
  12. Antonio Toral, Raphael Rubino, Miquel Esplà, Tommi Pirinen, Andy Way and Gema Ramírez-Sánchez (2014d). Extrinsic Evaluation of Web-Crawlers in Machine Translation: a Case Study on Croatian–English for the Tourism Domain in Proceedings of EAMT 2014
  13. Sjur Moshagen, Trond Trosterud, Jack Rueter, Francis Tyers and Tommi A Pirinen (2014c), Open-source infrastructures for collaborative wrok on under-resourced languages, in Proceedings of CCURL workshop 2014 in LREC
  14. Senka Drobac, Krister Lindén, Tommi A Pirinen and Miikka Silfverberg (2014b), Heuristic Hyperminimisation of Finite-State Lexicons, in Proceedings of LREC 2014
  15. Tommi A Pirinen, Krister Lindén (2014a) State-of-the-art in Weighted Finite-State Spell-Checking in Proceedings of CICLing 2014 TeX version, HTML (LaTeXML) version
  16. Sjur Moshagen, Tommi A Pirinen, Trond Trosterud (2013a) Building an open-source development infrastructure for language technology projects, in Proceedings of Nodalida 2013
  17. Tommi A Pirinen, Sam Hardwick (2012d) Effect of Language and Error Models on Efficiency of Finite-State Spell-Checking and Correction, in Proceedings of 10th International Workshop on Finite-State Methods and/in Natural Language Processing FSMNLP 2012 TeX version, HTML (LaTeXML) version
  18. Krister Lindén, Miikka Silfverberg, Erik Axelson, Senka Drobac, Sam Hardwick, Tommi A Pirinen (2012c) Using HFST for creating Computational Linguistic Applications in Computational Linguistics-Applications 2012
  19. Tommi A Pirinen, Francis M. Tyers (2012b) Compiling Apertium morphological dictionaries with HFST and using them in HFST applications in Proceedings of Workshops in Language Resources and Evaluation conference LREC 2012, in saltmil-aflat workshop on “language technology for normalisation of less-resourced languages” TeX version, HTML (LaTeXML) version
  20. Tommi A Pirinen, Miikka Silfverberg (2012a) Improving Finite-State Spell-Checker Suggestions with Part-of-Speech N-grams in Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics CICLING 2012 TeX version, HTML (LaTeXML) version
  21. Krister Lindén, Miikka Silfverberg, Erik Axelson, Sam Hardwick, Tommi A Pirinen (2011c) HFST—Framework for Compiling and Applying Morphologies in Systems and Frameworks for Computational Morphology 2011, in Communications in Computer and Information Science (100), ISBN: 978-3-642-23138-4
  22. Miikka Silfverberg, Mirka Hyvärinen, Tommi A Pirinen (2011b), Improving Predictive Entry of Finnish Text Messages using IRC Logs in Proceedings of the Computational Linguistics-Applications Conference 2011, ISBN: 978-83-60810-47-7.
  23. Tommi A Pirinen (2011a), Modularisation of Finnish Finite-State Language Description—Towards Wide Collaboration in Open Source Development of Morphological Analyser in Proceedings of Nodalida 2011 (18). TeX version, HTML (LaTeXML) version
  24. Tommi A Pirinen, Krister Lindén (2010c), Creating and Weighting Hunspell Dictionaries as Finite-State Automata , in Investigationes Linguisticae (19). TeX version, HTML (LaTeXML) version
  25. Tommi A Pirinen, Krister Lindén (2010b), Building and Using Existing Hunspell Dictionaries and TEX Hyphenators as Finite-State Automata, in Proceedings of International Multiconference in Computer Science and Information Technology TeX version, HTML (LaTeXML) version
  26. Tommi A Pirinen, Krister Lindén (2010a), Finite-State Spell-Checking with Weighted Language and Error Models, , in Proceedings of Workshops of Language Resources and Evaluation Conference 7 in Valletta, Malta.
  27. Krister Lindén, Tommi A Pirinen (2009a), Weighted Finite-State Morphological Analysis of Finnish Compounding with hfst-lexc, in Proceedings of Nodalida 2009 presentation in PDF] TeX version, HTML (LaTeXML) version
  28. Krister Lindén, Tommi A Pirinen (2009b), Weighting Finite-State Morphological Analyzers using HFST tools, in Pre-proceedings of FSMNLP 2009 TeX version, HTML (LaTeXML) version
  29. Krister Lindén, Miikka Silfverberg, Tommi A Pirinen (2009c), HFST Tools for morphology—An Efficient Open-Source Pacakge for Construction of Morphological Analyzers in Proceedings of Workshop on Systems and Frameworks for Computational Morphology TeX version, HTML (LaTeXML) version


  1. Tommi Pirinen (2008), Suomen kielen äärellistilainen automaattinen morfologinen analyysi avoimen lähdekoodin menetelmin, Master’s Thesis, University of Helsinki (in Finnish). TeX version, HTML (LaTeXML) version
  2. Tommi A Pirinen (2014), Weighted Finite-State Methods in Spell-Checking and Correction, Doctoral dissertation, University of Helsinki. TeX version, HTML (LaTeXML) version

Edited volumes

  1. Tommi A Pirinen et al. (2017) Acta Linguistica Hungarica, special issue. Volume 64, Issue 3, September 2017. publisher’s version
  2. Tommi A Pirinen, Francis Tyers, Trond Trosterud, Michael Rießler (2017), Proceedings of the third international workshop for computational linguistics of Uralic languages held in St. Petersburg, published by ACL anthology: SIG workshops ACL anthology version
  3. Tommi A Pirinen, Francis Tyers, Veronika Vincze (2016), Proceedings of the second international workshop for computational linguistics of Uralic languages held in Szeged, published in Szeged workshops
  4. Tommi A Pirinen, Francis Tyers, Trond Trosterud (2016), NEJLT (Nordic European Journal of Language Technology, special issue in Uralic Language Technology, published in NEJLT publisher’s version
  5. Tommi A Pirinen, Francis Tyers, Trond Trosterud (2015), Proceedings of the first international workshop for computational linguistics of Uralic languages held in Tromsø, published by university library library’s version

Presentations, tutorials, invited speeches

  1. Tommi A Pirinen, Antonio Toral (2015) Why linguistics in SMT? in Why Linguistics? workshop, Tarto, 2015
  2. Morphological segmentation for machine translation, in internal project meeting of abumatran, in Elx, 2014
  3. Weighted finite-state methods as a bridge between strictly rule-based and mostly statistical nlp systems in NCLT seminar series, DCU, 2014
  4. Crowd-Sourcing morphology and lexicography, productising NLP research, in FSCONS 2013, Gothenburg.
  5. Weighted Finite-State Spell-Checking, in Research Seminar of Uni Helsinki. A ~final report on PhD thesis.
  6. Building finite-state spell-checkers with HFST tools, in FSMNLP 2012, Donostia-San Sebastian.
  7. Building and Using Apertium Dictionaries with HFST in LREC 2012, Istanbul.
  8. Using POS taggers to rerank spell-checking results in CICLING 2012, Delhi.
  9. Building and using Hunspell and T_E_X hyphenation descriptions with HFST in CLA 2010, Wisła.
  10. Using Wikipedia to Weight a Spelling-Checker in LREC 2010, Valletta.
  11. Weighting Finnish Compound Boundaries in Nodalida 2009, Odense.
  12. Weighted Finite-State analysis of Finnish Compounds, in CLARIN/D-SPIN meeting
  13. (Unigram-)Weighting Language Models with HFST in FSMNLP 2009, Pretoria.
  14. Avoimen lähdekoodin menetelmät äärellistilaista morfologiaa varten, in 2008, Helsinki.

Software projects and resources

The following projects I participate are more or less related to my work at university and sparetime hobbies related to science:

Courses I’ve taught or TA’d