purplemonkeydishwasher

A public git version of my research projects, i.e. articles and all that

View project on GitHub

Dr. Tommi A Pirinen, publications and other academic work

Currently working at/in/on:

This site contains a list of publications and other academic works and a CVs for Dr Tommi A Pirinen. The publications are author’s versions automatically converted to HTML,.

Academic profiles

I have linked the ones I have found useful or that are required by e.g. ACL, the ones I did not link I do not actively use and perhaps even discourage (e.g. ResearchGate, academia.edu, please avoid these if possible).

Bio and CV

  • Here’s an academic bio, you can copy-paste if one is needed for conference and journal applications, etc…
  • Here’s an old sample CV, I basically last updated it before applying for jobs so it’s outdated by now

Short academic history

  • UiT Norges arktiske universitet, Divvun.no, Giellatekno (2020–)
  • Universität Hamburg, HZSK, CLARIN-D (2016–)
  • Dublin City University (2014–2016): Abu-Matran
  • University of Helsinki (2007–2014): HFST project, finite-state spell-checking (PhD), Open source morphology of Finnish (masters), TTS system simple4all
  • University of Joensuu (2003–2007): Computer Science (Bachelors), Finnish linguistics, etc.

Research interests

The things I’ve studied and am good at and interested in using my time in:

  • Weighted finite-state automata in computational lingustics
  • Lesser-resourced and minority languages
  • Uralic languages
  • Software engineering practices in computational linguistics
  • Computer science—Computational linguistics—Linguistics interdisciplinary co-operation
  • End-user apps with computational linguistics: Machine translation, writers tools, computer-aided language learning
  • Digital humanities—computational linguistics interactions
  • Neural models for very underresourced languages

The list is not exhaustive.

Publications

Following is a list of all my accepted publications and links to author’s post-print versions. I only provide HTML versions produced with latexml, with minimal extra stylings by me. I consider PDF a rubbish format and also printing wasteful; if something looks really bad on HTML send me a message and I can fix it. If you really must, TeX source codes are available on my github and can be used to generate PDFs.

It may be noteworthy at the moment, that google scholar offers a great way to browse my publications and see their incoming citations.

Here is a bib-file of all my publications, it may or may not be as accurate and up-to-date as google scholar.

Publications in conferences and journals

  1. Tommi A Pirinen, Francis M. Tyers (2021) Building language technology infrastructures to support a collaborative approach to language resource building, in Multilingual Facilitation (Festschrift of Dr Jack Rueter) ,
  2. Yvo Meeres, Tommi A Pirinen (2021) Vowel Harmony viewed as Error-Correcting Code., in SCiL 2021, Umass (Online)
  3. Linda Wiechetek, Chiara Argese, Tommi A Pirinen, Trond Trosterud (2021) Suoidne-varra-bleahkka-mála-bihkka-senet-dielku ‘hay-blood-ink-paint-tar-mustard-stain’ – Should compounds be lexicalized in NLP? , in CLIC-IT 2021, Bologna (actually Online)
  4. Heidemarie Sambale, Hanna Hedeland, Tommi Pirinen (2020 to appear), User Support for Digital Humanities, in CLARIN Book of FIXME
  5. Amr Keleg, Nick Howell, Francis M. Tyers, Tommi A Pirinen (2020), An Unsupervised Method for Weighting Finite-state Morphological Analyzers in LREC 2020, Marseille, France postponed / cancelled,
  6. Tommi Pirinen, Hanna Hedeland, Heidemarie Sambale (2019), User Support for Digital Humanities, in CLARIN Annual Conference 2019 (CAC 2019), Leipzig, Germany,
  7. Tommi A Pirinen (2019), Building minority dependency treebanks, dictionaries and computational grammars at the same time—an experiment in Karelian treebanking in Universal Dependencies Workshop 2019 (UDW 2019) at Syntaxfest 2019, Paris, France.
  8. Tommi A Pirinen (2019), Workflows for kickstarting RBMT in virtually No-Resource Situation in The 2nd Workshop on Technologies for MT of Low Resource Languages (LoResMT 2019), at MTsummit 2019, Dublin. ACL Anthology version
  9. Tommi A Pirinen (2019), Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task in the Fourth conference on Machine Translation (wmt19) at ACL 2019, Firenze, Italy. ACL Anthology version
  10. Tommi A Pirinen (2019), Neural and rule-based Finnish NLP models–expectations, experiments and experiences in 5th International Workshop for Computational Linguistics of Uralic Languages. Tartu, Estonia. ACL Anthology version
  11. Tommi Pirinen (2018), Rule-based machine-translation between Finnish and German in
    1. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, CL-Postersession
  12. Tommi A Pirinen, Hanna Hedeland, Daniel Jettka (2017b), Developing a CLARIN compatible AAI solution for academic and restricted resources.
  13. Tommi Pirinen, Francis M. Tyers, Trond Trosterud, Ryan Johnson, Kevin Unhammer, Tiina Puolakainen (2017a, equal contribution) North-Sámi to Finnish rule-based machine translation system at nodalida 2017
  14. Tommi A Pirinen, Eszter Simon, Francis M Tyers, Veronika Vincze, (2016c), Report on the Second International Workshop on Computational Linguistics for Uralic languages, in Finno-Ugric languages and linguistics,
  15. Francis Tyers, Tommi Pirinen (2016b) Intermediate Representations in Rule-Based Machine Translation for Uralic languages in Proceedings of Second International Workshop on Computational Linguistics for Uralic Languages (IWCLUL2016)
  16. Tommi Pirinen, Antonio Toral, Raphael Rubino (2016a) Rule-Based and Statistical Morph Segments in English-to-Finnish SMT, in Proceedings of Second International Workshop on Computational Linguistics for Uralic Languages (IWCLUL), Szeged, Hungary
  17. Tommi A Pirinen (2015e) Development and Use of Computational Morphology of Finnish in the Open Source and Open Science Era: Notes on Experiences with Omorfi Development. SKY Journal of Linguistics.
  18. Antonio Toral, Xiaofeng Wu, Tommi Pirinen, Zhengwei Qiu, Ergun Bicici and Jinhua Du (2015d) Dublin City University at the TweetMT 2015 Shared Task in Proceedings of TweetMT shared task at SEPLN 2015
  19. Raphael Rubino, Tommi Pirinen, Miquel Esplà-Gomis, Nikola Ljubešić, Sergio Ortiz Rojas, Vassilis Papavassiliou, Prokopis Prokopidis and Antonio Toral (2015c), Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling In proceedings of WMT shared task at EMNLP 2015
  20. Tommi A Pirinen (2015a), Omorfi—Free and open source morphological lexical database for Finnish, in Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA 2015
  21. Tommi A Pirinen (2015b), Using weighted finite state morphology with VISL CG-3—Some experiments with free open source Finnish resources, in Proceedings of Constraint grammar - methods, tools and applications Workshop at NoDaLiDa 2015
  22. Senka Drobac, Krister Lindén, Tommi Pirinen, Miikka Silfverberg (2014e), Heuristic hyper-minimization of finite state lexicons, in LREC 2014
  23. Antonio Toral, Raphael Rubino, Miquel Esplà, Tommi Pirinen, Andy Way and Gema Ramírez-Sánchez (2014d). Extrinsic Evaluation of Web-Crawlers in Machine Translation: a Case Study on Croatian–English for the Tourism Domain in Proceedings of EAMT 2014
  24. Sjur Moshagen, Trond Trosterud, Jack Rueter, Francis Tyers and Tommi A Pirinen (2014c), Open-source infrastructures for collaborative wrok on under-resourced languages, in Proceedings of CCURL workshop 2014 in LREC
  25. Senka Drobac, Krister Lindén, Tommi A Pirinen and Miikka Silfverberg (2014b), Heuristic Hyperminimisation of Finite-State Lexicons, in Proceedings of LREC 2014
  26. Tommi A Pirinen, Krister Lindén (2014a) State-of-the-art in Weighted Finite-State Spell-Checking in Proceedings of CICLing 2014
  27. Sjur Moshagen, Tommi A Pirinen, Trond Trosterud (2013a) Building an open-source development infrastructure for language technology projects, in Proceedings of Nodalida 2013
  28. Tommi A Pirinen, Sam Hardwick (2012d) Effect of Language and Error Models on Efficiency of Finite-State Spell-Checking and Correction, in Proceedings of 10th International Workshop on Finite-State Methods and/in Natural Language Processing FSMNLP 2012
  29. Krister Lindén, Miikka Silfverberg, Erik Axelson, Senka Drobac, Sam Hardwick, Tommi A Pirinen (2012c) Using HFST for creating Computational Linguistic Applications in Computational Linguistics-Applications 2012
  30. Tommi A Pirinen, Francis M. Tyers (2012b) Compiling Apertium morphological dictionaries with HFST and using them in HFST applications in Proceedings of Workshops in Language Resources and Evaluation conference LREC 2012, in saltmil-aflat workshop on “language technology for normalisation of less-resourced languages”
  31. Tommi A Pirinen, Miikka Silfverberg (2012a) Improving Finite-State Spell-Checker Suggestions with Part-of-Speech N-grams in Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics CICLING 2012
  32. Krister Lindén, Miikka Silfverberg, Erik Axelson, Sam Hardwick, Tommi A Pirinen (2011c) HFST—Framework for Compiling and Applying Morphologies in Systems and Frameworks for Computational Morphology 2011, in Communications in Computer and Information Science (100), ISBN: 978-3-642-23138-4
  33. Miikka Silfverberg, Mirka Hyvärinen, Tommi A Pirinen (2011b), Improving Predictive Entry of Finnish Text Messages using IRC Logs in Proceedings of the Computational Linguistics-Applications Conference 2011, ISBN: 978-83-60810-47-7.
  34. Tommi A Pirinen (2011a), Modularisation of Finnish Finite-State Language Description—Towards Wide Collaboration in Open Source Development of Morphological Analyser in Proceedings of Nodalida 2011.
  35. Tommi A Pirinen, Krister Lindén (2010c), Creating and Weighting Hunspell Dictionaries as Finite-State Automata , in Investigationes Linguisticae (19).
  36. Tommi A Pirinen, Krister Lindén (2010b), Building and Using Existing Hunspell Dictionaries and TₑX Hyphenators as Finite-State Automata, in Proceedings of International Multiconference in Computer Science and Information Technology
  37. Tommi A Pirinen, Krister Lindén (2010a), Finite-State Spell-Checking with Weighted Language and Error Models, , in Proceedings of Workshops of Language Resources and Evaluation Conference LREC 2010 in Valletta, Malta.
  38. Krister Lindén, Tommi A Pirinen (2009a), Weighted Finite-State Morphological Analysis of Finnish Compounding with hfst-lexc, in Proceedings of Nodalida 2009 presentation
  39. Krister Lindén, Tommi A Pirinen (2009b), Weighting Finite-State Morphological Analyzers using HFST tools, in Pre-proceedings of FSMNLP 2009
  40. Krister Lindén, Miikka Silfverberg, Tommi A Pirinen (2009c), HFST Tools for morphology—An Efficient Open-Source Pacakge for Construction of Morphological Analyzers in Proceedings of Workshop on Systems and Frameworks for Computational Morphology

Theses

  1. Tommi A Pirinen (2014), Weighted Finite-State Methods in Spell-Checking and Correction, Doctoral dissertation, University of Helsinki.
  2. Tommi Pirinen (2008), Suomen kielen äärellistilainen automaattinen morfologinen analyysi avoimen lähdekoodin menetelmin, Master’s Thesis, University of Helsinki (in Finnish).

Edited volumes

  1. Tommi A Pirinen et al. (2017) Acta Linguistica Hungarica, special issue. Volume 64, Issue 3, September 2017. publisher’s version
  2. Tommi A Pirinen, Francis Tyers, Trond Trosterud, Michael Rießler (2017), Proceedings of the third international workshop for computational linguistics of Uralic languages held in St. Petersburg, published by ACL anthology: SIG workshops ACL anthology version
  3. Tommi A Pirinen, Francis Tyers, Veronika Vincze (2016), Proceedings of the second international workshop for computational linguistics of Uralic languages held in Szeged, published in Szeged workshops
  4. Tommi A Pirinen, Francis Tyers, Trond Trosterud (2016), NEJLT (Nordic European Journal of Language Technology, special issue in Uralic Language Technology, published in NEJLT publisher’s version
  5. Tommi A Pirinen, Francis Tyers, Trond Trosterud (2015), Proceedings of the first international workshop for computational linguistics of Uralic languages held in Tromsø, published by university library library’s version

Presentations, tutorials, invited speeches

(A rather incomplete list of course…)

  1. Tommi A Pirinen, Antonio Toral (2015) Why linguistics in SMT? in Why Linguistics? workshop, Tarto, 2015
  2. Morphological segmentation for machine translation, in internal project meeting of abumatran, in Elx, 2014
  3. Weighted finite-state methods as a bridge between strictly rule-based and mostly statistical nlp systems in NCLT seminar series, DCU, 2014
  4. Crowd-Sourcing morphology and lexicography, productising NLP research, in FSCONS 2013, Gothenburg.
  5. Weighted Finite-State Spell-Checking, in Research Seminar of Uni Helsinki. A ~final report on PhD thesis.
  6. Building finite-state spell-checkers with HFST tools, in FSMNLP 2012, Donostia-San Sebastian.
  7. Building and Using Apertium Dictionaries with HFST in LREC 2012, Istanbul.
  8. Using POS taggers to rerank spell-checking results in CICLING 2012, Delhi.
  9. Building and using Hunspell and T_E_X hyphenation descriptions with HFST in CLA 2010, Wisła.
  10. Using Wikipedia to Weight a Spelling-Checker in LREC 2010, Valletta.
  11. Weighting Finnish Compound Boundaries in Nodalida 2009, Odense.
  12. Weighted Finite-State analysis of Finnish Compounds, in CLARIN/D-SPIN meeting
  13. (Unigram-)Weighting Language Models with HFST in FSMNLP 2009, Pretoria.
  14. Avoimen lähdekoodin menetelmät äärellistilaista morfologiaa varten, in 2008, Helsinki.

Software projects and resources

The following projects I participate are more or less related to my work at university and sparetime hobbies related to science:

Teaching

TA jobs (Uni. Helsinki):

  • Introduction to Speech Synthesis
  • Programming NLS
  • Speech Analysis
  • XML
  • Natural language parsing
  • Grammar engineering
  • Morphological language processing
  • Finite state parsing methods