purplemonkeydishwasher

A public git version of my research projects, i.e. articles and all that

View project on GitHub

Dr. Flammie A Pirinen, publications and other academic work

Currently working at/in/on:

This site contains a list of publications and other academic works and a CVs for Dr Flammie A Pirinen. The publications are author’s versions automatically converted to HTML.

For details on HTML conversions and such see my latexml page.

Academic profiles

Most stuff prior to 2021 will be under name Tommi A Pirinen.

I have linked the ones I have found useful or that are required by e.g. ACL, the ones I did not link I do not actively use and perhaps even discourage (e.g. ResearchGate, academia.edu, please avoid these if possible).

Bio and CV

  • Here’s an academic bio, you can copy-paste if one is needed for conference and journal applications, etc…
  • Here’s an old sample CV, I basically last updated it before applying for jobs so it’s outdated by now

Short academic history

  1. University of Joensuu (2003–2007): Computer Science (Bachelors), Finnish linguistics, etc.
  2. University of Helsinki (2007–2014): HFST project, finite-state spell-checking (PhD), Open source morphology of Finnish (masters), TTS system simple4all
  3. Dublin City University (2014–2016): Abu-Matran
  4. Universität Hamburg, HZSK, CLARIN-D (2016–202X)
  5. UiT Norges arktiske universitet, Divvun (2020–)

Research interests

The things I’ve studied and am good at and interested in using my time in:

  • Weighted finite-state automata in computational lingustics
  • Lesser-resourced and minority languages
  • Uralic languages
  • Software engineering practices in computational linguistics
  • Computer science—Computational linguistics—Linguistics interdisciplinary co-operation
  • End-user apps with computational linguistics: Machine translation, writers tools, computer-aided language learning
  • Digital humanities—computational linguistics interactions
  • Neural models for very underresourced languages

The list is not exhaustive.

Publications

Following is a list of all my accepted publications and links to author’s post-print versions. I only provide HTML versions produced with latexml, with minimal extra stylings by me. I consider PDF a rubbish format and also printing wasteful; if something looks really bad on HTML send me a message and I can fix it. If you really must, TeX source codes are available on my github and can be used to generate PDFs.

It may be noteworthy at the moment, that google scholar offers a great way to browse my publications and see their incoming citations.

I have curated some bibliographies and wrote scripts to convert them to other formats. Or just use pirinen.bib directly to find a suitable bibtex snippet.

Edited volumes and conference proceedings

  1. Mika Hämäläinen, Flammie Pirinen, Melany Macias, Mario Crespo Avila (Editors) Proceedings of the 9th International Workshop on Computational Linguistics for Uralic Languages, held in Helsinki, published in ACL anthology SIGUR workshops series under 2024
  2. Arvi Hurskainen, Kimmo Koskenniemi, Flammie A Pirinen, eds. (2023), Rule-based language technology. Published in NEJLT monographies publisher’s version
  3. Tommi A Pirinen, Francis Tyers, Trond Trosterud, Michael Rießler (2021), Proceedings of the seventh international workshop for computational linguistics of Uralic languages held in Syktyvkar / Online, published by ACL anthology: SIG workshops ACL anthology version
  4. Tommi A Pirinen, Francis Tyers, Trond Trosterud, Michael Rießler (2019), Proceedings of the sixth international workshop for computational linguistics of Uralic languages held in Wien, published by ACL anthology: SIG workshops ACL anthology version
  5. Tommi A Pirinen, Francis Tyers, Trond Trosterud, Michael Rießler (2018), Proceedings of the fifth international workshop for computational linguistics of Uralic languages held in Tarto, published by ACL anthology: SIG workshops ACL anthology version
  6. Tommi A Pirinen, Francis Tyers, Trond Trosterud, Michael Rießler (2017), Proceedings of the fourth international workshop for computational linguistics of Uralic languages held in Helsinki, published by ACL anthology: SIG workshops ACL anthology version
  7. Tommi A Pirinen et al. (2017) Acta Linguistica Hungarica, special issue. Volume 64, Issue 3, September 2017. publisher’s version
  8. Tommi A Pirinen, Francis Tyers, Trond Trosterud, Michael Rießler (2017), Proceedings of the third international workshop for computational linguistics of Uralic languages held in St. Petersburg, published by ACL anthology: SIG workshops ACL anthology version
  9. Tommi A Pirinen, Francis Tyers, Veronika Vincze (2016), Proceedings of the second international workshop for computational linguistics of Uralic languages held in Szeged, published in Szeged workshops
  10. Tommi A Pirinen, Francis Tyers, Trond Trosterud (2016), NEJLT (Nordic European Journal of Language Technology, special issue in Uralic Language Technology, published in NEJLT publisher’s version
  11. Tommi A Pirinen, Francis Tyers, Trond Trosterud (2015), Proceedings of the first international workshop for computational linguistics of Uralic languages held in Tromsø, published by university library library’s version

Publications in conferences and journals

Names written as used on the paper, will be uptdated if I get to it.

  1. Flammie A Pirinen (2024). Keeping Up Appearances—or how to get all Uralic languages included into bleeding edge research and software: generate, convert, and LLM your way into multilingual datasets. In Proceedings of the 9th International Workshop on Computational Linguistics for Uralic Languages, Helsinki, Finland. ACL Anthology version.
  2. Linda Wiechetek, Flammie A Pirinen, Børre Gaup, Trond Trosterud, Maja Lisa Kappfjell, Sjur Moshagen (2024). The Ethical Question–Use of Indigenous Corpora for Large Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics and Language Resources Evaluation Confence (COLING-LREC). ACL anthology version
  3. Linda Wiechetek, Flammie A Pirinen, Per Egil Kummervold. (2023). A Manual Evaluation Method of Neural MT for Indigenous Languages. In Proceedings of The 3rd Workshop on Human Evaluation of NLP Systems (HumEval) at RANLP 2023, Varna, Bulgaria.
  4. Flammie A Pirinen, Sjur Nørstebø Moshagen, Katri Hiovain-Asikainen. (2023) GiellaLT–a stable infrastructure for Nordic minority languages and beyond. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 634–649, Tórshavn, Faroe Islands. ACL anthology version
  5. Flammie A Pirinen (2023) Can you make a VISL CG 3 with weights?. In Proceedings of CG Worksop at Nodalida 2023, Tórshavn, Faroe Islands, to appear.
  6. Heiki-Jaan Kaalep, Flammie Pirinen, Sjur Nørstebø Moshagen (2022) You can’t suggest that?!–Comparisons and improvements of speller error models in Nordlyd Vol. 46 No. 1 (2022): Morfologi, målstrev og maskinar – Trond Trosterud {fyller | täyttää | deavdá | turns} 60!
  7. Linda Wiechetek, Flammie Pirinen, Børre Gaup, Chiara Argese, Thomas Omma (2022) Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp–En hybrid grammatikkontroll for å rette kongruensfeil in Nordlyd Vol. 46 No. 1 (2022): Morfologi, målstrev og maskinar – Trond Trosterud {fyller | täyttää | deavdá | turns} 60!
  8. Flammie Pirinen, Linda Wiechetek (2022) Building an Extremely Low Resource Language to High Resource Language Machine Translation System from Scratch
  9. Linda Wiechetek, Katri Hiovain-Asikainen, Inga Lill Sigga Mikkelsen, Sjur N. Moshagen, Flammie A. Pirinen, Trond Trosterud, Børre Gaup. (2022) Unmasking the Myth of Effortless Big Data — \ Making an Open Source Multilingual Infrastructure and Building Language Resources from Scratch. In Proceedings of Language Resources and Evaluation Conference 2022, pages tba, Marseille, France.
  10. Inga Lill Sigga Mikkelsen, Linda Wiechetek, and Flammie A Pirinen. (2022) Reusing a Multi-lingual Setup to Bootstrap a Grammar Checker for a Very Low Resource Language without Data. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 149–158, Dublin, Ireland. Association for Computational Linguistics. ACL anthology version
  11. Jack Rueter, Niko Partanen, Flammie A. Pirinen (2021) Numerals and what counts, in Universal Dependencies Workshop at SyntaxFest, Sofia (actually Online). ACL anthology version
  12. Tanmai Khanna, Jonathan N. Washington, Francis M. Tyers, Sevilay Bayatlı, Daniel G. Swanson, Flammie A Pirinen, Irene Tang, Hector Alòs i Font (2021) Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages, in Machine Translation 35.
  13. Linda Wiechetek, Flammie A Pirinen, Mika Hämäläinen, Chiara Argese, (2021) Rules Ruling Neural Networks – Neural vs. Rule-Based Grammar Checking for a Low Resource Language, in RANLP 2021, Bulgaria (actually Online)
  14. Tommi A Pirinen, Francis M. Tyers (2021) Building language technology infrastructures to support a collaborative approach to language resource building, in Multilingual Facilitation (Festschrift of Dr Jack Rueter) ,
  15. Yvo Meeres, Tommi A Pirinen (2021) Vowel Harmony viewed as Error-Correcting Code., in SCiL 2021, Umass (Online)
  16. Linda Wiechetek, Chiara Argese, Tommi A Pirinen, Trond Trosterud (2021) Suoidne-varra-bleahkka-mála-bihkka-senet-dielku ‘hay-blood-ink-paint-tar-mustard-stain’ – Should compounds be lexicalized in NLP? , in CLIC-IT 2021, Bologna (actually Online)
  17. Heidemarie Sambale, Hanna Hedeland, Tommi Pirinen (2020 to appear), User Support for Digital Humanities, in CLARIN Book of FIXME
  18. Amr Keleg, Nick Howell, Francis M. Tyers, Tommi A Pirinen (2020), An Unsupervised Method for Weighting Finite-state Morphological Analyzers in LREC 2020, Marseille, France postponed / cancelled,
  19. Tommi Pirinen, Hanna Hedeland, Heidemarie Sambale (2019), User Support for Digital Humanities, in CLARIN Annual Conference 2019 (CAC 2019), Leipzig, Germany,
  20. Tommi A Pirinen (2019), Building minority dependency treebanks, dictionaries and computational grammars at the same time—an experiment in Karelian treebanking in Universal Dependencies Workshop 2019 (UDW 2019) at Syntaxfest 2019, Paris, France.
  21. Tommi A Pirinen (2019), Workflows for kickstarting RBMT in virtually No-Resource Situation in The 2nd Workshop on Technologies for MT of Low Resource Languages (LoResMT 2019), at MTsummit 2019, Dublin. ACL Anthology version
  22. Tommi A Pirinen (2019), Apertium-fin-eng—Rule-based shallow machine translation for WMT 2019 shared task in the Fourth conference on Machine Translation (wmt19) at ACL 2019, Firenze, Italy. ACL Anthology version
  23. Tommi A Pirinen (2019), Neural and rule-based Finnish NLP models–expectations, experiments and experiences in 5th International Workshop for Computational Linguistics of Uralic Languages. Tartu, Estonia. ACL Anthology version
  24. Tommi Pirinen (2018), Rule-based machine-translation between Finnish and German in
    1. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, CL-Postersession
  25. Tommi A Pirinen, Hanna Hedeland, Daniel Jettka (2017b), Developing a CLARIN compatible AAI solution for academic and restricted resources.
  26. Tommi Pirinen, Francis M. Tyers, Trond Trosterud, Ryan Johnson, Kevin Unhammer, Tiina Puolakainen (2017a, equal contribution) North-Sámi to Finnish rule-based machine translation system at nodalida 2017
  27. Tommi A Pirinen, Eszter Simon, Francis M Tyers, Veronika Vincze, (2016c), Report on the Second International Workshop on Computational Linguistics for Uralic languages, in Finno-Ugric languages and linguistics,
  28. Francis Tyers, Tommi Pirinen (2016b) Intermediate Representations in Rule-Based Machine Translation for Uralic languages in Proceedings of Second International Workshop on Computational Linguistics for Uralic Languages (IWCLUL2016)
  29. Tommi Pirinen, Antonio Toral, Raphael Rubino (2016a) Rule-Based and Statistical Morph Segments in English-to-Finnish SMT, in Proceedings of Second International Workshop on Computational Linguistics for Uralic Languages (IWCLUL), Szeged, Hungary
  30. Tommi A Pirinen (2015e) Development and Use of Computational Morphology of Finnish in the Open Source and Open Science Era: Notes on Experiences with Omorfi Development. SKY Journal of Linguistics.
  31. Antonio Toral, Xiaofeng Wu, Tommi Pirinen, Zhengwei Qiu, Ergun Bicici and Jinhua Du (2015d) Dublin City University at the TweetMT 2015 Shared Task in Proceedings of TweetMT shared task at SEPLN 2015
  32. Raphael Rubino, Tommi Pirinen, Miquel Esplà-Gomis, Nikola Ljubešić, Sergio Ortiz Rojas, Vassilis Papavassiliou, Prokopis Prokopidis and Antonio Toral (2015c), Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling In proceedings of WMT shared task at EMNLP 2015
  33. Tommi A Pirinen (2015a), Omorfi—Free and open source morphological lexical database for Finnish, in Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA 2015
  34. Tommi A Pirinen (2015b), Using weighted finite state morphology with VISL CG-3—Some experiments with free open source Finnish resources, in Proceedings of Constraint grammar - methods, tools and applications Workshop at NoDaLiDa 2015
  35. Senka Drobac, Krister Lindén, Tommi Pirinen, Miikka Silfverberg (2014e), Heuristic hyper-minimization of finite state lexicons, in LREC 2014
  36. Antonio Toral, Raphael Rubino, Miquel Esplà, Tommi Pirinen, Andy Way and Gema Ramírez-Sánchez (2014d). Extrinsic Evaluation of Web-Crawlers in Machine Translation: a Case Study on Croatian–English for the Tourism Domain in Proceedings of EAMT 2014
  37. Sjur Moshagen, Trond Trosterud, Jack Rueter, Francis Tyers and Tommi A Pirinen (2014c), Open-source infrastructures for collaborative wrok on under-resourced languages, in Proceedings of CCURL workshop 2014 in LREC
  38. Senka Drobac, Krister Lindén, Tommi A Pirinen and Miikka Silfverberg (2014b), Heuristic Hyperminimisation of Finite-State Lexicons, in Proceedings of LREC 2014
  39. Tommi A Pirinen, Krister Lindén (2014a) State-of-the-art in Weighted Finite-State Spell-Checking in Proceedings of CICLing 2014
  40. Sjur Moshagen, Tommi A Pirinen, Trond Trosterud (2013a) Building an open-source development infrastructure for language technology projects, in Proceedings of Nodalida 2013
  41. Tommi A Pirinen, Sam Hardwick (2012d) Effect of Language and Error Models on Efficiency of Finite-State Spell-Checking and Correction, in Proceedings of 10th International Workshop on Finite-State Methods and/in Natural Language Processing FSMNLP 2012
  42. Krister Lindén, Miikka Silfverberg, Erik Axelson, Senka Drobac, Sam Hardwick, Tommi A Pirinen (2012c) Using HFST for creating Computational Linguistic Applications in Computational Linguistics-Applications 2012
  43. Tommi A Pirinen, Francis M. Tyers (2012b) Compiling Apertium morphological dictionaries with HFST and using them in HFST applications in Proceedings of Workshops in Language Resources and Evaluation conference LREC 2012, in saltmil-aflat workshop on “language technology for normalisation of less-resourced languages”
  44. Tommi A Pirinen, Miikka Silfverberg (2012a) Improving Finite-State Spell-Checker Suggestions with Part-of-Speech N-grams in Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics CICLING 2012
  45. Krister Lindén, Miikka Silfverberg, Erik Axelson, Sam Hardwick, Tommi A Pirinen (2011c) HFST—Framework for Compiling and Applying Morphologies in Systems and Frameworks for Computational Morphology 2011, in Communications in Computer and Information Science (100), ISBN: 978-3-642-23138-4
  46. Miikka Silfverberg, Mirka Hyvärinen, Tommi A Pirinen (2011b), Improving Predictive Entry of Finnish Text Messages using IRC Logs in Proceedings of the Computational Linguistics-Applications Conference 2011, ISBN: 978-83-60810-47-7.
  47. Tommi A Pirinen (2011a), Modularisation of Finnish Finite-State Language Description—Towards Wide Collaboration in Open Source Development of Morphological Analyser in Proceedings of Nodalida 2011.
  48. Tommi A Pirinen, Krister Lindén (2010c), Creating and Weighting Hunspell Dictionaries as Finite-State Automata , in Investigationes Linguisticae (19).
  49. Tommi A Pirinen, Krister Lindén (2010b), Building and Using Existing Hunspell Dictionaries and TₑX Hyphenators as Finite-State Automata, in Proceedings of International Multiconference in Computer Science and Information Technology
  50. Tommi A Pirinen, Krister Lindén (2010a), Finite-State Spell-Checking with Weighted Language and Error Models, , in Proceedings of Workshops of Language Resources and Evaluation Conference LREC 2010 in Valletta, Malta.
  51. Krister Lindén, Tommi A Pirinen (2009a), Weighted Finite-State Morphological Analysis of Finnish Compounding with hfst-lexc, in Proceedings of Nodalida 2009 presentation
  52. Krister Lindén, Tommi A Pirinen (2009b), Weighting Finite-State Morphological Analyzers using HFST tools, in Pre-proceedings of FSMNLP 2009
  53. Krister Lindén, Miikka Silfverberg, Tommi A Pirinen (2009c), HFST Tools for morphology—An Efficient Open-Source Pacakge for Construction of Morphological Analyzers in Proceedings of Workshop on Systems and Frameworks for Computational Morphology

Theses

  1. Tommi A Pirinen (2014), Weighted Finite-State Methods in Spell-Checking and Correction, Doctoral dissertation, University of Helsinki.
  2. Tommi Pirinen (2008), Suomen kielen äärellistilainen automaattinen morfologinen analyysi avoimen lähdekoodin menetelmin, Master’s Thesis, University of Helsinki (in Finnish).

Presentations, tutorials, invited speeches

(A rather incomplete list of course…)

  1. Tommi A Pirinen, Antonio Toral (2015) Why linguistics in SMT? in Why Linguistics? workshop, Tarto, 2015
  2. Morphological segmentation for machine translation, in internal project meeting of abumatran, in Elx, 2014
  3. Weighted finite-state methods as a bridge between strictly rule-based and mostly statistical nlp systems in NCLT seminar series, DCU, 2014
  4. Crowd-Sourcing morphology and lexicography, productising NLP research, in FSCONS 2013, Gothenburg.
  5. Weighted Finite-State Spell-Checking, in Research Seminar of Uni Helsinki. A ~final report on PhD thesis.
  6. Building finite-state spell-checkers with HFST tools, in FSMNLP 2012, Donostia-San Sebastian.
  7. Building and Using Apertium Dictionaries with HFST in LREC 2012, Istanbul.
  8. Using POS taggers to rerank spell-checking results in CICLING 2012, Delhi.
  9. Building and using Hunspell and T_E_X hyphenation descriptions with HFST in CLA 2010, Wisła.
  10. Using Wikipedia to Weight a Spelling-Checker in LREC 2010, Valletta.
  11. Weighting Finnish Compound Boundaries in Nodalida 2009, Odense.
  12. Weighted Finite-State analysis of Finnish Compounds, in CLARIN/D-SPIN meeting
  13. (Unigram-)Weighting Language Models with HFST in FSMNLP 2009, Pretoria.
  14. Avoimen lähdekoodin menetelmät äärellistilaista morfologiaa varten, in 2008, Helsinki.

Software projects and resources

The following projects I participate are more or less related to my work at university and sparetime hobbies related to science:

Teaching

TA jobs (Uni. Helsinki):

  • Introduction to Speech Synthesis
  • Programming NLS
  • Speech Analysis
  • XML
  • Natural language parsing
  • Grammar engineering
  • Morphological language processing
  • Finite state parsing methods