"If I ask 'Why do you do that?'
If they say 'I think it's fun', that's, to me, a good answer."
-- Fran Lebowitz

My research interests lie in computational linguistics and statistical natural language processing (NLP) in a multilingual context. I am passionate about the statistical modeling of languages and my long-term goal is to design robust NLP algorithms that can adapt to the large variety of linguistic phenomena observed around the world. More specifically I am interested in modeling tasks of sequence prediction (language modeling and generation) and sequence-to-sequence transduction (like translation) with loosely structured approaches using as little human supervision as possible.

Machine translation between disparate languages

Speakers of different languages experience language technologies very differently: For instance, the quality of modern machine translation systems can be near perfect for a minority of high-resource or closely related language pairs, but lags far behind in a majority of pairs with important structural, e.g. grammatical, differences and little training data. My past and present research has focused on various aspects of language that make it hard to design universal translation and language modeling algorithms, such as word order differences and rich morphological systems. As language use can also vary greatly within the same community, a line of my work also deals with domain and genre adaptation.

Analyzing neural models of language

The deep learning paradigm has recently transformed the NLP world in a radical way. However, understanding and interpreting the source of this success remains a challenge. To what extent do neural NLP models models capture language structure? What is the effect of typological properties on the difficulty of language learning by machines? How does linguistic transfer occur in multilingually trained neural language models? My current research aims to answer these questions by developing novel neural architectures and by conducting specific analyses of their outputs as well as their inner functioning. Only by gaining a deeper linguistic understanding of neural language models, we will ultimately understand whether they are meant to complement or replace the currently used probabilistic symbolic approaches to NLP.

Language evolution and (multilingual) language learning in humans

I am a cross-disciplinary research enthusiast. I am very interested in enhancing research on human language processing or language evolution with computational modeling tools, as well as bringing insights from those fields into NLP. If you agree with me and have a nice collaboration idea, please get in touch!