Amsterdam Data Science has awarded one of its best thesis award to Amsterdam’s Mario Giulianelli, for his thesis on Lexical Semantic Change Analysis with Contextualised Word Representations. See the announcement.

Supervisors: Dr. Raquel Fernandez and Marco del Tredici, University of Amsterdam

The jury’s motivation:

This thesis presents a novel approach that allows the detection and analysis of word-meaning and how this changes over time. It is the first unsupervised approach for this task that obtains word representations from a Transformer-based neural language model. This approach is domain-independent, data-driven, automatic, and easily reproducible. The results of the empirical evaluation demonstrate that the proposed approach allows the recognition of cultural drifts driven by technological innovations, cultural transitions, and specific events, as well as more subtle linguistic shifts such as changes in the subcategorisation frames of nouns and verbs.

The reviewers found that tracking the evolution of words’ meanings is essential in our data-driven society for a variety of applications. For instance, information retrieval and conversational AI can benefit from this technique. Discrimination arises when community-specific semantics are not well recognised, if not entirely obliterated by data-driven systems that are only able to process allegedly mainstream semantics. As a consequence, humans have to adapt to the ways words are interpreted by computers. Allowing for semantics to evolve is crucial for letting human cultures and diversity develop according to human values, rather than the limitations imposed by the current technologies. The work also has a clear impact on other scientific disciplines. This thesis is likely to even impact research in Digital Humanities, for in-depth further studies on language evolution across large historical corpora.

The work is outstanding due to the depth of technical details it discusses, and the informed choice of algorithms and metrics. The author provides an in-depth analysis of different factors to consider, and their expected and observed impact on the results. Limitations of the proposed approach, as well as other competing approaches, are discussed in great detail. The code and the dataset used to conduct experiments are publicly available at github.

The award selection committee has unanimously agreed that the presented thesis is a high quality work and a clear example of a very well executed research project that could be made possible only by a highly motivated and talented student.

Read Mario’s thesis in full.