📅 November 1, 2022 • 🕐 09:08 • 🏷 Papers and Research • 👁 40

Anna Lőrincz‘ UvA MSc. data science thesis “Transfer learning for multilingual vacancy text generation” — which was graded a 9/10 💫 — was recently accepted at the The Second Version of Generation, Evaluation & Metrics (GEM) Workshop 2022 which will be held as part of EMNLP, December 7-11, 2022!

Get the pre-print here:

[bibtex file=citations.bib key=lorincz2022transfer]

In her work, Anna explores transformer models for data-to-text generation, or more specifically: given structured inputs such as categorical features (e.g., location), real valued features (e.g., salary of hours of work per week), or binary features (e.g., contract type) that represent benefits of vacancy texts, the task is to generate a natural language snippet that expresses said feature.

Layout of benefit section from Randstad.nl

Anna finds that using transformers greatly increases (vocabulary) variation when compared to template-based models, and needs less human effort. The results were — to me — surprisingly good, another proof that transformers are taking over the world and making traditional NLP methods partly obsolete.

I was very much impressed with this work! But, to show how even transformers are not perfect, yet, I present you with my favorite error from the paper:

input: LOCATION = Zwaag
output: Pal gelegen achter het centraal station Zwaaijdijk!

Hope to catch you sometime in Zwaaijdijk!