“Transfer learning for multilingual vacancy text generation” preprint available

📅 November 1, 2022 • 🕐 09:08 • 🏷 Papers and Research

Anna Lőrincz‘ UvA MSc. data science thesis “Transfer learning for multilingual vacancy text generation” — which was graded a 9/10 💫 — was recently accepted at the The Second Version of Generation, Evaluation & Metrics (GEM) Workshop 2022 which will be held as part of EMNLP, December 7-11, 2022!

Get the pre-print here:

[bibtex file=citations.bib key=lorincz2022transfer]

In her work, Anna explores transformer models for data-to-text generation, or more specifically: given structured inputs such as categorical features (e.g., location), real valued features (e.g., salary of hours of work per week), or binary features (e.g., contract type) that represent benefits of vacancy texts, the task is to generate a natural language snippet that expresses said feature.

Layout of benefit section from Randstad.nl

Anna finds that using transformers greatly increases (vocabulary) variation when compared to template-based models, and needs less human effort. The results were — to me — surprisingly good, another proof that transformers are taking over the world and making traditional NLP methods partly obsolete.

I was very much impressed with this work! But, to show how even transformers are not perfect, yet, I present you with my favorite error from the paper:

input: LOCATION = Zwaag
output: Pal gelegen achter het centraal station Zwaaijdijk!

Hope to catch you sometime in Zwaaijdijk!

Two papers accepted at CompJobs ’22

📅 February 3, 2022 • 🕐 07:56 • 🏷 Blog and Papers

We have two papers accepted at “The First International Workshop on Computational Jobs Marketplace“, co-located with WSDM 2022. Both papers are based on work done by two of our former thesis interns at Randstad Groep Nederland!

[bibtex file=citations.bib key=vermeer2022using]

☝️ Ninande Vermeer worked under supervision of Sepideh Mesbah and Vera Provatorova (UvA) on: “Using RobBERT and eXtreme Multi-Label Classification to Extract Implicit and Explicit Skills From Dutch Job Descriptions” in which we study to what extent a RobBERT-XMLC model can be used to extract explicit and implicit skills from Dutch job descriptions.

[bibtex file=citations.bib key=vanels2022improving]

✌️ Sarah-Jane van Els worked under supervision of myself and Emma Beauxis-Aussalet (Civic AI Lab) on “Improving Fairness Assessments with Synthetic Data: a Practical Use Case with a Recommender System for Human Resources” in which we explore approaches and methods for assessing algorithmic bias by using synthetic data to improve the size and representativity of a test set used for training candidate recommender systems.

👏 Proud of our former interns for having published their work! And happy with the collaborations we have had with our co-authors 😁.

“Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation” accepted at UMAP2020!

📅 April 21, 2020 • 🕐 17:33 • 🏷 Papers and Research

The paper we wrote with former FD team mates Feng Lu and Anca Dumitrache has been accepted for publication as a long paper at UMAP 2020, the 28th Conference on User Modeling, Adaptation and Personalization! (I fondly remember my last time at UMAP, in 2016 😏)

We have published a preprint of this paper, get it: here, or from arXiv.

[bibtex file=citations.bib key=lu2020beyond]

Update 08/05: Cool, @NickKivits mentioned our paper in his Villamedia column: Het idee van de filterbubbel kan in de prullenbak and newsletter (with over 11k subscribers!)

I am particularly happy with this work because:

1️⃣ In our paper we show how you can align algorithm design across stakeholders (in this case: data scientists and journalists), by effectively modeling an editorial value (“dynamicness”) in the news recommender of Het Financieele Dagblad without losing accuracy.

2️⃣ We present (more) empirical proof that #recsys (can) offer(s) users *more* diverse, serendipitous, and dynamic lists of articles, compared to editorially curated lists, and hence (can) help in *avoiding*, not creating filter bubbles!

3️⃣ It is the perfect wrap-up of our Google DNI-funded “SMART Journalism” project at FD Mediagroep (we wrote most of the paper in our spare time after the project ended).

See below the video of the talk at UMAP 2020 below:


Improving automated segmentation of radio shows with audio embeddings published @ IEEE ICASSP2020

📅 February 12, 2020 • 🕐 11:54 • 🏷 Papers

Oberon Berlage’s MSc. thesis: “Improving automated segmentation of radio shows with audio embeddings” which he wrote under my supervision during his internship at FD Mediagroep was awarded a 9/10, under condition that the work was publishable.

Turns out it was, as it was recently accepted at IEEE ICASSP2020 (the 45th International Conference on Acoustics, Speech, and Signal Processing) without any additional work/experiments (just a bit of reduction). But you already knew this… Oberon will be presenting this work in Barcelona, thanks to the generous support of UvA’s Information Studies program.

We now published a preprint, read it below:

[bibtex file=citations.bib key=berlage2020improving]

His work revolved around improving BNR SMART Radio‘s text-based segmentation by incorporating audio signals in the form of audio embeddings. This turns out to improve over our text-based baseline by a whopping +32.3% F1-measure!

Even better: an audio-only approach, trained on a smallish openly available dataset, outperforms our text-only baseline by 9.4%. This means the segmentation method can be employed without need for audio transcription, which could be a money-saver.

Reading News with a Purpose: Explaining User Profiles for Self-Actualization

📅 April 10, 2019 • 🕐 11:15 • 🏷 Papers

Really excited to have co-authored “Reading News with a Purpose,” which was accepted at the International Workshop on Transparent Personalization Methods based on Heterogeneous Personal Data (ExHUM), at UMAP 2019!

With the largest list of authors (ranging from philosophers via polcomm researchers to computer scientists), from a wide array of institutions; Emily Sullivan, Dimitrios Bountouridis, Jaron Harambam, Shabnam Najafian, Felicia Loecherbach, Mykola Makhortykh, Domokos Kelen, Darcia Wilkinson, and Nava Tintarev!

This is work that came out of our ICT with Industry project “Opening the black box of user profiles in content-based recommender systems” where we (FD Mediagroep) collaborated with Nava Tintarev and our excellent team of academics in a week-long academic hackathon!

Read the pre-print, below:

[bibtex file=citations.bib key=sullivan2019reading]

Read the original idea that sparked the project, presented at the 2nd FATREC Workshop at RecSys 2018, here:

[bibtex file=citations.bib key=graus2018let]

Position paper ““Let Me Tell You Who You are” — Explaining Recommender Systems by Opening Black Box User Profiles”

📅 November 11, 2018 • 🕐 11:21 • 🏷 Papers

Our position paper ““Let Me Tell You Who You are” — Explaining Recommender Systems by Opening Black Box User Profiles” was accepted at the 2nd FATREC Workshop on Responsible Recommendation, held at RecSys ’18!

In this paper, we detail some our ideas and approaches of providing transparency in recommendations through displaying the user profiles, used ‘internally’ by our recommender system. Read the pre-print below!

[bibtex file=citations.bib key=graus2018let]

FATREC Position paper: Explaining recommender systems by opening black box user profiles

Pre-print of position paper “SMART Journalism: Personalizing, Summarizing, and Recommending Financial Economic News”

📅 June 1, 2018 • 🕐 13:44 • 🏷 Papers

Our position paper “SMART Journalism: Personalizing, Summarizing, and Recommending Financial Economic News” was accepted at Algorithmic Personalization and News (APEN18) workshop, held at ICWSM ’18!

In this paper, we detail some of the ideas and opportunities of personalization in the domain of financial economic news. Read the pre-print below!

[bibtex file=citations.bib key=sappelli2018smart]

The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams

📅 December 11, 2017 • 🕐 16:15 • 🏷 Papers

Our paper “The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams” was accepted for publication at JASIST (the Journal of the Association for Information Science and Technology)! Grab a pre-print here:

[bibtex file=citations.bib key=graus2018birth]

This paper is is:
1. My first journal paper
2. Based on Chapter 3 of my PhD thesis “Entities of Interest — Discovery in Digital Traces
3. The first collabo on a paper (on paper) between the FD Mediagroep, Blendle, and the UvA
4. The tombstone on my academic career! (?)

In this paper we study news and social media streams spanning over 18 months, and comprising over 579 million documents, and analyze ’emergence patterns’ of entities, i.e., how a real-world entity (such as a person, organization, product, etc.) appears in these documents in the timespan between the entity’s first mention in online text streams, and when an article devoted to the entity is subsequently added to Wikipedia.