Joined the DDMA AI Committee

πŸ“… October 30, 2020 β€’ πŸ• 10:33 β€’ 🏷 Blog

I recently joined the Artificial Intelligence committee of the Data Driven Marketing Association (DDMA), which aims to promote the use of responsible AI for marketing and other forms of interaction with customers. Read the small introductory post published by the DDMA below (in Dutch)

David Graus, Randstad Groep Nederland, Commissie AI: β€œAI biedt enorme kansen om filterbubbels te doorbreken en bias te reduceren”

David heeft een achtergrond in zoekmachinetechnologie waarmee hij inmiddels een carriΓ¨re heeft opgebouwd gericht op de ontwikkeling van personalisatie- en aanbevelingssystemen. In zijn huidige rol geeft David leiding aan de data scientists van Randstad Groep Nederland en is hij betrokken bij alle facetten van het bouwen van AI-systemen, van ideation tot aan het daadwerkelijk in productie brengen en monitoren van systemen. β€œUit ervaring weet ik dat je als AI-ontwerper en -bouwer over de mogelijkheden beschikt om personalisatie- en aanbevelingssystemen in de basis ethisch verantwoord op te zetten. AI biedt daarom grote kansen om filterbubbels te doorbreken en bias te reduceren. Als lid van de Commissie AI hoop ik de sector hierbij te helpen.β€œ


Joined the #RecSys2021 organizing committee

πŸ“… September 26, 2020 β€’ πŸ• 14:49 β€’ 🏷 Blog

After attending the beautiful virtual 14th ACM Conference on Recommender Systems (RecSys2020), I am happy to start looking forward to RecSys2021, which will be held in Amsterdam!

I am super excited to share that I’ve joined the organizing committee of RecSys2021 as local outreach chair, which means I’ll help out assisting the other chairs and linking the (local) industry and companies to the conference.

I’m looking forward to it! I have quite fond memories of co-organizing last year’s DIR 2019, and helping out the local organization of ECIR 2014 in Amsterdam.

ACM RecSys 2021: September 27 – October 1, in Amsterdam

Internships and MSc. projects at Randstad Groep Nederland

πŸ“… July 6, 2020 β€’ πŸ• 13:28 β€’ 🏷 Blog

Come join us in Diemen!

About Randstad

Work with impact. At Randstad Groep Nederland IT you keep the country moving, enabling people across sectors to do their work, getting pizza on your table and your suitcase on the plane. Your AI solutions mean tomorrow’s recruiter is smarter and faster but still embodies our human forward approach, combining tech with a personal touch and putting people first – including you. Constantly experimenting, working on new NLP use cases and matching systems or expanding our self-service data platform. If you bring the idea we will provide the freedom to explore, so you can help us shape the world of work. 

Data Science @ RGN

Randstad IT is organized in a variation of the Spotify Engineering Model with squads, tribes, and chapters. Our data science chapter spans 12 data scientists, data engineers and machine learning engineers over 3 departments (IT, finance, and marketing), across 6 different teams. These teams work on recommender systems for algorithmic job matching, natural language processing and information extraction, forecasting, and more. We are further interested in AI fairness and auditing, explainability, and transparency.

Who are you?

We’re looking for students studying AI, data science, or related programs, for either graduation projects or regular internships. Fluency in python is required, and we expect our interns to work autonomously. However, as an intern you’ll be a fully fledged member of our chapter, which means you get to benefit from the knowledge that is being shared in our chapter.

Here’s the overview of our suggested projects:

  • (Deep) Reinforcement Learning-based Planning & Poolmanagement
  • Writing style transfer learning
  • Career pathing MVP
  • Pairwise learning to rank for SmartMatch
  • Revenue forecasting using time-series algorithms
  • Structured information extraction from resumes
  • Salary parsing from vacancies
  • Record linkage for company linking
  • Free text notes and comments for improved job matching

Joined the board of SETUP

πŸ“… May 29, 2020 β€’ πŸ• 12:32 β€’ 🏷 Blog

I have joined the board of SETUP, a Utrecht-based medialab established in 2010. SETUP’s mission is:

to educate a wide audience, providing them with the tools necessary to design this brave new world, and infuse it with human values and new-found creativity.


This mission perfectly fits my personal conviction that knowledge and understanding of technology through media/algorithmic-literacy β€” not fear and repression β€” is vital in progressing into our technology-infused future! See, e.g., what I wrote about it on the neutrality of algorithms, or “algorithmic literacy.”

photo: Sebastiaan ter Burg ( for SETUP

Prior to joining their board, I have been following SETUP for a couple of years, joining some of their meetups, and giving a talk at one of their events in 2018 “leven met algoritmen.” I am very excited to start as a board member and help set up SETUP’s future!

I have emerged…

πŸ“… May 9, 2020 β€’ πŸ• 10:12 β€’ 🏷 Blog

… as an entity in the Google Knowledge Graph!

Which is funny, because “emerging entities” were the main topic of my PhD Thesis [1]. With my co-authors I’ve published research on:

  1. Learning how to recognize “out-of-knowledge base” entities emerging on social media [2]
  2. How our collective memory is formed through “emerging entities” on Wikipedia [3], and more generally
  3. Entity retrieval and ranking [4] where Google’s so-called “Knowledge Panels” often served as examples…
Google’s AI unleashes the long tail?

(FYI: I’m not sure how I ended up there, the metadata seems to be coming from Google Scholar)


[1] [pdf] D. Graus, “Entities of interest β€” discovery in digital traces,” PhD Thesis, 2017.
title={Entities of Interest β€” Discovery in Digital Traces},
author={Graus, David},
school={Informatics Institute, University of Amsterdam},
[2] [pdf] [doi] D. Graus, M. Tsagkias, L. Buitinck, and M. de Rijke, “Generating pseudo-ground truth for predicting new concepts in social streams,” in Advances in information retrieval, Cham, 2014, p. 286–298.
author={Graus, David and Tsagkias, Manos and Buitinck, Lars and de Rijke, Maarten},
title={Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams},
booktitle={Advances in Information Retrieval},
publisher={Springer International Publishing},
series = {ECIR '14}
[3] [pdf] [doi] D. Graus, D. Odijk, and M. de Rijke, “The birth of collective memories: analyzing emerging entities in text streams,” Journal of the association for information science and technology, vol. 69, iss. 6, pp. 773-786, 2018.
author = {Graus, David and Odijk, Daan and de Rijke, Maarten},
title = {The birth of collective memories: Analyzing emerging entities in text streams},
journal = {Journal of the Association for Information Science and Technology},
year = {2018},
volume = {69},
number = {6},
pages = {773-786},
doi = {10.1002/asi.24004},
url = {},
eprint = {},
[4] [pdf] [doi] D. Graus, M. Tsagkias, W. Weerkamp, E. Meij, and M. de Rijke, “Dynamic collective entity representations for entity ranking,” in Proceedings of the ninth acm international conference on web search and data mining, New York, NY, USA, 2016, p. 595–604.
author = {Graus, David and Tsagkias, Manos and Weerkamp, Wouter and Meij, Edgar and de Rijke, Maarten},
title = {Dynamic Collective Entity Representations for Entity Ranking},
year = {2016},
isbn = {9781450337168},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {},
doi = {10.1145/2835776.2835819},
booktitle = {Proceedings of the Ninth ACM International Conference on Web Search and Data Mining},
pages = {595–604},
numpages = {10},
keywords = {fielded retrieval, entity retrieval, entity ranking, content representation},
location = {San Francisco, California, USA},
series = {WSDM '16}

“Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation” accepted at UMAP2020!

πŸ“… April 21, 2020 β€’ πŸ• 17:33 β€’ 🏷 Papers and Research

The paper we wrote with former FD team mates Feng Lu and Anca Dumitrache has been accepted for publication as a long paper at UMAP 2020, the 28th Conference on User Modeling, Adaptation and Personalization! (I fondly remember my last time at UMAP, in 2016 😏)

We have published a preprint of this paper, get it: here, or from arXiv.

  • [PDF] [DOI] F. Lu, A. Dumitrache, and D. Graus, “Beyond optimizing for clicks: incorporating editorial values in news recommendation,” in Proceedings of the 28th acm conference on user modeling, adaptation and personalization, New York, NY, USA, 2020, p. 145–153.
    author = {Lu, Feng and Dumitrache, Anca and Graus, David},
    title = {Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation},
    year = {2020},
    isbn = {9781450368612},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {},
    doi = {10.1145/3340631.3394864},
    booktitle = {Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization},
    pages = {145–153},
    numpages = {9},
    keywords = {usefulness, news recommendation, editorial values},
    location = {Genoa, Italy},
    series = {UMAP ’20}

Update 08/05: Cool, @NickKivits mentioned our paper in his Villamedia column: Het idee van de filterbubbel kan in de prullenbak and newsletter (with over 11k subscribers!)

I am particularly happy with this work because:

1️⃣ In our paper we show how you can align algorithm design across stakeholders (in this case: data scientists and journalists), by effectively modeling an editorial value (“dynamicness”) in the news recommender of Het Financieele Dagblad without losing accuracy.

2️⃣ We present (more) empirical proof that #recsys (can) offer(s) users *more* diverse, serendipitous, and dynamic lists of articles, compared to editorially curated lists, and hence (can) help in *avoiding*, not creating filter bubbles!

3️⃣ It is the perfect wrap-up of our Google DNI-funded “SMART Journalism” project at FD Mediagroep (we wrote most of the paper in our spare time after the project ended).

See below the video of the talk at UMAP 2020 below:


PodRecs: Workshop on Podcast Recommendations PC

πŸ“… April 4, 2020 β€’ πŸ• 12:14 β€’ 🏷 Research

I was invited to join the program committee of (the first) PodRecs: Workshop on Podcast Recommendations (to be held at RecSys’20).

Since our work on BNR SMART Radio, I am really interested in the space of audio, recommender systems, and information retrieval. Curious to see the submissions!

See the PodRecs call for papers, and check out the website, by clicking the image below.

Improving automated segmentation of radio shows with audio embeddings published @ IEEE ICASSP2020

πŸ“… February 12, 2020 β€’ πŸ• 11:54 β€’ 🏷 Papers

Oberon Berlage’s MSc. thesis: “Improving automated segmentation of radio shows with audio embeddings” which he wrote under my supervision during his internship at FD Mediagroep was awarded a 9/10, under condition that the work was publishable.

Turns out it was, as it was recently accepted at IEEE ICASSP2020 (the 45th International Conference on Acoustics, Speech, and Signal Processing) without any additional work/experiments (just a bit of reduction). But you already knew this… Oberon will be presenting this work in Barcelona, thanks to the generous support of UvA’s Information Studies program.

We now published a preprint, read it below:

  • [PDF] [DOI] O. Berlage, K. Lux, and D. Graus, “Improving automated segmentation of radio shows with audio embeddings,” in Icassp 2020 – 2020 ieee international conference on acoustics, speech and signal processing (icassp), 2020, pp. 751-755.
    author={O. {Berlage} and K. {Lux} and D. {Graus}},
    booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    title={Improving Automated Segmentation of Radio Shows with Audio Embeddings},

His work revolved around improving BNR SMART Radio‘s text-based segmentation by incorporating audio signals in the form of audio embeddings. This turns out to improve over our text-based baseline by a whopping +32.3% F1-measure!

Even better: an audio-only approach, trained on a smallish openly available dataset, outperforms our text-only baseline by 9.4%. This means the segmentation method can be employed without need for audio transcription, which could be a money-saver.

Panel @ CPDP2020: "Algorithms and AI-driven technologies in the information society"

πŸ“… February 4, 2020 β€’ πŸ• 10:03 β€’ 🏷 Blog

I was invited by UvA’s Information, Communication and the Data Society (ICDS) to participate in a panel at the Conference on Privacy and Data Protection, which was focused on AI.

The recording of the panel is now online, watch me telling a room full of (highly) privacy-aware (and cookie-averse) people that Cambridge Analytica nudging people to “politically activate them” with tailored information can be a “democratic good” πŸ˜….

See the recording below:

For more information, see CPDP’s page of the panel.