The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams

Our paper “The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams” was accepted for publication at JASIST (the Journal of the Association for Information Science and Technology)! Grab a pre-print here:

  • [PDF] D. Graus, D. Odijk, and M. de Rijke, “The birth of collective memories: analyzing emerging entities in text streams,” Journal of the association for information science and technology, 2018.
    title={The birth of collective memories: Analyzing emerging entities in text streams},
    author={Graus, David and Odijk, Daan and de Rijke, Maarten},
    journal={Journal of the Association for Information Science and Technology},

This paper is is:
1. My first journal paper
2. Based on Chapter 3 of my PhD thesis “Entities of Interest — Discovery in Digital Traces
3. The first collabo on a paper (on paper) between the FD Mediagroep, Blendle, and the UvA
4. The tombstone on my academic career! (?)

In this paper we study news and social media streams spanning over 18 months, and comprising over 579 million documents, and analyze ’emergence patterns’ of entities, i.e., how a real-world entity (such as a person, organization, product, etc.) appears in these documents in the timespan between the entity’s first mention in online text streams, and when an article devoted to the entity is subsequently added to Wikipedia.


Financial News Mining @ FD Mediagroep/ Slidedeck

Here are the slides of a talk I gave at the Data Science Northeast Netherlands Meetup, where I detail the custom in-house entity linking framework, sentiment analysis, and entity salience scoring model we developed for (part of FD Mediagroep), in addition to showing some example applications of our corpus of news articles linked to organization profiles.

I’m sharing it here because I think it’s cool, since it’s one of the first project I’ve done at! Gives you some idea of what we’re working on..

In “Denktank” on algorithms, behavioral analysis, and personalization

My debut on national TV ;-)! Denktank is a TV show where youngsters explore and think about how (current day) technology will affect them in the future. In this episode I explain some of the mechanisms behind algorithmic personalization.

Stream the episode at (the part with me starts at about 05:00), or see the website of Human for more information on the episode.

Hosted 8th Recsys Amsterdam Meetup

Thursday 19 October, I had the pleasure of hosting the 8th Recommender Systems Amsterdam meetup at FDMG/ The meetup’s theme was media-content recsys, and we had three talks from industry, dealing with recommending tv programs, music videos, and text articles);

  1. Ghida Ibrahim (Senior Data Scientist, (formerly at) Liberty Global): “Recommender systems for video and TV products”
  2. Bouke Huurnink and Roman Ivanov (XITE): “Music Video Recommendation@XITE”
  3. Robbert van der Pluijm (Head of Bibblio Labs, Bibblio): “Scaling a recommendation service – a threefold story” wrote a small blog post about it, check it out here: Meetup: het succes van algoritmen en systemen voor personalisatie en aanbevelingen

Featured in FD on the value of (personal) data

In today’s edition of Het Financieele Dagblad, I am quoted in an article on the value of (personal) data titled “Wanneer je gegevens geld waard zijn”;

De kennis die met die cookies wordt verzameld, wordt vervolgens verkocht aan nog eens tientallen bedrijven die daarmee hun reclameboodschappen gericht kunnen afvuren. ‘Waar je ook komt op het web, je laat altijd digitale sporen na’, zegt David Graus, die twee weken geleden promoveerde op dit onderwerp aan de Universiteit van Amsterdam. ‘Uit al die sporen voorspellen de bedrijven je gedrag en op basis daarvan plaatsen ze een advertentie.’ […]

De mogelijkheden met data gaan verder, stelt Graus. Stel dat op basis van gedrag van vrienden, familieleden, likes, posts en zoekopdrachten wordt geconcludeerd dat je rookt. Terwijl je dat zelf nooit hebt aangegeven. ‘Daarmee geef je privacy weg’, aldus Graus.

Read the full article here.

Debate at De Balie: ‘The power of algorithms, how algorithms shape our lives’

Update: see the opening talk I gave on what algorithms are, here: Mini-college “Hoe algoritmen ons leven vormgeven”

On Sunday, June 18th I will participate in a debate at De Balie on the power of algorithms, along with (a very nice lineup:) Wouter van Noort, Naomi Jacobs, Marjolein Lanzing, Rutger Rienks, and Hans de Zwart.

For more information (and tickets), see: De macht van data, De Balie.

My PhD Thesis “Entities of Interest — Discovery in Digital Traces” is online!

My PhD thesis, Entities of Interest — Discovery in Digital Traces is now available for download. Click on the cover below to head to and grab your electronic copy of the little booklet that took me 4+ years to write!

Panel discussion on Data & Democracy

On Tuesday May 9th I will participate in a panel discussion on Data & Democracy, which will revolve around the impact of (big) data (mining), profiling, and political micro-targeting on politics and campaigning of the future. Data & Democracy is organized by the Personalised Communication group (a joint effort between UvA’s Communication Science & Information Law groups). See this article (in Dutch) and the flyer (below) for more information!