Debate at De Balie: ‘The power of algorithms, how algorithms shape our lives’

On Sunday, June 18th I will participate in a debate at De Balie on the power of algorithms, along with (a very nice lineup:) Wouter van Noort, Naomi Jacobs, Marjolein Lanzing, Rutger Rienks, and Hans de Zwart.

For more information (and tickets), see: De macht van data, De Balie.

My PhD Thesis “Entities of Interest — Discovery in Digital Traces” is online!

My PhD thesis, Entities of Interest — Discovery in Digital Traces is now available for download. Click on the cover below to head to graus.nu/entities-of-interest and grab your electronic copy of the little booklet that took me 4+ years to write!

Panel discussion on Data & Democracy

On Tuesday May 9th I will participate in a panel discussion on Data & Democracy, which will revolve around the impact of (big) data (mining), profiling, and political micro-targeting on politics and campaigning of the future. Data & Democracy is organized by the Personalised Communication group (a joint effort between UvA’s Communication Science & Information Law groups). See this article (in Dutch) and the flyer (below) for more information!

Keynote on Big Data, Machine Learning, and Algorithmic Bias at the Royal Marechaussee

15167453_10211031764808414_8187010699538446114_o

I was invited to give the opening keynote at the Intelligence Day of the Koninklijke Marechaussee (Military Police) on Big Data and Machine Learning, with the aim to explain the audience what ML and Big Data is.

I spent a disproportionate amount of time on Algorithmic Bias, because I think this is a hugely important topic — in particular for this audience! See the slides of my talk (in Dutch) below, or on slideshare:

Interview in Tekstblad: “Searching for truth in 11.5M documents”

Tekstblad (a magazine for text professionals) contains an interview with Hans Henseler and myself, on the “Semantic Search for E-Discovery” project I have been involved with during my PhD. The title loosely translates to “Searching for truth in 11.5M documents.” (click the image for the PDF).

Click for the PDF
Click for the PDF

James Chen Best Student Paper Award at UMAP 2016

Our paper,

  • [PDF] D. Graus, P. N. Bennett, R. W. White, and E. Horvitz, “Analyzing and predicting task reminders,” in The 24th conference on user modeling, adaptation and personalization, 2016.
    [Bibtex]
    @inproceedings{graus2016analyzing,
    title={Analyzing and Predicting Task Reminders},
    author={Graus, David and Bennett, Paul N and White, Ryen W and Horvitz, Eric},
    booktitle={The 24th Conference on User Modeling, Adaptation and Personalization},
    year={2016},
    organization={ACM}
    }

was awarded best student paper, at UMAP 2016!

Me receiving the award during the UMAP banquet dinner at the Marriott Harbourfront Hotel, Halifax. Photo by Denis Parra.

Algorithms aren’t neutral. And that’s a good thing.

Below is an article I wrote with Maarten de Rijke, which was published in nrc.next and NRC Handelsblad under a somewhat misleading title (which wasn’t ours). I cleaned up a Google Translate translation of this article. The translation is far from perfect, but I believe gets the main point across. You can read the original article in Blendle (for €0.29) or on NRC.nl (for free).

See the article in NRC
The article in NRC

A google image search for “three black teens” resulted in mugshot photos, while a search for “three white teens” yielded stock photos of happy smiling youth. Commotion everywhere, and not for the first time. The alleged lack of neutrality of algorithms is a controversial topic. In this controversy, the voice of computer scientists is hardly ever heard. And to have a meaningful discussion on the topic, it is important to understand the underlying technologies.

Our contention, as computer scientists: the lack of neutrality is both necessary and desirable. It is what enables search and recommendation systems to provide us access to huge amounts of information, and let us discover new music or movies. With objective, neutral algorithms, we wouldn’t be able to find anything anymore.

There’s two reasons for this. First, the “usefulness” of information is personal and context-dependent. The quality of a movie recommendation from Netflix, the interestingness of a Facebook post, even the usefulness of a Google search result, varies per person and context. Without contextual information, such as user location, time, or the task performed by the user, even experts do not reach agreement on the usefulness of a search result.

Second, search and recommendation systems have to give us access to enormous quantities of information. Deciding what (not) to display, the filtering of information, is a necessity. The alternative would be a “Facebook” which shows thousands of new messages every single day, making each new visit show a completely new deluge of posts. Or a Netflix which recommends only random movies, so that you can no longer find the movies you really care about.

In short, search and recommendation systems have to be subjective, context-dependent, and adapted to ourselves. They learn this subjectivity and lack of neutrality, from us, their users. The results of these systems are thus a reflection of ourselves, our preferences, attitudes, opinions and behavior. Never an absolute truth.

The idea of an algorithm as a static set of instructions carried out by a machine is misleading. In the context of, for example, Facebook’s news feed, Google’s search results or Netflix’ recommendation, a machine is not told what to do, but told to learn what to do. The systems learn from subjective sources: ourselves, our preferences, our interaction behavior. Learning from subjective sources naturally yields subjective outcomes.

To choose what results to show, a search and recommendation system learns to predict the user’s preferences or taste. To do this, it does what computers do best: counting things. By keeping track of the likes a post receives, or the post’s reading time, the system is able to measure various characteristics of a post. Likes or reading-time are just two examples: in reality, hundreds of attributes are included.
To then learn what is useful for an individual user, the system must determine which features of posts the user considers important. Essential here is to determine the effectiveness of the information displayed. For this, the system gets a goal, such as making sure the user spends more time on the site.
By showing messages with different characteristics (more or less likes, longer or shorter reading times), and to keep track of how long or often the user visits the site, the system can learn which message characteristics makes people spend more time on the website. Things that are simple to measure (clicks, likes, or reading time) are used to bring about more profound changes in user behavior (long term engagement). Furthermore, research has shown that following the personalized recommendations eventually leads to a wider range of choices, and a higher appreciation of the consumed content for users.

The success of modern search and recommendation systems largely results from their lack of neutrality. We should consider these systems as “personalized information intermediaries.” Just like traditional intermediaries (journalists, doctors, opinion leaders), they provide a point of view by filtering and ranking information. And just like traditional intermediaries, it would be wise to seek a second or third opinion when it really matters.

Improving User Productivity with Automated Personal Assistants: Analyzing and Predicting Task Reminders

Update (16 Jul): This paper was awarded the James Chen Best Student Paper Award at UMAP!
Cortana on Windows Phone (source: The Verge)

Automated personal assistants such as Google Now, Microsoft Cortana, Siri, M and Echo aid users in productivity-related tasks, e.g., planning, scheduling and reminding tasks or activities. In this paper we study one such feature of Microsoft Cortana: user-created reminders. Reminders are particularly interesting as they represent the tasks that people are likely to forget. Analyzing and better understanding the nature of these tasks could prove useful in inferring the user’s availability, aid in developing systems to automatically terminate ongoing tasks, allocate time for task completion, or pro-actively suggest (follow-up) tasks.

  • [PDF] D. Graus, P. N. Bennett, R. W. White, and E. Horvitz, “Analyzing and predicting task reminders,” in The 24th conference on user modeling, adaptation and personalization, 2016.
    [Bibtex]
    @inproceedings{graus2016analyzing,
    title={Analyzing and Predicting Task Reminders},
    author={Graus, David and Bennett, Paul N and White, Ryen W and Horvitz, Eric},
    booktitle={The 24th Conference on User Modeling, Adaptation and Personalization},
    year={2016},
    organization={ACM}
    }

Prospective memory

Studying things that people tend to forget has a rich history in the field of social psychology. This type of memory is called “Prospective memory” (or more poetically written: “Remembrance of Things Future“). One challenge in studying PM is that its hard to simulate in a lab study (the hammer of choice for social psychologists). For this reason, most studies of PM have been restricted to “event-based” PM, i.e., memories triggered by an event, modeled in a lab through having someone perform a mundane task, and doing a special thing upon being triggered by an event. Furthermore, the focus in these studies has largely been on retention and retrieval performance of “artificial” memories: subjects were typically given an artificial task to perform. Little is known about the type and nature of actual, real-world, “self-generated” tasks.

Enter Cortana. The user logs we study in this paper represent a rich collection of real-life, actual, self-generated, time-based PM instances, collected in the wild. Studying them in aggregate allows us to better understand the type of tasks that people remind themselves about.

Big data

(Yes, sorry, that heading really says big data…) 
As the loyal reader may have guessed, this paper is the result of my internship at Microsoft Research last summer, and one of the (many) advantages of working at Microsoft Research is the restricted access to big and beautiful data. In this paper we analyze 576,080 reminders, issued by 92,264 people over a period of two months (and we later do prediction experiments on 1.5M+ reminders over a six month time period). Note that this is a filtered set of reminders (a.o. for a smaller geographic area, and we removed all users that only issued a few reminders). Furthermore, when analyzing particular patterns, we filter data to patterns commonly observed across multiple users to study behavior in aggregate and further preserve user privacy: we are not looking at the users behavior at the individual level, but across a large population, to uncover broad and more general patterns. So what do we do to these reminders? The paper consists of three main parts;

tasktypetaxonomy
temporalpatterns

1. Task type taxonomy: First, we aim to identify common types of tasks that underlie reminder setting, by studying the most common reminders found in the logs. This analysis is partly data-driven, and partly qualitative; as we are interested in ‘global usage patterns,’ we extract common reminders, defined as reminders that are seen across many users, that contain a common ‘action’ or verb. We do so by identifying the top most common verb phrases (and find 52 verbs that cover ~61% of the reminders in our logs), and proceed by manually labeling them into categories.

2. Temporal patterns: Next, we study temporal patterns of reminders, by looking at correlations between reminder creation and notification, and in temporal patterns for the terms in the reminder descriptions. We study two aspects of these temporal patterns: patterns in when we create and execute reminders (as a proxy to when people typically tend to think about/execute certain tasks), and the duration of the delay between the reminder’s creation and notification (as a proxy to how “far in advance” we tend to plan different things).

predict

3. Predict! Finally, we show how the patterns we identify above generalize, by addressing the task of predicting the day at which a reminder is likely to trigger, given its creation time and the reminder description (i.e., terms). Understanding when people tend to perform certain tasks could be useful for better supporting users in the reminder process, including allocating time for task completion, or pro-actively suggesting reminder notification times, but also for understanding behavior at scale by looking at patterns in reminder types.

Findings

As always, no exhaustive summary of the paper point-by-point here, straight into some of our findings (there’s much more in the paper):

  • We tend to plan for things (i.e., set reminders) at the end of day, and execute them (i.e., reminders trigger) throughout the day, which suggests the end of day is a natural moment for people to reflect upon the tasks that need to be carried out.
  • The types of things we remind ourselves about are mostly short-term, immediate, tasks such as performing daily chores.
  • People are more likely to call their mom, and email their dad.

Want to know more? See the taxonomy? See more pretty plots? Look at some equations? Learn how this could improve intelligent assistants? Read the paper!

  • [PDF] D. Graus, P. N. Bennett, R. W. White, and E. Horvitz, “Analyzing and predicting task reminders,” in The 24th conference on user modeling, adaptation and personalization, 2016.
    [Bibtex]
    @inproceedings{graus2016analyzing,
    title={Analyzing and Predicting Task Reminders},
    author={Graus, David and Bennett, Paul N and White, Ryen W and Horvitz, Eric},
    booktitle={The 24th Conference on User Modeling, Adaptation and Personalization},
    year={2016},
    organization={ACM}
    }