Improving User Productivity with Automated Personal Assistants: Analyzing and Predicting Task Reminders

Update (16/07): This paper was awarded the James Chen Best Student Paper Award at UMAP!

Cortana on Windows Phone (source: The Verge)

Automated personal assistants such as Google Now, Microsoft Cortana, Siri, M and Echo aid users in productivity-related tasks, e.g., planning, scheduling and reminding tasks or activities. In this paper we study one such feature of Microsoft Cortana: user-created reminders. Reminders are particularly interesting as they represent the tasks that people are likely to forget. Analyzing and better understanding the nature of these tasks could prove useful in inferring the user’s availability, aid in developing systems to automatically terminate ongoing tasks, allocate time for task completion, or pro-actively suggest (follow-up) tasks.

  • [PDF] [DOI] D. Graus, P. N. Bennett, R. W. White, and E. Horvitz, “Analyzing and predicting task reminders,” in Proceedings of the 2016 conference on user modeling adaptation and personalization, New York, NY, USA, 2016, p. 7–15.
    [Bibtex]
    @inproceedings{graus2016analyzing,
    author = {Graus, David and Bennett, Paul N. and White, Ryen W. and Horvitz, Eric},
    title = {Analyzing and Predicting Task Reminders},
    year = {2016},
    isbn = {9781450343688},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/2930238.2930239},
    doi = {10.1145/2930238.2930239},
    booktitle = {Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization},
    pages = {7–15},
    numpages = {9},
    keywords = {prospective memory, reminders, log studies, intelligent assistant},
    location = {Halifax, Nova Scotia, Canada},
    series = {UMAP '16}
    }

Prospective memory

Studying things that people tend to forget has a rich history in the field of social psychology. This type of memory is called “Prospective memory” (or more poetically written: “Remembrance of Things Future“). One challenge in studying PM is that its hard to simulate in a lab study (the hammer of choice for social psychologists). For this reason, most studies of PM have been restricted to “event-based” PM, i.e., memories triggered by an event, modeled in a lab through having someone perform a mundane task, and doing a special thing upon being triggered by an event. Furthermore, the focus in these studies has largely been on retention and retrieval performance of “artificial” memories: subjects were typically given an artificial task to perform. Little is known about the type and nature of actual, real-world, “self-generated” tasks.

Enter Cortana. The user logs we study in this paper represent a rich collection of real-life, actual, self-generated, time-based PM instances, collected in the wild. Studying them in aggregate allows us to better understand the type of tasks that people remind themselves about.

Big data

(Yes, sorry, that heading really says big data…) 
As the loyal reader may have guessed, this paper is the result of my internship at Microsoft Research last summer, and one of the (many) advantages of working at Microsoft Research is the restricted access to big and beautiful data. In this paper we analyze 576,080 reminders, issued by 92,264 people over a period of two months (and we later do prediction experiments on 1.5M+ reminders over a six month time period). Note that this is a filtered set of reminders (a.o. for a smaller geographic area, and we removed all users that only issued a few reminders). Furthermore, when analyzing particular patterns, we filter data to patterns commonly observed across multiple users to study behavior in aggregate and further preserve user privacy: we are not looking at the users behavior at the individual level, but across a large population, to uncover broad and more general patterns. So what do we do to these reminders? The paper consists of three main parts;

tasktypetaxonomy
temporalpatterns

1. Task type taxonomy: First, we aim to identify common types of tasks that underlie reminder setting, by studying the most common reminders found in the logs. This analysis is partly data-driven, and partly qualitative; as we are interested in ‘global usage patterns,’ we extract common reminders, defined as reminders that are seen across many users, that contain a common ‘action’ or verb. We do so by identifying the top most common verb phrases (and find 52 verbs that cover ~61% of the reminders in our logs), and proceed by manually labeling them into categories.

2. Temporal patterns: Next, we study temporal patterns of reminders, by looking at correlations between reminder creation and notification, and in temporal patterns for the terms in the reminder descriptions. We study two aspects of these temporal patterns: patterns in when we create and execute reminders (as a proxy to when people typically tend to think about/execute certain tasks), and the duration of the delay between the reminder’s creation and notification (as a proxy to how “far in advance” we tend to plan different things).

predict

3. Predict! Finally, we show how the patterns we identify above generalize, by addressing the task of predicting the day at which a reminder is likely to trigger, given its creation time and the reminder description (i.e., terms). Understanding when people tend to perform certain tasks could be useful for better supporting users in the reminder process, including allocating time for task completion, or pro-actively suggesting reminder notification times, but also for understanding behavior at scale by looking at patterns in reminder types.

Findings

As always, no exhaustive summary of the paper point-by-point here, straight into some of our findings (there’s much more in the paper):

  • We tend to plan for things (i.e., set reminders) at the end of day, and execute them (i.e., reminders trigger) throughout the day, which suggests the end of day is a natural moment for people to reflect upon the tasks that need to be carried out.
  • The types of things we remind ourselves about are mostly short-term, immediate, tasks such as performing daily chores.
  • People are more likely to call their mom, and email their dad.

Want to know more? See the taxonomy? See more pretty plots? Look at some equations? Learn how this could improve intelligent assistants? Read the paper!

  • [PDF] [DOI] D. Graus, P. N. Bennett, R. W. White, and E. Horvitz, “Analyzing and predicting task reminders,” in Proceedings of the 2016 conference on user modeling adaptation and personalization, New York, NY, USA, 2016, p. 7–15.
    [Bibtex]
    @inproceedings{graus2016analyzing,
    author = {Graus, David and Bennett, Paul N. and White, Ryen W. and Horvitz, Eric},
    title = {Analyzing and Predicting Task Reminders},
    year = {2016},
    isbn = {9781450343688},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/2930238.2930239},
    doi = {10.1145/2930238.2930239},
    booktitle = {Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization},
    pages = {7–15},
    numpages = {9},
    keywords = {prospective memory, reminders, log studies, intelligent assistant},
    location = {Halifax, Nova Scotia, Canada},
    series = {UMAP '16}
    }

Apache Spark tutorial @ SURF

A few weeks ago, I took the initiative and participated in a Apache Spark workshop at SURF. As part of SURF’s year report, I was interviewed to comment on the workshop and talk a bit about my own research. SURF published the article recently: read it here! A small outtake below:

One of the participants was David Graus, who conducts research in the field of digital forensics. As a PhD student, he is involved with the ‘Semantic Search in E-Discovery’ project, which was set up in collaboration with the Netherlands Forensic Institute (NFI) and the Dutch Fraud Squad (FIOD).

‘We develop programs or algorithms that help analysts search for digital evidence in documents,’ says David. ‘Analysts are currently doing a lot of this manually, by getting hold of a number of computers, for example, and browsing through the files. They look at the email files to find out who was in contact with whom, and what they discussed in their emails. We are trying to automate this process.’

Dynamic Collective Entity Representations for Entity Ranking

Read a pre-print of our paper below:

  • [PDF] [DOI] D. Graus, M. Tsagkias, W. Weerkamp, E. Meij, and M. de Rijke, “Dynamic collective entity representations for entity ranking,” in Proceedings of the ninth acm international conference on web search and data mining, New York, NY, USA, 2016, p. 595–604.
    [Bibtex]
    @inproceedings{graus2016dynamic,
    author = {Graus, David and Tsagkias, Manos and Weerkamp, Wouter and Meij, Edgar and de Rijke, Maarten},
    title = {Dynamic Collective Entity Representations for Entity Ranking},
    year = {2016},
    isbn = {9781450337168},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/2835776.2835819},
    doi = {10.1145/2835776.2835819},
    booktitle = {Proceedings of the Ninth ACM International Conference on Web Search and Data Mining},
    pages = {595–604},
    numpages = {10},
    keywords = {fielded retrieval, entity retrieval, entity ranking, content representation},
    location = {San Francisco, California, USA},
    series = {WSDM '16}
    }

Entity search

In our latest paper we study the problem of entity ranking. In search engines, people often search for entities; real-life “things” (people, places, companies, movies, etc.). Google, Bing, Yahoo, DuckDuckGo, all big web search engines cater to this type of information need by displaying knowledge panels (they go by many names; but little snippets that show a summary of information related to an entity). You’ve seen this before, but if you haven’t, see the picture below;

Searching for Kendrick Lamar using his former stage-name "k.dot" (knowledge panel on the right).
Searching for Kendrick Lamar using his former stage-name “k.dot” (knowledge panel on the right).

Vocabulary mismatch

One challenge in giving people the entities they search for is that of vocabulary mismatch; people use many different ways to search for entities. Well-formed queries like “Kendrick Lamar” may be a large chunk, but just as well, you’ll find people searching for “k.dot,” or even more abstract/descriptive queries when users do not exactly remember the name of who they are looking for.

Another example is when events unfold in the real world, e.g., Michael Brown being killed by cops in Ferguson. As soon as this happens, and news media starts reporting it, people may start looking for relevant entities (Ferguson) by searching for previously unassociated words, e.g., “police shooting missouri.”

A final example (also in our paper) is shown below. The entity Anthropornis has a small and matter-of-factual description on Wikipedia (it is a stub);

zzz
zzz

But on Twitter, Brody Brooks refers to this particular species of penguin in the following way;

While putting profanity in research papers is not greatly appreciated, this tweet does illustrate our point: people do refer to entities in different (and rich!) ways. The underlying idea of our method is to leverage this for free, to close the gap between the vocabulary of people, and the (formal) language of the Knowledge Base. More specifically, the idea is to enable search engines to automagically incorporate changes in search behavior for entities (“police shooting + ferguson”), and different ways in how people refer to entities (bad penguins).

Main idea

So how? We propose to “expand” entity descriptions by mining content from the web. I mean add words to documents to make it easier to find the documents. We collect these words from tweets, social tags, web anchors (links on webpages), and search engine queries, all of which are somehow associated with entities. So in the case of our Anthropornis-example, the next time someone were to search for the “baddest penguin there ever was,” Anthropornis will get ranked higher.

These type of methods (document expansion) have been studied before, but what sets our setting apart from previous work are two things;

  1. We study our method in a dynamic scenario, i.e., we want to see how external descriptions affect the rankings in (near) real-time; what happens if people start tweeting away about an entity? How do you make sure the entity description doesn’t get swamped with additional content? Next, we;
  2. Combine a large number of different description sources. Which allows us to study differences between signals (tags, tweets, queries, web anchors). Are different sources complementary? Is there’s redundancy across sources? Which type of source is more effective? etc.

Main findings

dcer-plot

As usual, I won’t go into the nitty gritty details of our experimental setup, modeling and results in this post. Read the paper for that (actually, the experimental setup details are quite nitty and gritty in this case). Let’s cut to the chase: adding external descriptions to your entity representation improves entity ranking effectiveness (badum-tss)!

Furthermore, it is important to assign individual weights to the different sources, as the sources vary a lot in terms of content (tweets and queries differ in length, quality, etc.). The expansions also vary across different entities (popular entities may receive many expansions, where less popular entities may not). To balance this, we inform the ranker of the number of expansions a certain entity has received. We address all the above issues by proposing different features for our machine learning model. Finally, we show that in our dynamic scenario, it is a good idea to (periodically) retrain your ranker to re-assess these weights.

What I find attractive about our method is that it’s relatively “cheap” and simple; you simply add content (= words) to your entity representation (= document) and retrieval improves! Even if you omit the fancy machine learning re-training (detailed in our paper). Anyway, for the full details, and more pretty plots like this one, do read our paper!

  • [PDF] [DOI] D. Graus, M. Tsagkias, W. Weerkamp, E. Meij, and M. de Rijke, “Dynamic collective entity representations for entity ranking,” in Proceedings of the ninth acm international conference on web search and data mining, New York, NY, USA, 2016, p. 595–604.
    [Bibtex]
    @inproceedings{graus2016dynamic,
    author = {Graus, David and Tsagkias, Manos and Weerkamp, Wouter and Meij, Edgar and de Rijke, Maarten},
    title = {Dynamic Collective Entity Representations for Entity Ranking},
    year = {2016},
    isbn = {9781450337168},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/2835776.2835819},
    doi = {10.1145/2835776.2835819},
    booktitle = {Proceedings of the Ninth ACM International Conference on Web Search and Data Mining},
    pages = {595–604},
    numpages = {10},
    keywords = {fielded retrieval, entity retrieval, entity ranking, content representation},
    location = {San Francisco, California, USA},
    series = {WSDM '16}
    }

Additionally, you can check out the slides of a talk I gave on this paper at DIR 2015, or check out the poster I presented there.

“Dynamic Collective Entity Representations for Entity Ranking” paper accepted at WSDM2016

Download the poster (PDF, 3.4 MB)
Download the poster (PDF, 3.4 MB)

Our paper “Dynamic Collective Entity Representations for Entity Ranking,” with Manos Tsagkias, Wouter Weerkamp, Edgar Meij and Maarten de Rijke was accepted at The 9th ACM International Conference on Web Search and Data Mining (WSDM2016). Read the extended one-page abstract (submitted to DIR 2015) here (PDF, 200kb).

Abstract: Entity ranking, i.e., successfully positioning a relevant entity at the top of the ranking for a given query, is inherently difficult due to the potential mismatch between the entity’s description in a knowledge base, and the way people refer to the entity when searching for it. To counter this issue we propose a method for constructing dynamic collective entity representations. We collect entity descriptions from a variety of sources and combine them into a single entity representation by learning to weight the content from different sources that is associated with an entity for optimal retrieval effectiveness. Our method is able to add new descriptions in real time, and learn the best representation at set time intervals as time evolves so as to capture the dynamics in how people search entities. Incorporating dynamic description sources into dynamic collective entity representations improves retrieval effectiveness by 7% over a state-of-the-art learning to rank baseline. Periodic retraining of the ranker enables higher ranking effectiveness for dynamic collective entity representations.

I will post a pre-print here soon.

Update: Cool! Our paper has been selected for presentation as a long talk at the conference.

Update 2: The extended abstract of this paper has been accepted for poster + oral presentation at the 14th Dutch-Belgian Information Retrieval Workshop (DIR 2015). I’ve uploaded the slides of my DIR talk here.