David Graus

In defense of algorithms

Pre-print of position paper “SMART Journalism: Personalizing, Summarizing, and Recommending Financial Economic News”

Friday, June 1, 2018
107 views
0 comments

Our position paper “SMART Journalism: Personalizing, Summarizing, and Recommending Financial Economic News” was accepted at Algorithmic Personalization and News (APEN18) workshop, held at ICWSM ’18!

In this paper, we detail some of the ideas and opportunities of personalization in the domain of financial economic news. Read the pre-print below!

  • [PDF] M. Sappelli, D. M. Chu, B. Cambel, D. Graus, and P. Bressers, “Smart journalism: personalizing, summarizing, and recommending financial economic news,” in The algorithmic personalization and news (apen18) workshop at icwsm ’18, 2018.
    [Bibtex]
    @inproceedings{sappelli2018smart,
    title={SMART Journalism: Personalizing, Summarizing, and Recommending Financial Economic News},
    author={Sappelli, Maya and Chu, Dung Manh and Cambel, Bahadir and Graus, David and Bressers, Philippe},
    booktitle={The Algorithmic Personalization and News (APEN18) Workshop at ICWSM '18},
    year={2018}
    }

“The birth of collective memories” published in JASIST!

Monday, February 5, 2018
36 views
0 comments

The journal paper “The birth of collective memories: Analyzing emerging entities in text streams” I wrote with Daan Odijk and Maarten de Rijke is now (finally) published at JASIST! It is published under OpenAccess/CC BY 4.0 and available in “early view” (published before it’s published) in the Wiley Online Library. Click on the image below to access it:

 

The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams

Monday, December 11, 2017
165 views
0 comments

Our paper “The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams” was accepted for publication at JASIST (the Journal of the Association for Information Science and Technology)! Grab a pre-print here:

  • [PDF] D. Graus, D. Odijk, and M. de Rijke, “The birth of collective memories: analyzing emerging entities in text streams,” Journal of the association for information science and technology, 2018.
    [Bibtex]
    @article{graus2018birth,
    title={The birth of collective memories: Analyzing emerging entities in text streams},
    author={Graus, David and Odijk, Daan and de Rijke, Maarten},
    journal={Journal of the Association for Information Science and Technology},
    year={2018}
    }

This paper is is:
1. My first journal paper
2. Based on Chapter 3 of my PhD thesis “Entities of Interest — Discovery in Digital Traces
3. The first collabo on a paper (on paper) between the FD Mediagroep, Blendle, and the UvA
4. The tombstone on my academic career! (?)

In this paper we study news and social media streams spanning over 18 months, and comprising over 579 million documents, and analyze ’emergence patterns’ of entities, i.e., how a real-world entity (such as a person, organization, product, etc.) appears in these documents in the timespan between the entity’s first mention in online text streams, and when an article devoted to the entity is subsequently added to Wikipedia.

Combining multiple signals for semanticizing tweets

Friday, May 22, 2015
35 views
0 comments
Title Combining multiple signals for semanticizing tweets: University of Amsterdam at #Microposts2015
Author Cristina Gârbacea, Daan Odijk, David Graus, Isaac Sijaranamual, Maarten de Rijke
Publication type Workshop paper
Workshop name #Microposts2015 – 5th Workshop on Making Sense of Microposts
Conference name WWW ’15
Conference location Florence, Italy
Abstract In this paper we present an approach for extracting and linking entities from short and noisy microblog posts. We describe a diverse set of approaches based on the Semanticizer, an open-source entity linking framework developed at the University of Amsterdam, adapted to the task of the #Microposts2015 challenge. We consider alternatives for dealing with ambiguity that can help in the named entity extraction and linking processes. We retrieve entity candidates from multiple sources and process them in a four-step pipeline. Results show that we correctly manage to identify entity mentions (our best run attains an F1 score of 0.809 in terms of the strong mention match metric), but subsequent steps prove to be more challenging for our approach.
Paper PDF [92 KB]

Who is Involved? Semantic Search for E-Discovery

Thursday, May 21, 2015
31 views
0 comments
Title Who is Involved? Semantic Search for E-Discovery
Author David van Dijk, David Graus, Zhaochun Ren, Hans Henseler, Maarten de Rijke
Publication type Workshop paper
Workshop name DESI VI: Workshop on Using Machine Learning and Other Advanced Techniques to Address Legal Problems in E-Discovery and Information Governance
Conference name ICAIL ’15: The 15th International Conference on Artificial Intelligence & Law
Conference location San Diego, United States of America
Abstract E-discovery projects typically start with an assessment of the collected electronic data in order to estimate the risk to prosecute or defend a legal case. This is not a review task but is appropriately called early case assessment, which is better known as exploratory search in the information retrieval community. This paper first describes text mining methodologies that can be used for enhancing exploratory search. Based on these ideas we present a semantic search dashboard that includes entities that are relevant to investigators such as who knew who, what, where and when. We describe how this dashboard can be powered by results from our ongoing research in the “Semantic Search for E-Discovery” project on topic detection and clustering, semantic enrichment of user profiles, email recipient recommendation, expert finding and identity extraction from digital forensic evidence.
Paper PDF [610 KB]

Semanticizing Search Engine Queries

Wednesday, July 2, 2014
58 views
0 comments
Title Semanticizing Search Engine Queries — The University of Amsterdam at the ERD 2014 Challenge
Author David Graus, Daan Odijk, Manos Tsagkias, Wouter Weerkamp, Maarten de Rijke
Publication type Workshop paper
Workshop name ERD ’14: the First International Entity Recognition & Disambiguation Challenge
Conference name SIGIR ’14: 37th international ACM SIGIR conference on Research and development in information retrieval
Conference location Gold Coast, Australia
Abstract This paper describes the University of Amsterdam’s participation in the short track of the Entity Recognition & Disambiguation Challenge 2014 (ERD 2014). We describe how we adapt the Semanticizer—an open-source entity linking framework developed primarily at the University of Amsterdam—to the task of the ERD challenge: linking named entities in search engine queries. We steer the Semanticizer’s linking towards named entities by adapting an existing training corpus, and extend the Semanticizer’s set of features with contextual features that aim to leverage the limited context provided by search queries. With an F1 score of 0.6062 our final system run achieves median performance, and better than mean performance (0.5329).
Paper PDF [515 KB]

Recipient recommendation in enterprises using communication graphs and email content

Saturday, May 10, 2014
193 views
3 comments
Title Recipient recommendation in enterprises using communication graphs and email content
Author David Graus, David van Dijk, Manos Tsagkias, Wouter Weerkamp, Maarten de Rijke
Publication type Short paper
Conference name SIGIR ’14: 37th international ACM SIGIR conference on Research and development in information retrieval
Conference location Gold Coast, Australia
Abstract We address the task of recipient recommendation for emailing in enterprises. We propose an intuitive and elegant way of modeling the task of recipient recommendation, which uses both the communication graph (i.e., who are most closely connected to the sender) and the content of the email. Additionally, the model can incorporate evidence as prior probabilities. Experiments on two enterprise email collections show that our model achieves very high scores, and that it outperforms two variants that use either the communication graph or the content in isolation.
Paper PDF [747 KB]
Blog Understanding email-traffic: Social network analysis meets language modeling

yourHistory — Semantic linking for a personalized timeline of historic events

Sunday, January 19, 2014
16 views
0 comments
Title yourHistory — Semantic linking for a personalized timeline of historic events
Author David Graus, Maria-Hendrike Peetz, Daan Odijk, Ork de Rooij, Maarten de Rijke
Publication type Workshop Proceedings
Workshop name LinkedUp Challenge at Open Knowledge Conference (OKCon) 2013
Conference location Geneva, Switzerland
Abstract In this paper we present yourHistory: a Facebook application that aims to generate a tailor-made, personalized timeline of historic events, by matching a semantically enriched Facebook profile to a pool of candidate historic events extracted from DBPedia. Two aspects are central to our application: (i) semantic linking technologies backed by rich open web knowledge bases for generating semantically enriched user profiles, and (ii) semantic relatedness metrics for ranking historic events to user profiles. This paper describes the development of a Facebook application that aims to be engaging for users, whilst at the same time being a source for data that can be applied to evaluating or improving the application. We describe our Wikipedia-based semantic relatedness metric for event ranking, but also the restrictions and constraints concerning privacy-sensitive and ethical matters, around data storage and user consent. Finally, we reflect on how this type of user data can be applied for evaluating or improving both the semantic linking and event ranking methods in future work.
Full paper PDF [352.3 KB]

Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams

Friday, January 17, 2014
17 views
0 comments
Title Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams
Author David Graus, Manos Tsagkias, Lars Buitinck, Maarten de Rijke
Publication type Full paper
Conference name 36th European Conference on Information Retrieval (ECIR ’14)
Conference location Amsterdam, The Netherlands
Abstract The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
Full paper PDF [256 KB]

Semantic Linking and Contextualization for Social Forensic Text Analysis

Friday, June 21, 2013
23 views
0 comments
Title Semantic Linking and Contextualization for Social Forensic Text Analysis
Author Zhaochun Ren, David van Dijk, David Graus, Nina van der Knaap, Hans Henseler, Maarten de Rijke
Publication type Workshop Proceedings
Workshop name Workshop on Forensic Text Analysis (FORTAN)
Conference name European Intelligence and Security Informatics Conference (EISIC 2013)
Conference location Uppsala, Sweden
Abstract With the development of social media, forensic text analysis is becoming more and more challenging as forensic analysts have begun to include this information source in their practice. In this paper, we report on our recent work related to semantic search in e-discovery and propose the use of entity and topic extraction for social media text analysis. We first describe our approach for entity linking at the 2012 Text Analysis Conference Knowledge Base Population track and then detail the personalized tweets summarization task is introduced, where entity linking is used for semantically enriching information in a social media context.
Full paper PDF [204 KB]