David Graus • Page 7 of 24 • AI for OpenGov

BNR SMART Radio wins the Marconi Online Award!

Really happy and proud of our amazing AI team for winning the (first) Marconi Online Award at the Gouden RadioRing Gala, for BNR SMART Radio!

SMART Radio is the first product that comes out of our FD Mediagroup AI team! SMART Radio is released in beta for iOS (anytime) and Android, give it a try by downloading it here: https://bnr.nl/smartradio.

Read more about SMART Radio in our demo paper, presented at DIR 2018:

Textpipe logo

I created another logo!

It is the textpipe logo, for textpipe;

a Python package for converting raw text in to clean, readable text and extracting metadata from that text.
textpipe’s readme

A package we’re developing together with our friends at De Persgroep and RTL Nederland :-).

Featured in BNR’s “Artificial Intelligence Podcast” on AI in Media Part I: Tech

In the latest AI Podcast, I am featured, talking with Li’ao Wang on our SMART Radio and SMART Journalism projects. Stream the episode below!

“Media is smart, but needs to get smarter. That’s what we’re working on here at the FD Mediagroep. Come and see what we do with AI to make our creators and content better.”

Stream on BNR

Stream on Spotify

Demo Paper: “SMART Radio: Personalized News Radio”

We’re demo’ing SMART Radio at The 17th Dutch-Belgian Information Retrieval workshop (DIR 2018). We wrote a short paper titled “SMART Radio: Personalized News Radio” to accompany the demo, read it by clicking below!

Position paper ““Let Me Tell You Who You are” — Explaining Recommender Systems by Opening Black Box User Profiles”

Our position paper ““Let Me Tell You Who You are” — Explaining Recommender Systems by Opening Black Box User Profiles” was accepted at the 2nd FATREC Workshop on Responsible Recommendation, held at RecSys ’18!

In this paper, we detail some our ideas and approaches of providing transparency in recommendations through displaying the user profiles, used ‘internally’ by our recommender system. Read the pre-print below!

D. Graus, M. Sappelli, and D. M. Chu, “Let me tell you who you are,” in The 2nd fatrec workshop on responsible recommendation, 2018.
[Bibtex]

@inproceedings{graus2018let,
title={Let me tell you who you are},
author={Graus, David and Sappelli, Maya and Chu, Dung Manh},
booktitle={The 2nd FATREC Workshop on Responsible Recommendation},
year={2018}
}

FATREC Position paper: Explaining recommender systems by opening black box user profiles

Interviewed in “IP – vakblad voor informatieprofessionals”

In the magazine IP (“journal for information professionals”) I am interviewed as one of three young professionals who show that ‘traditional categories and conceptual frames need to be readjusted.’

More specifically, it describes how my multi-disciplinary background, with an academic background in media studies, professional experience in the media, with a PhD in computer science, is important in bridging the gap between ‘techies’ and ‘non-techies’, and of particular value in my current role where I work on enabling AI in media.

Digital forensics in the real world: the Ennetcom data

In the context of a high-profile legal case (involving a bunch of data acquired from encrypted “Ennetcom” phones) I assisted lawyer Inez Weski in acquiring insights and trying to understand how digital forensic tools were used in the collection of digital evidence. I did this work in the context of my PhD research on semantic search for E-Discovery. In this post, I list some of the publications that followed from my work and the case.

De Volkskrant: “Met deze eigen zoekmachine spit de politie schatten aan digitaal bewijs door”

Hansken is the search engine developed by the Netherlands Forensic Institute, and used by the police and public prosecutors. In this article in De Volkskrant, titled “Met deze eigen zoekmachine spit de politie schatten aan digitaal bewijs door,” I answered a few questions and explained my view on the role of Hansken in the court of law and digital evidence acquisition.

NEMO Kennislink: “Het sleepnet van Justitie”

For more information on the case and my work, there’s a more in-depth piece on my work for Weski in the following NEMO Kennislink article, which details my findings and concerns with respect to using a proprietary, continuously developed, and largely black-box tool for collecting digital forensic evidence:

Crimesite: “Hoe het pgp-sleepnet wel (en niet) werkt (#2)”

Finally, if you still didn’t have enough, there’s a blog post on crimesite which explains a bit more on the (legal) case, and some interpretations on my report and findings;

Click to read “Hoe het pgp-sleepnet wel (en niet) werkt (#2)”

Interviewed in RTL XL’s “How it’s done” on data science for news analysis.

In RTL XL’s “How it’s done” me and Company.info’s CTO Henk Pijper explain why and how we apply AI and data science at Company.info to gain insights from online news.

In NRC on the state of robo-journalism in The Netherlands

Article on ‘robo-journalism’ in NRC Handelsblad, titled “Waar blijft de Nederlandse robotjournalist?” where I briefly mention our SMART Journalism project.

Pre-print of position paper “SMART Journalism: Personalizing, Summarizing, and Recommending Financial Economic News”

Our position paper “SMART Journalism: Personalizing, Summarizing, and Recommending Financial Economic News” was accepted at Algorithmic Personalization and News (APEN18) workshop, held at ICWSM ’18!

In this paper, we detail some of the ideas and opportunities of personalization in the domain of financial economic news. Read the pre-print below!

M. Sappelli, D. M. Chu, B. Cambel, J. Nortier, and D. Graus, “Smart radio: personalized news radio,” in Proceedings of the 17th dutch-belgian information retrieval workshop, 2018, p. 27.
[Bibtex]

@inproceedings{sappelli2018smart,
title={SMART Radio: Personalized News Radio},
author={Sappelli, Maya and Chu, Dung Manh and Cambel, Bahadir and Nortier, Joeri and Graus, David},
booktitle={Proceedings of the 17th Dutch-Belgian Information Retrieval Workshop},
pages={27},
year={2018}
}

Featured in article on ‘robo-journalism’ in the Netherlands

Stimuleringsfonds voor de Journalistiek published an article on ‘robo-journalism’, where I say something about the SMART Journalism project we are doing at FDMG, which involves personalization and summarization of newspaper articles. Read it here! (pdf). Snippet:

Door introteksten te personaliseren, kun je meer doelgroepen bedienen.’ Bij het genereren van gepersonaliseerde intro’s op basis van artikelen, komt behoorlijk wat techniek kijken, vertelt David Graus, lead data scientist van het project bij het FD. ‘In de robotjournalistiek wordt nu vooral gewerkt aan het omzetten van gestructureerde data naar teksten. Wat wij willen is teksten maken op basis van door mensen geschreven teksten. Dat is behoorlijk cutting edge. We hebben daarom ook nauwelijks voorbeelden waar we ons op kunnen baseren.’

The Filter Bubble doesn’t exist!

Yesterday I gave a (tongue-in-cheek) talk on algorithmic personalization at the VOGIN-IP Lezing 2018, and brought five pieces of evidence to prove the “filter bubble” doesn’t exist. Check out my slides (in Dutch) by clicking on the picture below!

“The birth of collective memories” published in JASIST!

The journal paper “The birth of collective memories: Analyzing emerging entities in text streams” I wrote with Daan Odijk and Maarten de Rijke is now (finally) published at JASIST! It is published under OpenAccess/CC BY 4.0 and available in “early view” (published before it’s published) in the Wiley Online Library. Click on the image below to access it:

Predictive insights from company information (blog post for company.info)

For Company.info I wrote a short blog post explaining the current state-of-the-art, our current, and future projects that involve machine learning and company information. Read the post below! (in Dutch).

Voorspellende inzichten uit bedrijfsinformatie

Machine learning stelt ons in staat om geautomatiseerd en op grote schaal voorspellende inzichten uit de grote verzameling data uit bijvoorbeeld jaarverslagen en jaarrekeningen te ontdekken en te ontsluiten.

Jaarverslagen bieden waardevolle inzichten in het functioneren en de toekomstperspectieven van bedrijven. Zo ontdekte het FD uit het jaarverslag dat Blendle in 2018 naar nieuw kapitaal op zoek moet om voort te kunnen bestaan. Ook publiceerde het FD een onderzoek in samenwerking met Company.info waaruit bleek dat een groot aantal bedrijven hun interne beheersing niet op orde hebben. Zulke inzichten kunnen volgen uit simpele zoekopdrachten in jaarverslagen.

Zelf onderzoek doen?
Met meer dan 1.7 miljoen originele jaarverslagen is het eenvoudig grasduinen in jaarverslagen met Company.info. Zo levert een simpele zoekopdracht naar ‘toekomst onzeker’ of ‘oordeelonthouding’ in de verzameling jaarverslagen van Company.info op het moment van schrijven respectievelijk 31.267 en 18.231 resultaten op! Test het gratis.

Voorspellende inzichten met machine learning

De aanbevelingen van Netflix, autocorrecties van Google, of de zelfrijdende auto van Tesla; machine learning is niet meer weg te denken uit ons dagelijks leven. Ook in het domein van bedrijfsinformatie gaan de ontwikkelingen vliegensvlug. Machine learning stelt ons in staat om geautomatiseerd en op grote schaal voorspellende inzichten uit de grote verzameling bedrijfsinformatie en jaarverslagen van Company.info te ontdekken, en te ontsluiten.

Zo ontwikkelde Marcia Fissette een methode om te voorspellen of een bedrijf zich schuldig maakt aan fraude, op basis van de tekst uit een jaarverslag. Fissette verzamelde honderden jaarverslagen van (veroordeelde) frauderende bedrijven, en nog eens honderden jaarverslagen van niet-frauderende bedrijven. Door een algoritme het verschil te laten ontdekken tussen het taalgebruik van deze twee groepen jaarverslagen, was Fissette in staat om met een nauwkeurigheid van 89% de frauderende bedrijven te herkennen op basis van hun jaarverslag, en factoren zoals de sector waarin het bedrijf actief is, en de omvang van het bedrijf.

Een volgende stap is het voorspellen van de toekomstige financiële situatie van een bedrijf. Onderzoekers van Amazon en Euclidean Technologies toonden aan dat ze toekomstige balans- en kengetallen (zoals eigen vermogen, activa, en verschillende ratio’s) kunnen voorspellen op basis van (historische) kengetallen, afkomstig uit gepubliceerde jaarrekeningen. Daarnaast tonen ze een mogelijke toepassing van deze voorspellingen aan: aandelenportefeuilles die zijn samengesteld op basis van de voorspelde financiële getallen — in plaats van de al gepubliceerde getallen — leveren een substantieel hoger rendement op in (gesimuleerde) aandelenhandel.

Machine learning bij Company.info

Company.info zit bovenop de ontwikkelingen binnen het toepassen van machine learning om voorspellende inzichten te verkrijgen.

Zo lanceerden we afgelopen maart onze volledig geautomatiseerde nieuws monitor: met behulp van machine learning herkent Company.info automatisch bedrijfsnamen in nieuwsartikelen, en weet automatisch de juiste bedrijfsprofielen aan de artikelen te koppelen. Daarnaast zetten we machine learning in om geautomatiseerd bedrijfsprofielen met SBI-codes te verrijken.

En we zitten niet stil; we volgen alle ontwikkelingen op de voet, en zijn druk bezig met het ontwikkelen van nieuwe voorspellende modellen. Zo gaan we met behulp van tekst-classificatiealgoritmen een sentimentscore aan nieuwsartikelen toekennen, waarmee we patronen en ontwikkelingen kunnen ontdekken in het sentiment rond sectoren of bedrijven. Ook kijken we naar het voorspellen van financiële kengetallen.

David Graus is een data scientist bij Company.info met een PhD in Information Retrieval from ILPS (University of Amsterdam). Hij heeft een achtergrond in de media en in het huidige tijdperk waarin technologie vele aspecten van het leven raakt, voelt hij een verantwoordelijkheid als data scientist om uit te leggen over zijn werk en expertise.

The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams

Our paper “The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams” was accepted for publication at JASIST (the Journal of the Association for Information Science and Technology)! Grab a pre-print here:

D. Graus, D. Odijk, and M. de Rijke, “The birth of collective memories: analyzing emerging entities in text streams,” Journal of the association for information science and technology, vol. 69, iss. 6, pp. 773-786, 2018.
[Bibtex]

@article{graus2018birth,
author = {Graus, David and Odijk, Daan and de Rijke, Maarten},
title = {The birth of collective memories: Analyzing emerging entities in text streams},
journal = {Journal of the Association for Information Science and Technology},
year = {2018},
volume = {69},
number = {6},
pages = {773-786},
doi = {10.1002/asi.24004},
url = {https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/asi.24004},
eprint = {https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/asi.24004},
}

This paper is is:
1. My first journal paper
2. Based on Chapter 3 of my PhD thesis “Entities of Interest — Discovery in Digital Traces”
3. The first collabo on a paper (on paper) between the FD Mediagroep, Blendle, and the UvA
4. The tombstone on my academic career! (?)

In this paper we study news and social media streams spanning over 18 months, and comprising over 579 million documents, and analyze ’emergence patterns’ of entities, i.e., how a real-world entity (such as a person, organization, product, etc.) appears in these documents in the timespan between the entity’s first mention in online text streams, and when an article devoted to the entity is subsequently added to Wikipedia.