Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams

Title Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams
Author David Graus, Manos Tsagkias, Lars Buitinck, Maarten de Rijke
Publication type Full paper
Conference name 36th European Conference on Information Retrieval (ECIR ’14)
Conference location Amsterdam, The Netherlands
Abstract The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
Full paper PDF [256 KB]

Embodied Vision Turtle

Project by Peter Curet & David Graus for the ‘Embodied Vision’ course by Joost Rekveld for the Media Technology MSc. Programme at Leiden University.

We compare the movement of the webcam input (adding up all movement towards the left and right, and up and down). This results in two numbers which represent the total amount of movement since the start.

The turtle graphic system draws on the basis of character-input:
– ‘w’ makes it move forward
– ‘a’ makes it turn left (but doesn’t draw anything)
– ‘d’ makes it turn right (same)
– ‘s’ changes the thickness of the line
– ‘c’ the color

The turtle receives a number of random strings from the genetic algorithm. It calculates the amount and direction of movement each string results in. Then it compares all these numbers to the numbers of the webcam movement. The more alike, the fitter we consider the string. We select the fittest string out of the number of strings it received, and make the turtle draw it. This string is the basis for the ‘next generation’ of strings. It is fed to the genetic algorithm which evolves this string into multiple other strings. The process repeats to infinity. Since the webcam input is dynamic and ever-changing, the fitness of the strings will not gradually rise, but it an ever-changing value.

CS Column 4: Numbers

Small scales, huge numbers

I’ve recently been reading a bit about nanotechnology, and I realized the contradiction that thinking of such insanely small scales brings. You’ll always end up dealing with huge numbers.
In order to try to grasp something as tiny as a nanometer, we try to convert it to the next best thing – the smallest tangible, imaginable distance, the smallest scale on your average rule: a millimeter. This conversion forces us to use hard to imagine scales: a billion nanometers are supposed to fit in between one of the ten tiny lines on your ruler which divide a centimeter. One billion, in such a tiny space? To me, it’s impossible to even imagine such a number. Let alone to mentally chop up this centimeter-space on a ruler in a billion bits. How do I know – other than ‘very tiny’ – how big a nanometer is?
One example I recently came across stuck with me. Supposedly, during the time it takes us to pronounce the word ‘nanometer’, our hair grows ten! This fact impressed me. But, this hair example is not a stranger when it comes to imagining small scales. There are a few often-used examples of making nanometers or other small scales imaginable, one of them being to use the width of a human hair to illustrate the nano-scale.
But how helpful is that? According to one source, the average width of a human hair can vary from around 17 to 181 µm (that’s micrometer: a millionth of a meter. Huge compared to a nanometer). That means a human hair can vary from 17.000 to 181.000 nanometers. Let’s take a look back at our hair growth example. While your hair grows ten nanometers in one direction (in the time it takes you to pronounce the word ‘nanometer’), the other direction can be up to 181.000 nanometers long. That puts this impressive fact into perspective.
In the end, a nanometer is an abstract unit of measure we cannot use it in everyday life. And why would we? We can’t use it for ‘real-life’ measurement. We either use it when downscaling from bigger scales, and consequently end up with huge numbers. Or we use it when we deal with the totally abstract world of molecules and atoms, and then we end up in the even harder to imagine abstract world. Any attempt to make the scale tangible deals with intangible smallness. We’re always stuck with the contradiction of using huge numbers to imagine tiny scales.
Read my 3rd column for the Cool Science class:
» The case of my disappearing socks

Read my 2nd column for the Cool Science class:
» Mobb Deep’s Vision on Evolution Theory

Read my 1st column for the Cool Science class:
» Emerging Chaos – The Rules of Vietnamese Traffic

CS Column 2: Evolution

Mobb Deep’s Vision on Evolution Theory

“Yo, yo
We livin’ this till the day that we die
Survival of the fit, only the strong survive”

Mobb Deep, Survival of the Fittest (1995)

While I seriously doubt Mobb Deep’s ‘Survival of the Fittest’ song was intended to enlighten their audience with the ideas of evolution theory, I’d like to refer to this song to discuss the famous “survival of the fittest”-slogan. Because next to the Mobb Deep song (from the album ‘The Infamous’), it’s also a famous, popular and punchy ‘summary’ of Darwin’s evolution theory. It was introduced by Herbert Spencer in 1851 – seven years before Darwin re-used it in his revolutionary “The Origin of Species”.

In their song, Mobb Deep rap about living and surviving the harsh street life in Queens, New York City. Listening to this fine piece of East Coast rap made me wonder how scientifically valid this ‘street knowledge’ they provide us could be…

In the chorus Mobb Deep further elaborate on their title: ‘Survival of the fit, only the strong survive’. Shouldn’t that be ‘Survival of the fit, only the well adapted survive’? It might not sound as nice, but it would be more correct, at least from a evolution theory point of view. Darwin’s evolution theory does not imply the strongest or most physically fit will survive. It implies that individuals that fit best in their environment will! This misinterpretation of the word ‘fit’ in ‘survival of the fittest’ is (unfortunately) a very common one.

Darwin’s evolution theory is not about being strong, it is about adapting to the environment, surviving, and ultimately about reproducing to pass on genes. So, while Prodigy (one of two rappers in Mobb Deep) raps “I’m goin’ out blastin’, takin’ my enemies with me / And if not, they scarred, so they will never forget me” one could argue he’d be better off staying at home and reproducing (which, to be fair, is another recurring theme in Mobb Deep’s work).

But before we accuse Mobb Deep of misunderstanding the the word ‘fit’, let’s consider a possible alternative explanation: the artists of Mobb Deep might completely disagree to the evolution theory as Darwin formulated it. Rather, they might be strong advocates of Herbert Spencer’s ideas – the man who invented the slogan.

Spencer was a firm believer of Social Darwinism (before it was called Social Darwinism): the integration of Darwin’s evolution theory on ideas on human society. It dictates that in society, the strong will survive at cost of the weak, and that man should not offer a helping hand to the weak in society, as that would go against the natural order of things.

A controversial philosophy, especially today, but could it make sense if we put it in the context of Mobb Deep? The rappers came from poor life in the ghetto, worked their way up, sold millions of albums and eventually became wealthy through it. One could argue that Prodigy and Havoc are in fact the fittest to survive in contemporary human society!

Whatever the case, misinterpretation of a word or strong Social Darwinism, the fact remains that ‘survival of the fittest’ is a pretty strong and powerful slogan – one of which I personally do not mind if it’s applied in scientifically correct ways or not!

Win een paar ecosandalen

Tech the Future
Tech The Future

… Want die geven we zomaar weg op www.TechTheFuture.com!

Tech the Future is het laatste project waar ik me op heb gestort samen met partner in crime Augustus. Een blog over technologie, duurzaamheid, eco, wetenschap, etc. Sinds 3 maart is TTF nu online en het lijkt nu allemaal prima te lopen, met flink wat content in die korte tijd (voor 2 drukbezette bloggers) en veelbelovende stats. In het kader van web 2.0-erigheid hebben we een Facebook groep van 200 man, en we twitteren ook een hoop! Want zo werkt dat tegenwoordig nu eenmaal.

In ieder geval, ga voor die Sanük sandalen! Post een comment op dit artikel.