|Title||Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams|
|Author||David Graus, Manos Tsagkias, Lars Buitinck, Maarten de Rijke|
|Publication type||Full paper|
|Conference name||36th European Conference on Information Retrieval (ECIR ’14)|
|Conference location||Amsterdam, The Netherlands|
|Abstract||The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.|
|Full paper||PDF [256 KB]|
Project by Peter Curet & David Graus for the ‘Embodied Vision’ course by Joost Rekveld for the Media Technology MSc. Programme at Leiden University.
We compare the movement of the webcam input (adding up all movement towards the left and right, and up and down). This results in two numbers which represent the total amount of movement since the start.
The turtle graphic system draws on the basis of character-input:
– ‘w’ makes it move forward
– ‘a’ makes it turn left (but doesn’t draw anything)
– ‘d’ makes it turn right (same)
– ‘s’ changes the thickness of the line
– ‘c’ the color
The turtle receives a number of random strings from the genetic algorithm. It calculates the amount and direction of movement each string results in. Then it compares all these numbers to the numbers of the webcam movement. The more alike, the fitter we consider the string. We select the fittest string out of the number of strings it received, and make the turtle draw it. This string is the basis for the ‘next generation’ of strings. It is fed to the genetic algorithm which evolves this string into multiple other strings. The process repeats to infinity. Since the webcam input is dynamic and ever-changing, the fitness of the strings will not gradually rise, but it an ever-changing value.
Small scales, huge numbers
» The case of my disappearing socks
Read my 2nd column for the Cool Science class:
» Mobb Deep’s Vision on Evolution Theory
Read my 1st column for the Cool Science class:
» Emerging Chaos – The Rules of Vietnamese Traffic
Mobb Deep’s Vision on Evolution Theory
We livin’ this till the day that we die
Survival of the fit, only the strong survive”
Mobb Deep, Survival of the Fittest (1995)
While I seriously doubt Mobb Deep’s ‘Survival of the Fittest’ song was intended to enlighten their audience with the ideas of evolution theory, I’d like to refer to this song to discuss the famous “survival of the fittest”-slogan. Because next to the Mobb Deep song (from the album ‘The Infamous’), it’s also a famous, popular and punchy ‘summary’ of Darwin’s evolution theory. It was introduced by Herbert Spencer in 1851 – seven years before Darwin re-used it in his revolutionary “The Origin of Species”.
In their song, Mobb Deep rap about living and surviving the harsh street life in Queens, New York City. Listening to this fine piece of East Coast rap made me wonder how scientifically valid this ‘street knowledge’ they provide us could be…
In the chorus Mobb Deep further elaborate on their title: ‘Survival of the fit, only the strong survive’. Shouldn’t that be ‘Survival of the fit, only the well adapted survive’? It might not sound as nice, but it would be more correct, at least from a evolution theory point of view. Darwin’s evolution theory does not imply the strongest or most physically fit will survive. It implies that individuals that fit best in their environment will! This misinterpretation of the word ‘fit’ in ‘survival of the fittest’ is (unfortunately) a very common one.
Darwin’s evolution theory is not about being strong, it is about adapting to the environment, surviving, and ultimately about reproducing to pass on genes. So, while Prodigy (one of two rappers in Mobb Deep) raps “I’m goin’ out blastin’, takin’ my enemies with me / And if not, they scarred, so they will never forget me” one could argue he’d be better off staying at home and reproducing (which, to be fair, is another recurring theme in Mobb Deep’s work).
But before we accuse Mobb Deep of misunderstanding the the word ‘fit’, let’s consider a possible alternative explanation: the artists of Mobb Deep might completely disagree to the evolution theory as Darwin formulated it. Rather, they might be strong advocates of Herbert Spencer’s ideas – the man who invented the slogan.
Spencer was a firm believer of Social Darwinism (before it was called Social Darwinism): the integration of Darwin’s evolution theory on ideas on human society. It dictates that in society, the strong will survive at cost of the weak, and that man should not offer a helping hand to the weak in society, as that would go against the natural order of things.
A controversial philosophy, especially today, but could it make sense if we put it in the context of Mobb Deep? The rappers came from poor life in the ghetto, worked their way up, sold millions of albums and eventually became wealthy through it. One could argue that Prodigy and Havoc are in fact the fittest to survive in contemporary human society!
Whatever the case, misinterpretation of a word or strong Social Darwinism, the fact remains that ‘survival of the fittest’ is a pretty strong and powerful slogan – one of which I personally do not mind if it’s applied in scientifically correct ways or not!
… Want die geven we zomaar weg op www.TechTheFuture.com!
Tech the Future is het laatste project waar ik me op heb gestort samen met partner in crime Augustus. Een blog over technologie, duurzaamheid, eco, wetenschap, etc. Sinds 3 maart is TTF nu online en het lijkt nu allemaal prima te lopen, met flink wat content in die korte tijd (voor 2 drukbezette bloggers) en veelbelovende stats. In het kader van web 2.0-erigheid hebben we een Facebook groep van 200 man, en we twitteren ook een hoop! Want zo werkt dat tegenwoordig nu eenmaal.