DBPedia Twitterbot: Introducing @grausPi!

12/12/12 update: since @sem_web moved to live in my Raspberry Pi, I’ve renamed him @grausPi

The last couple of days I’ve spent working on my graduation project by working on a side-project: @sem_web; a Twitter-bot who queries DBPedia [wikipedia’s ‘linked data’ equivalent] for knowledge.

@sem_web is able to recognize 249 concepts, defined by the DBPedia ontology, and sends SPARQL queries to the DBPedia endpoint to retrieve more specific information about them. Currently, this means that @sem_web can check an incoming tweet (mention) for known concepts, and then return an instance (example) of the concept, along with a property of this instance, and the value for the property. An example of Sam’s output:

[findConcept] findConcept('video game')
[findConcept] Looking for concept: video game
 [u'http://dbpedia.org/class/yago/ComputerGame100458890', 
'video game']

[findInst] Seed: [u'http://dbpedia.org/class/yago/ComputerGame100458890', 
'video game']
[findInst] Has 367 instances.
[findInst] Instance: Fight Night Round 3

[findProp] Has 11 properties.
[findProp] [u'http://dbpedia.org/property/platforms', u'platforms']

[findVal] Property: platforms (has 1 values)
[findVal] Value: Xbox 360, Xbox, PSP, PS2, PS3
[findVal] Domain: [u'Thing', u'work', u'software']
[findVal] We're talking about a thing...
Fight Night Round 3 is a video game. Its platforms is Xbox 360, Xbox, 
PSP, PS2, PS3.

This is how it works:

  1. Look for words occurring in the tweet that match a given concept’s label.
  2. If found (concept): send a SPARQL query to retrieve an instance of the concept (an object with rdf:type concept).
  3. If not found: send a SPARQL query to retrieve a subClass of the concept. Go to step 1 with subClass as concept.
  4. If found (instance): send SPARQL queries to retrieve a property, value and domain of the instance. The domain is used to determine whether @sem_web is talking about a human or a thing.
  5. If no property with a value is found after several tries: Go to step 2 to retrieve a new instance.
  6. Compose a sentence (currently @sem_web has 4 different sentences) with the information (concept, instance, property, value).
  7. Tweet!

Next to that, @sem_web posts random tweets once an hour, by picking a random concept from the DBPedia ontology. Working on @sem_web allows me to get to grips with both the SPARQL query language, and programming in Python (which, still, is something I haven’t done before in a larger-than-20-lines-of-code way).

Comparing concepts

What I’m working on next is a method to compare multiple concepts, when @sem_web detects more than one in a tweet. Currently, this works by taking each concept and querying for all the superClasses of the concept. I then store the path from the seed to the topClass (Entity) in a list, repeat the process for the next concept, and then compare both paths to the top, to identify a common parent-Class.

This is relevant for my graduation project as well, because a large task in determining the right subject for a text will be to determine the ‘proximity’ or similarity of different concepts in the text. Still, that specific task of determining ‘similarity’ or proximity of concepts is a much bigger thing, finding common superClasses is just a tiny step towards it. There are other interesting relationships to explore, for example partOf/sameAs relations. I’m curious to see what kind of information I will gather with this from larger texts.

An example of the concept comparison in action. From the following tweet:

>>> randomFriend()
Picked mendicot: @offbeattravel .. FYI, my Twitter bot 
@vagabot found you by parsing (and attempting to answer) 
travel questions off the Twitter firehose ..

I received the following concepts:

5 concepts found.
[u'http://dbpedia.org/class/yago/Bot102311879',
u'http://dbpedia.org/class/yago/ChangeOfLocation107311115',
u'http://dbpedia.org/class/yago/FYI(TVSeries)',
u'http://dbpedia.org/class/yago/Locomotion100283127',
u'http://dbpedia.org/class/yago/Travel100295701']

The findCommonParent function takes two URIs and processes them, appending a new list with the superClasses of the initial URI. This way I can track all the ‘hops’ made by counting the list number. As soon as the function processed both URIs, it starts comparing the pathLists to determine the first common parent.

>>> findCommonParents(found[1],found[3])

[findParents]	http://dbpedia.org/class/yago/ChangeOfLocation107311115
[findParents]	Hop | Path:
[findParents]	0   | [u'http://dbpedia.org/class/yago/ChangeOfLocation107311115']
[findParents]	1   | [u'http://dbpedia.org/class/yago/Movement107309781']
[findParents]	2   | [u'http://dbpedia.org/class/yago/Happening107283608']
[findParents]	3   | [u'http://dbpedia.org/class/yago/Event100029378']
[findParents]	4   | [u'http://dbpedia.org/class/yago/PsychologicalFeature100023100']
[findParents]	5   | [u'http://dbpedia.org/class/yago/Abstraction100002137']
[findParents]	6   | [u'http://dbpedia.org/class/yago/Entity100001740']
[findCommonP]	1st URI processed

[findParents]	http://dbpedia.org/class/yago/Locomotion100283127
[findParents]	Hop | Path:
[findParents]	0   | [u'http://dbpedia.org/class/yago/Locomotion100283127']
[findParents]	1   | [u'http://dbpedia.org/class/yago/Motion100279835']
[findParents]	2   | [u'http://dbpedia.org/class/yago/Change100191142']
[findParents]	3   | [u'http://dbpedia.org/class/yago/Action100037396']
[findParents]	4   | [u'http://dbpedia.org/class/yago/Act100030358']
[findParents]	5   | [u'http://dbpedia.org/class/yago/Event100029378']
[findParents]	6   | [u'http://dbpedia.org/class/yago/PsychologicalFeature100023100']
[findParents]	7   | [u'http://dbpedia.org/class/yago/Abstraction100002137']
[findParents]	8   | [u'http://dbpedia.org/class/yago/Entity100001740']
[findCommonP]	2nd URI processed

[findCommonP]	CommonParent found!
[findCommonP]	Result1[3][0] [findCommonP]	matches with result2[5][0]
[findCommonP]	http://dbpedia.org/class/yago/Event100029378
[findCommonP]	http://dbpedia.org/class/yago/Event100029378

Here you can see the first common parentClass is ‘Event’: 3 hops away from ‘ChangeOfLocation’, and 5 hops away from ‘Locomotion’. If it finds multiple superClasses, it will process multiple URIs at the same time (in one list). Anyway, this is just the basic stuff. There’s plenty more on my to-do list…

While the major part of the functionality I’m building for @sem_web will be directly usable for my thesis project, I haven’t been sitting still with more directly thesis-related things either. I’ve set up a local RDF store (Sesame store) on my laptop with all the needed bio-ontologies. RDFLib’s in-memory stores were clearly not up for the large ontologies I had to load each time. This also means I have to better structure my queries, as all information is not available at any given time. I also – unfortunately – learned that one of my initial plans: finding the shortest path between two nodes in an RDF store to determine ‘proximity’, is actually quite a complicated task. Next I will focus more on improving the concept comparison, taking more properties into account than only rdfs:subClass, and I’ll also work on extracting keywords (which I haven’t, but should have arranged testing data for)… Till next time!

But mostly, the last weeks I’ve been learning SPARQL, improving my Python skills, and getting a better and more concrete idea of the possible approaches for my thesis project by working on sem_web.

[All thesis-related posts]

Embodied Vision Turtle

Project by Peter Curet & David Graus for the ‘Embodied Vision’ course by Joost Rekveld for the Media Technology MSc. Programme at Leiden University.

We compare the movement of the webcam input (adding up all movement towards the left and right, and up and down). This results in two numbers which represent the total amount of movement since the start.

The turtle graphic system draws on the basis of character-input:
– ‘w’ makes it move forward
– ‘a’ makes it turn left (but doesn’t draw anything)
– ‘d’ makes it turn right (same)
– ‘s’ changes the thickness of the line
– ‘c’ the color

The turtle receives a number of random strings from the genetic algorithm. It calculates the amount and direction of movement each string results in. Then it compares all these numbers to the numbers of the webcam movement. The more alike, the fitter we consider the string. We select the fittest string out of the number of strings it received, and make the turtle draw it. This string is the basis for the ‘next generation’ of strings. It is fed to the genetic algorithm which evolves this string into multiple other strings. The process repeats to infinity. Since the webcam input is dynamic and ever-changing, the fitness of the strings will not gradually rise, but it an ever-changing value.

CS Column 4: Numbers

Small scales, huge numbers

I’ve recently been reading a bit about nanotechnology, and I realized the contradiction that thinking of such insanely small scales brings. You’ll always end up dealing with huge numbers.

In order to try to grasp something as tiny as a nanometer, we try to convert it to the next best thing – the smallest tangible, imaginable distance, the smallest scale on your average rule: a millimeter. This conversion forces us to use hard to imagine scales: a billion nanometers are supposed to fit in between one of the ten tiny lines on your ruler which divide a centimeter. One billion, in such a tiny space? To me, it’s impossible to even imagine such a number. Let alone to mentally chop up this centimeter-space on a ruler in a billion bits. How do I know – other than ‘very tiny’ – how big a nanometer is?

One example I recently came across stuck with me. Supposedly, during the time it takes us to pronounce the word ‘nanometer’, our hair grows ten! This fact impressed me. But, this hair example is not a stranger when it comes to imagining small scales. There are a few often-used examples of making nanometers or other small scales imaginable, one of them being to use the width of a human hair to illustrate the nano-scale.

But how helpful is that? According to one source, the average width of a human hair can vary from around 17 to 181 µm (that’s micrometer: a millionth of a meter. Huge compared to a nanometer). That means a human hair can vary from 17.000 to 181.000 nanometers. Let’s take a look back at our hair growth example. While your hair grows ten nanometers in one direction (in the time it takes you to pronounce the word ‘nanometer’), the other direction can be up to 181.000 nanometers long. That puts this impressive fact into perspective.

In the end, a nanometer is an abstract unit of measure we cannot use it in everyday life. And why would we? We can’t use it for ‘real-life’ measurement. We either use it when downscaling from bigger scales, and consequently end up with huge numbers. Or we use it when we deal with the totally abstract world of molecules and atoms, and then we end up in the even harder to imagine abstract world. Any attempt to make the scale tangible deals with intangible smallness. We’re always stuck with the contradiction of using huge numbers to imagine tiny scales.

CS Column 3: Uncertainty

The case of my disappearing socks

I keep losing stuff. Even though I live on a surface of seven square meters I manage to misplace and lose all kinds of stuff. More than once pairs of my socks get separated, resulting in me having to wear two different socks. This leaves me wondering: did I lose these socks, or do they magically disappear by themselves? More often than not, the latter seems more likely to me.

Like a religious man clinging on to old stories to explain the inexplicable, I arm myself with science. “It’s not my fault” I tell my girlfriend, “it’s because my socks are wavy.” “… it has to do with quantum mechanics!” I bluff. This intimidating set of scientific principles can be my best friend when I’m blamed for losing stuff.

It works from inside the socks. Let’s take a closer look at my socks. Zoom in all the way, until the separate fibers that make up the sock’s fabric are exposed. Now keep on zooming, until eventually the structure of these fibers will show itself in the molecular scale. Keep on zooming still until you reach the atom-level, previously thought to be the smallest elements in our universe. Now we’re close: keep on zooming, until finally these elements break down into their subatomic parts – electrons and atomic nuclei, made up out of protons and neutrons. This is where the magic happens. This is what makes my sock disappear.

The problem lies in the behavior of the tiny particles that make up the atoms. Take electrons for example: we imagine electrons as tiny balls that fly never-ending circles around the atomic nuclei. But they’re not. Electrons are not simply miniscule balls flying around, they don’t behave like particles in a fixed trajectory. At least, sometimes they do. But at other times, they behave like a wave.

Now this wavy behavior is interesting: since a wave is never on one location at any given time, but rather on multiple locations ‘spread out through space’, it is impossible to know or measure the exact position of an electron at a specific moment in time. This means an electron has a multitude of possible locations at any moment.

So if the things in atoms behave like wavy things – wavy things with multiple possible positions, of which we can’t pinpoint the exact one – doesn’t that mean this also goes for the atoms they constitute, and for the molecules the atoms add up to, and consequently for the fibers of the fabric that make the sock? Wouldn’t it mean that if all atoms ‘wave’ their way to some other place, my sock would ride along in this atomic wave, and change its position?

So the key question is: are my socks really wavy!? Unfortunately, the answer is no. It’s not as simple as I’d like it to be: upscaling the weirdness of the microscopic world to the real world just doesn’t work. The reason a subatomic particle can show wavy behavior is not because of its scale, but because of its isolation. A single, isolated particle behaves like it does because it is isolated. Only if a subatomic particle is completely isolated, it behaves like a weird wavy thing. More surprisingly, this also implies that even to this day, science has failed to demystify the underlying mechanism of my disappearing socks. I can still bluff my way through, though. Quantum mechanics are to blame!

Read my 2nd column for the Cool Science class:
» Mobb Deep’s Vision on Evolution Theory

Read my 1st column for the Cool Science class:
» Emerging Chaos – The Rules of Vietnamese Traffic

CS Column 2: Evolution

Mobb Deep’s Vision on Evolution Theory

“Yo, yo
We livin’ this till the day that we die
Survival of the fit, only the strong survive”

Mobb Deep, Survival of the Fittest (1995)

While I seriously doubt Mobb Deep’s ‘Survival of the Fittest’ song was intended to enlighten their audience with the ideas of evolution theory, I’d like to refer to this song to discuss the famous “survival of the fittest”-slogan. Because next to the Mobb Deep song (from the album ‘The Infamous’), it’s also a famous, popular and punchy ‘summary’ of Darwin’s evolution theory. It was introduced by Herbert Spencer in 1851 – seven years before Darwin re-used it in his revolutionary “The Origin of Species”.

In their song, Mobb Deep rap about living and surviving the harsh street life in Queens, New York City. Listening to this fine piece of East Coast rap made me wonder how scientifically valid this ‘street knowledge’ they provide us could be…

In the chorus Mobb Deep further elaborate on their title: ‘Survival of the fit, only the strong survive’. Shouldn’t that be ‘Survival of the fit, only the well adapted survive’? It might not sound as nice, but it would be more correct, at least from a evolution theory point of view. Darwin’s evolution theory does not imply the strongest or most physically fit will survive. It implies that individuals that fit best in their environment will! This misinterpretation of the word ‘fit’ in ‘survival of the fittest’ is (unfortunately) a very common one.

Darwin’s evolution theory is not about being strong, it is about adapting to the environment, surviving, and ultimately about reproducing to pass on genes. So, while Prodigy (one of two rappers in Mobb Deep) raps “I’m goin’ out blastin’, takin’ my enemies with me / And if not, they scarred, so they will never forget me” one could argue he’d be better off staying at home and reproducing (which, to be fair, is another recurring theme in Mobb Deep’s work).

But before we accuse Mobb Deep of misunderstanding the the word ‘fit’, let’s consider a possible alternative explanation: the artists of Mobb Deep might completely disagree to the evolution theory as Darwin formulated it. Rather, they might be strong advocates of Herbert Spencer’s ideas – the man who invented the slogan.

Spencer was a firm believer of Social Darwinism (before it was called Social Darwinism): the integration of Darwin’s evolution theory on ideas on human society. It dictates that in society, the strong will survive at cost of the weak, and that man should not offer a helping hand to the weak in society, as that would go against the natural order of things.

A controversial philosophy, especially today, but could it make sense if we put it in the context of Mobb Deep? The rappers came from poor life in the ghetto, worked their way up, sold millions of albums and eventually became wealthy through it. One could argue that Prodigy and Havoc are in fact the fittest to survive in contemporary human society!

Whatever the case, misinterpretation of a word or strong Social Darwinism, the fact remains that ‘survival of the fittest’ is a pretty strong and powerful slogan – one of which I personally do not mind if it’s applied in scientifically correct ways or not!

CS Column 1: Emergence

Emerging chaos: The rules of Vietnamese traffic

When I took this picture in Ho Chi Minh City, Vietnam, I was awe-struck by the chaotic traffic. Dozens of “motobikes” buzz down the streets, seemingly not paying any attention to traffic lanes and rules, oncoming traffic or anything in their vicinity. Cars move through the thick clouds of bikes, and some brave souls even pedal their bicycles straight through it.

For an outsider such as myself, it initially looked like a totally random and chaotic event. Did these people just hope for the best when they were driving through their city? It was obvious all of this chaos would have to work out one way or another. Eventually – I assumed – everyone got where they were going. But how?

Soon I learned there is in fact a systematic at play, and there are plenty of unwritten rules involved in the apparent chaos. You learn this with the one confrontation you cannot avoid: crossing a road on foot (a very intimidating undertaking at first). The basic rule is simple: keep on moving – as long as you do, people manage to anticipate your path and will make sure not crash into you. The next step is that of total immersion: hop on a bike and jump right into traffic.

Once you participate, you realize how simple it actually works. It felt like I was part of a flock – all neighboring motomen adjusted and maintained their speed based on mine and that of the other drivers directly around us. This was not at all obvious when I was observing the traffic from the sidewalk. Eventually I didn’t even worry about horrible fatal accidents anymore, a theme predominantly on my mind when I was only watching the traffic…

Even if it looks simple when you’re in traffic, there is still speeding and overtaking, not everyone is heading to the same destination, so people are constantly moving in and out of the flock. The same principle however applies: when you take a turn, all is fine as long as your movement is fluid. It’s not the turn signals that will save you here: clear and predictable movement will.

What at first seemed totally unnatural to me started feeling more natural, and eventually made sense to me. But it only started to make real sense once I was back home and started reading about swarm intelligence and flocking behavior. The same three rules flocking behavior dictates seem to apply in Vietnamese traffic: separation (avoiding neighbors), alignment (keeping roughly the same direction) and cohesion (sticking together). These simple rules are all you need to create a realistic computer model of a flock of birds, and indeed it’s also all you need to create what seems to be ordered chaos on the roads of a Vietnamese city – I followed the same rules when driving through Ho Chi Minh City on my rental motobike.

As a matter of fact, when I came back home I had to re-adjust to the way traffic works in Holland. Traffic lights, zebra crossings, and the rules of the road were deciding for me where I was going. The Vietnamese traffic which at first seemed unnatural, chaotic and most of all very scary, eventually felt natural, ordered and elegant in its simplicity.