Welcome to my blog! I am David Graus. Currently finishing my Media Technology MSc. at the University of Leiden. My MSc thesis involves the Semantic Web, Python, OWL, RDF & NLP. I work as an editor for Wetenschap24 at NTR, a Dutch broadcasting company.

New project: dataminen.nl
18/04/2012 (10:06)
5 views

I have started a Dutch blog on datamining, as I haven’t really come across one, and figured the time is right, with the increase in interest in datajournalism, dataviz, big data, etc. The idea is to provide a general and human-understandable overview of the (academic) field of datamining and the innovations.

I will still use this blog to keep the world informed of my personal endeavours ;-).

text graphs
12/04/2012 (1:03)
9 views

Subgraph Similarity Example
10/04/2012 (13:45)
25 views

“(a) A sagittal reconstruction of a coronally acquired magnetic resonance imaging (MRI) scan, at the level on which the cingulate gyrus was measured. The area outlined represents the portion of the scan used to orient the operator to the landmarks of the cingulate. A box has been placed over the region of interest in one hemisphere. (b) A diagram of the cingulate gyrus divided into the rostral portion of the anterior cingulate (RAC), the caudal portion of the interior cingulate (CAC), and the posterior cingulate (PC). Adjoining landmarks include the corpus callosum (CC), the lateral ventricle (Lat. Vent.), and the thalamus (Thal.). (c) The region of the cingulate gyrus measured in the present study, as delineated on the MRI scan of a control subject. [...] ” (snippet)

» Check out the d3js demo here «

Two sets of annotations (Expert 1 & Expert 3)

 +

Result in the following similarity graph:

Het kan dooien, het kan vriezen in Tutti Frutti dorp
24/03/2012 (0:14)
78 views


zo gaat dat hier in Tuindorp

kapstok hack
04/12/2011 (17:49)
42 views

In de categorie vergiet-hack maar dan niet met een vergiet maar met een kapstok, en bovendien heel erg anders. Dit is wat Jysk van plan was met de Ascot kapstok:

En zo hangt ‘ie nu aan mijn muur:

#OccupyAmsterdam wordle
16/10/2011 (19:44)
36 views

Wordle van de 200 meest voorkomende woorden in tweets met hashtag #OccupyAmsterdam. Gemaakt van 5.239 tweets van tussen zaterdag 8 oktober 09:55 uur en 16 oktober 15:50 uur.
Handmatig gefilterd op nicknames en nietszeggende woorden. Hier is de lijst van de 1000 meest voorkomende woorden: OccupyAmsterdam-woorden.

More text-mining. Popularity contest: Drosophila Melanogaster vs. C. elegans
09/10/2011 (17:17)
53 views

 vs 

While waiting on several word-counting scripts to finish counting, I picked up my cancerCounter script to count something else. This time, I wanted to see what organism was more popular and more frequently mentioned in biomedical studies: the ever-present Drosophila Melanogaster, aka common fruit fly, or the aptly named Caenorhabditis elegans (one cannot deny that the 1nm-long worm has quite the elegant wiggle). Two model organisms in biomedical research.

Both have a lot going for themselves:
- Elegans was the first organism to ever have its entire genome sequenced (go worm!)
- The worm reproduces and mutates quickly and easily

The fruit fly on the other hand is quite the suitable lab-rat as well:
- Drosophila breeds easily
- Does not need much space nor care
- Has to pay for invading my kitchen each year during summer

I started counting the occurrence of ‘drosophila melanogaster’ or ‘d. melanogaster’ AND ‘caenorhabditis elegans’ or ‘c. elegans’ in the lowercased article-body of my 99.000-and-something BioMedCentral articles-corpus, and took a looksy. First comes the total amount of articles published a year, with the amount of articles mentioning the fruit fly/worm:

As we can see, worryingly, scientists hardly spend enough time performing research with worms and fruit flies. Since 2003, they do consistently play more with the worms than with fruit flies, though. But it’s hard to see, let’s ditch the total articles:

When we subtract the drosophila articles from the elegans articles, we can see how much the worm has on the fruit fly. The red bars represents by how many articles Elegans wins over Drosophila, and blue bars indicate with how many articles Drosophila wins over Elegans.

But absolute numbers is not what we’re looking for. As we have seen in the first graph, the frequency of articles is far from evenly distributed. So let’s see what the ratio is, of the difference between both organisms:

This evens out some of the bigger differences in the previous graph; Drosophila had ‘only’ a +5 win over Elegans in 2001, but relatively this is a bigger victory than Elegans’ +34 win in 2006, and even its +79 victory in 2009.

Conclusion: Elegans wins.

Direction flip counter
01/10/2011 (17:33)
15 views

image

I think I just created a functional direction-flips counter for the directed graph that my SPARQL-powered ontology-pathFinder produces :)).


>>> path = [['drie','>','vijf'],['vijf','>','negen'],['zeven','>','negen'],['zeven','>','acht'],['acht','>','twaalf'],['negentien','>','twaalf']]
>>> findFlips(path,'drie','negentien')
drie vijf
vijf negen
up
zeven negen
down
zeven acht
acht twaalf
up
negentien twaalf
down
3 flips

It seems to work correctly on this incredibly tricky test-path I gave it ;).

Textmining BioMedCentral: Cancer – a trending topic?
28/09/2011 (8:44)
36 views

*Update*
I added a graph which shows the ratio of articles containing the word ‘Cancer’ to total articles per year. It sadly still suffers from the incomplete data of earlier years:

*Original post*

This is my first attempt to get some data to get some data out of the BioMedCentral dataset, the freely available, Open Access archive of over 40 years of Biomedical research articles. I’ll use this set as a training corpus for my thesis, to extract domain-specific features to use when comparing the similarity between two documents. The dataset consists out of 103.782 articles from 1969 to today.

My text-mining experiment was a very simple one: count the occurrence of the word ‘cancer’ in every article of the journal. My expectation was that the term would occur more frequently as time progresses: as a science journalist I frequently came across (obscure) biomedical research which concluded its findings by in some way linking to (promising a potential way to discover a potential cure for:) cancer. I always figured it had to do with funding. But I’m no expert.

Anyway, to test this I threw together a simple Python script to parse each (xml-formatted) article and extract its date and the frequency of the word cancer, and output this data to a csv-file. I averaged the amount of counts per year per article. Resulting in the following graph:

I hoped to be able to provide an overview of the frequency of the word in ~40 years of BMC. I wasn’t. The first couple of years seem very incomplete: there aren’t many articles (in the hundreds instead of in the thousands as in later years), and lots of “(To access the full article, please see PDF)”-references (yay to Open Access). Anyway, I figured the last 10 years WERE okay, so I graphed the average occurrence of the word cancer of those last couple of years.

Some initial thoughts:

  • The average (word count per article) might be the wrong metric here. Articles dedicated to cancer-related topics skew the average too much. I am actually looking for the papers which do not contain the word frequently.
  • A better metric could be the ratio of articles that DO contain the word (at least once). I’ll give that a shot later and update this post.
  • There does seem to be some increase in occurrence, however I wouldn’t say it’s enough to support my observation.

vergiet-hack
02/03/2011 (15:37)
28 views

I hacked my colander into a lamp.

Morocco 2010 Slideshow
31/08/2010 (14:22)
0 views

Or check out the set on Flickr

The Ultimate Bday BBQ Breakdown
22/06/2010 (14:03)
2 views

Join my birthday party on the 1st of July in the Oosterpark, Amsterdam! From 3pm till late. See you there!

Embodied Vision Turtle
04/06/2010 (13:33)
3 views

Project by Peter Curet & David Graus for the ‘Embodied Vision’ course by Joost Rekveld for the Media Technology MSc. Programme at Leiden University.

We compare the movement of the webcam input (adding up all movement towards the left and right, and up and down). This results in two numbers which represent the total amount of movement since the start.

The turtle graphic system draws on the basis of character-input:
- ‘w’ makes it move forward
- ‘a’ makes it turn left (but doesn’t draw anything)
- ‘d’ makes it turn right (same)
- ‘s’ changes the thickness of the line
- ‘c’ the color

The turtle receives a number of random strings from the genetic algorithm. It calculates the amount and direction of movement each string results in. Then it compares all these numbers to the numbers of the webcam movement. The more alike, the fitter we consider the string. We select the fittest string out of the number of strings it received, and make the turtle draw it. This string is the basis for the ‘next generation’ of strings. It is fed to the genetic algorithm which evolves this string into multiple other strings. The process repeats to infinity. Since the webcam input is dynamic and ever-changing, the fitness of the strings will not gradually rise, but it an ever-changing value.

CS Column 4: Numbers
09/05/2010 (12:47)
4 views

Small scales, huge numbers

I’ve recently been reading a bit about nanotechnology, and I realized the contradiction that thinking of such insanely small scales brings. You’ll always end up dealing with huge numbers.
In order to try to grasp something as tiny as a nanometer, we try to convert it to the next best thing – the smallest tangible, imaginable distance, the smallest scale on your average rule: a millimeter. This conversion forces us to use hard to imagine scales: a billion nanometers are supposed to fit in between one of the ten tiny lines on your ruler which divide a centimeter. One billion, in such a tiny space? To me, it’s impossible to even imagine such a number. Let alone to mentally chop up this centimeter-space on a ruler in a billion bits. How do I know – other than ‘very tiny’ – how big a nanometer is?
One example I recently came across stuck with me. Supposedly, during the time it takes us to pronounce the word ‘nanometer’, our hair grows ten! This fact impressed me. But, this hair example is not a stranger when it comes to imagining small scales. There are a few often-used examples of making nanometers or other small scales imaginable, one of them being to use the width of a human hair to illustrate the nano-scale.
But how helpful is that? According to one source, the average width of a human hair can vary from around 17 to 181 µm (that’s micrometer: a millionth of a meter. Huge compared to a nanometer). That means a human hair can vary from 17.000 to 181.000 nanometers. Let’s take a look back at our hair growth example. While your hair grows ten nanometers in one direction (in the time it takes you to pronounce the word ‘nanometer’), the other direction can be up to 181.000 nanometers long. That puts this impressive fact into perspective.
In the end, a nanometer is an abstract unit of measure we cannot use it in everyday life. And why would we? We can’t use it for ‘real-life’ measurement. We either use it when downscaling from bigger scales, and consequently end up with huge numbers. Or we use it when we deal with the totally abstract world of molecules and atoms, and then we end up in the even harder to imagine abstract world. Any attempt to make the scale tangible deals with intangible smallness. We’re always stuck with the contradiction of using huge numbers to imagine tiny scales.
Read my 3rd column for the Cool Science class:
» The case of my disappearing socks

Read my 2nd column for the Cool Science class:
» Mobb Deep’s Vision on Evolution Theory

Read my 1st column for the Cool Science class:
» Emerging Chaos – The Rules of Vietnamese Traffic

CS Column 3: Uncertainty
20/04/2010 (17:49)
1 views

The case of my disappearing socks

I keep losing stuff. Even though I live on a surface of seven square meters I manage to misplace and lose all kinds of stuff. More than once pairs of my socks get separated, resulting in me having to wear two different socks. This leaves me wondering: did I lose these socks, or do they magically disappear by themselves? More often than not, the latter seems more likely to me.

Like a religious man clinging on to old stories to explain the inexplicable, I arm myself with science. “It’s not my fault” I tell my girlfriend, “it’s because my socks are wavy.” “… it has to do with quantum mechanics!” I bluff. This intimidating set of scientific principles can be my best friend when I’m blamed for losing stuff.

It works from inside the socks. Let’s take a closer look at my socks. Zoom in all the way, until the separate fibers that make up the sock’s fabric are exposed. Now keep on zooming, until eventually the structure of these fibers will show itself in the molecular scale. Keep on zooming still until you reach the atom-level, previously thought to be the smallest elements in our universe. Now we’re close: keep on zooming, until finally these elements break down into their subatomic parts – electrons and atomic nuclei, made up out of protons and neutrons. This is where the magic happens. This is what makes my sock disappear.

The problem lies in the behavior of the tiny particles that make up the atoms. Take electrons for example: we imagine electrons as tiny balls that fly never-ending circles around the atomic nuclei. But they’re not. Electrons are not simply miniscule balls flying around, they don’t behave like particles in a fixed trajectory. At least, sometimes they do. But at other times, they behave like a wave.

Now this wavy behavior is interesting: since a wave is never on one location at any given time, but rather on multiple locations ‘spread out through space’, it is impossible to know or measure the exact position of an electron at a specific moment in time. This means an electron has a multitude of possible locations at any moment.

So if the things in atoms behave like wavy things – wavy things with multiple possible positions, of which we can’t pinpoint the exact one – doesn’t that mean this also goes for the atoms they constitute, and for the molecules the atoms add up to, and consequently for the fibers of the fabric that make the sock? Wouldn’t it mean that if all atoms ‘wave’ their way to some other place, my sock would ride along in this atomic wave, and change its position?

So the key question is: are my socks really wavy!? Unfortunately, the answer is no. It’s not as simple as I’d like it to be: upscaling the weirdness of the microscopic world to the real world just doesn’t work. The reason a subatomic particle can show wavy behavior is not because of its scale, but because of its isolation. A single, isolated particle behaves like it does because it is isolated. Only if a subatomic particle is completely isolated, it behaves like a weird wavy thing. More surprisingly, this also implies that even to this day, science has failed to demystify the underlying mechanism of my disappearing socks. I can still bluff my way through, though. Quantum mechanics are to blame!


Read my 2nd column for the Cool Science class:
» Mobb Deep’s Vision on Evolution Theory

Read my 1st column for the Cool Science class:
» Emerging Chaos – The Rules of Vietnamese Traffic

TechTheFuture logo
20/04/2010 (1:31)
1 views

CS Column 2: Evolution
27/03/2010 (20:45)
9 views

Mobb Deep’s Vision on Evolution Theory

“Yo, yo
We livin’ this till the day that we die
Survival of the fit, only the strong survive”

Mobb Deep, Survival of the Fittest (1995)

While I seriously doubt Mobb Deep’s ‘Survival of the Fittest’ song was intended to enlighten their audience with the ideas of evolution theory, I’d like to refer to this song to discuss the famous “survival of the fittest”-slogan. Because next to the Mobb Deep song (from the album ‘The Infamous’), it’s also a famous, popular and punchy ‘summary’ of Darwin’s evolution theory. It was introduced by Herbert Spencer in 1851 – seven years before Darwin re-used it in his revolutionary “The Origin of Species”.

In their song, Mobb Deep rap about living and surviving the harsh street life in Queens, New York City. Listening to this fine piece of East Coast rap made me wonder how scientifically valid this ‘street knowledge’ they provide us could be…

In the chorus Mobb Deep further elaborate on their title: ‘Survival of the fit, only the strong survive’. Shouldn’t that be ‘Survival of the fit, only the well adapted survive’? It might not sound as nice, but it would be more correct, at least from a evolution theory point of view. Darwin’s evolution theory does not imply the strongest or most physically fit will survive. It implies that individuals that fit best in their environment will! This misinterpretation of the word ‘fit’ in ‘survival of the fittest’ is (unfortunately) a very common one.

Darwin’s evolution theory is not about being strong, it is about adapting to the environment, surviving, and ultimately about reproducing to pass on genes. So, while Prodigy (one of two rappers in Mobb Deep) raps “I’m goin’ out blastin’, takin’ my enemies with me / And if not, they scarred, so they will never forget me” one could argue he’d be better off staying at home and reproducing (which, to be fair, is another recurring theme in Mobb Deep’s work).

But before we accuse Mobb Deep of misunderstanding the the word ‘fit’, let’s consider a possible alternative explanation: the artists of Mobb Deep might completely disagree to the evolution theory as Darwin formulated it. Rather, they might be strong advocates of Herbert Spencer’s ideas – the man who invented the slogan.

Spencer was a firm believer of Social Darwinism (before it was called Social Darwinism): the integration of Darwin’s evolution theory on ideas on human society. It dictates that in society, the strong will survive at cost of the weak, and that man should not offer a helping hand to the weak in society, as that would go against the natural order of things.

A controversial philosophy, especially today, but could it make sense if we put it in the context of Mobb Deep? The rappers came from poor life in the ghetto, worked their way up, sold millions of albums and eventually became wealthy through it. One could argue that Prodigy and Havoc are in fact the fittest to survive in contemporary human society!

Whatever the case, misinterpretation of a word or strong Social Darwinism, the fact remains that ‘survival of the fittest’ is a pretty strong and powerful slogan – one of which I personally do not mind if it’s applied in scientifically correct ways or not!

CS Column 1: Emergence
14/03/2010 (11:35)
1 views

buzzing bikes in ho chi minh city

Emerging chaos: The rules of Vietnamese traffic

When I took this picture in Ho Chi Minh City, Vietnam, I was awe-struck by the chaotic traffic. Dozens of “motobikes” buzz down the streets, seemingly not paying any attention to traffic lanes and rules, oncoming traffic or anything in their vicinity. Cars move through the thick clouds of bikes, and some brave souls even pedal their bicycles straight through it.

For an outsider such as myself, it initially looked like a totally random and chaotic event. Did these people just hope for the best when they were driving through their city? It was obvious all of this chaos would have to work out one way or another. Eventually – I assumed – everyone got where they were going. But how?

Soon I learned there is in fact a systematic at play, and there are plenty of unwritten rules involved in the apparent chaos. You learn this with the one confrontation you cannot avoid: crossing a road on foot (a very intimidating undertaking at first). The basic rule is simple: keep on moving – as long as you do, people manage to anticipate your path and will make sure not crash into you. The next step is that of total immersion: hop on a bike and jump right into traffic.

Once you participate, you realize how simple it actually works. It felt like I was part of a flock – all neighboring motomen adjusted and maintained their speed based on mine and that of the other drivers directly around us. This was not at all obvious when I was observing the traffic from the sidewalk. Eventually I didn’t even worry about horrible fatal accidents anymore, a theme predominantly on my mind when I was only watching the traffic…

Even if it looks simple when you’re in traffic, there is still speeding and overtaking, not everyone is heading to the same destination, so people are constantly moving in and out of the flock. The same principle however applies: when you take a turn, all is fine as long as your movement is fluid. It’s not the turn signals that will save you here: clear and predictable movement will.

What at first seemed totally unnatural to me started feeling more natural, and eventually made sense to me. But it only started to make real sense once I was back home and started reading about swarm intelligence and flocking behavior. The same three rules flocking behavior dictates seem to apply in Vietnamese traffic: separation (avoiding neighbors), alignment (keeping roughly the same direction) and cohesion (sticking together). These simple rules are all you need to create a realistic computer model of a flock of birds, and indeed it’s also all you need to create what seems to be ordered chaos on the roads of a Vietnamese city – I followed the same rules when driving through Ho Chi Minh City on my rental motobike.

As a matter of fact, when I came back home I had to re-adjust to the way traffic works in Holland. Traffic lights, zebra crossings, and the rules of the road were deciding for me where I was going. The Vietnamese traffic which at first seemed unnatural, chaotic and most of all very scary, eventually felt natural, ordered and elegant in its simplicity.

Win een paar ecosandalen
07/03/2010 (21:16)
2 views

Tech the Future

Tech The Future

… Want die geven we zomaar weg op www.TechTheFuture.com!

Tech the Future is het laatste project waar ik me op heb gestort samen met partner in crime Augustus. Een blog over technologie, duurzaamheid, eco, wetenschap, etc. Sinds 3 maart is TTF nu online en het lijkt nu allemaal prima te lopen, met flink wat content in die korte tijd (voor 2 drukbezette bloggers) en veelbelovende stats. In het kader van web 2.0-erigheid hebben we een Facebook groep van 200 man, en we twitteren ook een hoop! Want zo werkt dat tegenwoordig nu eenmaal.

In ieder geval, ga voor die Sanük sandalen! Post een comment op dit artikel.

Isolde in Nanoland afl.1
01/03/2010 (14:12)
3 views

Voor dit programma heb ik de leader + animaties gemaakt.
Voor w24/Teleac/kennislink.

zeeburg, what else?
17/02/2010 (20:48)
1 views

Lovesow 2.0
22/01/2010 (15:02)
0 views

Warsow‘s free community blog Lovesow.net got a small restyle:

Sneak preview of ”storm in glas water”
08/01/2010 (18:34)
2 views

Source: www.youtube.com

in Zwolle…
28/12/2009 (14:38)
0 views

dendert het leven 24 uur per dag door:

Future vision
22/12/2009 (14:51)
0 views

FPS, First Person Search. A view of the future