Two papers accepted at AIOG 2026

Two papers I co-authored have been accepted at the AI & Open Government Workshop (AIOG 2026), co-located with ICAIL 2026 in Singapore on June 8.

Both come out of student work at the ICAI OpenGov Lab, and tackle different sides of the same broad problem: automating parts of the FOIA process that are currently bottlenecked by manual review.

Continue reading “Two papers accepted at AIOG 2026”

Paper accepted at ICAIL 2026: LLMs for Generating Decision Models from Legal Text

🎉 Full paper accepted at the 21st International Conference on Artificial Intelligence and Law (ICAIL 2026): “From Legal Text to Executable Decision Models: Evaluating Structured Representations for Legal Decision Model Generation”

Get the preprint, here:

  • [PDF] D. Graus, “From legal text to executable decision models: evaluating structured representations for legal decision model generation,” in Proceedings of the international conference on artificial intelligence and law (icail), Singapore, 2026.
    [Bibtex]
    @inproceedings{graus2026legaltext,
    author = {Graus, David},
    title = {From Legal Text to Executable Decision Models: Evaluating Structured Representations for Legal Decision Model Generation},
    booktitle = {Proceedings of the International Conference on Artificial Intelligence and Law (ICAIL)},
    year = {2026},
    month = jun,
    address = {Singapore},
    publisher = {ACM},
    note = {To appear},
    }

This paper studies LLMs and representation for generating executable decision models from legal text, using the Dutch Environment and Planning Portal (Digitaal Stelsel Omgevingswet (DSO)) as a case study, as it pairs complex hand-crafted decision models with legal articles. There’s 2 main findings!

1️⃣ We* find that providing input & output specifications (i.e., what variables a citizen must provide, and what the legal outcomes may be) improve LLM performance over raw legal text or text enriched with semantic role labels.

2️⃣ Interestingly, when evaluating generated models’ structural similarity to ground truth (do the models look similar?), and outcome similarity (do they yield the same legal outcomes?) we find complementarity: generated models can score well on structure while missing legal nuance, or vice versa. 🤏 More interestingly, we find that some models are structurally very dissimilar, and mostly much smaller/simpler, while yielding 100% identical legal outcomes! See attached pic for an example: the generated decision model yields 100% identical outcomes while completely ignoring 2 out of 4 input variables!

🎁 We release the full dataset of 95 production-grade decision models and their associated legal text, for reproducibility and extension of this work. It really is a pretty cool dataset!

Stay tuned for the preprint! See you in 🇸🇬 Singapore, where we’ll also host the 1st AI & Open Government (AIOG) Workshop!

* when I say “we”, I actually mean “I” but it sounds weird — this is a single author paper; but I have to thank Anne Schuth and Damiaan Reijnaers for feedback, help, and inspiration!

In Quest over “wat AI over je weet”

Erg leuk om zo in de Quest te staan! Vanuit de vraag “wat weet AI over jou?” kom je (ik?) al gauw uit op allerlei interessante Information Retrieval-vraagstukken die de huidige generatie AI-chatbots met zich meebrengt.

Zoals (natuurlijk) Retrieval-Augmented Generation (RAG), het verschil tussen parametric en non-parametric memory, de fragiliteit van AI rond de positie van relevante informatie (Lost In the Middle“-achtige uitdagingen). En hoe leg je al die thema’s uit zonder ook maar een technische term te gebruiken? Nou, zo dus, geloof ik!

Lees het stuk hier: https://www.quest.nl/tech/technologie/a70392166/wat-weet-chatgpt-over-jou-zo-werkt-ai-met-jouw-gegevens/

Nieuwe feature op zwaailicht.nu: nieuws!

Ik haal en koppel automatisch lokaal en online nieuws aan gegroepeerde en samengevatte meldingen op zwaailicht.nu. De bronnen variëren van landelijke titels als AD.nl, NU.nl, RTL Nieuws en NOS, tot lokale titels als Noordhollands Dagblad en de Stentor, en concullega’s als Alarmeringen.nl.

Nieuwsberichten verschijnen sinds nu op elke pagina waar de pieken beschreven staan, en ook op een verzamelpagina waar ik per nieuwsbericht gekoppelde meldingenpieken toon, inclusief het tijdsverschil tussen nieuwspublicatie en P2000-melding!

Neem een kijkje op zwaailicht.nu/nieuws!

Call for Papers: AI & Open Government Workshop at ICAIL 2026

We’re excited to announce the AI & Open Government Workshop (AIOG), co-located with ICAIL 2026 in Singapore on June 8, 2026. The call for papers is now open!

Together with Graham McDonald (University of Glasgow) and Jason R. Baron (University of Maryland), I’m co-organizing this workshop to bring together researchers and practitioners working at the intersection of AI and government transparency. As the research we do at the OpenGov Lab sits exactly at this intersection, we felt there was a need for a dedicated venue where these communities can meet: spanning information retrieval, legal AI, NLP, e-discovery, and open government practice.

Why this workshop?

Governments worldwide are grappling with the challenge of making public information accessible and transparent at scale. Whether it’s processing freedom of information requests, reviewing documents for sensitive content before publication, or making government archives searchable; AI has a growing role to play. At the same time, this raises important questions about reliability, fairness, and accountability.

AIOG aims to be a space where we can discuss these challenges across disciplinary boundaries. We’re particularly interested in bringing together people who don’t always meet at the same conferences: IR researchers, legal tech practitioners, government professionals, and transparency advocates.

Topics of interest

We welcome submissions on topics including, but not limited to:

  • FOIA and public records automation
  • AI-assisted sensitivity review and redaction
  • Large language models for government applications
  • Information retrieval in government archives
  • Transparency technologies and open government policies
  • Legal and ethical considerations in AI-enabled government transparency

Submission details

We accept two types of submissions:

  • Research papers: 3–9 pages (excluding references)
  • Position papers: 2–4 pages (excluding references)

All submissions should follow the ICAIL 2026 formatting guidelines. Papers may be under review elsewhere, and previously published work is welcome for presentation. Submissions are handled through OpenReview.

Important dates

  • Submission deadline: April 9, 2026
  • Notification of acceptance: May 1, 2026
  • Camera-ready deadline: May 20, 2026
  • Workshop: June 8, 2026

Note that our submission deadline falls after ICAIL’s main conference notification date (March 23), so if you have relevant work that didn’t make it into the main conference, we’d love to see it at AIOG.

Get involved

For more information, visit aiog.net or submit your work via OpenReview. Feel free to reach out to me at d.p.graus@uva.nl if you have any questions.

We hope to see you in Singapore! 🇸🇬

Vibe-code project: Zwaailicht.nu

De afgelopen weekenden heb ik zitten rommelen aan een hobbyprojectje samen met Claude Code: zwaailicht.nu. Zwaailicht.nu combineert een antenne aan een Raspberry Pi in mijn meterkast met mijn IR en NLP interesse: ik haal realtime P2000-meldingen van hulpdiensten in Nederland uit de lucht, en toon die op een interactieve kaart, gesorteerd op afstand (van je huidige locatie).

Die (publieke) P2000-berichten zijn nogal cryptisch, vol afkortingen, voertuigcodes, en capcodes, dus bouwde ik er een kleine pipeline bij voor semantische verrijking, geocoding, en burst detection, clustering, en LLM-gebaseerde samenvattingen om clusters losse berichten om te zetten naar iets begrijpelijks. Naast dat clusteren van “pieken” in meldingen sla ik alle verrijkte berichten op, laat je ze filteren (en subfilteren: woningbranden, voertuigen te water, spoorwegongevallen, of gevaarlijk stoffen), heb ik per stad een landingspagina waar ik bijhoud hoeveel en wat voor soort meldingen binnenkomen, en natuurlijk een ranking.

Vibe-coden met Claude Code is erg indrukwekkend en vermakelijk (je verzint wat, je bouwt wat), maar bovenal is het gewoon leuk om een aantal IR- en NLP-concepten (information extraction, document clustering, text generation) samen te laten komen in iets dat daadwerkelijk ergens draait. En een vraag te bantwoorden waar ik zelf eigenlijk gewoon altijd nieuwsgierig naar ben.

🚨 zwaailicht gezien? zwaailicht.nu!

in iBestuur on WooPush

iBestuur published a small article about the Open Data Conferentie where I gave my keynote on WooPush, see the snippet below:

De eerste keynote kwam van David Graus (ICAI OpenGov Lab). Hij liet zien dat open data pas echt impact heeft als het aansluit op wat burgers nodig hebben. Niet alleen het publiceren van open data telt, maar vooral dat informatie op een laagdrempelige manier beschikbaar is voor hergebruik.


Graus schetste het contrast tussen publieke data-infrastructuren en commerciële platforms zoals TikTok of Spotify, die hun gebruikers door en door kennen en actief content aanbevelen. ‘De vraag is niet: hoe maken we meer data openbaar? Maar: hoe zorgen we dat mensen de informatie vinden die voor hen betekenisvol is, zónder dat ze er expliciet om vragen?’


De uitdaging zit volgens Graus in de datakwaliteit (dus niet: gescande documenten, zwartgelakte pdf’s, ontbrekende metadata, et cetera), toegankelijkheid (denk aan: type documenten, gecombineerde pdf’s of decentrale bronnen) en de vraag: wie wordt er eigenlijk bereikt?


Die laatste is de grootste uitdaging, volgens Graus. ‘De wetenschappelijke literatuur laat zien: open data empowers the empowered,’ waarschuwt hij. ‘Als we willen dat open data recht doet aan publieke waarden, moeten we meer doen dan alleen publiceren.’

🔗 and read the full article here: Nieuwe energie op Open Data Conferentie 2026

WooPush: technology of the attention economy, for informed citizenship

TikTok knows what you want to see. Why doesn’t the government know what you need to know?

Yesterday I gave a keynote at the Dutch Ministry of the Interior’s Open Data Conference, where I shared my latest brainchild, working title: WooPush (an obvious combination of my current OpenGov and prior RecSys work).

The motivation: the Dutch government publishes enormous amounts of open data. In WooGLe alone we now have 8.8 million documents, and counting. That’s great, but we also know this doesn’t reach citizens very well: “open data empowers the empowered“, mostly researchers, journalists, and data diggers know how to work with it.

The idea behind WooPush is simple: what if we used the technology that Big Tech employs to keep you glued to your screen, notifications, recommendation systems, personalization, to proactively inform citizens about government decisions that affect them? Not to maximize engagement, but to inform and activate people. The technology of the attention economy, for informed citizenship.

The technology exists, we have the data, and we have the citizens. Now we just need to figure out how to build this responsibly. That’s what I’ll be working on the coming months.

Slides here: https://www.slideshare.net/slideshow/open-data-conferentie-2026-van-open-data-naar-een-beter-geinformeerde-maatschappij/285465110

LLMs for Axial Coding ECIR 2026 preprint available

Social scientists have been manually coding (labeling text segments to capture their essence, and clustering these labels into groups) large textual corpora for decades. Time that could have been spent doing research, annotating fire hydrants for Google Streetview, or spending time with loved ones.

For this reason and more, in our latest paper we turn to LLMs for automated axial coding of lengthy transcripts (political debates). We extend an ensemble-based open coding pipeline with two axial coding (grouping) strategies: “traditional” clustering with subsequent LLM labeling, and direct LLM-based grouping.

We find a clear trade-off: traditional clustering methods achieve high coverage and structural separation, and direct LLM grouping produces more concise, interpretable labels that are more similar to human-assign group labels, but with much lower coverage. Traditional clustering ensures broad representation; LLMs supply the interpretive layer that makes categories human-readable.

Get the preprint here:

  • [PDF] A. Parfenova, D. Graus, and J. Pfeffer, “From quotes to concepts: axial coding of political debates with ensemble lms,” in European conference on information retrieval (ecir), Delft, The Netherlands, 2026.
    [Bibtex]
    @inproceedings{parfenova2026quotes,
    title = {From Quotes to Concepts: Axial Coding of Political Debates with Ensemble LMs},
    author = {Parfenova, Angelina and Graus, David and Pfeffer, Juergen},
    booktitle = {European Conference on Information Retrieval (ECIR)},
    year = {2026},
    address = {Delft, The Netherlands},
    note = {To appear}
    }

This was work led by Angelina Parfenova. Our full dataset of 5k Dutch parliamentary debate utterances with LLM-assigned codes and categories are publicly available here: https://github.com/Likich/axial_coding_dataset.

in NEMO Kennislink on Europe and (the) AI (bubble)

Wat leuk om in dit stuk van NEMO Kennislink te staan, waarin ik, volledig tegen mijn natuur in, als genuanceerde voice of reason fungeer. Wie had dat gedacht 😅.

Want: moeten we nu als een malle in AI investeren, om niet de boot te missen? Uh, tja. We raken het scaling plateau, het eerste empirisch onderzoek dat binnendruppelt lijkt de beloofde productiviteitswinsten van AI wat te weerleggen, en het blijft ook wat verdacht dat al die big tech bedrijven iedereen, inclusief hun medewerkers, AI door de strot proberen te duwen. Maar, een AI Factory en betere faciliteiten en aandacht voor AI onderzoek lijkt mij een uitstekend plan (😇). En werken zonder Claude zie ik ook niet meer echt zitten… Afijn, je snapt het.

Mijn favoriete passage (naast “Graus kijkt vanaf zijn bureaustoel naar buiten“–uit het leven gegrepen):

“Op de arbeidsmarkt, waar discriminatie structureel voorkomt, kunnen AI-systemen dit probleem versterken, maar ook helpen tegengaan; mits goed ontworpen.”

Liever doen we AI goed, dan groot en duur!

Benieuwd naar mijn twijfels, en of ik geloof in AI of in de AI bubbel? Lees: Europa moet eigen positie claimen in de AI-race.

Dank aan Daan Appels voor het accuraat optekenen van mijn lichte twijfel, nuance, en voorzichtigheid met stellige uitspraken over de toekomst van AI, naast de wat meer optimistische blik van Michiel Bakker!

Quoted in Trouw article on mitigating harmful content in GenAI (Grok)

This weekend I was asked a few questions about mitigation of harmful content in AI, in response to the news around Groks use for generating sexualized content: Grok produceert massaal en ongewild seksuele afbeeldingen. Kan dat ook anders? I was quoted:

In theorie zijn AI-bots als Grok prima te begrenzen, zegt universitair docent David Graus (Universiteit van Amsterdam). Dat begint al bij de ontwikkeling ervan. “Je kunt de bot van tevoren bijsturen en trainen: ‘als ik zo’n vraag stel, wil ik zo’n soort antwoord zien.”En is de bot eenmaal in gebruik, zoals bij Grok? Dan kan een bedrijf achteraf altijd nog filters toevoegen.

“Bijvoorbeeld een verbod op de woordcombinatie ‘uitkleden’ en ‘deze foto’, om maar iets te noemen.” Dat filteren kun je zelfs ook weer met kunstmatige intelligentie doen, zegt Graus. “Dat werkt over het algemeen vrij goed.”

What didn’t make the cut due to space constraints, was my full answer, stating three layers of harmful content mitigation:

  1. Data: curate your data to reduce harmful content
  2. Training/fine-tuning: instruct your model to show (un) desired behavior
  3. Output: filtering with heuristics (words) or using LLMs-as-judge to estimate the likelihood the prompt will yield harmful content

Each of these methods work, have different costs, are likely all applied (to some extent), but neither is an end-all solution.

AI & Open Government Workshop at ICAIL 2026

Great news: our workshop proposal has been accepted at ICAIL 2026 in Singapore (June 8-12)!

Together with Graham McDonald (University of Glasgow) and Jason R. Baron (University of Maryland), we’ll be bringing together researchers, practitioners, and policymakers to explore how AI can support government transparency and public access to information.

We’re planning a full day with keynotes, paper presentations, breakout sessions, and plenty of structured dialogue. We’re soliciting both research papers and position papers offering insights from practice. Details on submission deadlines and the program committee will follow soon.

Check out aiog.net for more info, and follow @aiog.net on Bluesky for updates!

LLMs for Axial Coding paper accepted at ECIR2026!

Super happy with the acceptance of our ECIR 2026 Findings paper: “From Quotes to Concepts: Axial Coding of Political Debates with Ensemble LMs

In this work we apply fine-tuned LLMs for qualitative data analysis (axial coding) of Dutch political debates. This is work led by Angelina Parfenova (Technical University of Munich), who I met at last year’s ECIR in Lucca 🇮🇹. Love it when conference chats turn into papers!

From Quotes to Concepts: Axial Coding of Political Debates with Ensemble LMs

Abstract. We introduce the first method to perform axial coding using large language models (LLMs), transforming raw debate transcripts into concise, hierarchical categories. Axial coding is a qualitative analysis technique that organizes codes (labels) representing text into broader categories, enhancing document understanding and analysis. Starting with sentence-level labels (open codes) generated by an LLM ensemble with a moderator, we introduce an axial coding step that groups these codes into higher-order categories. We compare two strategies: (i) clustering (code + utterance) embeddings using density-based and partitioning algorithms followed by LLM labeling, and (ii) direct LLM-based grouping of codes and utterances into categories. We apply our method to Dutch parliamentary debates, converting lengthy transcripts into compact, hierarchically structured codes and categories. We evaluate our method using extrinsic metrics aligned with human-assigned topic labels (ROUGE-L, cosine, BERTScore), and intrinsic metrics describing code groups (coverage, brevity, coherence, novelty, JSD divergence). Our results reveal a trade-off: density-based clustering achieves high coverage and strong cluster alignment, while direct LLM grouping results in higher fine-grained alignment, but lower coverage (∼20%). Overall, clustering maximizes coverage and structural separation, whereas LLM grouping produces more concise, interpretable, and semantically aligned categories. To support future research, we publicly release the full dataset of utterances and codes, enabling reproducibility and comparative studies.

This is also my first full (and OpenGov-themed) co-authored paper since rejoining academia and running the OpenGov Lab!

Stay tuned for the preprint.

OpenGov Team in “Terminal Woo” hackathon

Together with my PhD students Damiaan Reijnaers and Maik Larooij and Jos Zuijderwijk, our muscle-for-hire from Utrecht University, the OpenGov team participated in the Terminal Woo hackathon. This hackathon was focused on the government’s side of the FOIA process, and aimed to result in tools that help civil servants respond to FOIA requests better and faster.

Maik Larooij, Damiaan Reijnaers, Jos Zuijderwijk (UU) and myself at work

We didn’t win, but we did manage to push 80s IR methods (pseudo-relevance feedback) packaged in shiny LLMs and presented through a fancy interface to the hackathon community. Read the blog by Maik on opengov.nl for all details and more pictures!

Continue reading “OpenGov Team in “Terminal Woo” hackathon”

opengov.nl is live

We have published our ICAI OpenGov Lab website at opengov.nl! It is a bit bare at the moment, containing some information on our team, projects, and some news items that were shared on socials. But as our research progresses, expect more, including papers and resources!

Website design (together with the logo and visual identity) by Rutger de Vries. The website is running on mkdocs-material, for easy posting by committing markdown files to our GitHub repository.