Research Archives • David Graus

“Fairness and Bias in Algorithmic Hiring: a Multidisciplinary Survey” preprint available

Together with some of the researchers in FINDHR we have authored and submitted an extensive survey on algorithmic hiring. The preprint is available here:

In this multidisciplinary work we bring together different perspectives from computer science, law, and practitioners to extensively survey literature and classify so-called “bias conducive factors,” i.e., factors that contribute to bias in the algorithmic hiring process. These factors span the complete hiring pipeline, and are classified into three main families: institutional biases, individual preferences, and technology blindspots.

In addition, our paper surveys bias measures (n=21) and bias mitigation strategies (n=12) that have been applied and studied specifically in the context of algorithmic hiring, which we present in unified notation.

Finally, our survey lists datasets, summarizes the relevant legal landscape (w.r.t. regulations and non-discrimination provisions concerning algorithmic hiring in the EU and the US), and shows practical considerations and examples for bias mitigation in practice (which was my main contribution to this paper).

One of my personal main positive takeaways from this paper is around the potential of positive effects that algorithmic components can have in an inherently biased and complex hiring process, i.e.:

One upshot of understanding bias as an inherently intersectional process is that it also offers a way to reduce discrimination. Since the factors that create bias are interrelated and mutually reinforcing, by halting or ameliorating one BCF, we may introduce positive feedback loops on other BCFs. By removing the discriminatory effect of any one factor, we can hope to reduce its influence on the other factors that reinforce each other in a discriminatory way.

All in all, I am very happy and proud to be listed in this monumental work, which surveys a highly complex field and leaves both enough pointers to get started as useful recommendations for future work, grounded in (gaps in) extensive literature.

Participating in the AIMMES 2024 Workshop

On March 20 I am participating in the first Workshop on AI bias: Measurements, Mitigation, Explanation Strategies Amsterdam, as part of the AI Fairness Cluster Inaugural Conference (FINDHR is part of the AI Fairness Cluster). I am looking forward to this workshop with a strong program, where I have also contributed to the following two papers that were accepted for presentation;

Alessandro Fabris, Nina Baranowska, Matthew J. Dennis, David Graus, Philipp Hacker, Jorge Saldivar, Frederik Zuiderveen Borgesius and Asia J. Biega Bias Conducive Factors in Algorithmic Hiring
Adam Mehdi Arafan, David Graus, Fernando P. Santos and Emma Beauxis-Aussalet End-to-End Bias Mitigation in Candidate Recommender Systems with Fairness Gates (Extended Abstract)

FINDHR CV Data Donation Campaign

🗣️ Please consider donating your (anonymized) CV to advance research into bias mitigation in algorithmic hiring!

With Randstad we are part of a consortium of research institutions (e.g., University of Amsterdam, Radboud Universiteit, Universitat Pompeu Fabra), civil society organizations (e.g., AlgorithmWatch), and companies (e.g., Adevinta) under the EU-funded FINDHR research project.

The FINDHR project aims to:
1️⃣ create new ways to measure algorithmic bias,
2️⃣ propose technical implementations for bias mitigation strategies, and
3️⃣ meaningfully incorporate human expertise
in algorithmic hiring systems (i.e., job/job seeker recommender systems).

To achieve these ambitious goals, the project requires real CVs and résumés. For that reason, FINDHR has initiated a CV donation campaign, where you’ll be able to donate your (anonymized) CV with just a few clicks. These donated CVs will be used to generate a dataset of realistic-but-fake synthetic CVs, that will serve as the basis for studying and developing bias and bias mitigation in job/job seeker recommender systems.

Your donated data will be safe: stored securely, can be deleted/withdrawn at any time upon request, and only accessible to authorized persons in the FINDHR research project who are required to sign confidentiality agreements.

Please consider donating your CV to accelerate research into bias and bias mitigation strategies for algorithmic hiring systems! For more details, check the donation campaign’s FAQ (or ping me!).

Donate your CV with just a couple of clicks here: findhr.eu/datadonation!

“Transfer learning for multilingual vacancy text generation” preprint available

Anna Lőrincz‘ UvA MSc. data science thesis “Transfer learning for multilingual vacancy text generation” — which was graded a 9/10 💫 — was recently accepted at the The Second Version of Generation, Evaluation & Metrics (GEM) Workshop 2022 which will be held as part of EMNLP, December 7-11, 2022!

Get the pre-print here:

A. Lőrincz, D. Graus, D. Lavi, and J. L. M. Pereira, “Transfer learning for multilingual vacancy text generation,” in Proceedings of the 2nd workshop on natural language generation, evaluation, and metrics (gem), Abu Dhabi, United Arab Emirates (Hybrid), 2022, p. 207–222.
[Bibtex]

@inproceedings{lorincz2022transfer,
author = {L{\H{o}}rincz, Anna and Graus, David and Lavi, Dor and Pereira, Jo{\~a}o L. M.},
title = {Transfer learning for multilingual vacancy text generation},
booktitle = "Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates (Hybrid)",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.gem-1.18",
doi = "10.18653/v1/2022.gem-1.18",
pages = "207--222",
abstract = "Writing job vacancies is a repetitive and expensive task for humans. This research focuses on automatically generating the benefit sections of vacancies at redacted from job attributes using mT5, the multilingual version of the state-of-the-art T5 transformer trained on general domains to generate texts in multiple languages. While transformers are accurate at generating coherent text, they are sometimes incorrect at including the structured data (the input) in the generated text. Including the input correctly is crucial for vacancy text generation; otherwise, the candidates may get misled. To evaluate how the model includes the input we developed our own domain-specific metrics (input generation accuracy). This was necessary, because Relation Generation, the pre-existing evaluation metric for data-to-text generation uses only string matching, which was not suitable for our dataset (due to the binary field). With the help of the new evaluation method we were able to measure how well the input is included in the generated text separately for different types of inputs (binary, categorical, numeric), offering another contribution to the field. Additionally, we also evaluated how accurate the mT5 model generates the text in the requested language. The results show that mT5 is very accurate at generating the text in the correct language, at including seen categorical inputs and binary values correctly in the generated text. However, mT5 performed worse when generating text from unseen city names or working with numeric inputs. Furthermore, we found that generating additional synthetic training data for the samples with numeric input can increase the input generation accuracy, however this only works when the numbers are integers and only cover a small range.",
}

In her work, Anna explores transformer models for data-to-text generation, or more specifically: given structured inputs such as categorical features (e.g., location), real valued features (e.g., salary of hours of work per week), or binary features (e.g., contract type) that represent benefits of vacancy texts, the task is to generate a natural language snippet that expresses said feature.

Anna finds that using transformers greatly increases (vocabulary) variation when compared to template-based models, and needs less human effort. The results were — to me — surprisingly good, another proof that transformers are taking over the world and making traditional NLP methods partly obsolete.

I was very much impressed with this work! But, to show how even transformers are not perfect, yet, I present you with my favorite error from the paper:

input: LOCATION = Zwaag
output: Pal gelegen achter het centraal station Zwaaijdijk!

Hope to catch you sometime in Zwaaijdijk!

RecSys in HR 2022 Workshop Recording available

We have published the full recording of our RecSys in HR 2022 workshop, which we held September 22 in Seattle, WA, USA.

The video is 5h42m43s long, so to guide you, I provide you the following list of highlights (see the video description for timestamps that will allow you to instantly skip to the sections described below):

1️⃣ Our first keynote speaker, Robyn Rap, a data science leader at Indeed.com talks in depth about the importance of collaboration between #UX and Data Scientists in evaluating and developing search and recommendation systems. She provides a great (broad) overview of the challenges and differences of doing recsys in HR, compared to more common scenarios such as e-commerce or media. Great introduction into our deep field!

2️⃣ The panel, which includes Randstad’s Helen Hulsker, Carlos Castillo (ChaTo), Liangjie Hong (director of AI, engineering at LinkedIn) and the aforementioned Robyn Rap (still Indeed.com). The topics discussed by these experts: the role of HR Tech in the Global Labor Shortage, fair AI in Practice, multi-stakeholder development of HR Tech, and Regulation and Accountability.

3️⃣ Our second keynote speaker, Liangjie Hong, presents some of the foundational engineering work at LinkedIn that aims to serve many downstream AI applications, which revolves around a pipeline with (continuously updating) embedding representation for job seekers, jobs, and everything else, which are fused with LinkedIns (huge) Knowledge Graph.

4️⃣ There’s also a bunch of interesting paper presentations, e.g., a bunch from Indeed.com: Model Threshold Optimization for Segmented Job-Jobseeker Recommendation System (where the authors show a sneakpeek in their overall setup of recommendations at Indeed.com), Flexible Job Classification with Zero-Shot Learning by thomas lake, which shows how to use off-the-shelf transformer models for doing job classification. And Beyond human-in-the-loop: scaling occupation taxonomy at Indeed: where the authors show how they combine human intelligence with automation for scaling taxonomies across languages and markets. Finally, some interesting and very pragmatic/hands-on papers on skill extraction, e.g., Mike Zhang‘s Skill Extraction from Job Postings using Weak Supervision and Jens-Joris Decorte‘s Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction.

Enjoy watching!

Three papers accepted at RecSys in HR 2022 Workshop

🎉 A little success to share: three of our former data science interns at the Data Science chapter at Randstad Groep Nederland have written and published their master theses at our upcoming RecSys in HR Workshop; an academic workshop that revolves around AI in HR, which is part of an ACM International Conference on Recommender Systems (the AI systems used for matching; whether it is Netflix movies to users, or in our case; jobs to job seekers).

As always, the work of the students is pretty technical, but I will go ahead and try to provide little human-understandable summaries below.

Explainable Career Path Predictions using Neural Models

Roan Schellingerhout worked under supervision of Volodymyr Medentsiy on Explainable Career Path Prediction using Neural Networks, where he trained deep neural networks on our own talent work history data, to create a tool that can help consultants or talents to predict possible career switches, given as input a talent’s work history. The predictions are visually explained, in the sense that the underlying reasons for proposing a certain job are provided. Roan tested these visualizations on consultants, and found consultants generally like them.

End-to-End Bias Mitigation in Candidate Recommender Systems with Fairness Gates

A. M. Arafan, D. Graus, F. P. Santos, and E. Beauxis-Aussalet, “End-to-end bias mitigation in candidate recommender systems with fairness gates,” in Recsys in hr’22: the 2\textsuperscriptnd workshop on recommender systems for human resources, 2022.
[Bibtex]

@inproceedings{arafan2022end,
author = {Arafan, Adam Mehdi and Graus, David and Santos, Fernando P. and Beauxis-Aussalet, Emma},
title = {End-to-End Bias Mitigation in Candidate Recommender Systems with Fairness Gates},
year = {2022},
booktitle = {RecSys in HR’22: The 2\textsuperscript{nd} Workshop on Recommender Systems for Human Resources},
numpages = {8},
location = {Seattle, WA, USA and Online},
series = {CEUR Workshop Proceedings},
url = {https://ceur-ws.org/Vol-3218/RecSysHR2022-paper_6.pdf},
month={9}
}

Adam Arafan worked under supervision of myself on “End-to-End Bias Mitigation in Candidate Recommender Systems with Fairness Gates,” in his thesis he experimented with making the SmartMatch Talent Recommender more fair (at the level of gender), either by changing the “input” of the algorithm (for example; by balancing male and female candidates in the training data), or by changing its “output” (for example: for a given list of candidates, go through the list to make sure the top 10 has a 50/50 balance between male and female candidates). His work is novel because these type of “bias mitigation” strategies have been studied in isolation, but never together.

Automated Personnel Scheduling with Reinforcement Learning and Graph Neural Networks

B. Platten, M. Macfarlane, D. Graus, and S. Mesbah, “Automated personnel scheduling with reinforcement learning and graph neural networks,” in Recsys in hr’22: the 2\textsuperscriptnd workshop on recommender systems for human resources, 2022.
[Bibtex]

@inproceedings{platten2022automated,
author = {Platten, Benjamin and Macfarlane, Matthew and Graus, David and Mesbah, Sepideh},
title = {Automated Personnel Scheduling with Reinforcement Learning and Graph Neural Networks},
year = {2022},
booktitle = {RecSys in HR’22: The 2\textsuperscript{nd} Workshop on Recommender Systems for Human Resources},
numpages = {10},
location = {Seattle, WA, USA and Online},
url = {https://ceur-ws.org/Vol-3218/RecSysHR2022-paper_1.pdf},
series = {CEUR Workshop Proceedings},
month={9}
}

Ben Platten worked under supervision of Sepideh Mesbah on Automated Personnel Scheduling with Reinforcement Learning and Graph Neural Networks, in which he experimented with “reinforcement learning” (a specific machine learning paradigm) which in theory suits the challenging task of scheduling well. He experimented on a toy problem to assess that, indeed, the method seems to work quite well.

See the full list of accepted papers here: https://recsyshr.aau.dk/accepted-papers/.

And stay tuned for the pre-prints, which I’ll share as soon as they’re available!

RecSys in HR at ACM RecSys 2022 in Seattle!

Fantastic news! We’ve received word the 2nd edition of our “Recommender Systems for Human Resources” (RecSys in HR) Workshop has been accepted to be included in the ACM RecSys 2022 program, to be held in Seattle!

Last year’s (first) edition of our workshop was co-located with ACM RecSys 2021 in Amsterdam, and featured two keynotes, a panel, breakout sessions and 8 paper presentations. The recording, workshop proceedings, and a workshop report are available through our workshop’s website at: https://recsyshr2021.aau.dk/

Check back there soon for information on the 2022 edition we’re planning with Toine Bogers, Mesut Kaya, Francisco Gutiérrez, and newly joined co-organizers Sepideh Mesbah (Randstad Groep Nederland) and Chris Johnson (Indeed.com)!

Two papers accepted at the RecSys in HR Workshop!

Happy to have learned we have two papers accepted at the first Recommender Systems in Human Resources Workshop, co-located with ACM RecSys 2021! These papers are the first academic publications of the Data Science Chapter at Randstad Groep Nederland!

M. de Groot, J. Schutte, and D. Graus, “Job posting-enriched knowledge graph for skills-based matching,” in Workshop on recommender systems for human resources (recsys in hr), Amsterdam, Netherlands, 2021.
[Bibtex]

@inproceedings{degroot2021job,
author = {de Groot, Maurits and Schutte, Jelle and Graus, David},
title = {Job Posting-Enriched Knowledge Graph for Skills-based Matching},
year = {2021},
booktitle = {Workshop on Recommender Systems for Human Resources (RecSys in HR)},
numpages = {9},
location = {Amsterdam, Netherlands},
address = {Amsterdam, Netherlands},
month={10},
series = {CEUR Workshop Proceedings},
url = {https://ceur-ws.org/Vol-2967/paper_3.pdf},
}

D. Lavi, V. Medentsiy, and D. Graus, “Consultantbert: fine-tuned siamese sentence-bert for matching jobs and job seekers,” in Workshop on recommender systems for human resources (recsys in hr), Amsterdam, Netherlands, 2021.
[Bibtex]

@inproceedings{lavi2021consultantbert,
author = {Lavi, Dor and Medentsiy, Volodymyr and Graus, David},
title = {conSultantBERT: Fine-tuned Siamese Sentence-BERT for Matching Jobs and Job Seekers},
year = {2021},
numpages = {8},
booktitle = {Workshop on Recommender Systems for Human Resources (RecSys in HR)},
location = {Amsterdam, Netherlands},
address = {Amsterdam, Netherlands},
month={10},
series = {CEUR Workshop Proceedings},
url = {https://ceur-ws.org/Vol-2967/paper_8.pdf},
}

Curious to know what they’re about? I tweet better than I blog 👇

✌️ In "conSultantBERT: Fine-tuned Siamese Sentence-BERT for Matching Jobs and Job Seekers" we fine-tune a siamese SBERT model for matching resumes to vacancy texts using a high-quality dataset of over 270k resume-vacancy pairs labeled by our staffing consultants. It works ✅! pic.twitter.com/iZRzicZE9F
— David Graus (@dvdgrs) August 23, 2021

Stay tuned for pre-prints! See the other accepted papers here.

Disclaimer: yes, I co-organize the workshop, but I was not involved with reviewing/decisions, we have a great (and independent) Program Committee for that!

Co-organizing “RecSys in HR” workshop at RecSys 2021!

We received news that our workshop proposal “RecSys in HR: Workshop on Recommender Systems for Human Resources” was accepted for inclusion in the 15th ACM Conference on Recommender Systems (RecSys 2021) program! That means we’ll be running a full-day workshop with (research and position) papers, keynotes, and a panel (all TBD) during the conference which will be held in Amsterdam, 27th September-1st October 2021.

We wrote this workshop proposal with Toine Bogers (Aalborg University), Mesut Kaya (Aalborg University), Katrien Verbert (KU Leuven) and Francisco Gutiérrez (KU Leuven), at the initiative/idea of Toine, who virtually approached me in RecSys 2020’s gather.town :-D. Toine and Mesut work on a large research project with Denmark’s largest online recruitment portal, JobIndex.

For now, check out our stunning stub page at https://recsyshr2021.aau.dk/ and stay tuned for updates!

“Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation” accepted at UMAP2020!

The paper we wrote with former FD team mates Feng Lu and Anca Dumitrache has been accepted for publication as a long paper at UMAP 2020, the 28th Conference on User Modeling, Adaptation and Personalization! (I fondly remember my last time at UMAP, in 2016 😏)

We have published a preprint of this paper, get it: here, or from arXiv.

F. Lu, A. Dumitrache, and D. Graus, “Beyond optimizing for clicks: incorporating editorial values in news recommendation,” in Proceedings of the 28th acm conference on user modeling, adaptation and personalization, New York, NY, USA, 2020, p. 145–153.
[Bibtex]

@inproceedings{lu2020beyond,
author = {Lu, Feng and Dumitrache, Anca and Graus, David},
title = {Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation},
year = {2020},
isbn = {9781450368612},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3340631.3394864},
doi = {10.1145/3340631.3394864},
booktitle = {Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization},
pages = {145–153},
numpages = {9},
keywords = {usefulness, news recommendation, editorial values},
location = {Genoa, Italy},
series = {UMAP ’20}
}

Update 08/05: Cool, @NickKivits mentioned our paper in his Villamedia column: “Het idee van de filterbubbel kan in de prullenbak“ and newsletter (with over 11k subscribers!)

I am particularly happy with this work because:

1️⃣ In our paper we show how you can align algorithm design across stakeholders (in this case: data scientists and journalists), by effectively modeling an editorial value (“dynamicness”) in the news recommender of Het Financieele Dagblad without losing accuracy.

2️⃣ We present (more) empirical proof that #recsys (can) offer(s) users *more* diverse, serendipitous, and dynamic lists of articles, compared to editorially curated lists, and hence (can) help in *avoiding*, not creating filter bubbles!

3️⃣ It is the perfect wrap-up of our Google DNI-funded “SMART Journalism” project at FD Mediagroep (we wrote most of the paper in our spare time after the project ended).

See below the video of the talk at UMAP 2020 below:

PodRecs: Workshop on Podcast Recommendations PC

I was invited to join the program committee of (the first) PodRecs: Workshop on Podcast Recommendations (to be held at RecSys’20).

Since our work on BNR SMART Radio, I am really interested in the space of audio, recommender systems, and information retrieval. Curious to see the submissions!

See the PodRecs call for papers, and check out the website, by clicking the image below.

“Improving automated segmentation of radio shows with audio embeddings”

Update (28/1/2020): Oberon’s thesis was accepted and will be published at the IEEE 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020), to be held May 4-8 in Barcelona, Spain! The submission is co-authored with Klaus Lux and myself.

Oberon Berlage recently successfully defended his MSc. thesis (title above!) for the Data Science Master at University of Amsterdam, and graduated with a whopping 9!

He’s the first academic offspring of our AI Team @ FD Mediagroep, and worked on BNR SMART Radio‘s segmenter. Oberon improved our text-based segmenter by adding audio embeddings, improving the F1 score with +32%!

His thesis is now online, check it out at: http://scriptiesonline.uba.uva.nl/document/673254

“The birth of collective memories” published in JASIST!

The journal paper “The birth of collective memories: Analyzing emerging entities in text streams” I wrote with Daan Odijk and Maarten de Rijke is now (finally) published at JASIST! It is published under OpenAccess/CC BY 4.0 and available in “early view” (published before it’s published) in the Wiley Online Library. Click on the image below to access it:

My PhD Thesis “Entities of Interest — Discovery in Digital Traces” is online!

My PhD thesis, Entities of Interest — Discovery in Digital Traces is now available for download. Click on the cover below to head to graus.nu/entities-of-interest and grab your electronic copy of the little booklet that took me 4+ years to write!

James Chen Best Student Paper Award at UMAP 2016

Our paper,

D. Graus, P. N. Bennett, R. W. White, and E. Horvitz, “Analyzing and predicting task reminders,” in Proceedings of the 2016 conference on user modeling adaptation and personalization, New York, NY, USA, 2016, p. 7–15.
[Bibtex]

@inproceedings{graus2016analyzing,
author = {Graus, David and Bennett, Paul N. and White, Ryen W. and Horvitz, Eric},
title = {Analyzing and Predicting Task Reminders},
year = {2016},
isbn = {9781450343688},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2930238.2930239},
doi = {10.1145/2930238.2930239},
booktitle = {Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization},
pages = {7–15},
numpages = {9},
keywords = {prospective memory, reminders, log studies, intelligent assistant},
location = {Halifax, Nova Scotia, Canada},
series = {UMAP '16}
}

was awarded best student paper, at UMAP 2016!