Another academic year, another (short)list of potential projects. Are you a final-year AI or data science student, interested in doing an internship with us? Reach out! First, read below why you would want to join us, and scroll further down for the following project descriptions:
- Job Description Generation (NLP)
- Conversational/QA approaches for resume information extraction (NLP)
- Segmentation of resumes (NLP, CV)
- Synthetic data for bias mitigation in recommender systems
- Career pathing
- Algorithmic planning
- Open project: IR, RecSys, NLP, or fair AI
Work with impact. At Randstad Groep Nederland IT you keep the country moving, enabling people across sectors to do their work, getting pizza on your table and your suitcase on the plane. Your AI solutions mean tomorrow’s recruiter is smarter and faster but still embodies our human forward approach, combining tech with a personal touch and putting people first – including you. Constantly experimenting, working on new NLP use cases and matching systems or expanding our self-service data platform. If you bring the idea we will provide the freedom to explore, so you can help us shape the world of work.
When you join us at randstad, you’ll join the data science chapter with over a dozen data scientists who work on a variety of projects, from recommender systems, to knowledge graphs, and time series forecasting. We have a strong academic network, like publishing papers and collaborating with researchers. Our chapter has PhD-level researchers and experienced industry data scientists.
We are headquartered in Diemen and will gladly welcome you to our office, but in the current times fully remote internships can be discussed.
We are looking for autonomous, creative, and independent students. We expect you to be proficient in python and machine learning, experience with working on AWS is a plus, but not a requirement (you’ll get the hang of it). We like and are experienced in publishing papers based on master thesae, so if you would like to publish your thesis, we’d be happy to support you!
1. Job Description Generation (NLP)
At Randstad, we have a rich dataset for mapping structured and unstructured data. To find the right candidate for a job, jobs are represented with structured “job request” descriptive metadata (e.g., company name, salary, company size, company sector, location, job title, etc.), and we sometimes (but not always) have a semi-structured textual job description (e.g., natural language, but sub-sectioned into parts such as job title, job description, candidate requirements, company description, perks).
To help our recruiters in writing job descriptions, we aim to apply deep encoder-decoder or transformer models to map from one to the other, e.g., given structured data, generate a vacancy text. This task is only a suggestion, the core asset in this project is our parallel corpus of structured data to unstructured text, allowing data-to-text generation. Additional directions of interest include personalized vacancy generation (by incorporating structured candidate data), or mapping the other way around: training structured information extractors for vacancy data.
- Preksha Nema, Shreyas Shetty, Parag Jain, Anirban Laha, Karthik Sankaranarayanan, & Mitesh M. Khapra. (2018). Generating Descriptions from Structured Data Using a Bifocal Attention Mechanism and Gated Orthogonalization.
- Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang, & Zhifang Sui. (2017). Table-to-text Generation by Structure-aware Seq2seq Learning.
- Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, & Dipanjan Das. (2019). Text Generation with Exemplar-based Adaptive Decoding.
- Sam Wiseman, Stuart Shieber, & Alexander Rush. (2017). Challenges in Data-to-Document Generation.
- Sam Wiseman, Stuart Shieber, & Alexander Rush. (2018). Learning Neural Templates for Text Generation.
2. Conversational/QA approaches for resume information extraction (NLP)
At its core (and with a bit of creativity), the job matching process can be seen as a vacancy asking questions to a resume: I am looking for… Having access to a large number of unstructured (parsed) resumes and having worked with deep embedding models and transformers such as SBERT, we are curious in unleashing the power of Question Answering over resumes.
After doing some experiments with pre-trained transformer models, we are confident fine-tuning can yield powerful conversational/QA applications for accessing information from resumes; having structured candidate descriptive metadata (such as name, education history, work experience, location) on the one hand – which can be modeled as questions and answers, and unstructured resumes on the other – which can be modeled as contexts for answer retrieval; we have all core ingredients for learning extractors.
Additional directions of interest include automated anonymization of resumes, and other subtasks of structured information extraction (e.g., PII removal). This project will involve leveraging the mapping between the unstructured and structured data sets.
- Caiming Xiong, Stephen Merity, & Richard Socher. (2016). Dynamic Memory Networks for Visual and Textual Question Answering.
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, & Peter J. Liu. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
- Soni, K. (2020). Evaluation of Dataset Selection for Pre-Training and Fine-Tuning Transformer Language Models for Clinical Question Answering. In Proceedings of the 12th Language Resources and Evaluation Conference (pp. 5532–5538). European Language Resources Association.
3. Segmentation of resumes (NLP, CV)
To reliably extract machine readable information from resumes, we need to leverage methods for segmenting resumes into different sub-parts (e.g., professional experience vs. education vs. general information). One way to approach this would be to do text segmentation, which is challenging as the parsing process is noisy due to the user-generated and non-standardized format of resumes, content appears all over the place. Another direction could be visual segmentation/block identification for resumes. We’ll give you as many resumes as you can chew on, and are looking for students interested in recent advances around deep learning (see, e.g., Google’s “Extracting Structured Data from Templatic Documents“) and computer vision, to develop clever algorithms for segmenting resumes into different parts.
- Bodhisattwa Majumder, Navneet Potti, Sandeep Tata, James B. Wendt, Qi Zhao, & Marc Najork (2020). Representation Learning for Information Extraction from Form-like Documents. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020) (pp. 6495-6504).
- Liu, H. (2019). Graph Convolution for Multimodal Information Extraction from Visually Rich Documents. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers) (pp. 32–39). Association for Computational Linguistics.
- Koshorek, J. (2018). Text Segmentation as a Supervised Learning Task. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) (pp. 469–473). Association for Computational Linguistics.
4. Synthetic data for bias mitigation in recommender systems
Together with prof. Emma Beauxis-Aussalet at the Civic AI Lab/VU Amsterdam we experimented with synthetic data generation for bias mitigation in our recommender system. We focused on the applicability of off-the-shelf models for generating realistic but synthetic candidate profiles, by analyzing the utility and privacy-related aspects of the data generated by such models. We found current models to be insufficient for our complex (raw) data, which may be noisy and is high dimensional and of a heterogeneous nature, with many different feature types (e.g., embeddings, categorical features, binary features and real-valued features).
For this follow-up project, we would like to further explore the domain of synthetic candidate data, e.g., by learning the constraints in which we may yield a model that we can use for generating realistic but synthetic candidate data, e.g., simplifying or restricting the input data (number and types of features), exploring additional/more complex synthetic data generation models (e.g., GANs). To study the applicability of using these less-than-perfect models for generating new training data with different distributions, or to explore the feasibility of publishing said models (by, e.g., further studying the privacy-related concerns).
Depending on the direction you choose, we can go more towards algorithmic challenges (more complex, deeper models), multi-stakeholder aspects (what are the constraints for publishing models), or application-oriented challenges (how can we leverage synthetic data for adjusting models).
- Hittmeir, M., Ekelhart, A., & Mayer, R. (2019). On the Utility of Synthetic Data: An Empirical Evaluation on Machine Learning Tasks. In Proceedings of the 14th International Conference on Availability, Reliability and Security. Association for Computing Machinery.
- Woo, M.J., Reiter, J., Oganian, A., & Karr, A. (2009). Global Measures of Data Utility for Microdata Masked for Disclosure Limitation. Journal of Privacy and Confidentiality, 1(1).
- Dankar, F., & Ibrahim, M. (2021). Fake It Till You Make It: Guidelines for Effective Synthetic Data Generation. Applied Sciences, 11(5).
- Aman Gupta, Deepak Bhatt, & Anubha Pandey. (2021). Transitioning from Real to Synthetic data: Quantifying the bias in model.
5. Career pathing
Part of Randstad’s activities is commitment is to help our talents grow. We have rich longitudinal data from job seekers and the labor market, i.e., talents that have been with us for months or years with extensive work experiences, going back years, and growing over time from one role to the next. By aggregated all these career paths, interesting patterns may emerge.
We have rich ways to represent both our candidates and jobs, through descriptive metadata such as skills (required or provided), sector information, and salary. This project revolves around unleashing the rich metadata and longitudinal data points of career paths to develop models and tools that help our candidates find their next step in their career. Do we need temporal graphs, or deep LSTMs to model career paths? You tell us! Potential projects include predicting the most common next role given a current, finding skills gaps, or building tools or MVPs that help job seekers plan their careers in steps, by, e.g., including transition probabilities between roles, or shortest path-finding.
- Kokkodis, Marios and Ipeirotis, Panagiotis G., Demand-Aware Career Path Recommendations: A Reinforcement Learning Approach (January 6, 2020). Management Science (forthcoming)
- Richard J. Oentaryo, Xavier Jayaraj Siddarth Ashok, Ee-Peng Lim, & Philips Kokoh Prasetyo. (2018). JobComposer: Career Path Optimization via Multicriteria Utility Learning.
- Meng, Q., Zhu, H., Xiao, K., Zhang, L., & Xiong, H. (2019). A Hierarchical Career-Path-Aware Neural Network for Job Mobility Prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 14–24). Association for Computing Machinery.
6. Algorithmic planning
Forget ancient boardgames, and solve real world problems for real people: AlphaGo, but for automated planning! At Randstad we schedule over 63,000 employees into over 170,000 shifts every week. Our finalized schedules are the end-state of a complex planning game, where we map our pool of employees into shifts, with complex and multiple constraints and rules at the employee, shift, and legal levels.
For this project, we would like to explore the potential for applying reinforcement learning methods to learn how to plan, leveraging our rich set of historic plannings and variables and available attributes and data variables associated with our employees and shifts. We’re looking for students that are interested in (deep) reinforcement learning and want to explore how approaches that have proved successful in chess, go, or other games to our planning challenge.
- Eysenbach, B., Salakhutdinov, R., & Levine, S. (2019). Search on the Replay Buffer: Bridging Planning and Reinforcement Learning. In Advances in Neural Information Processing Systems. Curran Associates, Inc..
- Václavík, R., Šůcha, P., & Hanzálek, Z. (2016). Roster evaluation based on classifiers for the nurse rostering problem. Journal of Heuristics, 22(5), 667–697.
- Ziyi Chen, Patrick De Causmaecker, & Yajie Dou. (2020). Neural Networked Assisted Tree Search for the Personnel Rostering Problem.
7. Open project: IR, RecSys, NLP, or fair AI
We are interested in hearing your ideas in the areas of information retrieval and recommender systems, natural language processing, knowledge graphs and bias and fairness.
As the world’s largest HR service provider we have lots of rich data on job seekers and jobs. See some of our other project proposal to get an indication of the data and projects we do. In summary:
- We have industry-scale recommender systems for matching candidates to jobs
- We have structured and unstructured data to represent candidates (structured database entries + unstructured resumes) and jobs (structured “job requests” metadata and unstructured job descriptions)
- We are working on skills and occupation knowledge graphs and deep representations of jobs and candidates
- We have rich labeled data such as interactions from job seekers to jobs, and from recruiters and candidates
- We have lots of historic placement data going back years
- We are interested in fair ranking and algorithmic bias in search and recommendations
In this open project, we look forward to receiving your ideas. We’d also be happy to have a dialog to come up with ideas together.