Improving automated segmentation of radio shows with audio embeddings published @ IEEE ICASSP2020

Oberon Berlage’s MSc. thesis: “Improving automated segmentation of radio shows with audio embeddings” which he wrote under my supervision during his internship at FD Mediagroep was awarded a 9/10, under condition that the work was publishable.

Turns out it was, as it was recently accepted at IEEE ICASSP2020 (the 45th International Conference on Acoustics, Speech, and Signal Processing) without any additional work/experiments (just a bit of reduction). But you already knew this… Oberon will be presenting this work in Barcelona, thanks to the generous support of UvA’s Information Studies program.

We now published a preprint, read it below:

  • [PDF] [DOI] O. Berlage, K. Lux, and D. Graus, “Improving automated segmentation of radio shows with audio embeddings,” in Icassp 2020 – 2020 ieee international conference on acoustics, speech and signal processing (icassp), 2020, pp. 751-755.
    [Bibtex]
    @inproceedings{berlage2020improving,
    author={O. {Berlage} and K. {Lux} and D. {Graus}},
    booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    title={Improving Automated Segmentation of Radio Shows with Audio Embeddings},
    year={2020},
    pages={751-755},
    doi={10.1109/ICASSP40776.2020.9054315},
    url={https://doi.org/10.1109/ICASSP40776.2020.9054315}
    }

His work revolved around improving BNR SMART Radio‘s text-based segmentation by incorporating audio signals in the form of audio embeddings. This turns out to improve over our text-based baseline by a whopping +32.3% F1-measure!

Even better: an audio-only approach, trained on a smallish openly available dataset, outperforms our text-only baseline by 9.4%. This means the segmentation method can be employed without need for audio transcription, which could be a money-saver.

“Improving automated segmentation of radio shows with audio embeddings”

Update (28/1/2020): Oberon’s thesis was accepted and will be published at the IEEE 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020), to be held May 4-8 in Barcelona, Spain! The submission is co-authored with Klaus Lux and myself.

Oberon Berlage recently successfully defended his MSc. thesis (title above!) for the Data Science Master at University of Amsterdam, and graduated with a whopping 9!

He’s the first academic offspring of our AI Team @ FD Mediagroep, and worked on BNR SMART Radio‘s segmenter. Oberon improved our text-based segmenter by adding audio embeddings, improving the F1 score with +32%!

His thesis is now online, check it out at: http://scriptiesonline.uba.uva.nl/document/673254