Generating Emotionally Relevant Musical Scores for Audio Stories

Steve Rubin, Maneesh Agrawala


Highly-produced audio stories often include musical scores that reflect the emotions of the speech. Yet, creating effective musical scores requires deep expertise in sound production and is time-consuming even for experts. We present a system and algorithm for re-sequencing music tracks to generate emotionally relevant music scores for audio stories. The user provides a speech track and music tracks and our system gathers emotion labels on the speech through hand-labeling, crowdsourcing, and automatic methods. We develop a constraint-based dynamic programming algorithm that uses these emotion labels to generate emotionally relevant musical scores. We demonstrate the effectiveness of our algorithm by generating 20 musical scores for audio stories and showing that crowd workers rank their overall quality significantly higher than stories without music.

Our algorithm re-sequences the beats (circles) of the input music (bottom row) to match the emotions of the speech (top row). Our algorithm inserts pauses in speech and music, and makes musical transitions that were not in the original music in order to meet these constraints.

Research Paper

PDF (3.1M)


Musical scores generated by our system


MP4 (31.9M)

Generating Emotionally Relevant Musical Scores for Audio Stories
UIST 2014, October 2014. pp. 439-448.