SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual Data

Visit Github

Abstract:

We present SONIQUE, a model for generating background music tailored to video content. Unlike traditional video-to-music generation approaches, which rely heavily on paired audio-visual datasets, SONIQUE leverages unpaired data, combining royalty-free music and independent video sources. By utilizing large language models (LLMs) for video understanding and converting visual descriptions into musical tags, alongside a U-Net-based conditional diffusion model, SONIQUE enables customizable music generation. Users can control specific aspects of the music, such as instruments, genres, tempo, and melodies, ensuring the generated output fits their creative vision.

Generated by SONIQUE

Output from V2Meow

Video to Music Generation (automated without tuning):

Demo 1: Transition Scene from Breaking Bad

Demo 2: Car Chase Scene from Movie Infinite(2021)

Demo 3: Transition Scene from Friends

Demo 4: Landscape Scene from Vloggers Lei and Josh

Demo 5: Vlog Scene from Juan Marcel & Rhylan

Demo 6: Cartoon Scene from Zootopia

Video to Music Generation (with tuning)

Demo 1: Transition Scene from Breaking Bad

Original

Tuned

Original tags: Atmospheric, Bass, Moderate tempo, Nighttime, Orchestral, Strings, 80 BPM
Added Tags: Piano, Ambient
Negative Prompt: N/A

Demo 2: Campaign Scene from Balenciaga 2022

Original

Tuned

Original tags: Ambient, Introspective, Melancholic, Minor Key, Nocturne, Piano Ballad, Slow, Solo Piano, (60-80BPM)
Added Tag:140 BPM
Negative Prompt: Slow, (60-80BPM)