Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Control

Demo Page

Introduction

Pretraining on large-scale noisy data followed by fine-tuning on the target domain has proven highly successful in numerous tasks. In light of this, our objective is to construct a cutting-edge large language model for symbolic music by leveraging the extensive symbolic music dataset available from the MuseScore forum.

Content

  1. Model Summary
  2. Best examples
  3. Examples in Unconditioned Generation
  4. Examples in Music Generation Conditioned on “Genre”
  5. Examples in Music Generation Conditioned on “Instrument”
  6. Examples in Music Generation Conditioned on “Genre” and “Instrument”

Model Summary:

The resolution we set is 12. The max sequence length is 1024.

Model Tag Control Instrument Control Total Number of Parameters Number of Training Sample
Unconditioned/Pretrained 87.15K 1.3M
Genre Conditioned 87.18K 158K
Instrument Conditioned 87.27K 739K
Genre-Instrument Conditioned 87.28K 158K

Best Examples

Here we introduce some great examples generated in our model. We are giving one single-track example and one multi-track example for each condition.

Unconditioned Generation

1. Single Track:

2. MultiTrack:

Genre Conditioned Generation

1. Single Track: Genre: Classical 2. MultiTrack: Genre: Soundtrack

Instrument Conditioned Generation

1. Single Track: Instrument: Piano 2. MultiTrack: Instrument: Piano, Flute

Genre-Instrument Conditioned Generation

1. Single Track: Instrument: Violin, Genre: Folk

2. MultiTrack: Instrument: Harpsichord, Flute, Genre: Classical


More Examples:

Unconditioned Generation

Settings: Only a `start-of-song’ event is provided to the model. The model generates the instrument list and subsequently the note sequence and end with “end-of-song”.


Genre Conditioned Generation

Settings: The model is provided with a structured music data format, starting with a ‘start-of-song’ event, followed by a ‘start-of-tags’ marker. The ‘start-of-tags’ marker is followed by genre that is used in the music. Finally, the ‘start-of-notes’ marker indicates the commencement of a sequence of instrument codes, representing musical notes and instrument selections

Genre: classical
Genre: religious music
Genre: soundtrack
Genre: folk
Genre: worldmusic

Instrument Conditioned Generation

Settings: The model is provided with a structured music data format, starting with a ‘start-of-song’ event, followed by a ‘start-of-program’ marker. The ‘start-of-program’ marker is followed by programs that are used in the music. Finally, the ‘start-of-notes’ marker indicates the commencement of a sequence of instrument codes, representing musical notes and instrument selections

Instrument: piano, flute, cello
Instrument: voices, piano
Instrument: piano, trumpet, harmonica, guitar, bass and violin
Instrument: recorder
Instrument: church-organ, trombone, tuba, horn

Genre-Instrument Conditioned Generation

Settings: In this context, there exist five distinctive data event types: ‘start-of-song,’ ‘start-of-program,’ ‘start-of-tags,’ ‘start-of-notes,’ and ‘end-of-song.’ The ‘start-of-program’ event serves as an indicator for the initiation of a program list, while the ‘start-of-tags’ event marks the commencement of a tag list. Meanwhile, the ‘start-of-notes’ event signifies the outset of a sequence of music notes.

Instrument:Voice, Genre: Classical
Instrument:Classical String Guitar, Genre: Classical
Instrument:Piano, Genre: Classical
Instrument:Organ, Guitar, Lead, Pad, Strings, Drum Genre: Hip Hop
Instrument:Vibraphone, Piano, Flute, String-guitar, Genre: Pop