Introduction

Pretraining on large-scale noisy data followed by fine-tuning on the target domain has proven highly successful in numerous tasks. In light of this, our objective is to construct a cutting-edge large language model for symbolic music by leveraging the extensive symbolic music dataset available from the MuseScore forum.

Content

Model Summary
Best examples
Examples in Unconditioned Generation
Examples in Music Generation Conditioned on “Genre”
Examples in Music Generation Conditioned on “Instrument”
Examples in Music Generation Conditioned on “Genre” and “Instrument”

Model Summary:

The resolution we set is 12. The max sequence length is 1024.

Model	Tag Control	Instrument Control	Total Number of Parameters	Number of Training Sample
Unconditioned/Pretrained	✕	✕	87.15K	1.3M
Genre Conditioned	✓	✕	87.18K	158K
Instrument Conditioned	✕	✓	87.27K	739K
Genre-Instrument Conditioned	✓	✓	87.28K	158K

Best Examples

Here we introduce some great examples generated in our model. We are giving one single-track example and one multi-track example for each condition.

Unconditioned Generation

1. Single Track:

2. MultiTrack:

Genre Conditioned Generation

1. Single Track: Genre: Classical 2. MultiTrack: Genre: Soundtrack

Instrument Conditioned Generation

1. Single Track: Instrument: Piano 2. MultiTrack: Instrument: Piano, Flute

Genre-Instrument Conditioned Generation

1. Single Track: Instrument: Violin, Genre: Folk

2. MultiTrack: Instrument: Harpsichord, Flute, Genre: Classical

More Examples:

Unconditioned Generation

Settings: Only a `start-of-song’ event is provided to the model. The model generates the instrument list and subsequently the note sequence and end with “end-of-song”.

Genre Conditioned Generation

Settings: The model is provided with a structured music data format, starting with a ‘start-of-song’ event, followed by a ‘start-of-tags’ marker. The ‘start-of-tags’ marker is followed by genre that is used in the music. Finally, the ‘start-of-notes’ marker indicates the commencement of a sequence of instrument codes, representing musical notes and instrument selections

Genre: classical
Genre: religious music
Genre: soundtrack
Genre: folk
Genre: worldmusic

Instrument Conditioned Generation

Settings: The model is provided with a structured music data format, starting with a ‘start-of-song’ event, followed by a ‘start-of-program’ marker. The ‘start-of-program’ marker is followed by programs that are used in the music. Finally, the ‘start-of-notes’ marker indicates the commencement of a sequence of instrument codes, representing musical notes and instrument selections

Instrument: piano, flute, cello
Instrument: voices, piano
Instrument: piano, trumpet, harmonica, guitar, bass and violin
Instrument: recorder
Instrument: church-organ, trombone, tuba, horn

Genre-Instrument Conditioned Generation

Settings: In this context, there exist five distinctive data event types: ‘start-of-song,’ ‘start-of-program,’ ‘start-of-tags,’ ‘start-of-notes,’ and ‘end-of-song.’ The ‘start-of-program’ event serves as an indicator for the initiation of a program list, while the ‘start-of-tags’ event marks the commencement of a tag list. Meanwhile, the ‘start-of-notes’ event signifies the outset of a sequence of music notes.

Instrument:Voice, Genre: Classical
Instrument:Classical String Guitar, Genre: Classical
Instrument:Piano, Genre: Classical
Instrument:Organ, Guitar, Lead, Pad, Strings, Drum Genre: Hip Hop
Instrument:Vibraphone, Piano, Flute, String-guitar, Genre: Pop