HIERARCHICAL TIMBRE-PAINTING AND ARTICULATION GENERATION

Michael Michelashvili, Lior Wolf

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

We present a fast and high-fidelity method for music generation, based on specified f0 and loudness, such that the synthesized audio mimics the timbre and articulation of a target instrument. The generation process consists of learned source-filtering networks, which reconstruct the signal at increasing resolutions. The model optimizes a multi-resolution spectral loss as the reconstruction loss, an adversarial loss to make the audio sound more realistic, and a perceptual f0 loss to align the output to the desired input pitch contour. The proposed architecture enables high-quality fitting of an instrument, given a sample that can be as short as a few minutes, and the method demonstrates state-of-the-art timbre transfer capabilities. Code and audio samples are shared at.

Original languageEnglish
Title of host publicationProceedings of the International Society for Music Information Retrieval Conference
PublisherInternational Society for Music Information Retrieval
Pages916-922
Number of pages7
StatePublished - 2020

Publication series

NameProceedings of the International Society for Music Information Retrieval Conference
Volume2020

All Science Journal Classification (ASJC) codes

  • Music
  • Artificial Intelligence
  • Human-Computer Interaction
  • Signal Processing

Cite this