Supplemental materials

Adaptive thresholding for peak-picking

The set was used to estimate the components of the thresholding function dedicated to a peak-picking procedure. It contains various pieces of music, monologues, dialogues and field recordings. Every signal has nine 5-seconds long segments and eight change points in total. The types of segments are described in the table where 'S' denotes a speech signal, 'M' music, and 'E' indicates an environmental sound. The duration of audio material is equal to 11 minutes and 15 seconds.

Segments table

Segment transitions:

  • S → M = 11
  • M → S = 9
  • S → E = 10
  • E → S = 8
  • M → E = 12
  • E → M = 10
  • S → S = 18
  • M → M = 21
  • E → E = 21


The dataset is available upon the request. Please send an e-mail for more information.

Onset detection in adverse conditions

An analysis of onsets detection procedure in adverse acoustical conditions was performed in this study. Two types of audio signals contained onsets were mixed with four types of noises in order to determine the robustness of five onset detection techniques. The following types of source signals were used in the experiments:

  1. Set of words (source signal - sw_CountDownFrom10.aiff).

    The pauses between words were zeroed, the signal was trimmed and downsampled to 22.05kHz (download).

  2. Sequence of tones (source signal - simplerand1.wav).

    The signal was downsampled to 22.05kHz (download).

Each source signal was mixed at various SNR levels with the four noise sources (click on the image to download audio file):