URBAN-SED

Introduction

Welcome to the companion site for the URBAN-SED dataset. Here you will find information and download links for the dataset presented in:

Scaper: A Library for Soundscape Synthesis and Augmentation
J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello.
In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2017.
[IEEE][PDF][BibTeX]

If you're looking for scaper, the python library for soundscape synthesis described in the paper, please go to: https://github.com/justinsalamon/scaper

The URBAN-SED dataset

URBAN-SED is a dataset of 10,000 soundscapes with sound event annotations generated using scaper. Here's a summary:

The dataset includes 10,000 soundscapes, totals almost 30 hours and includes close to 50,000 annotated sound events
Complete annotations are provided in JAMS format, and simplified annotations are provided as tab-separated text files
Every soundscape is 10 seconds long and has a background of Brownian noise resembling the typical "hum" often heard in urban environments
Every soundscape contains between 1-9 sound events from the following classes:
- air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren and street_music
The source material for the sound events are the clips from the UrbanSound8K dataset
URBAN-SED comes pre-sorted into three sets: train, validate and test:
- There are 6000 soundscapes in the training set, generated using clips from folds 1-6 in UrbanSound8K
- There are 2000 soundscapes in the validation set, generated using clips from folds 7-8 in UrbanSound8K
- There are 2000 soundscapes in the test set, generated using clips from folds 9-10 in UrbanSound8K
Further details about how the soundscapes were generated including the distribution of sound event start times, durations, signal-to-noise ratios, pitch shifting, time stretching, and the range of sound event polyphony (overlap) can be found in Section 3 of the scaper paper.
The scripts used to generated URBAN-SED using scaper can be found here.

Audio Files

10,000 synthesized soundscapes in single channel (mono), 44100Hz, 16-bit, WAV format.
The files are split into a training set (6000), validation set (2000) and test set (2000).

Annotation Files

The annotations list the sound events that occur in every soundscape. The annotations are "strong", meaning for every sound event the annotations include (at least) the start time, end time, and label of the sound event. Sound events come from the following 10 labels (categories):

air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music

There are two types of annotations: full annotations in JAMS format, and simplified annotations in tab-separated txt format.

JAMS Annotations

The full annotations are distributed in JAMS format (https://github.com/marl/jams).
There are 10,000 JAMS annotation files, each one corresponding to a single soundscape with the same filename (other than the extension).
Each JAMS file contains a single annotation in the scaper namespace format - jams >=v0.3.2 is required in order to load the annotation into python with jams: jam = jams.load('soundscape_train_bimodal0.jams')
The value of each observation (sound event) is a dictionary storing all scaper-related sound event parameters:
- label, source_file, source_time, event_time, event_duration, snr, role, pitch_shift, time_stretch.
Note: the event_duration stored in the value dictionary represents the specified duration prior to any time stretching. The actual event duration in the soundscape is stored in the duration field of the JAMS observation.
The observations (sound events) in the JAMS annotation include both foreground sound events and the background(s).
The probabilistic scaper foreground and background event specifications are stored in the annotation's sandbox, allowing a complete reconstruction of the soundscape audio from the JAMS annotation (assuming access to the original source material) using scaper.generate_from_jams('soundscape_train_bimodal0.jams').
The annotation sandbox also includes additional metadata such as the total number of foreground sound events, the maximum polyphony (sound event overlap) of the soundscape and its gini coefficient (a measure of soundscape complexity).

Simplified Annotations

The simplified annotations are distributed as tab-separated text files.
There are 10,000 simplified annotation files, each one corresponding to a single soundscape with the same filename (other than the extension)
Each simplified annotation has a 3-column format (no header): start_time, end_time, label.
Background sounds are NOT included in the simplified annotations (only foreground sound events)
No additional information is stored in the simplified events (see the JAMS annotations for more details).

Version 2.0.0

Audio files generated with scaper v0.1.0 (identical to audio in URBAN-SED 1.0)
Jams annotation files generated with scaper v0.1.0 and updated to comply with scaper v1.0.0 (namespace changed from "sound_event" to "scaper"): requires jams >= v0.3.2
NOTE: due to updates to the scaper library, regenerating the audio from the jams annotations using scaper >=1.0.0 will result in audio files that are highly similar, but not identical, to the audio files provided. This is because the provided audio files were generated with scaper v0.1.0 and have been purposely kept the same as in URBAN-SED v1.0 to ensure comparability to previously published results.

Please acknowledge this dataset in academic research

We would highly appreciate it if scientific publications of work partly based on URBAN-SED and/or scaper cite the aforementioned publication.

Download

To download URBA-SED, please complete the download form.

Header image by Billie Ward/Flickr

The URBAN-SED Dataset