Spaces:
Sleeping
Sleeping
created file structure, added dataset
Browse files- .gitignore +3 -0
- app.py +0 -0
- audio_utils.py +0 -0
- dataset/README.md +69 -0
- dataset/analysis.zip +3 -0
- dataset/licenses.txt +0 -0
- dataset/one_shot_percussive_sounds.zip +3 -0
- inference.py +0 -0
.gitignore
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/.env
|
| 2 |
+
._*
|
| 3 |
+
/dataset/unzipped
|
app.py
ADDED
|
File without changes
|
audio_utils.py
ADDED
|
File without changes
|
dataset/README.md
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Freesound One-Shot Percussive Sounds Dataset
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
This dataset contains 10254 one-shot (single event) percussive sounds from Freesound.org and the corresponding timbral analysis. These were used to train the generative model for "Neural Percussive Synthesis Parameterised by High-Level Timbral Features".
|
| 5 |
+
|
| 6 |
+
## Dataset Construction
|
| 7 |
+
|
| 8 |
+
To collect this dataset, the following steps were performed:
|
| 9 |
+
|
| 10 |
+
* Freesound was queried with words associated with percussive instruments, such as "percussion", "kick", "wood" or "clave". Only sounds with less than one second of [effective duration](https://essentia.upf.edu/reference/std_EffectiveDuration.html) were selected.
|
| 11 |
+
|
| 12 |
+
* This stage retrieved some audio clips that contained multiple sound events or that were of low quality.
|
| 13 |
+
Therefore, we listened to all the retrieved sounds and manually discarded the sounds presenting one of these characteristics. For this, the [percussive-annotator](https://github.com/xavierfav/percussive-annotator) was used.
|
| 14 |
+
|
| 15 |
+
* The sounds were then cut or padded to have 1-second length, normalized and downsampled to 16kHz.
|
| 16 |
+
|
| 17 |
+
* Finally, the sounds were analyzed with the [AudioCommons Extractor](https://github.com/AudioCommons/ac-audio-extractor), to obtain the AudioCommons timbral descriptors. This information is contained in the 'analysis' folder.
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
## Dataset Organisation
|
| 21 |
+
|
| 22 |
+
The dataset contains two folders and two files in the root directory:
|
| 23 |
+
|
| 24 |
+
* 'one_shot_percussive_sounds' encloses the pre-processed audio files. These are named '<freesound_sound_id>.wav'
|
| 25 |
+
|
| 26 |
+
* 'analysis' holds the AudioCommons analysis files for each of the sounds in the dataset. This analysis is stored as a .json file, named '<freesound_sound_id>_analysis.json', with a key for each of the features extracted.
|
| 27 |
+
|
| 28 |
+
* Two more files are present in the root directory of the dataset: this 'README' and the 'licenses.json'. The latter one is a '.json' file containing the name, the username of the uploader and the license for each of the sounds in the dataset.
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
## Authors and Contact
|
| 32 |
+
|
| 33 |
+
This dataset was developed by Ant贸nio Ramires, Pritish Chadna, Xavier Favory, Emilia G贸mez and Xavier Serra.
|
| 34 |
+
|
| 35 |
+
Any questions related to this dataset please contact:
|
| 36 |
+
|
| 37 |
+
Ant贸nio Ramires
|
| 38 |
+
|
| 39 | |
| 40 |
+
|
| 41 | |
| 42 |
+
|
| 43 |
+
|
| 44 |
+
## References
|
| 45 |
+
|
| 46 |
+
Please cite this paper if you use this dataset:
|
| 47 |
+
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
@inproceedings{ramires2020,
|
| 51 |
+
author = "Antonio Ramires and Pritish Chandna and Xavier Favory and Emilia G贸mez and Xavier Serra",
|
| 52 |
+
title = "Neural Percussive Synthesis Parametrerised by High-Level Timbral Features",
|
| 53 |
+
booktitle = "Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)",
|
| 54 |
+
year = "2020"
|
| 55 |
+
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
## Acknowledgements
|
| 62 |
+
|
| 63 |
+
This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk艂odowska-Curie grant agreement No. 765068 (MIP-Frontiers).
|
| 64 |
+
|
| 65 |
+
This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 770376 (TROMPA).
|
| 66 |
+
|
| 67 |
+
<img src="https://upload.wikimedia.org/wikipedia/commons/b/b7/Flag_of_Europe.svg" height="64" hspace="20">
|
| 68 |
+
|
| 69 |
+
|
dataset/analysis.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e27faf24d3650e9541fb2f76c0ec7bd2be79672583c45aa17bc2cb830cb50fd8
|
| 3 |
+
size 5610013
|
dataset/licenses.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dataset/one_shot_percussive_sounds.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c45401b3cbdd56606f0d9e5e494a18efbae1ca830f835504dccc316c1934720c
|
| 3 |
+
size 112614838
|
inference.py
ADDED
|
File without changes
|