The Process of Training a General-Purpose Audio Classification Model




Abstract:
Branches of machine learning such as image classification, object detection and speech recognition are more commonly used in modern devices today than ever before. Most smartphones released in the last five years have at least one function that depends on one of the aforementioned fields. Google allows users to make a query based on a speech input which is converted into text, cameras on both iOS and Android devices have built-in object and face detection, and gallery apps can automatically sort photographs based on their content. Speech recognition falls under the category of audio classification, which also contains subfields like music genre classification, song identification, automatic audio equalization, voice-based identification, etc. This paper describes the basic steps of training a general audio classification model which can predict a limited number of distinct sounds, and it outlines the techniques that are employed during the process of training any sound classification model, regardless of its intended usage.

CITATION:

IEEE format

P. Petrović, N. Ćoso, S. Maravić Čisar, R. Pinter, “The Process of Training a General-Purpose Audio Classification Model,” in Sinteza 2022 - International Scientific Conference on Information Technology and Data Related Research, Belgrade, Singidunum University, Serbia, 2022, pp. 81-88. doi:10.15308/Sinteza-2022-81-88

APA format

Petrović, P., Ćoso, N., Maravić Čisar, S., Pinter, R. (2022). The Process of Training a General-Purpose Audio Classification Model. Paper presented at Sinteza 2022 - International Scientific Conference on Information Technology and Data Related Research. doi:10.15308/Sinteza-2022-81-88

BibTeX format
Download

RefWorks Tagged format
Download