본문 바로가기
[학술저널]

  • 학술저널

Hyemin Han(Korea University) Yoonsuh Jung(Korea University)

DOI : 10.7465/jkdi.2021.32.2.439

표지

북마크 0

리뷰 0

이용수 0

피인용수 0

초록

We compare the effect of multiple input representations on polyphonic piano music transcription based on neural networks. A state-of-the-art piano transcription neural network model, onsets and frames, is explored. We first provide detailed backgrounds of the piano transcription and input representations for the readers who are unfamiliar with this area. For comparing their effects, we consider four spectrograms; Mel-spectrogram, Linear-spectrogram, Log-spectrogram and constant-Q-transform with various hyper parameters. The effects of frequency bins, Short Time Fourier Transformation (STFT) window size and hop length on the four spectrograms are also examined. Our results show that Mel-spectrogram of 2,048 STFT window size, 512 frequency bins and 256 hop length yields the highest accuracy. We show that Mel-spectrogram is one of the most satisfactory input representations in general. Mel-spectrogram dominates other spectrograms and keeps a relatively high transcription accuracy even at the low resolutions in our experiments.

목차

Abstract
1. Introduction
2. Background of piano music transcription
3. Background of audio input representations
4. Onsets and frames model
5. Application to real data
6. Concluding remarks
References

참고문헌(0)

리뷰(0)

도움이 되었어요.0

도움이 안되었어요.0

첫 리뷰를 남겨주세요.
DBpia에서 서비스 중인 논문에 한하여 피인용 수가 반영됩니다.
인용된 논문이 DBpia에서 서비스 중이라면, 아래 [참고문헌 신청]을 통해서 등록해보세요.
Insert title here