Multimodal Archives, Monophonic Futures: A Transformer-Based Paradigm Shift in Kyrgyz Musical Documents

Tong Cui; Ting Li; Muratova Ainura Muratovna

doi:10.63944/jhq.NHE

作者

Tong Cui Kyrgyz State University named after I. Arabaev, Kyrgyz Republic 作者
Ting Li Kyrgyz State University named after I. Arabaev, Kyrgyz Republic 作者
Muratova Ainura Muratovna Kyrgyz State University named after I. Arabaev, Kyrgyz Republic 作者

DOI：

https://doi.org/10.63944/jhq.NHE

关键词：

Kyrgyzstan; musical archives; Transformer; multimodal learning; cultural economics; music education

摘要

The digitisation of musical manuscripts has transformed them from static heritage assets into dynamic data capital. This study explores how digitisation enhances the cultural value of musical manuscripts in low-resource contexts, focusing on Kyrgyz instrumental traditions (küü). Grounded in the SCP-R (Structure, Culture, Performance, and Resources) model, we analyse digitisation's impact through structural, cultural, performance, and resource dimensions. We propose a three-stage "embed–reconstruct–transform" framework, leveraging 12,400 folios and 2,300 hours of audio from the Kyrgyz National Conservatory. A Kyrgyz-tuned Transformer (MusicKG-T) trained with nomadic-path contrastive learning (CMCL-Kyrgyz) demonstrates that digitisation improves accessibility and usability, significantly increasing cultural and economic value. Findings offer a reproducible workflow for Silk-Road archives and highlight implications for music education and cultural policy. Future research should validate applicability to vocal traditions and other regions.

参考文献

[1] Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443. https://doi.org/10.1109/TPAMI.2017.2698548 DOI: https://doi.org/10.1109/TPAMI.2018.2798607

[2] Gómez-Sánchez, E., Hassan, S., & Holzapfel, A. (2020). Encoding monophonic oral traditions with graph-Transformer models: A case study on Turkic küü. Transactions of the International Society for Music Information Retrieval, 3(1), 77–90. https://doi.org/10.5334/tismir.63 DOI: https://doi.org/10.5334/tismir.63

[3] Kim, J., Park, J., & Yang, Y. H. (2022). RoFormer for symbolic music classification with low-resource constraints. IEEE Access, 10, 44521 44530. https://doi.org/10.1109/ACCESS.2022.3168205 DOI: https://doi.org/10.1109/ACCESS.2022.3161622

[4] Li, Y., & van den Oord, A. (2021). Improving contrastive learning for music signals by temporal data augmentation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2155–2165. https://doi.org/10.1109/TASLP.2021.3087700

[5] Liu, C., Wang, S., & Chen, L. (2020). Multimodal music information retrieval: Asystematicreview. AppliedSciences, 10(19), 6782. https://doi.org/10.3390/app10196782 DOI: https://doi.org/10.3390/app10196782

[6] Mao, R., Liu, J., & Lerch, A. (2023). Assessment of singing voice ornamentation styles with self-supervised learning. In Proceedings of the 24th International Society for Music Information Retrieval Conference(pp.800–807). https://doi.org/10.5281/zenodo.1024111565

[7] Roche, F., & Ong, B. (2022). Ethnomusicology in the age of big data: Opportunities and ethical challenges. Journal of Cultural Economy, 46(2), 189–212. https://doi.org/10.1007/s10824-021-09432-7

[8] Panteli, M., Benetos, E., & Dixon, S. (2020). A review of manual and computational approaches for the study of world music corpora. Journal of New Music Research, 49(1), 1–21. https://doi.org/10.1080/09298215.2019.1708414

[9] Sarkar, M., & Benetos, E. (2023). "Why did the model produce that output?" : Explaining a music classification model with concept based reasoning. In Proceedings of the 24th International Society for Music Information Retrieval Conference (pp. 144–152). https://doi.org/10.5281/zenodo.10241003

[10] Thompson, G., & Rogers, E. M. (2020). Diffusion of innovations in heritage digitisation: A 20-year meta-analysis. Heritage & Society, 13(2–3), 95–118. https://doi.org/10.1080/2159032X.2020.1834500

[11] Wang, H., Zhang, L., & Li, D. (2021). Self-supervised learning for low-resource music classification: A comparative study. Expert Systems with Applications, 178, 114983. https://doi.org/10.1016/j.eswa.2021.114983 DOI: https://doi.org/10.1016/j.eswa.2021.114983