Sound

Authors and titles for recent submissions

[ total of 48 entries: 1-10 | 11-20 | 21-30 | 31-40 | 41-48 ]
[ showing 10 entries per page: fewer | more | all ]

Fri, 5 Dec 2025

[1] arXiv:2512.04847 [pdf, ps, other]: Title: Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding

Authors: Tsai-Ning Wang, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[2] arXiv:2512.04827 [pdf, ps, other]: Title: Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs

Authors: Wenzhang Du

Comments: 11 pages, 3 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[3] arXiv:2512.04814 [pdf, ps, other]: Title: Shared Multi-modal Embedding Space for Face-Voice Association

Authors: Christopher Simic, Korbinian Riedhammer, Tobias Bocklet

Comments: Ranked 1st in Fame 2026 Challenge, ICASSP

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
[4] arXiv:2512.04793 [pdf, ps, other]: Title: YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases

Authors: Gongyu Chen, Xiaoyu Zhang, Zhenqiang Weng, Junjie Zheng, Da Shen, Chaofan Ding, Wei-Qiang Zhang, Zihao Chen

Comments: 17 pages, 5 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[5] arXiv:2512.04779 [pdf, ps, other]: Title: YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance

Authors: Junjie Zheng, Chunbo Hao, Guobin Ma, Xiaoyu Zhang, Gongyu Chen, Chaofan Ding, Zihao Chen, Lei Xie

Comments: 13 pages, 3 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[6] arXiv:2512.04720 [pdf, ps, other]: Title: M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis

Authors: Xiaopeng Wang, Chunyu Qiang, Ruibo Fu, Zhengqi Wen, Xuefei Liu, Yukun Liu, Yuzhe Liang, Kang Yin, Yuankun Xie, Heng Xie, Chenxing Li, Chen Zhang, Changsheng Li

Comments: Submitted to ICASSP 2026

Subjects: Sound (cs.SD)
[7] arXiv:2512.04711 [pdf, ps, other]: Title: Large Speech Model Enabled Semantic Communication

Authors: Yun Tian, Zhijin Qin, Guocheng Lv, Ye Jin, Kaibin Huang, Zhu Han

Comments: 15 pages, 9 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[8] arXiv:2512.04616 [pdf, ps, other]: Title: Standard audiogram classification from loudness scaling data using unsupervised, supervised, and explainable machine learning techniques

Authors: Chen Xu, Lena Schell-Majoor, Birger Kollmeier

Subjects: Sound (cs.SD); Medical Physics (physics.med-ph)
[9] arXiv:2512.04552 [pdf, ps, other]: Title: RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS

Authors: Cong Wang, Changfeng Gao, Yang Xiang, Zhihao Du, Keyu An, Han Zhao, Qian Chen, Xiangang Li, Yingming Gao, Ya Li

Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[10] arXiv:2512.04551 [pdf, ps, other]: Title: Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention

Authors: Cong Wang, Yizhong Geng, Yuhua Wen, Qifei Li, Yingming Gao, Ruimin Wang, Chunfeng Wang, Hao Li, Ya Li, Wei Chen

Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

[ total of 48 entries: 1-10 | 11-20 | 21-30 | 31-40 | 41-48 ]
[ showing 10 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions

Fri, 5 Dec 2025