Sound

Authors and titles for recent submissions, skipping first 29

[ total of 51 entries: 1-10 | 10-19 | 20-29 | 30-39 | 40-49 | 50-51 ]
[ showing 10 entries per page: fewer | more | all ]

Fri, 5 Dec 2025 (continued, showing last 5 of 10 entries)

[30] arXiv:2512.04720 [pdf, ps, other]: Title: M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis

Authors: Xiaopeng Wang, Chunyu Qiang, Ruibo Fu, Zhengqi Wen, Xuefei Liu, Yukun Liu, Yuzhe Liang, Kang Yin, Yuankun Xie, Heng Xie, Chenxing Li, Chen Zhang, Changsheng Li

Comments: Submitted to ICASSP 2026

Subjects: Sound (cs.SD)
[31] arXiv:2512.04711 [pdf, ps, other]: Title: Large Speech Model Enabled Semantic Communication

Authors: Yun Tian, Zhijin Qin, Guocheng Lv, Ye Jin, Kaibin Huang, Zhu Han

Comments: 15 pages, 9 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[32] arXiv:2512.04616 [pdf, ps, other]: Title: Standard audiogram classification from loudness scaling data using unsupervised, supervised, and explainable machine learning techniques

Authors: Chen Xu, Lena Schell-Majoor, Birger Kollmeier

Subjects: Sound (cs.SD); Medical Physics (physics.med-ph)
[33] arXiv:2512.04552 [pdf, ps, other]: Title: RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS

Authors: Cong Wang, Changfeng Gao, Yang Xiang, Zhihao Du, Keyu An, Han Zhao, Qian Chen, Xiangang Li, Yingming Gao, Ya Li

Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34] arXiv:2512.04551 [pdf, ps, other]: Title: Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention

Authors: Cong Wang, Yizhong Geng, Yuhua Wen, Qifei Li, Yingming Gao, Ruimin Wang, Chunfeng Wang, Hao Li, Ya Li, Wei Chen

Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Thu, 4 Dec 2025

[35] arXiv:2512.03637 [pdf, ps, other]: Title: AaPE: Aliasing-aware Patch Embedding for Self-Supervised Audio Representation Learning

Authors: Kohei Yamamoto, Kosuke Okusa

Comments: 11 pages, 4 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Machine Learning (stat.ML)
[36] arXiv:2512.03563 [pdf, ps, other]: Title: State Space Models for Bioacoustics: A comparative Evaluation with Transformers

Authors: Chengyu Tang, Sanjeev Baskiyar

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[37] arXiv:2512.03783 (cross-list from cs.AI) [pdf, ps, other]: Title: Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning

Authors: Dongchao Yang, Songxiang Liu, Disong Wang, Yuanyuan Wang, Guanglu Wan, Helen Meng

Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)
[38] arXiv:2512.03636 (cross-list from cs.HC) [pdf, ps, other]: Title: Head, posture, and full-body gestures in dyadic conversations

Authors: Ľuboš Hládek, Bernhard U. Seeber

Comments: 7 figures, 10 tables, 29 pages

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2512.03458 (cross-list from eess.SP) [pdf, ps, other]: Title: A Convolutional Framework for Mapping Imagined Auditory MEG into Listened Brain Responses

Authors: Maryam Maghsoudi, Mohsen Rezaeizadeh, Shihab Shamma

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 51 entries: 1-10 | 10-19 | 20-29 | 30-39 | 40-49 | 50-51 ]
[ showing 10 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions, skipping first 29

Fri, 5 Dec 2025 (continued, showing last 5 of 10 entries)

Thu, 4 Dec 2025