Sound

Authors and titles for recent submissions

[ total of 49 entries: 1-25 | 26-49 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 13 May 2026

[1] arXiv:2605.12387 [pdf, ps, other]: Title: A Semi-Supervised Framework for Speech Confidence Detection using Whisper

Authors: Adam Wynn, Jingyun Wang

Comments: 12 pages, 9 Figures, Submitted to IEEE Transactions on Audio, Speech and Language Processing

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[2] arXiv:2605.12310 [pdf, ps, other]: Title: Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling

Authors: Chen Geng, Meng Chen, Ruohua Zhou, Ruolan Liu, Weifeng Zhao

Comments: Accepted by ICASSP 2026

Subjects: Sound (cs.SD)
[3] arXiv:2605.12135 [pdf, ps, other]: Title: STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts

Authors: Joshua Opria

Comments: 9 pages, 4 figures, 3 tables. Code and models: this https URL<your-github-username>/autocharter

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4] arXiv:2605.11866 [pdf, ps, other]: Title: AuDirector: A Self-Reflective Closed-Loop Framework for Immersive Audio Storytelling

Authors: Yiming Ren, Xuenan Xu, Ziyang Zhang, Wen Wu, Baoxiang Li, Chao Zhang

Subjects: Sound (cs.SD)
[5] arXiv:2605.11192 [pdf, ps, other]: Title: Exploring Token-Space Manipulation in Latent Audio Tokenizers

Authors: Francesco Paissan, Luca Della Libera, Mirco Ravanelli, Cem Subakan

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[6] arXiv:2605.11098 [pdf, ps, other]: Title: AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling

Authors: Jiacheng Shi, Hongfei Du, Xinyuan Song, Y. Alicia Hong, Yanfu Zhang, Ye Gao

Comments: Accepted to ACL Findings 2026

Subjects: Sound (cs.SD)
[7] arXiv:2605.12287 (cross-list from eess.AS) [pdf, ps, other]: Title: The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking

Authors: Jaehoon Ahn, Tae Gum Hwang, Moon-Ryul Jung

Comments: 6 pages, 3 figures. Technical report on beat tracking failure modes; prepared for ISMIR 2026

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2605.11286 (cross-list from eess.SP) [pdf, ps, other]: Title: Adaptive Diagonal Loading using Krylov Subspaces for Robust Beamforming

Authors: Manan Mittal, Ryan M. Corey, John R. Buck, Andrew C. Singer

Comments: 5 pages, 8 figures

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 12 May 2026 (showing first 17 of 18 entries)

[9] arXiv:2605.10494 [pdf, ps, other]: Title: Multi-layer attentive probing improves transfer of audio representations for bioacoustics

Authors: Marius Miron, David Robinson, Masato Hagiwara, Titouan Parcollet, Jules Cauzinille, Gagan Narula, Milad Alizadeh, Ellen Gilsenan-McMahon, Sara Keen, Emmanuel Chemla, Benjamin Hoffman, Maddie Cusimano, Diane Kim, Felix Effenberger, Jane K. Lawton, Aza Raskin, Olivier Pietquin, Matthieu Geist

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[10] arXiv:2605.10281 [pdf, ps, other]: Title: Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs

Authors: Konstantinos Soiledis, Maximos Kaliakatsos-Papakostas, Dimos Makris, Konstantinos Tsamis

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[11] arXiv:2605.10256 [pdf, ps, other]: Title: A Cold Diffusion Approach for Percussive Dereverberation

Authors: Dimos Makris, András Barják, Maximos Kaliakatsos-Papakostas

Comments: Accepted for the 2026 IEEE World Congress on Computational Intelligence, IJCNN Track, 21-26 June 2026, Maastricht, the Netherlands

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[12] arXiv:2605.10203 [pdf, ps, other]: Title: Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration

Authors: Haowen Li, Tianxiang Li, Yi Yang, Boyu Cao, Qi Liu

Comments: Accepted by ICML 2026

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13] arXiv:2605.10153 [pdf, ps, other]: Title: APEX: Audio Prototype EXplanations for Classification Tasks

Authors: Piotr Kawa, Kornel Howil, Piotr Borycki, Miłosz Adamczyk, Przemysław Spurek, Piotr Syga

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[14] arXiv:2605.09846 [pdf, ps, other]: Title: ChladniSonify: A Visual-Acoustic Mapping Method for Chladni Patterns in New Media Art Creation

Authors: Yakun Liu, Hai Luan, Dong Liu, Zhiyu Jin

Comments: 9 pages, 5 figures, IEEE conference format

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[15] arXiv:2605.09259 [pdf, ps, other]: Title: Remix the Timbre: Diffusion-Based Style Transfer Across Polyphonic Stems

Authors: Leduo Chen, Junchuan Zhao, Shengchen Li

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[16] arXiv:2605.09087 [pdf, ps, other]: Title: Towards Trustworthy Audio Deepfake Detection: A Systematic Framework for Diagnosing and Mitigating Gender Bias

Authors: Aishwarya Fursule, Shruti Kshirsagar, Anderson R. Avila

Comments: Submitted to SMC 2026 conference

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[17] arXiv:2605.08762 [pdf, ps, other]: Title: Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search

Authors: Tao Yu, yiming ding, Shenghua Chai, Minghui Zhang, Zhongtian Luo, Xinming Wang, Xinlong Chen, Zhaolu Kang, Junhao Gong, Yuxuan Zhou, Haopeng Jin, Zhiqing Cui, Jiabing Yang, YiFan Zhang, Hongzhu Yi, Zheqi He, Xi Yang, Yan Huang, Liang Wang

Comments: 43 pages

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[18] arXiv:2605.08554 [pdf, ps, other]: Title: Online Segmented Beamforming via Dynamic Programming

Authors: Manan Mittal, Ryan M. Corey, Diego Cuji, John R. Buck, Andrew C. Singer

Comments: 4 pages, 2 figures

Subjects: Sound (cs.SD)
[19] arXiv:2605.08214 [pdf, ps, other]: Title: Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization

Authors: Mohammed Aman Bhuiyan, Md Sazzad Hossain Adib, Samiul Basir Bhuiyan, Amit Chakraborty, Aritra Islam Saswato, Ahmed Faizul Haque Dhrubo, Mohammad Ashrafuzzaman Khan

Comments: 3 figures and 5 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20] arXiv:2605.08194 [pdf, ps, other]: Title: ShipEcho -- An Interactive Tool for Global Mapping of Underwater Radiated Noise from Vessels

Authors: Mark Shipton, Valentino Denona, Đula Nađ, Roee Diamant

Comments: 34 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[21] arXiv:2605.10084 (cross-list from eess.AS) [pdf, ps, other]: Title: PoDAR: Power-Disentangled Audio Representation for Generative Modeling

Authors: Alejandro Luebs, Mithilesh Vaidya, Ishaan Kumar, Sumukh Badam, Stephen W. Bailey, Matthew Bendel, Jose Sotelo, Xingzhe He

Comments: 9 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[22] arXiv:2605.09908 (cross-list from cs.LG) [pdf, ps, other]: Title: Voice Biomarkers for Depression and Anxiety

Authors: Oleksii Abramenko, Noah D. Stein, Colin Vaz

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
[23] arXiv:2605.09906 (cross-list from cs.AI) [pdf, ps, other]: Title: Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought

Authors: Xuanchen Li, Yuheng Lu, Chenrui Cui, Tianrui Wang, Zikang Huang, Yu Jiang, Long Zhou, Longbiao Wang, Jianwu Dang

Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)
[24] arXiv:2605.09120 (cross-list from cs.IR) [pdf, ps, other]: Title: Reddit2Deezer: A Scalable Dataset for Real-World Grounded Conversational Music Recommendation

Authors: Haven Kim, Julian McAuley

Subjects: Information Retrieval (cs.IR); Sound (cs.SD)
[25] arXiv:2605.08729 (cross-list from cs.CV) [pdf, ps, other]: Title: Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation

Authors: Shihao Cheng, Jiaxu Zhang, Quanyue Song, Shansong Liu, Zhizhi Guo, Xiaolei Zhang, Chi Zhang, Xuelong Li, Zhigang Tu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM); Sound (cs.SD)

[ total of 49 entries: 1-25 | 26-49 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2605, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions

Wed, 13 May 2026

Tue, 12 May 2026 (showing first 17 of 18 entries)