Sound

Authors and titles for recent submissions, skipping first 39

[ total of 48 entries: 1-25 | 15-39 | 40-48 ]
[ showing 25 entries per page: fewer | more | all ]

Mon, 1 Dec 2025

[40] arXiv:2511.23178 [pdf, ps, other]: Title: HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding

Authors: Chen Li, Peiji Yang, Yicheng Zhong, Jianxing Yu, Zhisheng Wang, Zihao Gou, Wenqing Chen, Jian Yin

Comments: Accepted by AAAI 2026

Subjects: Sound (cs.SD)
[41] arXiv:2511.22696 [pdf, ps, other]: Title: Probabilistic Fusion and Calibration of Neural Speaker Diarization Models

Authors: Juan Ignacio Alvarez-Trejos, Sergio A. Balanya, Daniel Ramos, Alicia Lozano-Diez

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[42] arXiv:2511.22687 [pdf, ps, other]: Title: PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

Authors: Jiatong Shi, Haoran Wang, William Chen, Chenda Li, Wangyou Zhang, Jinchuan Tian, Shinji Watanabe

Comments: Accepted by ASRU2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:2511.22293 [pdf, ps, other]: Title: GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis

Authors: Teysir Baoueb, Xiaoyu Bie, Mathieu Fontaine, Gaël Richard

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[44] arXiv:2511.21872 [pdf, ps, other]: Title: Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection

Authors: Bruno Padovese, Fabio Frazao, Michael Dowd, Ruth Joy

Comments: 16 pages, 6 Figures, 2 Tables, submitted to Marine Mammal Science as part of a special issue on Machine Learning and Artificial Intelligence in Marine Mammal Research

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[45] arXiv:2511.23142 (cross-list from cs.LG) [pdf, ps, other]: Title: Adapting Neural Audio Codecs to EEG

Authors: Ard Kastrati, Luca Lanzendörfer, Riccardo Rigoni, John Staib Matilla, Roger Wattenhofer

Comments: Foundation Models for the Brain and Body (BrainBodyFM@NeurIPS)

Subjects: Machine Learning (cs.LG); Sound (cs.SD)
[46] arXiv:2511.22503 (cross-list from cs.CL) [pdf, ps, other]: Title: Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking

Authors: Katia Vendrame, Bolaji Yusuf, Santosh Kesiraju, Šimon Sedláček, Oldřich Plchot, Jan Černocký

Comments: submitted to ICASSP 2026

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47] arXiv:2511.21780 (cross-list from cs.MM) [pdf, ps, other]: Title: 3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation

Authors: Yaoru Li, Heyu Si, Federico Landi, Pilar Oplustil Gallegos, Ioannis Koutsoumpas, O. Ricardo Cortez Vazquez, Ruiju Fu, Qi Guo, Xin Jin, Shunyu Liu, Mingli Song

Subjects: Multimedia (cs.MM); Sound (cs.SD)
[48] arXiv:2511.21704 (cross-list from cs.CL) [pdf, ps, other]: Title: On the Cross-lingual Transferability of Pre-trained wav2vec2-based Models

Authors: Jonatas Grosman, Cassio Almeida, Guilherme Schardong, Hélio Lopes

Subjects: Computation and Language (cs.CL); Sound (cs.SD)

[ total of 48 entries: 1-25 | 15-39 | 40-48 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions, skipping first 39

Mon, 1 Dec 2025