Sound

Authors and titles for recent submissions, skipping first 27

[ total of 48 entries: 1-25 | 3-27 | 28-48 ]
[ showing 25 entries per page: fewer | more | all ]

Tue, 2 Dec 2025

[28] arXiv:2512.01626 [pdf, ps, other]: Title: Parallel Delayed Memory Units for Enhanced Temporal Modeling in Biomedical and Bioacoustic Signal Analysis

Authors: Pengfei Sun, Wenyu Jiang, Paul Devos, Dick Botteldooren

Comments: Accepted for publication in IEEE Transactions on Audio, Speech and Language Processing, 2025

Journal-ref: IEEE Transactions on Audio, Speech and Language Processing, 2025

Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE)
[29] arXiv:2512.01559 [pdf, ps, other]: Title: LLM2Fx-Tools: Tool Calling For Music Post-Production

Authors: Seungheon Doh, Junghyun Koo, Marco A. Martínez-Ramírez, Woosung Choi, Wei-Hsiang Liao, Qiyu Wu, Juhan Nam, Yuki Mitsufuji

Subjects: Sound (cs.SD)
[30] arXiv:2512.01537 [pdf, ps, other]: Title: Q2D2: A Geometry-Aware Audio Codec Leveraging Two-Dimensional Quantization

Authors: Tal Shuster, Eliya Nachmani

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
[31] arXiv:2512.00621 [pdf, ps, other]: Title: Melody or Machine: Detecting Synthetic Music with Dual-Stream Contrastive Learning

Authors: Arnesh Batra, Dev Sharma, Krish Thukral, Ruhani Bhatia, Naman Batra, Aditya Gautam

Comments: Accepted at Transactions on Machine Learning Research (TMLR)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[32] arXiv:2512.00563 [pdf, ps, other]: Title: Explainable Multi-Modal Deep Learning for Automatic Detection of Lung Diseases from Respiratory Audio Signals

Authors: S M Asiful Islam Saky, Md Rashidul Islam, Md Saiful Arefin, Shahaba Alam

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[33] arXiv:2512.00451 [pdf, ps, other]: Title: STCTS: Generative Semantic Compression for Ultra-Low Bitrate Speech via Explicit Text-Prosody-Timbre Decomposition

Authors: Siyu Wang, Haitao Li, Donglai Zhu

Comments: The complete source code and online speech reconstruction demo is publicly available at this https URL

Subjects: Sound (cs.SD); Multimedia (cs.MM)
[34] arXiv:2512.00120 [pdf, ps, other]: Title: Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment

Authors: Jiaying Hong, Ting Zhu, Thanet Markchom, Huizhi Liang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[35] arXiv:2512.00115 [pdf, ps, other]: Title: MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning

Authors: Kyeongha Rho, Hyeongkeun Lee, Jae Won Cho, Joon Son Chung

Comments: 10 pages, 5 figures

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[36] arXiv:2512.01443 (cross-list from cs.CL) [pdf, ps, other]: Title: MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification

Authors: Xabier de Zuazo, Ibon Saratxaga, Eva Navas

Comments: 10 pages, 5 figures, 4 tables, LibriBrain Workshop, NeurIPS 2025

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)
[37] arXiv:2512.01428 (cross-list from eess.SP) [pdf, ps, other]: Title: Masked Symbol Modeling for Demodulation of Oversampled Baseband Communication Signals in Impulsive Noise-Dominated Channels

Authors: Oguz Bedir (1), Nurullah Sevim (1), Mostafa Ibrahim (2), Sabit Ekin (2 and 1) ((1) Electrical & Computer Engineering, Texas A&M University, USA, (2) Engineering Technology & Industrial Distribution, Texas A&M University, USA)

Comments: Accepted to the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop on AI and ML for Next-Generation Wireless Communications and Networking (AI4NextG), non-archival

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD)
[38] arXiv:2512.01267 (cross-list from cs.MM) [pdf, ps, other]: Title: ZO-ASR: Zeroth-Order Fine-Tuning of Speech Foundation Models without Back-Propagation

Authors: Yuezhang Peng, Yuxin Liu, Yao Li, Sheng Wang, Fei Wen, Xie Chen

Comments: 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Subjects: Multimedia (cs.MM); Sound (cs.SD)
[39] arXiv:2512.00883 (cross-list from cs.MM) [pdf, ps, other]: Title: Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound

Authors: Jiahua Wang, Shannan Yan, Leqi Zheng, Jialong Wu, Yaoxin Mao

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

Mon, 1 Dec 2025

[40] arXiv:2511.23178 [pdf, ps, other]: Title: HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding

Authors: Chen Li, Peiji Yang, Yicheng Zhong, Jianxing Yu, Zhisheng Wang, Zihao Gou, Wenqing Chen, Jian Yin

Comments: Accepted by AAAI 2026

Subjects: Sound (cs.SD)
[41] arXiv:2511.22696 [pdf, ps, other]: Title: Probabilistic Fusion and Calibration of Neural Speaker Diarization Models

Authors: Juan Ignacio Alvarez-Trejos, Sergio A. Balanya, Daniel Ramos, Alicia Lozano-Diez

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[42] arXiv:2511.22687 [pdf, ps, other]: Title: PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

Authors: Jiatong Shi, Haoran Wang, William Chen, Chenda Li, Wangyou Zhang, Jinchuan Tian, Shinji Watanabe

Comments: Accepted by ASRU2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:2511.22293 [pdf, ps, other]: Title: GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis

Authors: Teysir Baoueb, Xiaoyu Bie, Mathieu Fontaine, Gaël Richard

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[44] arXiv:2511.21872 [pdf, ps, other]: Title: Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection

Authors: Bruno Padovese, Fabio Frazao, Michael Dowd, Ruth Joy

Comments: 16 pages, 6 Figures, 2 Tables, submitted to Marine Mammal Science as part of a special issue on Machine Learning and Artificial Intelligence in Marine Mammal Research

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[45] arXiv:2511.23142 (cross-list from cs.LG) [pdf, ps, other]: Title: Adapting Neural Audio Codecs to EEG

Authors: Ard Kastrati, Luca Lanzendörfer, Riccardo Rigoni, John Staib Matilla, Roger Wattenhofer

Comments: Foundation Models for the Brain and Body (BrainBodyFM@NeurIPS)

Subjects: Machine Learning (cs.LG); Sound (cs.SD)
[46] arXiv:2511.22503 (cross-list from cs.CL) [pdf, ps, other]: Title: Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking

Authors: Katia Vendrame, Bolaji Yusuf, Santosh Kesiraju, Šimon Sedláček, Oldřich Plchot, Jan Černocký

Comments: submitted to ICASSP 2026

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47] arXiv:2511.21780 (cross-list from cs.MM) [pdf, ps, other]: Title: 3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation

Authors: Yaoru Li, Heyu Si, Federico Landi, Pilar Oplustil Gallegos, Ioannis Koutsoumpas, O. Ricardo Cortez Vazquez, Ruiju Fu, Qi Guo, Xin Jin, Shunyu Liu, Mingli Song

Subjects: Multimedia (cs.MM); Sound (cs.SD)
[48] arXiv:2511.21704 (cross-list from cs.CL) [pdf, ps, other]: Title: On the Cross-lingual Transferability of Pre-trained wav2vec2-based Models

Authors: Jonatas Grosman, Cassio Almeida, Guilherme Schardong, Hélio Lopes

Subjects: Computation and Language (cs.CL); Sound (cs.SD)

[ total of 48 entries: 1-25 | 3-27 | 28-48 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions, skipping first 27

Tue, 2 Dec 2025

Mon, 1 Dec 2025