Sound

Authors and titles for recent submissions, skipping first 8

[ total of 47 entries: 1-50 | 9-47 ]
[ showing up to 50 entries per page: fewer | more ]

Thu, 11 Dec 2025

[9] arXiv:2512.09504 [pdf, ps, other]: Title: DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance

Authors: Kang Yin, Chunyu Qiang, Sirui Zhao, Xiaopeng Wang, Yuzhe Liang, Pengfei Cai, Tong Xu, Chen Zhang, Enhong Chen

Subjects: Sound (cs.SD)
[10] arXiv:2512.09285 [pdf, ps, other]: Title: Who Speaks What from Afar: Eavesdropping In-Person Conversations via mmWave Sensing

Authors: Shaoying Wang, Hansong Zhou, Yukun Yuan, Xiaonan Zhang

Subjects: Sound (cs.SD)
[11] arXiv:2512.09066 [pdf, ps, other]: Title: ORCA: Open-ended Response Correctness Assessment for Audio Question Answering

Authors: Šimon Sedláček, Sara Barahona, Bolaji Yusuf, Laura Herrera-Alarcón, Santosh Kesiraju, Cecilia Bolaños, Alicia Lozano-Diez, Sathvik Udupa, Fernando López, Allison Ferner, Ramani Duraiswami, Jan Černocký

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[12] arXiv:2512.08973 [pdf, ps, other]: Title: Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture

Authors: Karamvir Singh

Comments: 5 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13] arXiv:2512.09786 (cross-list from cs.LG) [pdf, ps, other]: Title: TinyDéjàVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On Microcontrollers

Authors: Zhaolan Huang, Emmanuel Baccelli

Subjects: Machine Learning (cs.LG); Performance (cs.PF); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[14] arXiv:2512.09327 (cross-list from cs.CV) [pdf, ps, other]: Title: UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking

Authors: Xuangeng Chu, Ruicong Liu, Yifei Huang, Yun Liu, Yichen Peng, Bo Zheng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[15] arXiv:2512.09299 (cross-list from cs.CV) [pdf, ps, other]: Title: VABench: A Comprehensive Benchmark for Audio-Video Generation

Authors: Daili Hua, Xizhi Wang, Bohan Zeng, Xinyi Huang, Hao Liang, Junbo Niu, Xinlong Chen, Quanqing Xu, Wentao Zhang

Comments: 24 pages, 25 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

Wed, 10 Dec 2025

[16] arXiv:2512.08812 [pdf, ps, other]: Title: Emovectors: assessing emotional content in jazz improvisations for creativity evaluation

Authors: Anna Jordanous

Comments: Presented at IEEE Big Data 2025 3rd Workshop on AI Music Generation (AIMG 2025). this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[17] arXiv:2512.08403 [pdf, ps, other]: Title: DFALLM: Achieving Generalizable Multitask Deepfake Detection by Optimizing Audio LLM Components

Authors: Yupei Li, Li Wang, Yuxiang Wang, Lei Wang, Rizhao Cai, Jie Shi, Björn W. Schuller, Zhizheng Wu

Subjects: Sound (cs.SD)
[18] arXiv:2512.08238 [pdf, ps, other]: Title: SpeechQualityLLM: LLM-Based Multimodal Assessment of Speech Quality

Authors: Mahathir Monjur, Shahriar Nirjon

Comments: 9 pages, 5 figures, 8 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[19] arXiv:2512.08203 [pdf, ps, other]: Title: Error-Resilient Semantic Communication for Speech Transmission over Packet-Loss Networks

Authors: Zhuohang Han, Jincheng Dai, Shengshi Yao, Junyi Wang, Yanlong Li, Kai Niu, Wenjun Xu, Ping Zhang

Comments: submitted to IEEE in Nov. 2025

Subjects: Sound (cs.SD)
[20] arXiv:2512.08006 [pdf, ps, other]: Title: Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS

Authors: Mahta Fetrat, Donya Navabi, Zahra Dehghanian, Morteza Abolghasemi, Hamid R. Rabiee

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[21] arXiv:2512.07872 [pdf, ps, other]: Title: LocaGen: Sub-Sample Time-Delay Learning for Beam Localization

Authors: Ishaan Kunwar, Henry Cantor, Tyler Rizzo, Ayaan Qayyum

Comments: 7 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[22] arXiv:2512.07845 [pdf, ps, other]: Title: AudioScene: Integrating Object-Event Audio into 3D Scenes

Authors: Shuaihang Yuan, Congcong Wen, Muhammad Shafique, Anthony Tzes, Yi Fang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23] arXiv:2512.08282 (cross-list from cs.CV) [pdf, ps, other]: Title: PAVAS: Physics-Aware Video-to-Audio Synthesis

Authors: Oh Hyun-Bin, Yuhta Takida, Toshimitsu Uesaka, Tae-Hyun Oh, Yuki Mitsufuji

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)

Tue, 9 Dec 2025

[24] arXiv:2512.07627 [pdf, ps, other]: Title: Incorporating Structure and Chord Constraints in Symbolic Transformer-based Melodic Harmonization

Authors: Maximos Kaliakatsos-Papakostas, Konstantinos Soiledis, Theodoros Tsamis, Dimos Makris, Vassilis Katsouros, Emilios Cambouropoulos

Comments: Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025), Brussels, Belgium, September 10th-12th

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Symbolic Computation (cs.SC)
[25] arXiv:2512.07352 [pdf, ps, other]: Title: MultiAPI Spoof: A Multi-API Dataset and Local-Attention Network for Speech Anti-spoofing Detection

Authors: Xueping Zhang, Zhenshan Zhang, Yechen Wang, Linxi Li, Liwei Jin, Ming Li

Subjects: Sound (cs.SD)
[26] arXiv:2512.07168 [pdf, ps, other]: Title: JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention

Authors: Georgios Ioannides, Christos Constantinou, Aman Chadha, Aaron Elkins, Linsey Pang, Ravid Shwartz-Ziv, Yann LeCun

Comments: UniReps: Unifying Representations in Neural Models (NeurIPS 2025 Workshop)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27] arXiv:2512.07005 [pdf, ps, other]: Title: Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition

Authors: Zihao Wang, Ruibin Yuan, Ziqi Geng, Hengjia Li, Xingwei Qu, Xinyi Li, Songye Chen, Haoying Fu, Roger B. Dannenberg, Kejun Zhang

Comments: Accepted by ACMMM 2025

Journal-ref: Proceedings of the 33rd ACM International Conference on Multimedia (ACMMM 2025), Pages 12714-12721, October 27, 2025. Dublin, Ireland

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[28] arXiv:2512.06999 [pdf, ps, other]: Title: Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model

Authors: Zihao Wang, Ruibin Yuan, Ziqi Geng, Hengjia Li, Xingwei Qu, Xinyi Li, Songye Chen, Haoying Fu, Roger B. Dannenberg, Kejun Zhang

Comments: Accepted to ACMMM 2025 oral

Journal-ref: Proceedings of the 33rd ACM International Conference on Multimedia (ACMMM 2025), Pages 12227-12236

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[29] arXiv:2512.06890 [pdf, ps, other]: Title: What Needs to be Known in Order to Perform a Meaningful Scientific Comparison Between Animal Communications and Human Spoken Language

Authors: Roger K. Moore

Comments: 5 pages, 1 figure, Proc. Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR-24), Kos, Greece, 6 Sept. 2024

Journal-ref: Proc. Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR-24), pp 22-26, Kos, Greece, 6 Sept. 2024

Subjects: Sound (cs.SD)
[30] arXiv:2512.06757 [pdf, ps, other]: Title: XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice Association

Authors: Zhihua Fang, Shumei Tao, Junxu Wang, Liang He

Comments: FAME 2026 Technical Report

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
[31] arXiv:2512.06380 [pdf, ps, other]: Title: Protecting Bystander Privacy via Selective Hearing in LALMs

Authors: Xiao Zhan, Guangzhi Sun, Jose Such, Phil Woodland

Comments: Dataset: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[32] arXiv:2512.06259 [pdf, ps, other]: Title: Who Will Top the Charts? Multimodal Music Popularity Prediction via Adaptive Fusion of Modality Experts and Temporal Engagement Modeling

Authors: Yash Choudhary, Preeti Rao, Pushpak Bhattacharyya

Comments: 8 pages

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[33] arXiv:2512.06041 [pdf, ps, other]: Title: Technical Report of Nomi Team in the Environmental Sound Deepfake Detection Challenge 2026

Authors: Candy Olivia Mawalim, Haotian Zhang, Shogo Okada

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:2512.06040 [pdf, ps, other]: Title: Physics-Guided Deepfake Detection for Voice Authentication Systems

Authors: Alireza Mohammadi, Keshav Sood, Dhananjay Thiruvady, Asef Nazari

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[35] arXiv:2512.06022 [pdf, ps, other]: Title: DreamFoley: Scalable VLMs for High-Fidelity Video-to-Audio Generation

Authors: Fu Li, Weichao Zhao, You Li, Zhichao Zhou, Dongliang He

Comments: 10 pages; Bytedance

Subjects: Sound (cs.SD); Multimedia (cs.MM)
[36] arXiv:2512.07741 (cross-list from cs.LG) [pdf, ps, other]: Title: A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data

Authors: Agnes Norbury, George Fairs, Alexandra L. Georgescu, Matthew M. Nour, Emilia Molimpakis, Stefano Goria

Subjects: Machine Learning (cs.LG); Sound (cs.SD)
[37] arXiv:2512.07351 (cross-list from cs.CV) [pdf, ps, other]: Title: DeepAgent: A Dual Stream Multi Agent Fusion for Robust Multimodal Deepfake Detection

Authors: Sayeem Been Zaman, Wasimul Karim, Arefin Ittesafun Abian, Reem E. Mohamed, Md Rafiqul Islam, Asif Karim, Sami Azam

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD)
[38] arXiv:2512.07226 (cross-list from eess.AS) [pdf, ps, other]: Title: Unsupervised Single-Channel Audio Separation with Diffusion Source Priors

Authors: Runwu Shi, Chang Li, Jiang Wang, Rui Zhang, Nabeela Khan, Benjamin Yen, Takeshi Ashizawa, Kazuhiro Nakadai

Comments: 15 pages, 31 figures, accepted by The 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[39] arXiv:2512.07209 (cross-list from cs.MM) [pdf, ps, other]: Title: Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits

Authors: Masato Ishii, Akio Hayakawa, Takashi Shibuya, Yuki Mitsufuji

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Sound (cs.SD)
[40] arXiv:2512.06417 (cross-list from cs.LG) [pdf, ps, other]: Title: Hankel-FNO: Fast Underwater Acoustic Charting Via Physics-Encoded Fourier Neural Operator

Authors: Yifan Sun (1), Lei Cheng (1), Jianlong Li (1), Peter Gerstoft (2) ((1) College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, (2) Scripps Institution of Oceanography, University of California San Diego, La Jolla, USA)

Subjects: Machine Learning (cs.LG); Sound (cs.SD)
[41] arXiv:2512.06304 (cross-list from eess.AS) [pdf, ps, other]: Title: Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation

Authors: Xining Song, Zhihua Wei, Rui Wang, Haixiao Hu, Yanxiang Chen, Meng Han

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Sound (cs.SD)
[42] arXiv:2512.05994 (cross-list from eess.AS) [pdf, ps, other]: Title: KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening

Authors: Rohan Sharma, Dancheng Liu, Jingchen Sun, Shijie Zhou, Jiayu Qin, Jinjun Xiong, Changyou Chen

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)

Mon, 8 Dec 2025

[43] arXiv:2512.05592 [pdf, ps, other]: Title: The T12 System for AudioMOS Challenge 2025: Audio Aesthetics Score Prediction System Using KAN- and VERSA-based Models

Authors: Katsuhiko Yamamoto, Koichi Miyazaki, Shogo Seki

Comments: Accepted by IEEE ASRU 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:2512.05508 [pdf, ps, other]: Title: Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction

Authors: Yash Choudhary, Preeti Rao, Pushpak Bhattacharyya

Comments: 8 pages

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[45] arXiv:2512.05528 (cross-list from q-bio.NC) [pdf, ps, other]: Title: Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening

Authors: Taketo Akama, Zhuohao Zhang, Tsukasa Nagashima, Takagi Yutaka, Shun Minamikawa, Natalia Polouliakh

Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[46] arXiv:2512.05201 (cross-list from cs.NI) [pdf, ps, other]: Title: MuMeNet: A Network Simulator for Musical Metaverse Communications

Authors: Ali Al Housseini, Jaime Llorca, Luca Turchet, Tiziano Leidi, Cristina Rottondi, Omran Ayoub

Comments: To appear in 2025 IEEE 6th International Symposium on the Internet of Sounds (IS2) proceedings

Subjects: Networking and Internet Architecture (cs.NI); Sound (cs.SD)
[47] arXiv:2512.05126 (cross-list from eess.AS) [pdf, ps, other]: Title: SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model

Authors: Kaidi Wang, Yi He, Wenhao Guan, Weijie Wu, Hongwu Ding, Xiong Zhang, Di Wu, Meng Meng, Jian Luan, Lin Li, Qingyang Hong

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)

[ total of 47 entries: 1-50 | 9-47 ]
[ showing up to 50 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions, skipping first 8

Thu, 11 Dec 2025

Wed, 10 Dec 2025

Tue, 9 Dec 2025

Mon, 8 Dec 2025