Sound

Authors and titles for recent submissions, skipping first 17

[ total of 47 entries: 1-50 | 18-47 ]
[ showing up to 50 entries per page: fewer | more ]

Tue, 9 Dec 2025 (continued, showing last 10 of 19 entries)

[18] arXiv:2512.06041 [pdf, ps, other]: Title: Technical Report of Nomi Team in the Environmental Sound Deepfake Detection Challenge 2026

Authors: Candy Olivia Mawalim, Haotian Zhang, Shogo Okada

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2512.06040 [pdf, ps, other]: Title: Physics-Guided Deepfake Detection for Voice Authentication Systems

Authors: Alireza Mohammadi, Keshav Sood, Dhananjay Thiruvady, Asef Nazari

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20] arXiv:2512.06022 [pdf, ps, other]: Title: DreamFoley: Scalable VLMs for High-Fidelity Video-to-Audio Generation

Authors: Fu Li, Weichao Zhao, You Li, Zhichao Zhou, Dongliang He

Comments: 10 pages; Bytedance

Subjects: Sound (cs.SD); Multimedia (cs.MM)
[21] arXiv:2512.07741 (cross-list from cs.LG) [pdf, ps, other]: Title: A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data

Authors: Agnes Norbury, George Fairs, Alexandra L. Georgescu, Matthew M. Nour, Emilia Molimpakis, Stefano Goria

Subjects: Machine Learning (cs.LG); Sound (cs.SD)
[22] arXiv:2512.07351 (cross-list from cs.CV) [pdf, ps, other]: Title: DeepAgent: A Dual Stream Multi Agent Fusion for Robust Multimodal Deepfake Detection

Authors: Sayeem Been Zaman, Wasimul Karim, Arefin Ittesafun Abian, Reem E. Mohamed, Md Rafiqul Islam, Asif Karim, Sami Azam

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD)
[23] arXiv:2512.07226 (cross-list from eess.AS) [pdf, ps, other]: Title: Unsupervised Single-Channel Audio Separation with Diffusion Source Priors

Authors: Runwu Shi, Chang Li, Jiang Wang, Rui Zhang, Nabeela Khan, Benjamin Yen, Takeshi Ashizawa, Kazuhiro Nakadai

Comments: 15 pages, 31 figures, accepted by The 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[24] arXiv:2512.07209 (cross-list from cs.MM) [pdf, ps, other]: Title: Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits

Authors: Masato Ishii, Akio Hayakawa, Takashi Shibuya, Yuki Mitsufuji

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Sound (cs.SD)
[25] arXiv:2512.06417 (cross-list from cs.LG) [pdf, ps, other]: Title: Hankel-FNO: Fast Underwater Acoustic Charting Via Physics-Encoded Fourier Neural Operator

Authors: Yifan Sun (1), Lei Cheng (1), Jianlong Li (1), Peter Gerstoft (2) ((1) College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, (2) Scripps Institution of Oceanography, University of California San Diego, La Jolla, USA)

Subjects: Machine Learning (cs.LG); Sound (cs.SD)
[26] arXiv:2512.06304 (cross-list from eess.AS) [pdf, ps, other]: Title: Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation

Authors: Xining Song, Zhihua Wei, Rui Wang, Haixiao Hu, Yanxiang Chen, Meng Han

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Sound (cs.SD)
[27] arXiv:2512.05994 (cross-list from eess.AS) [pdf, ps, other]: Title: KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening

Authors: Rohan Sharma, Dancheng Liu, Jingchen Sun, Shijie Zhou, Jiayu Qin, Jinjun Xiong, Changyou Chen

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)

Mon, 8 Dec 2025

[28] arXiv:2512.05592 [pdf, ps, other]: Title: The T12 System for AudioMOS Challenge 2025: Audio Aesthetics Score Prediction System Using KAN- and VERSA-based Models

Authors: Katsuhiko Yamamoto, Koichi Miyazaki, Shogo Seki

Comments: Accepted by IEEE ASRU 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29] arXiv:2512.05508 [pdf, ps, other]: Title: Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction

Authors: Yash Choudhary, Preeti Rao, Pushpak Bhattacharyya

Comments: 8 pages

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[30] arXiv:2512.05528 (cross-list from q-bio.NC) [pdf, ps, other]: Title: Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening

Authors: Taketo Akama, Zhuohao Zhang, Tsukasa Nagashima, Takagi Yutaka, Shun Minamikawa, Natalia Polouliakh

Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[31] arXiv:2512.05201 (cross-list from cs.NI) [pdf, ps, other]: Title: MuMeNet: A Network Simulator for Musical Metaverse Communications

Authors: Ali Al Housseini, Jaime Llorca, Luca Turchet, Tiziano Leidi, Cristina Rottondi, Omran Ayoub

Comments: To appear in 2025 IEEE 6th International Symposium on the Internet of Sounds (IS2) proceedings

Subjects: Networking and Internet Architecture (cs.NI); Sound (cs.SD)
[32] arXiv:2512.05126 (cross-list from eess.AS) [pdf, ps, other]: Title: SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model

Authors: Kaidi Wang, Yi He, Wenhao Guan, Weijie Wu, Hongwu Ding, Xiong Zhang, Di Wu, Meng Meng, Jian Luan, Lin Li, Qingyang Hong

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)

Fri, 5 Dec 2025

[33] arXiv:2512.04847 [pdf, ps, other]: Title: Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding

Authors: Tsai-Ning Wang, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[34] arXiv:2512.04827 [pdf, ps, other]: Title: Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs

Authors: Wenzhang Du

Comments: 11 pages, 3 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[35] arXiv:2512.04814 [pdf, ps, other]: Title: Shared Multi-modal Embedding Space for Face-Voice Association

Authors: Christopher Simic, Korbinian Riedhammer, Tobias Bocklet

Comments: Ranked 1st in Fame 2026 Challenge, ICASSP

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
[36] arXiv:2512.04793 [pdf, ps, other]: Title: YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases

Authors: Gongyu Chen, Xiaoyu Zhang, Zhenqiang Weng, Junjie Zheng, Da Shen, Chaofan Ding, Wei-Qiang Zhang, Zihao Chen

Comments: 17 pages, 5 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[37] arXiv:2512.04779 [pdf, ps, other]: Title: YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance

Authors: Junjie Zheng, Chunbo Hao, Guobin Ma, Xiaoyu Zhang, Gongyu Chen, Chaofan Ding, Zihao Chen, Lei Xie

Comments: 13 pages, 3 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[38] arXiv:2512.04720 [pdf, ps, other]: Title: M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis

Authors: Xiaopeng Wang, Chunyu Qiang, Ruibo Fu, Zhengqi Wen, Xuefei Liu, Yukun Liu, Yuzhe Liang, Kang Yin, Yuankun Xie, Heng Xie, Chenxing Li, Chen Zhang, Changsheng Li

Comments: Submitted to ICASSP 2026

Subjects: Sound (cs.SD)
[39] arXiv:2512.04711 [pdf, ps, other]: Title: Large Speech Model Enabled Semantic Communication

Authors: Yun Tian, Zhijin Qin, Guocheng Lv, Ye Jin, Kaibin Huang, Zhu Han

Comments: 15 pages, 9 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[40] arXiv:2512.04616 [pdf, ps, other]: Title: Standard audiogram classification from loudness scaling data using unsupervised, supervised, and explainable machine learning techniques

Authors: Chen Xu, Lena Schell-Majoor, Birger Kollmeier

Subjects: Sound (cs.SD); Medical Physics (physics.med-ph)
[41] arXiv:2512.04552 [pdf, ps, other]: Title: RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS

Authors: Cong Wang, Changfeng Gao, Yang Xiang, Zhihao Du, Keyu An, Han Zhao, Qian Chen, Xiangang Li, Yingming Gao, Ya Li

Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[42] arXiv:2512.04551 [pdf, ps, other]: Title: Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention

Authors: Cong Wang, Yizhong Geng, Yuhua Wen, Qifei Li, Yingming Gao, Ruimin Wang, Chunfeng Wang, Hao Li, Ya Li, Wei Chen

Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Thu, 4 Dec 2025

[43] arXiv:2512.03637 [pdf, ps, other]: Title: AaPE: Aliasing-aware Patch Embedding for Self-Supervised Audio Representation Learning

Authors: Kohei Yamamoto, Kosuke Okusa

Comments: 11 pages, 4 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Machine Learning (stat.ML)
[44] arXiv:2512.03563 [pdf, ps, other]: Title: State Space Models for Bioacoustics: A comparative Evaluation with Transformers

Authors: Chengyu Tang, Sanjeev Baskiyar

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[45] arXiv:2512.03783 (cross-list from cs.AI) [pdf, ps, other]: Title: Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning

Authors: Dongchao Yang, Songxiang Liu, Disong Wang, Yuanyuan Wang, Guanglu Wan, Helen Meng

Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)
[46] arXiv:2512.03636 (cross-list from cs.HC) [pdf, ps, other]: Title: Head, posture, and full-body gestures in dyadic conversations

Authors: Ľuboš Hládek, Bernhard U. Seeber

Comments: 7 figures, 10 tables, 29 pages

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47] arXiv:2512.03458 (cross-list from eess.SP) [pdf, ps, other]: Title: A Convolutional Framework for Mapping Imagined Auditory MEG into Listened Brain Responses

Authors: Maryam Maghsoudi, Mohsen Rezaeizadeh, Shihab Shamma

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 47 entries: 1-50 | 18-47 ]
[ showing up to 50 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions, skipping first 17

Tue, 9 Dec 2025 (continued, showing last 10 of 19 entries)

Mon, 8 Dec 2025

Fri, 5 Dec 2025

Thu, 4 Dec 2025