Computer Vision and Pattern Recognition

Authors and titles for recent submissions, skipping first 386

[ total of 603 entries: 1-250 | 137-386 | 387-603 ]
[ showing 250 entries per page: fewer | more | all ]

Mon, 29 Dec 2025 (continued, showing last 54 of 96 entries)

[387] arXiv:2512.21734 [pdf, ps, other]: Title: Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation

Authors: Steven Xiao, Xindi Zhang, Dechao Meng, Qi Wang, Peng Zhang, Bang Zhang

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[388] arXiv:2512.21714 [pdf, ps, other]: Title: AstraNav-World: World Model for Foresight Control and Consistency

Authors: Junjun Hu, Jintao Chen, Haochen Bai, Minghua Luo, Shichao Xie, Ziyi Chen, Fei Liu, Zedong Chu, Xinda Xue, Botao Ren, Xiaolong Wu, Mu Xu, Shanghang Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[389] arXiv:2512.21710 [pdf, ps, other]: Title: RAPTOR: Real-Time High-Resolution UAV Video Prediction with Efficient Video Attention

Authors: Zhan Chen, Zile Guo, Enze Zhu, Peirong Zhang, Xiaoxuan Liu, Lei Wang, Yidan Zhang

Comments: Accepted by AAAI2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[390] arXiv:2512.21707 [pdf, ps, other]: Title: Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction

Authors: Zheng Yin, Chengjian Li, Xiangbo Shu, Meiqi Cao, Rui Yan, Jinhui Tang

Comments: 12 pages, 7 figures, Accepted by AAAI 2026 (oral)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[391] arXiv:2512.21695 [pdf, ps, other]: Title: FUSE: Unifying Spectral and Semantic Cues for Robust AI-Generated Image Detection

Authors: Md. Zahid Hossain, Most. Sharmin Sultana Samu, Md. Kamrozzaman Bhuiyan, Farhad Uz Zaman, Md. Rakibul Islam

Comments: accepted for publication in 2025 28th International Conference on Computer and Information Technology (ICCIT)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[392] arXiv:2512.21694 [pdf, ps, other]: Title: BeHGAN: Bengali Handwritten Word Generation from Plain Text Using Generative Adversarial Networks

Authors: Md. Rakibul Islam, Md. Kamrozzaman Bhuiyan, Safwan Muntasir, Arifur Rahman Jawad, Most. Sharmin Sultana Samu

Comments: Accepted for publication in 2025 28th International Conference on Computer and Information Technology (ICCIT)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[393] arXiv:2512.21693 [pdf, ps, other]: Title: Prior-AttUNet: Retinal OCT Fluid Segmentation Based on Normal Anatomical Priors and Attention Gating

Authors: Li Yang, Yuting Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[394] arXiv:2512.21692 [pdf, ps, other]: Title: ShinyNeRF: Digitizing Anisotropic Appearance in Neural Radiance Fields

Authors: Albert Barreiro, Roger Marí, Rafael Redondo, Gloria Haro, Carles Bosch

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[395] arXiv:2512.21691 [pdf, ps, other]: Title: Analyzing the Mechanism of Attention Collapse in VGGT from a Dynamics Perspective

Authors: Huan Li, Longjun Luo, Yuling Shi, Xiaodong Gu

Comments: 8 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[396] arXiv:2512.21684 [pdf, ps, other]: Title: SlideChain: Semantic Provenance for Lecture Understanding via Blockchain Registration

Authors: Md Motaleb Hossen Manik, Md Zabirul Islam, Ge Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[397] arXiv:2512.21683 [pdf, ps, other]: Title: Contrastive Graph Modeling for Cross-Domain Few-Shot Medical Image Segmentation

Authors: Yuntian Bo, Tao Zhou, Zechao Li, Haofeng Zhang, Ling Shao

Comments: Accepted to IEEE Transactions on Medical Imaging (T-MI), 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[398] arXiv:2512.21675 [pdf, ps, other]: Title: UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Authors: Shuo Cao, Jiayang Li, Xiaohui Li, Yuandong Pu, Kaiwen Zhu, Yuanting Gao, Siqi Luo, Yi Xin, Qi Qin, Yu Zhou, Xiangyu Chen, Wenlong Zhang, Bin Fu, Yu Qiao, Yihao Liu

Comments: 27 pages, 14 figures, 17 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[399] arXiv:2512.21673 [pdf, ps, other]: Title: Comparative Analysis of Deep Learning Models for Perception in Autonomous Vehicles

Authors: Jalal Khan

Comments: 6 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[400] arXiv:2512.21670 [pdf, ps, other]: Title: The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds

Authors: Subramanyam Sahoo, Jared Junkin

Comments: 10 pages, 5 figures, Initial Work

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[401] arXiv:2512.21643 [pdf, ps, other]: Title: Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding

Authors: Zhiwang Zhou, Yuandong Pu, Xuming He, Yidi Liu, Yixin Chen, Junchao Gong, Xiang Zhuang, Wanghan Xu, Qinglong Cao, Shixiang Tang, Yihao Liu, Wenlong Zhang, Lei Bai

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[402] arXiv:2512.21641 [pdf, ps, other]: Title: TrackTeller: Temporal Multimodal 3D Grounding for Behavior-Dependent Object References

Authors: Jiahong Yu, Ziqi Wang, Hailiang Zhao, Wei Zhai, Xueqiang Yan, Shuiguang Deng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[403] arXiv:2512.21637 [pdf, ps, other]: Title: Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints

Authors: Mutiara Shabrina, Nova Kurnia Putri, Jefri Satria Ferdiansyah, Sabita Khansa Dewi, Novanto Yudistira

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[404] arXiv:2512.21618 [pdf, ps, other]: Title: SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration

Authors: Zhiyuan Liu, Daocheng Fu, Pinlong Cai, Lening Wang, Ying Liu, Yilong Ren, Botian Shi, Jianqiang Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[405] arXiv:2512.21617 [pdf, ps, other]: Title: CausalFSFG: Rethinking Few-Shot Fine-Grained Visual Categorization from Causal Perspective

Authors: Zhiwen Yang, Jinglin Xu, Yuxin Pen

Comments: 12 pages, 5 figures, accepted by IEEE TMM

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[406] arXiv:2512.21616 [pdf, ps, other]: Title: TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant

Authors: Rongpei Hong, Jian Lang, Ting Zhong, Yong Wang, Fan Zhou

Comments: Accepted by KDD 2026 research track. Code and data are available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[407] arXiv:2512.21599 [pdf, ps, other]: Title: GaussianEM: Model compositional and conformational heterogeneity using 3D Gaussians

Authors: Bintao He, Yiran Cheng, Hongjia Li, Xiang Gao, Xin Gao, Fa Zhang, Renmin Han

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[408] arXiv:2512.21598 [pdf, ps, other]: Title: From Shallow Humor to Metaphor: Towards Label-Free Harmful Meme Detection via LMM Agent Self-Improvement

Authors: Jian Lang, Rongpei Hong, Ting Zhong, Leiting Chen, Qiang Gao, Fan Zhou

Comments: 12 pages. Accepted by KDD 2026 research track. Codes are released at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[409] arXiv:2512.21584 [pdf, ps, other]: Title: UltraLBM-UNet: Ultralight Bidirectional Mamba-based Model for Skin Lesion Segmentation

Authors: Linxuan Fan (1), Juntao Jiang (2), Weixuan Liu (3), Zhucun Xue (2), Jiajun Lv (2), Jiangning Zhang (2), Yong Liu (2) ((1) Data Science Institute, Vanderbilt University, Nashville, USA (2) College of Control Science and Engineering, Zhejiang University, Hangzhou, China (3) School of Computer Science and Technology, East China Normal University, Shanghai, China)

Comments: 12 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[410] arXiv:2512.21582 [pdf, ps, other]: Title: LLM-Free Image Captioning Evaluation in Reference-Flexible Settings

Authors: Shinnosuke Hirano, Yuiga Wada, Kazuki Matsuda, Seitaro Otsuki, Komei Sugiura

Comments: Accepted for presentation at AAAI2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[411] arXiv:2512.21576 [pdf, ps, other]: Title: Towards Long-window Anchoring in Vision-Language Model Distillation

Authors: Haoyi Zhou, Shuo Li, Tianyu Chen, Qi Song, Chonghan Gao, Jianxin Li

Comments: Accepted by AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[412] arXiv:2512.21562 [pdf, ps, other]: Title: Exploration of Reproducible Generated Image Detection

Authors: Yihang Duan

Comments: AAAI workshop RAI accepted

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[413] arXiv:2512.21560 [pdf, ps, other]: Title: Toward Intelligent Scene Augmentation for Context-Aware Object Placement and Sponsor-Logo Integration

Authors: Unnati Saraswat, Tarun Rao, Namah Gupta, Shweta Swami, Shikhar Sharma, Prateek Narang, Dhruv Kumar

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[414] arXiv:2512.21545 [pdf, ps, other]: Title: EraseLoRA: MLLM-Driven Foreground Exclusion and Background Subtype Aggregation for Dataset-Free Object Removal

Authors: Sanghyun Jo, Donghwan Lee, Eunji Jung, Seong Je Oh, Kyungsu Kim

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[415] arXiv:2512.21542 [pdf, ps, other]: Title: Vision Transformers are Circulant Attention Learners

Authors: Dongchen Han, Tianyu Li, Ziyi Wang, Gao Huang

Comments: AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[416] arXiv:2512.21529 [pdf, ps, other]: Title: Hierarchy-Aware Fine-Tuning of Vision-Language Models

Authors: Jiayu Li, Rajesh Gangireddy, Samet Akcay, Wei Cheng, Juhua Hu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[417] arXiv:2512.21514 [pdf, ps, other]: Title: DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO

Authors: Henglin Liu, Huijuan Huang, Jing Wang, Chang Liu, Xiu Li, Xiangyang Ji

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[418] arXiv:2512.21513 [pdf, ps, other]: Title: MuS-Polar3D: A Benchmark Dataset for Computational Polarimetric 3D Imaging under Multi-Scattering Conditions

Authors: Puyun Wang, Kaimin Yu, Huayang He, Xianyu Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[419] arXiv:2512.21512 [pdf, ps, other]: Title: Fixed-Threshold Evaluation of a Hybrid CNN-ViT for AI-Generated Image Detection Across Photos and Art

Authors: Md Ashik Khan, Arafat Alam Jion

Comments: Accepted at the 2025 28th International Conference on Computer and Information Technology (ICCIT). 6 pages, 5 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[420] arXiv:2512.21508 [pdf, ps, other]: Title: Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification

Authors: Md Ashik Khan, Md Nahid Siddique

Comments: Accepted at the 2025 28th International Conference on Computer and Information Technology (ICCIT). 6 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[421] arXiv:2512.21507 [pdf, ps, other]: Title: SVBench: Evaluation of Video Generation Models on Social Reasoning

Authors: Wenshuo Peng, Gongxuan Wang, Tianmeng Yang, Chuanhao Li, Xiaojie Xu, Hui He, Kaipeng Zhang

Comments: 10pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[422] arXiv:2512.21495 [pdf, ps, other]: Title: Generative Multi-Focus Image Fusion

Authors: Xinzhe Xie, Buyu Guo, Bolin Li, Shuangyan He, Yanzhen Gu, Qingyan Jiang, Peiliang Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[423] arXiv:2512.21476 [pdf, ps, other]: Title: GPF-Net: Gated Progressive Fusion Learning for Polyp Re-Identification

Authors: Suncheng Xiang, Xiaoyang Wang, Junjie Jiang, Hejia Wang, Dahong Qian

Comments: Work in progress

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[424] arXiv:2512.21472 [pdf, ps, other]: Title: IMA++: ISIC Archive Multi-Annotator Dermoscopic Skin Lesion Segmentation Dataset

Authors: Kumar Abhishek, Jeremy Kawahara, Ghassan Hamarneh

Comments: 11 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[425] arXiv:2512.21459 [pdf, ps, other]: Title: CCAD: Compressed Global Feature Conditioned Anomaly Detection

Authors: Xiao Jin, Liang Diao, Qixin Xiao, Yifan Hu, Ziqi Zhang, Yuchen Liu, Haisong Gu

Comments: 18 pages, 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[426] arXiv:2512.21452 [pdf, ps, other]: Title: Intelligent recognition of GPR road hidden defect images based on feature fusion and attention mechanism

Authors: Haotian Lv, Yuhui Zhang, Jiangbo Dai, Hanli Wu, Jiaji Wang, Dawei Wang

Comments: Accepted for publication in *IEEE Transactions on Geoscience and Remote Sensing*

Journal-ref: IEEE Transactions on Geoscience and Remote Sensing, 2025, 63, 5213217

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[427] arXiv:2512.21434 [pdf, ps, other]: Title: Scalable Deep Subspace Clustering Network

Authors: Nairouz Mrabah, Mohamed Bouguessa, Sihem Sami

Comments: Published at the 2025 IEEE 12th International Conference on Data Science and Advanced Analytics (DSAA)

Journal-ref: Proceedings of the IEEE 12th International Conference on Data Science and Advanced Analytics (DSAA), 2025, pp. 1-10

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[428] arXiv:2512.21414 [pdf, ps, other]: Title: A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding

Authors: Christina Liu, Alan Q. Wang, Joy Hsu, Jiajun Wu, Ehsan Adeli

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[429] arXiv:2512.21402 [pdf, ps, other]: Title: Understanding Virality: A Rubric based Vision-Language Model Framework for Short-Form Edutainment Evaluation

Authors: Arnav Gupta, Gurekas Singh Sahney, Hardik Rathi, Abhishek Chandwani, Ishaan Gupta, Pratik Narang, Dhruv Kumar

Comments: Under Review

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[430] arXiv:2512.22016 (cross-list from cs.HC) [pdf, ps, other]: Title: SketchPlay: Intuitive Creation of Physically Realistic VR Content with Gesture-Driven Sketching

Authors: Xiangwen Zhang, Xiaowei Dai, Runnan Chen, Xiaoming Chen, Zeke Zexi Hu

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
[431] arXiv:2512.21988 (cross-list from eess.IV) [pdf, ps, other]: Title: The Color-Clinical Decoupling: Why Perceptual Calibration Fails Clinical Biomarkers in Smartphone Dermatology

Authors: Sungwoo Kang

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
[432] arXiv:2512.21975 (cross-list from eess.IV) [pdf, ps, other]: Title: RT-Focuser: A Real-Time Lightweight Model for Edge-side Image Deblurring

Authors: Zhuoyu Wu, Wenhui Ou, Qiawei Zheng, Jiayan Yang, Quanjun Wang, Wenqi Fang, Zheng Wang, Yongkui Yang, Heshan Li

Comments: 2 pages, 2 figures, this paper already accepted by IEEE ICTA 2025

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[433] arXiv:2512.21789 (cross-list from cs.CL) [pdf, ps, other]: Title: Five Years of SciCap: What We Learned and Future Directions for Scientific Figure Captioning

Authors: Ting-Hao K.Huang, Ryan A. Rossi, Sungchul Kim, Tong Yu, Ting-Yao E. Hsu, Ho Yin (Sam)Ng, C. Lee Giles

Comments: Accepted to the 5th Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE 2026)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[434] arXiv:2512.21747 (cross-list from cs.HC) [pdf, ps, other]: Title: Modified TSception for Analyzing Driver Drowsiness and Mental Workload from EEG

Authors: Gourav Siddhad, Anurag Singh, Rajkumar Saini, Partha Pratim Roy

Comments: 8 Pages, 3 Figures, 1 Table

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
[435] arXiv:2512.21743 (cross-list from cs.LG) [pdf, ps, other]: Title: Dynamic Feedback Engines: Layer-Wise Control for Self-Regulating Continual Learning

Authors: Hengyi Wu, Zhenyi Wang, Heng Huang

Comments: 14 pages

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[436] arXiv:2512.21602 (cross-list from cs.LG) [pdf, ps, other]: Title: Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emergency and Critical Care

Authors: Yusuf Brima, Marcellin Atemkeng

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[437] arXiv:2512.21593 (cross-list from stat.ML) [pdf, ps, other]: Title: Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models

Authors: Takuro Kutsuna

Comments: 40 pages

Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[438] arXiv:2512.21516 (cross-list from cs.LG) [pdf, ps, other]: Title: Global-Graph Guided and Local-Graph Weighted Contrastive Learning for Unified Clustering on Incomplete and Noise Multi-View Data

Authors: Hongqing He, Jie Xu, Wenyuan Yang, Yonghua Zhu, Guoqiu Wen, Xiaofeng Zhu

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[439] arXiv:2512.21510 (cross-list from cs.LG) [pdf, ps, other]: Title: Missing Pattern Tree based Decision Grouping and Ensemble for Deep Incomplete Multi-View Clustering

Authors: Wenyuan Yang, Jie Xu, Hongqing He, Jiangzhang Gan, Xiaofeng Zhu

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[440] arXiv:2512.21372 (cross-list from eess.IV) [pdf, ps, other]: Title: A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI

Authors: Md Assaduzzaman, Nushrat Jahan Oyshi, Eram Mahamud

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Thu, 25 Dec 2025

[441] arXiv:2512.21338 [pdf, ps, other]: Title: HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Authors: Haonan Qiu, Shikun Liu, Zijian Zhou, Zhaochong An, Weiming Ren, Zhiheng Liu, Jonas Schult, Sen He, Shoufa Chen, Yuren Cong, Tao Xiang, Ziwei Liu, Juan-Manuel Perez-Rua

Comments: Project Page: this http URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[442] arXiv:2512.21337 [pdf, ps, other]: Title: Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models

Authors: Li-Zhong Szu-Tu, Ting-Lin Wu, Chia-Jui Chang, He Syu, Yu-Lun Liu

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[443] arXiv:2512.21334 [pdf, ps, other]: Title: Streaming Video Instruction Tuning

Authors: Jiaer Xia, Peixian Chen, Mengdan Zhang, Xing Sun, Kaiyang Zhou

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[444] arXiv:2512.21333 [pdf, ps, other]: Title: Fast SAM2 with Text-Driven Token Pruning

Authors: Avilasha Mandal, Chaoning Zhang, Fachrina Dewi Puspitasari, Xudong Wang, Jiaquan Zhang, Caiyan Qin, Guoqing Wang, Yang Yang, Heng Tao Shen

Comments: 28 pages, 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[445] arXiv:2512.21331 [pdf, ps, other]: Title: TICON: A Slide-Level Tile Contextualizer for Histopathology Representation Learning

Authors: Varun Belagali, Saarthak Kapse, Pierre Marza, Srijan Das, Zilinghan Li, Sofiène Boutaj, Pushpak Pati, Srikar Yellapragada, Tarak Nath Nandi, Ravi K Madduri, Joel Saltz, Prateek Prasanna, Stergios Christodoulidis, Maria Vakalopoulou, Dimitris Samaras

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[446] arXiv:2512.21302 [pdf, ps, other]: Title: AndroidLens: Long-latency Evaluation with Nested Sub-targets for Android GUI Agents

Authors: Yue Cao, Yingyao Wang, Pi Bu, Jingxuan Xing, Wei Jiang, Zekun Zhu, Junpeng Ma, Sashuai Zhou, Tong Lu, Jun Song, Yu Cheng, Yuning Jiang, Bo Zheng

Comments: 23 pages, 13 figures, 8 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[447] arXiv:2512.21287 [pdf, ps, other]: Title: Post-Processing Mask-Based Table Segmentation for Structural Coordinate Extraction

Authors: Suren Bandara

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[448] arXiv:2512.21284 [pdf, ps, other]: Title: Surgical Scene Segmentation using a Spike-Driven Video Transformer with Real-Time Potential

Authors: Shihao Zou, Jingjing Li, Wei Ji, Jincai Huang, Kai Wang, Guo Dan, Weixin Si, Yi Pan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[449] arXiv:2512.21276 [pdf, ps, other]: Title: GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

Authors: Snehal Singh Tomar, Alexandros Graikos, Arjun Krishna, Dimitris Samaras, Klaus Mueller

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[450] arXiv:2512.21268 [pdf, ps, other]: Title: ACD: Direct Conditional Control for Video Diffusion Models via Attention Supervision

Authors: Weiqi Li, Zehao Zhang, Liang Lin, Guangrun Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[451] arXiv:2512.21264 [pdf, ps, other]: Title: AnyAD: Unified Any-Modality Anomaly Detection in Incomplete Multi-Sequence MRI

Authors: Changwei Wu, Yifei Chen, Yuxin Du, Mingxuan Liu, Jinying Zong, Beining Wu, Jie Dong, Feiwei Qin, Yunkang Cao, Qiyuan Tian

Comments: 15 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[452] arXiv:2512.21252 [pdf, ps, other]: Title: DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

Authors: Jiawei Liu, Junqiao Li, Jiangfan Deng, Gen Li, Siyu Zhou, Zetao Fang, Shanshan Lao, Zengde Deng, Jianing Zhu, Tingting Ma, Jiayi Li, Yunqiu Wang, Qian He, Xinglong Wu

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[453] arXiv:2512.21237 [pdf, ps, other]: Title: SegMo: Segment-aligned Text to 3D Human Motion Generation

Authors: Bowen Dang, Lin Wu, Xiaohang Yang, Zheng Yuan, Zhixiang Chen

Comments: The IEEE/CVF Winter Conference on Applications of Computer Vision 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[454] arXiv:2512.21221 [pdf, ps, other]: Title: Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval

Authors: Dao Sy Duy Minh, Huynh Trung Kiet, Nguyen Lam Phu Quy, Phu-Hoa Pham, Tran Chi Nguyen

Comments: System description paper for EVENTA Grand Challenge Track 2 at ACM Multimedia 2025 (MM '25). Ranked 4th place. 6 pages, 1 figure, 2 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[455] arXiv:2512.21218 [pdf, ps, other]: Title: Latent Implicit Visual Reasoning

Authors: Kelvin Li, Chuyi Shang, Leonid Karlinsky, Rogerio Feris, Trevor Darrell, Roei Herzig

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[456] arXiv:2512.21209 [pdf, ps, other]: Title: Human Motion Estimation with Everyday Wearables

Authors: Siqi Zhu, Yixuan Li, Junfu Li, Qi Wu, Zan Wang, Haozhe Ma, Wei Liang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[457] arXiv:2512.21194 [pdf, ps, other]: Title: VisRes Bench: On Evaluating the Visual Reasoning Capabilities of VLMs

Authors: Brigitta Malagurski Törtei, Yasser Dahou, Ngoc Dung Huynh, Wamiq Reyaz Para, Phúc H. Lê Khac, Ankit Singh, Sofian Chaybouti, Sanath Narayan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[458] arXiv:2512.21185 [pdf, ps, other]: Title: UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

Authors: Tanghui Jia, Dongyu Yan, Dehao Hao, Yang Li, Kaiyi Zhang, Xianyi He, Lanjiong Li, Yuhan Wang, Jinnan Chen, Lutao Jiang, Qishen Yin, Long Quan, Ying-Cong Chen, Li Yuan

Comments: 14 pages, 10 figures, Technical Report,

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[459] arXiv:2512.21183 [pdf, ps, other]: Title: Towards Arbitrary Motion Completing via Hierarchical Continuous Representation

Authors: Chenghao Xu, Guangtao Lyu, Qi Liu, Jiexi Yan, Muli Yang, Cheng Deng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[460] arXiv:2512.21174 [pdf, ps, other]: Title: A Turn Toward Better Alignment: Few-Shot Generative Adaptation with Equivariant Feature Rotation

Authors: Chenghao Xu, Qi Liu, Jiexi Yan, Muli Yang, Cheng Deng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[461] arXiv:2512.21150 [pdf, ps, other]: Title: ORCA: Object Recognition and Comprehension for Archiving Marine Species

Authors: Yuk-Kwan Wong, Haixin Liang, Zeyu Ma, Yiwei Chen, Ziqiang Zheng, Rinaldi Gotama, Pascal Sebastian, Lauren D. Sparks, Sai-Kit Yeung

Comments: Accepted by The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[462] arXiv:2512.21135 [pdf, ps, other]: Title: TGC-Net: A Structure-Aware and Semantically-Aligned Framework for Text-Guided Medical Image Segmentation

Authors: Gaoren Lin, Huangxuan Zhao, Yuan Xiong, Lefei Zhang, Bo Du, Wentao Zhu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[463] arXiv:2512.21126 [pdf, ps, other]: Title: MarineEval: Assessing the Marine Intelligence of Vision-Language Models

Authors: YuK-Kwan Wong, Tuan-An To, Jipeng Zhang, Ziqiang Zheng, Sai-Kit Yeung

Comments: Accepted by The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)
[464] arXiv:2512.21104 [pdf, ps, other]: Title: FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting

Authors: Chao Gong, Dong Li, Yingwei Pan, Jingjing Chen, Ting Yao, Tao Mei

Comments: Accepted by AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[465] arXiv:2512.21095 [pdf, ps, other]: Title: UniRec-0.1B: Unified Text and Formula Recognition with 0.1B Parameters

Authors: Yongkun Du, Zhineng Chen, Yazhen Xie, Weikang Baiand, Hao Feng, Wei Shi, Yuchen Su, Can Huang, Yu-Gang Jiang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[466] arXiv:2512.21094 [pdf, ps, other]: Title: T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

Authors: Zhe Cao, Tao Wang, Jiaming Wang, Yanghai Wang, Yuanxing Zhang, Jialu Chen, Miao Deng, Jiahao Wang, Yubin Guo, Chenxi Liao, Yize Zhang, Zhaoxiang Zhang, Jiaheng Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[467] arXiv:2512.21083 [pdf, ps, other]: Title: Hierarchical Modeling Approach to Fast and Accurate Table Recognition

Authors: Takaya Kawakatsu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[468] arXiv:2512.21078 [pdf, ps, other]: Title: UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer

Authors: Tianchen Deng, Xun Chen, Ziming Li, Hongming Shen, Danwei Wang, Javier Civera, Hesheng Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[469] arXiv:2512.21064 [pdf, ps, other]: Title: Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition

Authors: Hongsong Wang, Heng Fei, Bingxuan Dai, Jie Gui

Comments: Accepted by Machine Intelligence Research (Journal Impact Factor 8.7, 2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[470] arXiv:2512.21058 [pdf, ps, other]: Title: Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control

Authors: Minghao Han, YiChen Liu, Yizhou Liu, Zizhi Chen, Jingqun Tang, Xuecheng Wu, Dingkang Yang, Lihua Zhang

Comments: 32 pages, 17 figures, and 6 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[471] arXiv:2512.21054 [pdf, ps, other]: Title: DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors

Authors: Kaustubh Kundu, Hrishav Bakul Barua, Lucy Robertson-Bell, Zhixi Cai, Kalin Stefanov

Comments: Accepted in WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[472] arXiv:2512.21053 [pdf, ps, other]: Title: Optical Flow-Guided 6DoF Object Pose Tracking with an Event Camera

Authors: Zibin Liu, Banglei Guan, Yang Shang, Shunkun Liang, Zhenbao Yu, Qifeng Yu

Comments: 9 pages, 5 figures. In Proceedings of the 32nd ACM International Conference on Multimedia (MM '24)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[473] arXiv:2512.21050 [pdf, ps, other]: Title: Matrix Completion Via Reweighted Logarithmic Norm Minimization

Authors: Zhijie Wang, Liangtian He, Qinghua Zhang, Jifei Miao, Liang-Jian Deng, Jun Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[474] arXiv:2512.21040 [pdf, ps, other]: Title: A Large-Depth-Range Layer-Based Hologram Dataset for Machine Learning-Based 3D Computer-Generated Holography

Authors: Jaehong Lee, You Chan No, YoungWoo Kim, Duksu Kim

Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
[475] arXiv:2512.21038 [pdf, ps, other]: Title: Next-Scale Prediction: A Self-Supervised Approach for Real-World Image Denoising

Authors: Yiwen Shan, Haiyu Zhao, Peng Hu, Xi Peng, Yuanbiao Gou

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[476] arXiv:2512.21032 [pdf, ps, other]: Title: Multi-Attribute guided Thermal Face Image Translation based on Latent Diffusion Model

Authors: Mingshu Cai, Osamu Yoshie, Yuya Ieiri

Comments: Accepted by 2025 IEEE International Joint Conference on Biometrics (IJCB 2025)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[477] arXiv:2512.21019 [pdf, ps, other]: Title: Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face

Authors: Rui-qing Sun, Xingshan Yao, Tian Lan, Jia-Ling Shi, Chen-Hao Cui, Hui-Yang Zhao, Zhijing Wu, Chen Yang, Xian-Ling Mao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[478] arXiv:2512.21015 [pdf, ps, other]: Title: FluencyVE: Marrying Temporal-Aware Mamba with Bypass Attention for Video Editing

Authors: Mingshu Cai, Yixuan Li, Osamu Yoshie, Yuya Ieiri

Comments: Accepted by IEEE Transactions on Multimedia (TMM)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[479] arXiv:2512.21011 [pdf, ps, other]: Title: Granular-ball Guided Masking: Structure-aware Data Augmentation

Authors: Shuyin Xia, Fan Chen, Dawei Dai, Meng Yang, Junwei Han, Xinbo Gao, Guoyin Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[480] arXiv:2512.21004 [pdf, ps, other]: Title: Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations

Authors: Jinghan Li, Yang Jin, Hao Jiang, Yadong Mu, Yang Song, Kun Xu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[481] arXiv:2512.21003 [pdf, ps, other]: Title: MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds

Authors: Xiangzuo Wu, Chengwei Ren, Jun Zhou, Xiu Li, Yuan Liu

Comments: 21 pages, 17 figures, 5 tables, project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[482] arXiv:2512.20988 [pdf, ps, other]: Title: PUFM++: Point Cloud Upsampling via Enhanced Flow Matching

Authors: Zhi-Song Liu, Chenhang He, Roland Maier, Andreas Rupp

Comments: 21 pages, 15 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[483] arXiv:2512.20980 [pdf, ps, other]: Title: X-ray Insights Unleashed: Pioneering the Enhancement of Multi-Label Long-Tail Data

Authors: Xinquan Yang, Jinheng Xie, Yawen Huang, Yuexiang Li, Huimin Huang, Hao Zheng, Xian Wu, Yefeng Zheng, Linlin Shen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[484] arXiv:2512.20976 [pdf, ps, other]: Title: XGrid-Mapping: Explicit Implicit Hybrid Grid Submaps for Efficient Incremental Neural LiDAR Mapping

Authors: Zeqing Song, Zhongmiao Yan, Junyuan Deng, Songpengcheng Xia, Xiang Mu, Jingyi Xu, Qi Wu, Ling Pei

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[485] arXiv:2512.20975 [pdf, ps, other]: Title: SPOT!: Map-Guided LLM Agent for Unsupervised Multi-CCTV Dynamic Object Tracking

Authors: Yujin Noh, Inho Jake Park, Chigon Hwang

Comments: 33 pages, 27figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[486] arXiv:2512.20937 [pdf, ps, other]: Title: Beyond Artifacts: Real-Centric Envelope Modeling for Reliable AI-Generated Image Detection

Authors: Ruiqi Liu, Yi Han, Zhengbo Zhang, Liwei Yao, Zhiyuan Yan, Jialiang Shen, ZhiJin Chen, Boyi Sun, Lubin Weng, Jing Dong, Yan Wang, Shu Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[487] arXiv:2512.20936 [pdf, ps, other]: Title: Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation

Authors: Hongxing Fan, Shuyu Zhao, Jiayang Ao, Lu Sheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[488] arXiv:2512.20934 [pdf, ps, other]: Title: Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning

Authors: Shengguang Wu, Xiaohan Wang, Yuhui Zhang, Hao Zhu, Serena Yeung-Levy

Comments: Project Website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multiagent Systems (cs.MA)
[489] arXiv:2512.20927 [pdf, ps, other]: Title: Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting

Authors: Yoonwoo Jeong, Cheng Sun, Frank Wang, Minsu Cho, Jaesung Choe

Comments: Will be updated

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[490] arXiv:2512.20921 [pdf, ps, other]: Title: Self-supervised Multiplex Consensus Mamba for General Image Fusion

Authors: Yingying Wang, Rongjin Zhuang, Hui Zheng, Xuanhua He, Ke Cao, Xiaotong Tu, Xinghao Ding

Comments: Accepted by AAAI 2026, 9 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[491] arXiv:2512.20907 [pdf, ps, other]: Title: PanoGrounder: Bridging 2D and 3D with Panoramic Scene Representations for VLM-based 3D Visual Grounding

Authors: Seongmin Jung, Seongho Choi, Gunwoo Jeon, Minsu Cho, Jongwoo Lim

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[492] arXiv:2512.20901 [pdf, ps, other]: Title: Benchmarking and Enhancing VLM for Compressed Image Understanding

Authors: Zifu Zhang, Tongda Xu, Siqi Li, Shengxi Li, Yue Zhang, Mai Xu, Yan Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[493] arXiv:2512.20898 [pdf, ps, other]: Title: DGSAN: Dual-Graph Spatiotemporal Attention Network for Pulmonary Nodule Malignancy Prediction

Authors: Xiao Yu, Zhaojie Fang, Guanyu Zhou, Yin Shen, Huoling Luo, Ye Li, Ahmed Elazab, Xiang Wan, Ruiquan Ge, Changmiao Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[494] arXiv:2512.20892 [pdf, ps, other]: Title: Beyond Weight Adaptation: Feature-Space Domain Injection for Cross-Modal Ship Re-Identification

Authors: Tingfeng Xian, Wenlve Zhou, Zhiheng Zhou, Zhelin Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[495] arXiv:2512.20871 [pdf, ps, other]: Title: NeRV360: Neural Representation for 360-Degree Videos with a Viewport Decoder

Authors: Daichi Arai, Kyohei Unno, Yasuko Sugito, Yuichi Kusakabe

Comments: 2026 IIEEJ International Conference on Image Electronics and Visual Computing (IEVC)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[496] arXiv:2512.20866 [pdf, ps, other]: Title: Lightweight framework for underground pipeline recognition and spatial localization based on multi-view 2D GPR images

Authors: Haotian Lv, Chao Li, Jiangbo Dai, Yuhui Zhang, Zepeng Fan, Yiqiu Tan, Dawei Wang, Binglei Xie

Journal-ref: IEEE Transactions on Geoscience and Remote Sensing, 2025, 63, 5110115

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[497] arXiv:2512.20858 [pdf, ps, other]: Title: ALIVE: An Avatar-Lecture Interactive Video Engine with Content-Aware Retrieval for Real-Time Interaction

Authors: Md Zabirul Islam, Md Motaleb Hossen Manik, Ge Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[498] arXiv:2512.20839 [pdf, ps, other]: Title: Input-Adaptive Visual Preprocessing for Efficient Fast Vision-Language Model Inference

Authors: Putu Indah Githa Cahyani, Komang David Dananjaya Suartana, Novanto Yudistira

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[499] arXiv:2512.20833 [pdf, ps, other]: Title: CHAMMI-75: pre-training multi-channel models with heterogeneous microscopy images

Authors: Vidit Agrawal (1,2), John Peters (1,2), Tyler N. Thompson (1,2), Mohammad Vali Sanian (3,4), Chau Pham (5), Nikita Moshkov (6), Arshad Kazi (1,2), Aditya Pillai (1,2), Jack Freeman (1), Byunguk Kang (7,8), Samouil L. Farhi (8), Ernest Fraenkel (7), Ron Stewart (1), Lassi Paavolainen (3,4), Bryan A. Plummer (5), Juan C. Caicedo (1,2) ((1) Morgridge Institute for Research, Madison, WI, USA, (2) University of Wisconsin-Madison, Madison, WI, USA, (3) Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland, (4) University of Helsinki, Helsinki, Finland, (5) Boston University, Boston, MA, USA, (6) Institute of Computational Biology, Helmholtz Munich, Neuherberg, Germany, (7) Massachusetts Institute of Technology, Cambridge, MA, USA, (8) Broad Institute of MIT and Harvard, Cambridge, MA, USA)

Comments: 47 Pages, 23 Figures, 26 Tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[500] arXiv:2512.20815 [pdf, ps, other]: Title: Learning to Sense for Driving: Joint Optics-Sensor-Model Co-Design for Semantic Segmentation

Authors: Reeshad Khan, John Gauch

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[501] arXiv:2512.20783 [pdf, ps, other]: Title: NULLBUS: Multimodal Mixed-Supervision for Breast Ultrasound Segmentation via Nullable Global-Local Prompts

Authors: Raja Mallina, Bryar Shareef

Comments: 5 pages, 2 figures, and 4 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[502] arXiv:2512.20770 [pdf, ps, other]: Title: OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective

Authors: Markus Gross, Sai B. Matha, Aya Fahmy, Rui Song, Daniel Cremers, Henri Meess

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[503] arXiv:2512.20746 [pdf, ps, other]: Title: TrashDet: Iterative Neural Architecture Search for Efficient Waste Detection

Authors: Tony Tran, Bin Hu

Comments: 10 pages. The paper has been accepted by the WACV 2026 workshop

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[504] arXiv:2512.20735 [pdf, ps, other]: Title: VL4Gaze: Unleashing Vision-Language Models for Gaze Following

Authors: Shijing Wang, Chaoqun Cui, Yaping Huang, Hyung Jin Chang, Yihua Cheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[505] arXiv:2512.21315 (cross-list from cs.LG) [pdf, ps, other]: Title: Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks

Authors: Roy Turgeman, Tom Tirer

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[506] arXiv:2512.21241 (cross-list from cs.LG) [pdf, ps, other]: Title: Improving the Convergence Rate of Ray Search Optimization for Query-Efficient Hard-Label Attacks

Authors: Xinjie Xu, Shuyu Cheng, Dongwei Xu, Qi Xuan, Chen Ma

Comments: Published at AAAI 2026 (Oral). This version corresponds to the conference proceedings; v2 will include the appendix

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[507] arXiv:2512.21220 (cross-list from cs.AI) [pdf, ps, other]: Title: RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

Authors: Le Wang, Zonghao Ying, Xiao Yang, Quanchen Zou, Zhenfei Yin, Tianlin Li, Jian Yang, Yaodong Yang, Aishan Liu, Xianglong Liu

Comments: 11 pages, 6 figures

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[508] arXiv:2512.21201 (cross-list from cs.RO) [pdf, ps, other]: Title: Schrödinger's Navigator: Imagining an Ensemble of Futures for Zero-Shot Object Navigation

Authors: Yu He, Da Huang, Zhenyang Liu, Zixiao Gu, Qiang Sun, Guangnan Ye, Yanwei Fu

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[509] arXiv:2512.21180 (cross-list from physics.med-ph) [pdf, ps, other]: Title: Equivariant Multiscale Learned Invertible Reconstruction for Cone Beam CT: From Simulated to Real Data

Authors: Nikita Moriakov, Efstratios Gavves, Jonathan H. Mason, Carmen Seller-Oria, Jonas Teuwen, Jan-Jakob Sonke

Comments: 29 pages. arXiv admin note: substantial text overlap with arXiv:2401.11256

Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)
[510] arXiv:2512.21118 (cross-list from cs.LG) [pdf, ps, other]: Title: STLDM: Spatio-Temporal Latent Diffusion Model for Precipitation Nowcasting

Authors: Shi Quan Foo, Chi-Ho Wong, Zhihan Gao, Dit-Yan Yeung, Ka-Hing Wong, Wai-Kin Wong

Comments: Accepted by TMLR. Camera-ready submission

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[511] arXiv:2512.21099 (cross-list from cs.GR) [pdf, ps, other]: Title: TexAvatars : Hybrid Texel-3D Representations for Stable Rigging of Photorealistic Gaussian Head Avatars

Authors: Jaeseong Lee, Junyeong Ahn, Taewoong Kang, Jaegul Choo

Comments: 3DV 2026, Project page with videos: this https URL

Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[512] arXiv:2512.21065 (cross-list from cs.RO) [pdf, ps, other]: Title: Language-Guided Grasp Detection with Coarse-to-Fine Learning for Robotic Manipulation

Authors: Zebin Jiang, Tianle Jin, Xiangtong Yao, Alois Knoll, Hu Cao

Comments: Submitted to IEEE Journal

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[513] arXiv:2512.20963 (cross-list from cs.LG) [pdf, ps, other]: Title: Generalization of Diffusion Models Arises with a Balanced Representation Space

Authors: Zekai Zhang, Xiao Li, Xiang Li, Lianghe Shi, Meng Wu, Molei Tao, Qing Qu

Comments: 40 pages, 19 figures. The first two authors contributed equally

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[514] arXiv:2512.20674 (cross-list from cs.LG) [pdf, ps, other]: Title: HyDRA: Hierarchical and Dynamic Rank Adaptation for Mobile Vision Language Model

Authors: Yuanhao Xi, Xiaohuan Bing, Ramin Yahyapour

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[515] arXiv:2512.20655 (cross-list from cs.LG) [pdf, ps, other]: Title: MaskOpt: A Large-Scale Mask Optimization Dataset to Advance AI in Integrated Circuit Manufacturing

Authors: Yuting Hu, Lei Zhuang, Hua Xiang, Jinjun Xiong, Gi-Joon Nam

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[516] arXiv:2512.20642 (cross-list from physics.flu-dyn) [pdf, ps, other]: Title: Flow Gym

Authors: Francesco Banelli, Antonio Terpin, Alan Bonomi, Raffaello D'Andrea

Comments: Code: this https URL

Subjects: Fluid Dynamics (physics.flu-dyn); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE); Computational Physics (physics.comp-ph)
[517] arXiv:2512.20626 (cross-list from cs.AI) [pdf, ps, other]: Title: MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation

Authors: Chi-Hsiang Hsiao, Yi-Cheng Wang, Tzung-Sheng Lin, Yi-Ren Yeh, Chu-Song Chen

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Wed, 24 Dec 2025

[518] arXiv:2512.20619 [pdf, ps, other]: Title: SemanticGen: Video Generation in Semantic Space

Authors: Jianhong Bai, Xiaoshi Wu, Xintao Wang, Xiao Fu, Yuanxing Zhang, Qinghe Wang, Xiaoyu Shi, Menghan Xia, Zuozhu Liu, Haoji Hu, Pengfei Wan, Kun Gai

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[519] arXiv:2512.20617 [pdf, ps, other]: Title: SpatialTree: How Spatial Abilities Branch Out in MLLMs

Authors: Yuxi Xiao, Longfei Li, Shen Yan, Xinhang Liu, Sida Peng, Yunchao Wei, Xiaowei Zhou, Bingyi Kang

Comments: webpage: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[520] arXiv:2512.20615 [pdf, ps, other]: Title: Active Intelligence in Video Avatars via Closed-loop World Modeling

Authors: Xuanhua He, Tianyu Yang, Ke Cao, Ruiqi Wu, Cheng Meng, Yong Zhang, Zhuoliang Kang, Xiaoming Wei, Qifeng Chen

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[521] arXiv:2512.20610 [pdf, ps, other]: Title: FedPOD: the deployable units of training for federated learning

Authors: Daewoon Kim, Si Young Yie, Jae Sung Lee

Comments: 12 pages, 12 figures, MICCAI

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[522] arXiv:2512.20606 [pdf, ps, other]: Title: Repurposing Video Diffusion Transformers for Robust Point Tracking

Authors: Soowon Son, Honggyu An, Chaehyun Kim, Hyunah Ko, Jisu Nam, Dahyun Chung, Siyoon Jin, Jung Yi, Jaewon Min, Junhwa Hur, Seungryong Kim

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[523] arXiv:2512.20563 [pdf, ps, other]: Title: LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

Authors: Long Nguyen, Micha Fauth, Bernhard Jaeger, Daniel Dauner, Maximilian Igl, Andreas Geiger, Kashyap Chitta

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[524] arXiv:2512.20561 [pdf, ps, other]: Title: FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

Authors: Kaitong Cai, Jusheng Zhang, Jing Yang, Yijia Fan, Pengtao Xie, Jian Wang, Keze Wang

Comments: Under submission

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[525] arXiv:2512.20557 [pdf, ps, other]: Title: Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Authors: Shengchao Zhou, Yuxin Chen, Yuying Ge, Wei Huang, Jiehong Lin, Ying Shan, Xiaojuan Qi

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[526] arXiv:2512.20556 [pdf, ps, other]: Title: Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios

Authors: Mingwei Tang, Jiahao Nie, Guang Yang, Ziqing Cui, Jie Li

Comments: Accepted to WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[527] arXiv:2512.20538 [pdf, ps, other]: Title: AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment

Authors: Anna Šárová Mikeštíková, Médéric Fourmy, Martin Cífka, Josef Sivic, Vladimir Petrik

Comments: 18 pages, 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[528] arXiv:2512.20531 [pdf, ps, other]: Title: SirenPose: Dynamic Scene Reconstruction via Geometric Supervision

Authors: Kaitong Cai, Jensen Zhang, Jing Yang, Keze Wang

Comments: Under submission

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[529] arXiv:2512.20501 [pdf, ps, other]: Title: Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition

Authors: Gorjan Radevski

Comments: Ph.D. manuscript; Supervisors/Mentors: Marie-Francine Moens and Tinne Tuytelaars

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[530] arXiv:2512.20487 [pdf, ps, other]: Title: Multi-temporal Adaptive Red-Green-Blue and Long-Wave Infrared Fusion for You Only Look Once-Based Landmine Detection from Unmanned Aerial Systems

Authors: James E. Gallagher, Edward J. Oughton, Jana Kosecka

Comments: 21 pages with 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[531] arXiv:2512.20479 [pdf, ps, other]: Title: UTDesign: A Unified Framework for Stylized Text Editing and Generation in Graphic Design Images

Authors: Yiming Zhao, Yuanpeng Gao, Yuxuan Luo, Jiwei Duan, Shisong Lin, Longfei Xiong, Zhouhui Lian

Comments: 22 pages, 25 figures, SIGGRAPH Asia 2025, Conference Paper

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[532] arXiv:2512.20451 [pdf, ps, other]: Title: Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding

Authors: Anh Dao, Manh Tran, Yufei Zhang, Xiaoming Liu, Zijun Cui

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[533] arXiv:2512.20432 [pdf, ps, other]: Title: High Dimensional Data Decomposition for Anomaly Detection of Textured Images

Authors: Ji Song, Xing Wang, Jianguo Wu, Xiaowei Yue

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[534] arXiv:2512.20431 [pdf, ps, other]: Title: Skin Lesion Classification Using a Soft Voting Ensemble of Convolutional Neural Networks

Authors: Abdullah Al Shafi, Abdul Muntakim, Pintu Chandra Shill, Rowzatul Zannat, Abdullah Al-Amin

Comments: Authors' version of the paper published in proceedings of ECCE, DOI: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[535] arXiv:2512.20417 [pdf, ps, other]: Title: Chain-of-Anomaly Thoughts with Large Vision-Language Models

Authors: Pedro Domingos, João Pereira, Vasco Lopes, João Neves, David Semedo

Comments: 2 pages, 3 figures, 1 table. Accepted for RECPAD 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
[536] arXiv:2512.20409 [pdf, ps, other]: Title: DETACH : Decomposed Spatio-Temporal Alignment for Exocentric Video and Ambient Sensors with Staged Learning

Authors: Junho Yoon, Jaemo Jung, Hyunju Kim, Dongman Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[537] arXiv:2512.20377 [pdf, ps, other]: Title: SmartSplat: Feature-Smart Gaussians for Scalable Compression of Ultra-High-Resolution Images

Authors: Linfei Li, Lin Zhang, Zhong Wang, Ying Shen

Comments: Accepted by AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[538] arXiv:2512.20376 [pdf, ps, other]: Title: Linking Faces and Voices Across Languages: Insights from the FAME 2026 Challenge

Authors: Marta Moscati, Ahmed Abdullah, Muhammad Saad Saeed, Shah Nawaz, Rohan Kumar Das, Muhammad Zaigham Zaheer, Junaid Mir, Muhammad Haroon Yousaf, Khalid Mahmood Malik, Markus Schedl

Comments: Accepted at ICASSP 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[539] arXiv:2512.20362 [pdf, ps, other]: Title: CRAFT: Continuous Reasoning and Agentic Feedback Tuning for Multimodal Text-to-Image Generation

Authors: V. Kovalev, A. Kuvshinov, A. Buzovkin, D. Pokidov, D. Timonin

Comments: 37 pages, 42 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[540] arXiv:2512.20340 [pdf, ps, other]: Title: The devil is in the details: Enhancing Video Virtual Try-On via Keyframe-Driven Details Injection

Authors: Qingdong He, Xueqin Chen, Yanjie Pan, Peng Tang, Pengcheng Xu, Zhenye Gan, Chengjie Wang, Xiaobin Hu, Jiangning Zhang, Yabiao Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[541] arXiv:2512.20296 [pdf, ps, other]: Title: TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation

Authors: Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Joon Son Chung, Shinji Watanabe

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[542] arXiv:2512.20288 [pdf, ps, other]: Title: UbiQVision: Quantifying Uncertainty in XAI for Image Recognition

Authors: Akshat Dubey, Aleksandar Anžel, Bahar İlgen, Georges Hattab

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[543] arXiv:2512.20260 [pdf, ps, other]: Title: ${D}^{3}${ETOR}: ${D}$ebate-Enhanced Pseudo Labeling and Frequency-Aware Progressive ${D}$ebiasing for Weakly-Supervised Camouflaged Object ${D}$etection with Scribble Annotations

Authors: Jiawei Ge, Jiuxin Cao, Xinyi Li, Xuelin Zhu, Chang Liu, Bo Liu, Chen Feng, Ioannis Patras

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[544] arXiv:2512.20257 [pdf, ps, other]: Title: LADLE-MM: Limited Annotation based Detector with Learned Ensembles for Multimodal Misinformation

Authors: Daniele Cardullo, Simone Teglia, Irene Amerini

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[545] arXiv:2512.20255 [pdf, ps, other]: Title: BiCoR-Seg: Bidirectional Co-Refinement Framework for High-Resolution Remote Sensing Image Segmentation

Authors: Jinghao Shi, Jianing Song

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[546] arXiv:2512.20251 [pdf, ps, other]: Title: Degradation-Aware Metric Prompting for Hyperspectral Image Restoration

Authors: Binfeng Wang, Di Wang, Haonan Guo, Ying Fu, Jing Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[547] arXiv:2512.20236 [pdf, ps, other]: Title: IndicDLP: A Foundational Dataset for Multi-Lingual and Multi-Domain Document Layout Parsing

Authors: Oikantik Nath, Sahithi Kukkala, Mitesh Khapra, Ravi Kiran Sarvadevabhatla

Comments: Accepted in ICDAR 2025 (Oral Presentation) - Best Student Paper Runner-Up Award

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[548] arXiv:2512.20217 [pdf, ps, other]: Title: LiteFusion: Taming 3D Object Detectors from Vision-Based to Multi-Modal with Minimal Adaptation

Authors: Xiangxuan Ren, Zhongdao Wang, Pin Tang, Guoqing Wang, Jilai Zheng, Chao Ma

Comments: 13 pages, 9 figures, 8 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[549] arXiv:2512.20213 [pdf, ps, other]: Title: JDPNet: A Network Based on Joint Degradation Processing for Underwater Image Enhancement

Authors: Tao Ye, Hongbin Ren, Chongbing Zhang, Haoran Chen, Xiaosong Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[550] arXiv:2512.20194 [pdf, ps, other]: Title: Generative Latent Coding for Ultra-Low Bitrate Image Compression

Authors: Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, Yan Lu

Comments: Accepted at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[551] arXiv:2512.20174 [pdf, ps, other]: Title: Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark

Authors: Hao Guo, Xugong Qin, Jun Jie Ou Yang, Peng Zhang, Gangyan Zeng, Yubo Li, Hailun Lin

Comments: CVPR 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Information Retrieval (cs.IR)
[552] arXiv:2512.20157 [pdf, ps, other]: Title: AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model

Authors: Sofian Chaybouti, Sanath Narayan, Yasser Dahou, Phúc H. Lê Khac, Ankit Singh, Ngoc Dung Huynh, Wamiq Reyaz Para, Hilde Kuehne, Hakim Hacid

Comments: 17 pages, 8 figures, 11 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[553] arXiv:2512.20153 [pdf, ps, other]: Title: CoDi -- an exemplar-conditioned diffusion model for low-shot counting

Authors: Grega Šuštar, Jer Pelhan, Alan Lukežič, Matej Kristan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[554] arXiv:2512.20148 [pdf, ps, other]: Title: Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS)

Authors: Robert van de Ven, Trim Bresilla, Bram Nelissen, Ard Nieuwenhuizen, Eldert J. van Henten, Gert Kootstra

Comments: 33 pages, excluding appendices. 17 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[555] arXiv:2512.20128 [pdf, ps, other]: Title: milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion

Authors: Niraj Prakash Kini, Shiau-Rung Tsai, Guan-Hsun Lin, Wen-Hsiao Peng, Ching-Wen Ma, Jenq-Neng Hwang

Comments: Accepted at WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[556] arXiv:2512.20120 [pdf, ps, other]: Title: HEART-VIT: Hessian-Guided Efficient Dynamic Attention and Token Pruning in Vision Transformer

Authors: Mohammad Helal Uddin, Liam Seymour, Sabur Baidya

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[557] arXiv:2512.20117 [pdf, ps, other]: Title: DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation

Authors: Jingqi Tian, Yiheng Du, Haoji Zhang, Yuji Wang, Isaac Ning Lee, Xulong Bai, Tianrui Zhu, Jingxuan Niu, Yansong Tang

Comments: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[558] arXiv:2512.20113 [src]: Title: Multi Modal Attention Networks with Uncertainty Quantification for Automated Concrete Bridge Deck Delamination Detection

Authors: Alireza Moayedikia, Sattar Dorafshan

Comments: the authors are going to substantially edit the paper

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[559] arXiv:2512.20107 [pdf, ps, other]: Title: UMAMI: Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis

Authors: Thanh-Tung Le, Tuan Pham, Tung Nguyen, Deying Kong, Xiaohui Xie, Stephan Mandt

Comments: Accepted to NeurIPS 2025. The first two authors contributed equally

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[560] arXiv:2512.20105 [pdf, ps, other]: Title: LiDARDraft: Generating LiDAR Point Cloud from Versatile Inputs

Authors: Haiyun Wei, Fan Lu, Yunwei Zhu, Zehan Zheng, Weiyi Xue, Lin Shao, Xudong Zhang, Ya Wu, Rong Fu, Guang Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[561] arXiv:2512.20104 [pdf, ps, other]: Title: Effect of Activation Function and Model Optimizer on the Performance of Human Activity Recognition System Using Various Deep Learning Models

Authors: Subrata Kumer Paula, Dewan Nafiul Islam Noora, Rakhi Rani Paula, Md. Ekramul Hamidb, Fahmid Al Faridc, Hezerul Abdul Karimd, Md. Maruf Al Hossain Princee, Abu Saleh Musa Miahb

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[562] arXiv:2512.20088 [pdf, ps, other]: Title: Item Region-based Style Classification Network (IRSN): A Fashion Style Classifier Based on Domain Knowledge of Fashion Experts

Authors: Jinyoung Choi, Youngchae Kwon, Injung Kim

Comments: This is a pre-print of an article published in Applied Intelligence. The final authenticated version is available online at: this https URL

Journal-ref: Applied Intelligence, Vol. 54, pp. 6197-6209 (2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[563] arXiv:2512.20070 [pdf, ps, other]: Title: Progressive Learned Image Compression for Machine Perception

Authors: Jungwoo Kim, Jun-Hyuk Kim, Jong-Seok Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[564] arXiv:2512.20042 [pdf, ps, other]: Title: Beyond Vision: Contextually Enriched Image Captioning with Multi-Modal Retrieva

Authors: Nguyen Lam Phu Quy, Pham Phu Hoa, Tran Chi Nguyen, Dao Sy Duy Minh, Nguyen Hoang Minh Ngoc, Huynh Trung Kiet

Comments: 7 pages, 5 figures. System description for the EVENTA Grand Challenge (Track 1) at ACM MM'25

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[565] arXiv:2512.20033 [pdf, ps, other]: Title: FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs

Authors: Andreas Zinonos, Michał Stypułkowski, Antoni Bigata, Stavros Petridis, Maja Pantic, Nikita Drobyshev

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[566] arXiv:2512.20032 [pdf, ps, other]: Title: VALLR-Pin: Uncertainty-Factorized Visual Speech Recognition for Mandarin with Pinyin Guidance

Authors: Chang Sun, Dongliang Xie, Wanpeng Xie, Bo Qin, Hong Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[567] arXiv:2512.20029 [pdf, ps, other]: Title: $\text{H}^2$em: Learning Hierarchical Hyperbolic Embeddings for Compositional Zero-Shot Learning

Authors: Lin Li, Jiahui Li, Jiaming Lei, Jun Xiao, Feifei Shao, Long Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[568] arXiv:2512.20026 [pdf, ps, other]: Title: MAPI-GNN: Multi-Activation Plane Interaction Graph Neural Network for Multimodal Medical Diagnosis

Authors: Ziwei Qin, Xuhui Song, Deqing Huang, Na Qin, Jun Li

Comments: Accepted by Proceedings of the AAAI Conference on Artificial Intelligence 40 (AAAI-26)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[569] arXiv:2512.20025 [pdf, ps, other]: Title: A Contextual Analysis of Driver-Facing and Dual-View Video Inputs for Distraction Detection in Naturalistic Driving Environments

Authors: Anthony Dontoh, Stephanie Ivey, Armstrong Aboah

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[570] arXiv:2512.20013 [pdf, ps, other]: Title: SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images

Authors: Zepeng Xin, Kaiyu Li, Luodi Chen, Wanchen Li, Yuchen Xiao, Hui Qiao, Weizhan Zhang, Deyu Meng, Xiangyong Cao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[571] arXiv:2512.20011 [pdf, ps, other]: Title: PaveSync: A Unified and Comprehensive Dataset for Pavement Distress Analysis and Classification

Authors: Blessing Agyei Kyem, Joshua Kofi Asamoah, Anthony Dontoh, Andrews Danyo, Eugene Denteh, Armstrong Aboah

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[572] arXiv:2512.20000 [pdf, ps, other]: Title: Few-Shot-Based Modular Image-to-Video Adapter for Diffusion Models

Authors: Zhenhao Li, Shaohan Yi, Zheng Liu, Leonartinus Gao, Minh Ngoc Le, Ambrose Ling, Zhuoran Wang, Md Amirul Islam, Zhixiang Chi, Yuanhao Yu

Comments: GitHub page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[573] arXiv:2512.19990 [pdf, ps, other]: Title: A Dual-Branch Local-Global Framework for Cross-Resolution Land Cover Mapping

Authors: Peng Gao, Ke Li, Di Wang, Yongshan Zhu, Yiming Zhang, Xuemei Luo, Yifeng Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[574] arXiv:2512.19989 [pdf, ps, other]: Title: A Novel CNN Gradient Boosting Ensemble for Guava Disease Detection

Authors: Tamim Ahasan Rijon, Yeasin Arafath

Comments: Accepted at IEEE ICCIT 2025. This is the author accepted manuscript

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[575] arXiv:2512.19982 [pdf, ps, other]: Title: WSD-MIL: Window Scale Decay Multiple Instance Learning for Whole Slide Image Classification

Authors: Le Feng, Li Xiao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[576] arXiv:2512.19954 [pdf, ps, other]: Title: HistoWAS: A Pathomics Framework for Large-Scale Feature-Wide Association Studies of Tissue Topology and Patient Outcomes

Authors: Yuechen Yang, Junlin Guo, Yanfan Zhu, Jialin Yue, Junchao Zhu, Yu Wang, Shilin Zhao, Haichun Yang, Xingyi Guo, Jovan Tanevski, Laura Barisoni, Avi Z. Rosenberg, Yuankai Huo

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[577] arXiv:2512.19949 [pdf, ps, other]: Title: How Much 3D Do Video Foundation Models Encode?

Authors: Zixuan Huang, Xiang Li, Zhaoyang Lv, James M. Rehg

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[578] arXiv:2512.19943 [pdf, ps, other]: Title: SE360: Semantic Edit in 360$^\circ$ Panoramas via Hierarchical Data Construction

Authors: Haoyi Zhong, Fang-Lue Zhang, Andrew Chalmers, Taehyun Rhee

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[579] arXiv:2512.19941 [pdf, ps, other]: Title: Block-Recurrent Dynamics in Vision Transformers

Authors: Mozes Jacobs, Thomas Fel, Richard Hakim, Alessandra Brondetta, Demba Ba, T. Andy Keller

Comments: 25 pages, 15 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[580] arXiv:2512.19934 [pdf, ps, other]: Title: Vehicle-centric Perception via Multimodal Structured Pre-training

Authors: Wentao Wu, Xiao Wang, Chenglong Li, Jin Tang, Bin Luo

Comments: Journal extension of VehicleMAE (AAAI 2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[581] arXiv:2512.19928 [pdf, ps, other]: Title: Unified Brain Surface and Volume Registration

Authors: S. Mazdak Abulnaga, Andrew Hoopes, Malte Hoffmann, Robin Magnet, Maks Ovsjanikov, Lilla Zöllei, John Guttag, Bruce Fischl, Adrian Dalca

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[582] arXiv:2512.19918 [pdf, ps, other]: Title: Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs

Authors: Houston H. Zhang, Tao Zhang, Baoze Lin, Yuanqi Xue, Yincheng Zhu, Huan Liu, Li Gu, Linfeng Ye, Ziqiang Wang, Xinxin Zuo, Yang Wang, Yuanhao Yu, Zhixiang Chi

Comments: Code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[583] arXiv:2512.19871 [pdf, ps, other]: Title: HyGE-Occ: Hybrid View-Transformation with 3D Gaussian and Edge Priors for 3D Panoptic Occupancy Prediction

Authors: Jong Wook Kim, Wonseok Roh, Ha Dam Baek, Pilhyeon Lee, Jonghyun Choi, Sangpil Kim

Comments: 11 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[584] arXiv:2512.19850 [pdf, ps, other]: Title: RANSAC Scoring Functions: Analysis and Reality Check

Authors: A. Shekhovtsov

Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
[585] arXiv:2512.19823 [pdf, ps, other]: Title: Learning to Refocus with Video Diffusion Models

Authors: SaiKiran Tedla, Zhoutong Zhang, Xuaner Zhang, Shumian Xin

Comments: Code and data are available at this https URL . SIGGRAPH Asia 2025, Dec. 2025

Journal-ref: Proceedings of the SIGGRAPH Asia 2025, pp. 1-11, 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[586] arXiv:2512.19817 [pdf, ps, other]: Title: Generating the Past, Present and Future from a Motion-Blurred Image

Authors: SaiKiran Tedla, Kelly Zhu, Trevor Canham, Felix Taubner, Michael S. Brown, Kiriakos N. Kutulakos, David B. Lindell

Comments: Code and data are available at this https URL

Journal-ref: ACM Trans. Graph. (SIGGRAPH Asia 2025), vol. 44, no. 6, pp. 1-15, Dec. 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[587] arXiv:2512.19711 [pdf, ps, other]: Title: PHANTOM: PHysical ANamorphic Threats Obstructing Connected Vehicle Mobility

Authors: Md Nahid Hasan Shuvo, Moinul Hossain

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[588] arXiv:2512.20618 (cross-list from cs.AI) [pdf, ps, other]: Title: LongVideoAgent: Multi-Agent Reasoning with Long Videos

Authors: Runtao Liu, Ziyi Liu, Jiaqi Tang, Yue Ma, Renjie Pi, Jipeng Zhang, Qifeng Chen

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[589] arXiv:2512.20595 (cross-list from cs.CL) [pdf, ps, other]: Title: Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs

Authors: Dhruv Anand, Ehsan Shareghi

Comments: 27 pages, 5 figures, 9 tables. Cube available at this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[590] arXiv:2512.20464 (cross-list from physics.optics) [pdf, ps, other]: Title: Snapshot 3D image projection using a diffractive decoder

Authors: Cagatay Isil, Alexander Chen, Yuhang Li, F. Onuralp Ardic, Shiqi Chen, Che-Yung Shen, Aydogan Ozcan

Comments: 22 Pages, 8 Figures

Subjects: Optics (physics.optics); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Applied Physics (physics.app-ph)
[591] arXiv:2512.20436 (cross-list from eess.IV) [pdf, ps, other]: Title: Dual-Encoder Transformer-Based Multimodal Learning for Ischemic Stroke Lesion Segmentation Using Diffusion MRI

Authors: Muhammad Usman, Azka Rehman, Muhammad Mutti Ur Rehman, Abd Ur Rehman, Muhammad Umar Farooq

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[592] arXiv:2512.20420 (cross-list from cs.LG) [pdf, ps, other]: Title: Simplifying Multi-Task Architectures Through Task-Specific Normalization

Authors: Mihai Suteu, Ovidiu Serban

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[593] arXiv:2512.20387 (cross-list from cs.AI) [pdf, ps, other]: Title: Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems

Authors: YuChe Hsu, AnJui Wang, TsaiChing Ni, YuanFu Yang

Comments: 10 pages, 9 figures

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[594] arXiv:2512.20374 (cross-list from eess.IV) [pdf, ps, other]: Title: CLIP Based Region-Aware Feature Fusion for Automated BBPS Scoring in Colonoscopy Images

Authors: Yujia Fu, Zhiyu Dong, Tianwen Qian, Chenye Zheng, Danian Ji, Linhai Zhuo

Comments: 12 pages, 9 figures, BMVC 2025 submission

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[595] arXiv:2512.20350 (cross-list from cs.LG) [pdf, ps, other]: Title: Field-Space Attention for Structure-Preserving Earth System Transformers

Authors: Maximilian Witte, Johannes Meuer, Étienne Plésiat, Christopher Kadow

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Mathematical Physics (math-ph)
[596] arXiv:2512.20299 (cross-list from cs.RO) [pdf, ps, other]: Title: KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System

Authors: Zhongyu Xia, Wenhao Chen, Yongtao Wang, Ming-Hsuan Yang

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[597] arXiv:2512.20249 (cross-list from cs.LG) [pdf, ps, other]: Title: Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion

Authors: Xuanyu Hu

Comments: 15 pages, 2 figures, 4 tables. Submitted to ICPR 2026

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[598] arXiv:2512.20233 (cross-list from cs.LG) [pdf, ps, other]: Title: How I Met Your Bias: Investigating Bias Amplification in Diffusion Models

Authors: Nathan Roos, Ekaterina Iakovleva, Ani Gjergji, Vito Paolo Pastore, Enzo Tartaglione

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[599] arXiv:2512.20145 (cross-list from cs.CL) [pdf, ps, other]: Title: Retrieval-augmented Prompt Learning for Pre-trained Foundation Models

Authors: Xiang Chen, Yixin Ou, Quan Feng, Lei Li, Piji Li, Haibo Ye, Sheng-Jun Huang, Shuofei Qiao, Shumin Deng, Huajun Chen, Ningyu Zhang

Comments: IEEE/ACM Transactions on Audio, Speech and Language Processing

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[600] arXiv:2512.20129 (cross-list from cs.HC) [pdf, ps, other]: Title: Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and Outputs

Authors: Cyrus Vachha, Yixiao Kang, Zach Dive, Ashwat Chidambaram, Anik Gupta, Eunice Jun, Bjoern Hartmann

Comments: CHI 2025, Project page: this https URL

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
[601] arXiv:2512.20056 (cross-list from cs.AI) [pdf, ps, other]: Title: Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach

Authors: Hao Li, Fabian Deuser, Wenping Yin, Steffen Knoblauch, Wufan Zhao, Filip Biljecki, Yong Xue, Wei Huang

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[602] arXiv:2512.19731 (cross-list from cs.LG) [pdf, ps, other]: Title: Exploring Deep-to-Shallow Transformable Neural Networks for Intelligent Embedded Systems

Authors: Xiangzhong Luo, Weichen Liu

Comments: Accepted by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[603] arXiv:2512.18099 (cross-list from eess.AS) [pdf, ps, other]: Title: SAM Audio: Segment Anything in Audio

Authors: Bowen Shi, Andros Tjandra, John Hoffman, Helin Wang, Yi-Chiao Wu, Luya Gao, Julius Richter, Matt Le, Apoorv Vyas, Sanyuan Chen, Christoph Feichtenhofer, Piotr Dollár, Wei-Ning Hsu, Ann Lee

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV)

[ total of 603 entries: 1-250 | 137-386 | 387-603 ]
[ showing 250 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2601, contact, help (Access key information)

> cs > cs.CV

Computer Vision and Pattern Recognition

Authors and titles for recent submissions, skipping first 386

Mon, 29 Dec 2025 (continued, showing last 54 of 96 entries)

Thu, 25 Dec 2025

Wed, 24 Dec 2025