Computer Vision and Pattern Recognition

Authors and titles for recent submissions

Mon, 8 Dec 2025
Fri, 5 Dec 2025
Thu, 4 Dec 2025
Wed, 3 Dec 2025
Tue, 2 Dec 2025

[ total of 778 entries: 1-778 ]
[ showing 778 entries per page: fewer | more ]

Mon, 8 Dec 2025

[1] arXiv:2512.05965 [pdf, ps, other]: Title: EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Authors: Hongyu Li, Manyuan Zhang, Dian Zheng, Ziyu Guo, Yimeng Jia, Kaituo Feng, Hao Yu, Yexin Liu, Yan Feng, Peng Pei, Xunliang Cai, Linjiang Huang, Hongsheng Li, Si Liu

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[2] arXiv:2512.05960 [pdf, ps, other]: Title: AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement

Authors: Munsif Ali, Najmul Hassan, Lucia Ventura, Davide Di Bari, Simonepietro Canese

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[3] arXiv:2512.05941 [pdf, ps, other]: Title: Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding

Authors: Zhiyuan Jiang, Shenghao Xie, Wenyi Li, Wenqiang Zu, Peihang Li, Jiahao Qiu, Siqi Pei, Lei Ma, Tiejun Huang, Mengdi Wang, Shilong Liu

Comments: Code is available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[4] arXiv:2512.05937 [pdf, ps, other]: Title: Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV Perception

Authors: Anne Sielemann, Valentin Barner, Stefan Wolf, Masoud Roschani, Jens Ziehn, Juergen Beyerer

Comments: 8 pages, 2 figures, 7 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[5] arXiv:2512.05936 [pdf, ps, other]: Title: Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition

Authors: Anne Sielemann, Lena Loercher, Max-Lion Schumacher, Stefan Wolf, Masoud Roschani, Jens Ziehn

Comments: 8 pages, 8 figures, 3 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[6] arXiv:2512.05928 [pdf, ps, other]: Title: A Comparative Study on Synthetic Facial Data Generation Techniques for Face Recognition

Authors: Pedro Vidal, Bernardo Biesseck, Luiz E. L. Coelho, Roger Granada, David Menotti

Comments: 18 pages, 17 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[7] arXiv:2512.05927 [pdf, ps, other]: Title: World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty

Authors: Zhiting Mei, Tenny Yin, Micah Baker, Ola Shorinwa, Anirudha Majumdar

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[8] arXiv:2512.05922 [pdf, ps, other]: Title: LPD: Learnable Prototypes with Diversity Regularization for Weakly Supervised Histopathology Segmentation

Authors: Khang Le, Anh Mai Vu, Thi Kim Trang Vo, Ha Thach, Ngoc Bui Lam Quang, Thanh-Huy Nguyen, Minh H. N. Le, Zhu Han, Chandra Mohan, Hien Van Nguyen

Comments: Note: Khang Le and Anh Mai Vu contributed equally

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[9] arXiv:2512.05920 [pdf, ps, other]: Title: NICE: Neural Implicit Craniofacial Model for Orthognathic Surgery Prediction

Authors: Jiawen Yang, Yihui Cao, Xuanyu Tian, Yuyao Zhang, Hongjiang Wei

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[10] arXiv:2512.05905 [pdf, ps, other]: Title: SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

Authors: Wenhao Yan, Sheng Ye, Zhuoyi Yang, Jiayan Teng, ZhenHui Dong, Kairui Wen, Xiaotao Gu, Yong-Jin Liu, Jie Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[11] arXiv:2512.05866 [pdf, ps, other]: Title: Underwater Image Reconstruction Using a Swin Transformer-Based Generator and PatchGAN Discriminator

Authors: Md. Mahbub Hasan Akash, Aria Tasnim Mridula, Sheekar Banerjee, Ishtiak Al Mamoon

Comments: This paper has been accepted for presentation at the IEEE 28th International Conference on Computer and Information Technology (ICCIT), December 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[12] arXiv:2512.05859 [pdf, ps, other]: Title: Edit-aware RAW Reconstruction

Authors: Abhijith Punnappurath, Luxi Zhao, Ke Zhao, Hue Nguyen, Radek Grzeszczuk, Michael S. Brown

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[13] arXiv:2512.05853 [pdf, ps, other]: Title: VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack

Authors: Shiji Zhao, Shukun Xiong, Yao Huang, Yan Jin, Zhenyu Wu, Jiyang Guan, Ranjie Duan, Jialing Tao, Hui Xue, Xingxing Wei

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[14] arXiv:2512.05830 [pdf, ps, other]: Title: Phase-OTDR Event Detection Using Image-Based Data Transformation and Deep Learning

Authors: Muhammet Cagri Yeke, Samil Sirin, Kivilcim Yuksel, Abdurrahman Gumus

Comments: 22 pages, 11 figures, 5 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[15] arXiv:2512.05814 [pdf, ps, other]: Title: UG-FedDA: Uncertainty-Guided Federated Domain Adaptation for Multi-Center Alzheimer's Disease Detection

Authors: Fubao Zhu, Zhanyuan Jia, Zhiguo Wang, Huan Huang, Danyang Sun, Chuang Han, Yanting Li, Jiaofen Nan, Chen Zhao, Weihua Zhou

Comments: The code is already available on GitHub: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[16] arXiv:2512.05809 [pdf, ps, other]: Title: Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling

Authors: Saurav Jha, M. Jehanzeb Mirza, Wei Lin, Shiqi Yang, Sarath Chandar

Comments: Extended abstract at World Modeling Workshop 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[17] arXiv:2512.05802 [pdf, ps, other]: Title: Bring Your Dreams to Life: Continual Text-to-Video Customization

Authors: Jiahua Dong, Xudong Wang, Wenqi Liang, Zongyan Han, Meng Cao, Duzhen Zhang, Hanbin Zhao, Zhi Han, Salman Khan, Fahad Shahbaz Khan

Comments: Accepted to AAAI2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[18] arXiv:2512.05783 [pdf, ps, other]: Title: Curvature-Regularized Variational Autoencoder for 3D Scene Reconstruction from Sparse Depth

Authors: Maryam Yousefi, Soodeh Bakhshandeh

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[19] arXiv:2512.05774 [pdf, ps, other]: Title: Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding

Authors: Ziyang Wang, Honglu Zhou, Shijie Wang, Junnan Li, Caiming Xiong, Silvio Savarese, Mohit Bansal, Michael S. Ryoo, Juan Carlos Niebles

Comments: Website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[20] arXiv:2512.05762 [pdf, ps, other]: Title: FNOPT: Resolution-Agnostic, Self-Supervised Cloth Simulation using Meta-Optimization with Fourier Neural Operators

Authors: Ruochen Chen, Thuy Tran, Shaifali Parashar

Comments: Accepted for WACV

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[21] arXiv:2512.05759 [pdf, ps, other]: Title: Label-Efficient Point Cloud Segmentation with Active Learning

Authors: Johannes Meyer, Jasper Hoffmann, Felix Schulz, Dominik Merkle, Daniel Buescher, Alexander Reiterer, Joschka Boedecker, Wolfram Burgard

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[22] arXiv:2512.05754 [pdf, ps, other]: Title: USV: Unified Sparsification for Accelerating Video Diffusion Models

Authors: Xinjian Wu, Hongmei Wang, Yuan Zhou, Qinglin Lu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[23] arXiv:2512.05746 [pdf, ps, other]: Title: HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models

Authors: Shizhuo Mao, Hongtao Zou, Qihu Xie, Song Chen, Yi Kang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[24] arXiv:2512.05740 [pdf, ps, other]: Title: Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision

Authors: Lennart Maack, Julia-Kristin Graß, Lisa-Marie Toscha, Nathaniel Melling, Alexander Schlaefer

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[25] arXiv:2512.05710 [pdf, ps, other]: Title: Manifold-Aware Point Cloud Completion via Geodesic-Attentive Hierarchical Feature Learning

Authors: Jianan Sun, Dongzhihan Wang, Mingyu Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[26] arXiv:2512.05698 [pdf, ps, other]: Title: OWL: Unsupervised 3D Object Detection by Occupancy Guided Warm-up and Large Model Priors Reasoning

Authors: Xusheng Guo, Wanfa Zhang, Shijia Zhao, Qiming Xia, Xiaolong Xie, Mingming Wang, Hai Wu, Chenglu Wen

Comments: The 40th Annual AAAI Conference on Artificial Intelligence

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[27] arXiv:2512.05683 [pdf, ps, other]: Title: Physics-Informed Graph Neural Network with Frequency-Aware Learning for Optical Aberration Correction

Authors: Yong En Kok, Bowen Deng, Alexander Bentley, Andrew J. Parkes, Michael G. Somekh, Amanda J. Wright, Michael P. Pound

Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
[28] arXiv:2512.05674 [pdf, ps, other]: Title: Hyperspectral Unmixing with 3D Convolutional Sparse Coding and Projected Simplex Volume Maximization

Authors: Gargi Panda, Soumitra Kundu, Saumik Bhattacharya, Aurobinda Routray

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[29] arXiv:2512.05672 [pdf, ps, other]: Title: InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem

Authors: Yeobin Hong, Suhyeon Lee, Hyungjin Chung, Jong Chul Ye

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[30] arXiv:2512.05669 [pdf, ps, other]: Title: Deep Learning-Based Real-Time Sequential Facial Expression Analysis Using Geometric Features

Authors: Talha Enes Koksal, Abdurrahman Gumus

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[31] arXiv:2512.05663 [pdf, ps, other]: Title: LeAD-M3D: Leveraging Asymmetric Distillation for Real-time Monocular 3D Detection

Authors: Johannes Meier, Jonathan Michel, Oussema Dhaouadi, Yung-Hsu Yang, Christoph Reich, Zuria Bauer, Stefan Roth, Marc Pollefeys, Jacques Kaiser, Daniel Cremers

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[32] arXiv:2512.05651 [pdf, ps, other]: Title: Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective

Authors: Nan Zhong, Mian Zou, Yiran Xu, Zhenxing Qian, Xinpeng Zhang, Baoyuan Wu, Kede Ma

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[33] arXiv:2512.05635 [pdf, ps, other]: Title: Experts-Guided Unbalanced Optimal Transport for ISP Learning from Unpaired and/or Paired Data

Authors: Georgy Perevozchikov, Nancy Mehta, Egor Ershov, Radu Timofte

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[34] arXiv:2512.05613 [pdf, ps, other]: Title: DistillFSS: Synthesizing Few-Shot Knowledge into a Lightweight Segmentation Model

Authors: Pasquale De Marinis, Pieter M. Blok, Uzay Kaymak, Rogier Brussee, Gennaro Vessio, Giovanna Castellano

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[35] arXiv:2512.05610 [pdf, ps, other]: Title: NormalView: sensor-agnostic tree species classification from backpack and aerial lidar data using geometric projections

Authors: Juho Korkeala, Jesse Muhojoki, Josef Taher, Klaara Salolahti, Matti Hyyppä, Antero Kukko, Juha Hyyppä

Comments: 19 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[36] arXiv:2512.05597 [pdf, ps, other]: Title: Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction

Authors: Ruihong Yin, Xuepeng Shi, Oleksandr Bailo, Marco Manfredi, Theo Gevers

Comments: 10 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[37] arXiv:2512.05593 [pdf, ps, other]: Title: Learning High-Fidelity Cloth Animation via Skinning-Free Image Transfer

Authors: Rong Wang, Wei Mao, Changsheng Lu, Hongdong Li

Comments: Accepted to 3DV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[38] arXiv:2512.05571 [pdf, ps, other]: Title: MedDIFT: Multi-Scale Diffusion-Based Correspondence in 3D Medical Imaging

Authors: Xingyu Zhang, Anna Reithmeir, Fryderyk Kögl, Rickmer Braren, Julia A. Schnabel, Daniel M. Lang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[39] arXiv:2512.05564 [pdf, ps, other]: Title: ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Authors: Zijun Wang, Panwen Hu, Jing Wang, Terry Jingchen Zhang, Yuhao Cheng, Long Chen, Yiqiang Yan, Zutao Jiang, Hanhui Li, Xiaodan Liang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[40] arXiv:2512.05557 [pdf, ps, other]: Title: 2K-Characters-10K-Stories: A Quality-Gated Stylized Narrative Dataset with Disentangled Control and Sequence Consistency

Authors: Xingxi Yin, Yicheng Li, Gong Yan, Chenglin Li, Jian Zhao, Cong Huang, Yue Deng, Yin Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[41] arXiv:2512.05546 [pdf, ps, other]: Title: Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models

Authors: Weijue Bu, Guan Yuan, Guixian Zhang

Comments: 6 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[42] arXiv:2512.05539 [pdf, ps, other]: Title: Ideal Observer for Segmentation of Dead Leaves Images

Authors: Swantje Mahncke, Malte Ott

Comments: 41 pages, 16 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Methodology (stat.ME)
[43] arXiv:2512.05529 [pdf, ps, other]: Title: See in Depth: Training-Free Surgical Scene Segmentation with Monocular Depth Priors

Authors: Kunyi Yang, Qingyu Wang, Cheng Yuan, Yutong Ban

Comments: The first two authors contributed equally

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[44] arXiv:2512.05524 [pdf, ps, other]: Title: VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation

Authors: Chinthani Sugandhika, Chen Li, Deepu Rajan, Basura Fernando

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[45] arXiv:2512.05515 [pdf, ps, other]: Title: DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis

Authors: Yuhua Wen, Qifei Li, Yingying Zhou, Yingming Gao, Zhengqi Wen, Jianhua Tao, Ya Li

Comments: Accepted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[46] arXiv:2512.05513 [pdf, ps, other]: Title: Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning

Authors: Chinthani Sugandhika, Chen Li, Deepu Rajan, Basura Fernando

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[47] arXiv:2512.05511 [pdf, ps, other]: Title: Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm

Authors: Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Yaokun Li, Xiujun Shu, Yuanhao Feng, Bo Wang, Yimian Dai, Xiangyu Yue

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[48] arXiv:2512.05494 [pdf, ps, other]: Title: Decoding with Structured Awareness: Integrating Directional, Frequency-Spatial, and Structural Attention for Medical Image Segmentation

Authors: Fan Zhang, Zhiwei Gu, Hua Wang

Comments: Accepted to AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[49] arXiv:2512.05492 [pdf, ps, other]: Title: WaterWave: Bridging Underwater Image Enhancement into Video Streams via Wavelet-based Temporal Consistency Field

Authors: Qi Zhu, Jingyi Zhang, Naishan Zheng, Wei Yu, Jinghao Zhang, Deyi Ji, Feng Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[50] arXiv:2512.05482 [pdf, ps, other]: Title: Concept-based Explainable Data Mining with VLM for 3D Detection

Authors: Mai Tsujimoto

Comments: 28 pages including appendix. Code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[51] arXiv:2512.05481 [pdf, ps, other]: Title: UniFS: Unified Multi-Contrast MRI Reconstruction via Frequency-Spatial Fusion

Authors: Jialin Li, Yiwei Ren, Kai Pan, Dong Wei, Pujin Cheng, Xian Wu, Xiaoying Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[52] arXiv:2512.05478 [pdf, ps, other]: Title: EmoStyle: Emotion-Driven Image Stylization

Authors: Jingyuan Yang, Zihuan Bai, Hui Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[53] arXiv:2512.05468 [pdf, ps, other]: Title: University Building Recognition Dataset in Thailand for the mission-oriented IoT sensor system

Authors: Takara Taniguchi, Yudai Ueda, Atsuya Muramatsu, Kohki Hashimoto, Ryo Yagi, Hideya Ochiai, Chaodit Aswakul

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[54] arXiv:2512.05446 [pdf, ps, other]: Title: TED-4DGS: Temporally Activated and Embedding-based Deformation for 4DGS Compression

Authors: Cheng-Yuan Ho, He-Bi Yang, Jui-Chiu Chiang, Yu-Lun Liu, Wen-Hsiao Peng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[55] arXiv:2512.05422 [pdf, ps, other]: Title: ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction

Authors: Jiangtong Tan, Lin Liu, Jie Huanng, Xiaopeng Zhang, Qi Tian, Feng Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[56] arXiv:2512.05418 [pdf, ps, other]: Title: Performance Evaluation of Deep Learning for Tree Branch Segmentation in Autonomous Forestry Systems

Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[57] arXiv:2512.05415 [pdf, ps, other]: Title: Moving object detection from multi-depth images with an attention-enhanced CNN

Authors: Masato Shibukawa, Fumi Yoshida, Toshifumi Yanagisawa, Takashi Ito, Hirohisa Kurosaki, Makoto Yoshikawa, Kohki Kamiya, Ji-an Jiang, Wesley Fraser, JJ Kavelaars, Susan Benecchi, Anne Verbiscer, Akira Hatakeyama, Hosei O, Naoya Ozaki

Comments: 14 pages, 22 figures, submitted to PASJ

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[58] arXiv:2512.05412 [pdf, ps, other]: Title: YOLO and SGBM Integration for Autonomous Tree Branch Detection and Depth Estimation in Radiata Pine Pruning Applications

Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[59] arXiv:2512.05410 [pdf, ps, other]: Title: Genetic Algorithms For Parameter Optimization for Disparity Map Generation of Radiata Pine Branch Images

Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[60] arXiv:2512.05398 [pdf, ps, other]: Title: The Dynamic Prior: Understanding 3D Structures for Casual Dynamic Videos

Authors: Zhuoyuan Wu, Xurui Yang, Jiahui Huang, Yue Wang, Jun Gao

Comments: Code is available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[61] arXiv:2512.05394 [pdf, ps, other]: Title: Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability

Authors: Shizhan Liu, Xinran Deng, Zhuoyi Yang, Jiayan Teng, Xiaotao Gu, Jie Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[62] arXiv:2512.05391 [pdf, ps, other]: Title: LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models

Authors: Qingqiao Hu, Weimin Lyu, Meilong Xu, Kehan Qi, Xiaoling Hu, Saumya Gupta, Jiawei Zhou, Chao Chen

Comments: 20 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[63] arXiv:2512.05385 [pdf, ps, other]: Title: ShaRP: SHAllow-LayeR Pruning for Video Large Language Models Acceleration

Authors: Yingjie Xia, Tao Liu, Jinglei Shi, Qingsong Xie, Heng Guo, Jian Yang, Xi Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[64] arXiv:2512.05362 [pdf, ps, other]: Title: PoolNet: Deep Learning for 2D to 3D Video Process Validation

Authors: Sanchit Kaul, Joseph Luna, Shray Arora

Comments: All code related to this paper can be found at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[65] arXiv:2512.05359 [pdf, ps, other]: Title: Group Orthogonal Low-Rank Adaptation for RGB-T Tracking

Authors: Zekai Shao, Yufan Hu, Jingyuan Liu, Bin Fan, Hongmin Liu

Comments: 13 pages, 8 figures. Accepted by AAAI 2026. Extended version

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[66] arXiv:2512.05354 [pdf, ps, other]: Title: SplatPainter: Interactive Authoring of 3D Gaussians from 2D Edits via Test-Time Training

Authors: Yang Zheng, Hao Tan, Kai Zhang, Peng Wang, Leonidas Guibas, Gordon Wetzstein, Wang Yifan

Comments: project page this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[67] arXiv:2512.05343 [pdf, ps, other]: Title: SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Authors: Elisabetta Fedele, Francis Engelmann, Ian Huang, Or Litany, Marc Pollefeys, Leonidas Guibas

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[68] arXiv:2512.05277 [pdf, ps, other]: Title: From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model

Authors: Kevin Cannons, Saeed Ranjbar Alvar, Mohammad Asiful Hossain, Ahmad Rezaei, Mohsen Gholami, Alireza Heidarikhazaei, Zhou Weimin, Yong Zhang, Mohammad Akbari

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[69] arXiv:2512.05272 [pdf, ps, other]: Title: Inferring Compositional 4D Scenes without Ever Seeing One

Authors: Ahmet Berke Gokmen, Ajad Chhatkuli, Luc Van Gool, Danda Pani Paudel

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[70] arXiv:2512.05268 [pdf, ps, other]: Title: CARD: Correlation Aware Restoration with Diffusion

Authors: Niki Nezakati, Arnab Ghosh, Amit Roy-Chowdhury, Vishwanath Saragadam

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[71] arXiv:2512.05259 [pdf, ps, other]: Title: Age-Inclusive 3D Human Mesh Recovery for Action-Preserving Data Anonymization

Authors: Georgios Chatzichristodoulou, Niki Efthymiou, Panagiotis Filntisis, Georgios Pavlakos, Petros Maragos

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[72] arXiv:2512.05240 [pdf, ps, other]: Title: IE2Video: Adapting Pretrained Diffusion Models for Event-Based Video Reconstruction

Authors: Dmitrii Torbunov, Onur Okuducu, Yi Huang, Odera Dim, Rebecca Coles, Yonggang Cui, Yihui Ren

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[73] arXiv:2512.05209 [pdf, ps, other]: Title: DEAR: Dataset for Evaluating the Aesthetics of RenderingDEAR: Dataset for Evaluating the Aesthetics of Rendering

Authors: Vsevolod Plohotnuk, Artyom Panshin, Nikola Banić, Simone Bianco, Michael Freeman, Egor Ershov

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[74] arXiv:2512.05198 [pdf, ps, other]: Title: Your Latent Mask is Wrong: Pixel-Equivalent Latent Compositing for Diffusion Models

Authors: Rowan Bradbury, Dazhi Zhong

Comments: 16 pages, 10 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[75] arXiv:2512.05172 [pdf, ps, other]: Title: Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning

Authors: Wentao Wang, Chunyang Liu, Kehua Sheng, Bo Zhang, Yan Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[76] arXiv:2512.05152 [pdf, ps, other]: Title: EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models

Authors: Kun Wang, Donglin Di, Tonghua Su, Lei Fan

Comments: 6pages, 5figures, published to 2025 IEEE International Conference on Multimedia and Expo (ICME), Nantes, France, 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[77] arXiv:2512.05150 [pdf, ps, other]: Title: TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Authors: Zhenglin Cheng, Peng Sun, Jianguo Li, Tao Lin

Comments: arxiv v0

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[78] arXiv:2512.05145 [pdf, ps, other]: Title: Self-Improving VLM Judges Without Human Annotations

Authors: Inna Wanyin Lin, Yushi Hu, Shuyue Stella Li, Scott Geng, Pang Wei Koh, Luke Zettlemoyer, Tim Althoff, Marjan Ghazvininejad

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[79] arXiv:2512.05140 [pdf, other]: Title: FlowEO: Generative Unsupervised Domain Adaptation for Earth Observation

Authors: Georges Le Bellier (CEDRIC - VERTIGO, Cnam), Nicolas Audebert (LaSTIG, IGN, CEDRIC - VERTIGO)

Comments: 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Mar 2026, Tucson (AZ), United States

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[80] arXiv:2512.05139 [pdf, ps, other]: Title: Spatiotemporal Satellite Image Downscaling with Transfer Encoders and Autoregressive Generative Models

Authors: Yang Xiang, Jingwen Zhong, Yige Yan, Petros Koutrakis, Eric Garshick, Meredith Franklin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[81] arXiv:2512.05137 [pdf, ps, other]: Title: ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images

Authors: Yunfei Zhang, Yizhuo He, Yuanxun Shao, Zhengtao Yao, Haoyan Xu, Junhao Dong, Zhen Yao, Zhikang Dong

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[82] arXiv:2512.05136 [pdf, ps, other]: Title: Fine-tuning an ECG Foundation Model to Predict Coronary CT Angiography Outcomes

Authors: Yujie Xiao, Gongzhen Tang, Deyun Zhang, Jun Li, Guangkun Nie, Haoyu Wang, Shun Huang, Tong Liu, Qinghao Zhao, Kangyin Chen, Shenda Hong

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[83] arXiv:2512.05134 [pdf, ps, other]: Title: InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models

Authors: Zihao Wu

Comments: 8 pages main, 8 pages appendix, 16 figures, 5 tables. Code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[84] arXiv:2512.05132 [pdf, ps, other]: Title: Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training

Authors: Wenshuo Wang, Fan Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[85] arXiv:2512.05131 [pdf, ps, other]: Title: AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance

Authors: Tianling Xu, Shengzhe Gan, Leslie Gu, Yuelei Li, Fangneng Zhan, Hanspeter Pfister

Comments: Under review

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[86] arXiv:2512.05959 (cross-list from cs.CL) [pdf, ps, other]: Title: M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

Authors: David Anugraha, Patrick Amadeus Irawan, Anshul Singh, En-Shiun Annie Lee, Genta Indra Winata

Comments: Preprint

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[87] arXiv:2512.05955 (cross-list from cs.RO) [pdf, ps, other]: Title: SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models

Authors: Haowen Liu, Shaoxiong Yao, Haonan Chen, Jiawei Gao, Jiayuan Mao, Jia-Bin Huang, Yilun Du

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[88] arXiv:2512.05932 (cross-list from cs.RO) [pdf, ps, other]: Title: Physically-Based Simulation of Automotive LiDAR

Authors: L. Dudzik, M. Roschani, A. Sielemann, K. Trampert, J. Ziehn, J. Beyerer, C. Neumann

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[89] arXiv:2512.05824 (cross-list from cs.AI) [pdf, ps, other]: Title: Multimodal Oncology Agent for IDH1 Mutation Prediction in Low-Grade Glioma

Authors: Hafsa Akebli (1), Adam Shephard (2), Vincenzo Della Mea (1), Nasir Rajpoot (2 and 3) ((1) University of Udine, Udine, Italy, (2) University of Warwick, Coventry, UK, (3) Histofy Ltd, Coventry, UK)

Comments: 4 pages, 2 figures

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[90] arXiv:2512.05812 (cross-list from cs.RO) [pdf, ps, other]: Title: Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation

Authors: Fabian Konstantinidis, Moritz Sackmann, Ulrich Hofmann, Christoph Stiller

Comments: This work has been submitted to the IEEE for possible publication

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[91] arXiv:2512.05665 (cross-list from cs.CL) [pdf, ps, other]: Title: Interleaved Latent Visual Reasoning with Selective Perceptual Modeling

Authors: Shuai Dong, Siyuan Wang, Xingyu Liu, Zhongyu Wei

Comments: 11 pages, 6 figures. Code available at this https URL

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[92] arXiv:2512.05438 (cross-list from cs.HC) [pdf, ps, other]: Title: EXR: An Interactive Immersive EHR Visualization in Extended Reality

Authors: Benoit Marteau, Shaun Q. Y. Tan, Jieru Li, Andrew Hornback, Yishan Zhong, Shaunna Wang, Christian Lowson, Jason Woloff, Joshua M. Pahys, Steven W. Hwang, Coleman Hilton, May D. Wang

Comments: 11 pages, 6 figures. Preprint version. This paper has been accepted to IEEE ICIR 2025. This is the author-prepared version and not the final published version. The final version will appear in IEEE Xplo

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[93] arXiv:2512.05299 (cross-list from eess.SY) [pdf, ps, other]: Title: ARCAS: An Augmented Reality Collision Avoidance System with SLAM-Based Tracking for Enhancing VRU Safety

Authors: Ahmad Yehia, Jiseop Byeon, Tianyi Wang, Huihai Wang, Yiming Xu, Junfeng Jiao, Christian Claudel

Comments: 8 pages, 3 figures, 1 table

Subjects: Systems and Control (eess.SY); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET); Robotics (cs.RO); Image and Video Processing (eess.IV)
[94] arXiv:2512.05126 (cross-list from eess.AS) [pdf, ps, other]: Title: SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model

Authors: Kaidi Wang, Yi He, Wenhao Guan, Weijie Wu, Hongwu Ding, Xiong Zhang, Di Wu, Meng Meng, Jian Luan, Lin Li, Qingyang Hong

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)

Fri, 5 Dec 2025

[95] arXiv:2512.05115 [pdf, ps, other]: Title: Light-X: Generative 4D Video Rendering with Camera and Illumination Control

Authors: Tianqi Liu, Zhaoxi Chen, Zihao Huang, Shaocong Xu, Saining Zhang, Chongjie Ye, Bohan Li, Zhiguo Cao, Wei Li, Hao Zhao, Ziwei Liu

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[96] arXiv:2512.05113 [pdf, ps, other]: Title: Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

Authors: Hao-Jen Chien, Yi-Chuan Huang, Chung-Ho Wu, Wei-Lun Chao, Yu-Lun Liu

Comments: WACV 2025. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[97] arXiv:2512.05112 [pdf, ps, other]: Title: DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

Authors: Dongzhi Jiang, Renrui Zhang, Haodong Li, Zhuofan Zong, Ziyu Guo, Jun He, Claire Guo, Junyan Ye, Rongyao Fang, Weijia Li, Rui Liu, Hongsheng Li

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[98] arXiv:2512.05111 [pdf, ps, other]: Title: ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Authors: Shengyuan Ding, Xinyu Fang, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiangyu Zhao, Haodong Duan, Xiaoyi Dong, Jianze Liang, Bin Wang, Conghui He, Dahua Lin, Jiaqi Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[99] arXiv:2512.05110 [pdf, ps, other]: Title: ShadowDraw: From Any Object to Shadow-Drawing Compositional Art

Authors: Rundong Luo, Noah Snavely, Wei-Chiu Ma

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[100] arXiv:2512.05106 [pdf, ps, other]: Title: NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

Authors: Yu Zeng, Charles Ochoa, Mingyuan Zhou, Vishal M. Patel, Vitor Guizilini, Rowan McAllister

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Robotics (cs.RO)
[101] arXiv:2512.05104 [pdf, ps, other]: Title: EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency Modulation

Authors: Jiaqi Ma, Shengkai Hu, Jun Wan, Jiaxing Huang, Lefei Zhang, Salman Khan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[102] arXiv:2512.05098 [pdf, ps, other]: Title: SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards

Authors: Yuan Gao, Jin Song

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[103] arXiv:2512.05091 [pdf, ps, other]: Title: Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark

Authors: Haobo Yuan, Yueyi Sun, Yanwei Li, Tao Zhang, Xueqing Deng, Henghui Ding, Lu Qi, Anran Wang, Xiangtai Li, Ming-Hsuan Yang

Comments: Technical Report; Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[104] arXiv:2512.05081 [pdf, ps, other]: Title: Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

Authors: Jung Yi, Wooseok Jang, Paul Hyunbin Cho, Jisu Nam, Heeji Yoon, Seungryong Kim

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[105] arXiv:2512.05079 [pdf, ps, other]: Title: Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints

Authors: Minghan Zhu, Zhiyi Wang, Qihang Sun, Maani Ghaffari, Michael Posa

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[106] arXiv:2512.05076 [pdf, ps, other]: Title: BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

Authors: Yiming Wang, Qihang Zhang, Shengqu Cai, Tong Wu, Jan Ackermann, Zhengfei Kuang, Yang Zheng, Frano Rajič, Siyu Tang, Gordon Wetzstein

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[107] arXiv:2512.05060 [pdf, ps, other]: Title: 4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer

Authors: Xianfeng Wu, Yajing Bai, Minghan Li, Xianzu Wu, Xueqi Zhao, Zhongyuan Lai, Wenyu Liu, Xinggang Wang

Comments: Code: this https URL, Webpage: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[108] arXiv:2512.05044 [pdf, ps, other]: Title: Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Authors: Yanran Zhang, Ziyi Wang, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu

Comments: 18 Pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[109] arXiv:2512.05039 [pdf, ps, other]: Title: Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding

Authors: Abhigyan Bhattacharya, Hiranmoy Roy

Comments: Submitted for review CVPR-2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[110] arXiv:2512.05025 [pdf, ps, other]: Title: RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation

Authors: Nicolas Houdré, Diego Marcos, Hugo Riffaud de Turckheim, Dino Ienco, Laurent Wendling, Camille Kurtz, Sylvain Lobry

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[111] arXiv:2512.05021 [pdf, ps, other]: Title: HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition

Authors: Pham Thach Thanh Truc, Dang Hoai Nam, Huynh Tong Dang Khoa, Vo Nguyen Le Duy

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[112] arXiv:2512.05016 [pdf, ps, other]: Title: Generative Neural Video Compression via Video Diffusion Prior

Authors: Qi Mao, Hao Cheng, Tinghan Yang, Libiao Jin, Siwei Ma

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[113] arXiv:2512.05006 [pdf, ps, other]: Title: Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent Objects

Authors: Xianghui Fan, Zhaoyu Chen, Mengyang Pan, Anping Deng, Hang Yang

Comments: conference

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[114] arXiv:2512.05000 [pdf, ps, other]: Title: Reflection Removal through Efficient Adaptation of Diffusion Transformers

Authors: Daniyar Zakarin, Thiemo Wandel, Anton Obukhov, Dengxin Dai

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[115] arXiv:2512.04996 [pdf, ps, other]: Title: A dynamic memory assignment strategy for dilation-based ICP algorithm on embedded GPUs

Authors: Qiong Chang, Weimin Wang, Junpei Zhong, Jun Miyazaki

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[116] arXiv:2512.04981 [pdf, ps, other]: Title: Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models

Authors: NaHyeon Park, Namin An, Kunhee Kim, Soyeon Yoon, Jiahao Huo, Hyunjung Shim

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[117] arXiv:2512.04970 [pdf, ps, other]: Title: Stable Single-Pixel Contrastive Learning for Semantic and Geometric Tasks

Authors: Leonid Pogorelyuk, Niels Bracher, Aaron Verkleeren, Lars Kühmichel, Stefan T. Radev

Comments: UniReps Workshop 2025, 12 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[118] arXiv:2512.04969 [pdf, ps, other]: Title: Rethinking the Use of Vision Transformers for AI-Generated Image Detection

Authors: NaHyeon Park, Kunhee Kim, Junsuk Choe, Hyunjung Shim

Comments: Code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[119] arXiv:2512.04967 [pdf, ps, other]: Title: Balanced Few-Shot Episodic Learning for Accurate Retinal Disease Diagnosis

Authors: Jasmaine Khale, Ravi Prakash Srivastava

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[120] arXiv:2512.04963 [pdf, ps, other]: Title: GeoPE:A Unified Geometric Positional Embedding for Structured Tensors

Authors: Yupu Yao, Bowen Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[121] arXiv:2512.04952 [pdf, ps, other]: Title: FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action Tokenization

Authors: Yicheng Liu, Shiduo Zhang, Zibin Dong, Baijun Ye, Tianyuan Yuan, Xiaopeng Yu, Linqi Yin, Chenhao Lu, Junhao Shi, Luca Jiang-Tao Yu, Liangtao Zheng, Tao Jiang, Jingjing Gong, Xipeng Qiu, Hang Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[122] arXiv:2512.04943 [pdf, ps, other]: Title: Towards Adaptive Fusion of Multimodal Deep Networks for Human Action Recognition

Authors: Novanto Yudistira

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[123] arXiv:2512.04939 [pdf, ps, other]: Title: LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging

Authors: Zhijian Shu, Cheng Lin, Tao Xie, Wei Yin, Ben Li, Zhiyuan Pu, Weize Li, Yao Yao, Xun Cao, Xiaoyang Guo, Xiao-Xiao Long

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[124] arXiv:2512.04927 [pdf, ps, other]: Title: Virtually Unrolling the Herculaneum Papyri by Diffeomorphic Spiral Fitting

Authors: Paul Henderson

Comments: Accepted at WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[125] arXiv:2512.04926 [pdf, ps, other]: Title: Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Authors: Yueming Pan, Ruoyu Feng, Qi Dai, Yuqi Wang, Wenfeng Lin, Mingyu Guo, Chong Luo, Nanning Zheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[126] arXiv:2512.04904 [pdf, ps, other]: Title: ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching

Authors: Guanbo Huang, Jingjia Mao, Fanding Huang, Fengkai Liu, Xiangyang Luo, Yaoyuan Liang, Jiasheng Lu, Xiaoe Wang, Pei Liu, Ruiliu Fu, Shao-Lun Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[127] arXiv:2512.04890 [pdf, ps, other]: Title: Equivariant Symmetry-Aware Head Pose Estimation for Fetal MRI

Authors: Ramya Muthukrishnan, Borjan Gagoski, Aryn Lee, P. Ellen Grant, Elfar Adalsteinsson, Polina Golland, Benjamin Billot

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[128] arXiv:2512.04888 [pdf, ps, other]: Title: You Only Train Once (YOTO): A Retraining-Free Object Detection Framework

Authors: Priyanto Hidayatullah, Nurjannah Syakrani, Yudi Widhiyasana, Muhammad Rizqi Sholahuddin, Refdinal Tubagus, Zahri Al Adzani Hidayat, Hanri Fajar Ramadhan, Dafa Alfarizki Pratama, Farhan Muhammad Yasin

Comments: This manuscript was first submitted to the Engineering (Elsevier Journal). The preprint version was posted to arXiv afterwards to facilitate open access and community feedback

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[129] arXiv:2512.04883 [pdf, ps, other]: Title: SDG-Track: A Heterogeneous Observer-Follower Framework for High-Resolution UAV Tracking on Embedded Platforms

Authors: Jiawen Wen, Yu Hu, Suixuan Qiu, Jinshan Huang, Xiaowen Chu

Comments: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[130] arXiv:2512.04875 [pdf, ps, other]: Title: SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection

Authors: Qing Xu, Yanqian Wang, Xiangjian Hea, Yue Li, Yixuan Zhang, Rong Qu, Wenting Duan, Zhen Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[131] arXiv:2512.04862 [pdf, ps, other]: Title: Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing

Authors: Maria-Paola Forte, Nikos Athanasiou, Giulia Ballardini, Jan Ulrich Bartels, Katherine J. Kuchenbecker, Michael J. Black

Comments: * Equal contribution. Minor figure corrections compared to the ICCV 2025 version

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[132] arXiv:2512.04857 [pdf, ps, other]: Title: Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens

Authors: Ziran Qin, Youru Lv, Mingbao Lin, Zeren Zhang, Chanfan Gan, Tieyuan Chen, Weiyao Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[133] arXiv:2512.04837 [pdf, ps, other]: Title: A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World

Authors: Jikang Cheng, Renye Yan, Zhiyuan Yan, Yaozhong Gan, Xueyi Zhang, Zhongyuan Wang, Wei Peng, Ling Liang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[134] arXiv:2512.04832 [pdf, ps, other]: Title: Tokenizing Buildings: A Transformer for Layout Synthesis

Authors: Manuel Ladron de Guevara, Jinmo Rhee, Ardavan Bidgoli, Vaidas Razgaitis, Michael Bergin

Comments: 8 pages, 1 page References, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[135] arXiv:2512.04830 [pdf, ps, other]: Title: FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis

Authors: Shijie Chen, Peixi Peng

Comments: Novel View Synthesis, Driving Scene, Free Trajectory, Image Generation

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[136] arXiv:2512.04821 [pdf, ps, other]: Title: LatentFM: A Latent Flow Matching Approach for Generative Medical Image Segmentation

Authors: Huynh Trinh Ngoc, Hoang Anh Nguyen Kim, Toan Nguyen Hai, Long Tran Quoc

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[137] arXiv:2512.04815 [pdf, ps, other]: Title: RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGS

Authors: Chuanyu Fu, Guanying Chen, Yuqi Zhang, Kunbin Yao, Yuan Xiong, Chuan Huang, Shuguang Cui, Yasuyuki Matsushita, Xiaochun Cao

Comments: arXiv admin note: substantial text overlap with arXiv:2506.02751

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[138] arXiv:2512.04810 [pdf, ps, other]: Title: EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Authors: Xin He, Longhui Wei, Jianbo Ouyang, Lingxi Xie, Qi Tian

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[139] arXiv:2512.04786 [pdf, ps, other]: Title: LaFiTe: A Generative Latent Field for 3D Native Texturing

Authors: Chia-Hao Chen, Zi-Xin Zou, Yan-Pei Cao, Ze Yuan, Guan Luo, Xiaojuan Qi, Ding Liang, Song-Hai Zhang, Yuan-Chen Guo

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[140] arXiv:2512.04784 [pdf, ps, other]: Title: PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Authors: Bowen Ping, Chengyou Jia, Minnan Luo, Changliang Xia, Xin Shen, Zhuohang Dang, Hangwei Qian

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[141] arXiv:2512.04761 [pdf, ps, other]: Title: Order Matters: 3D Shape Generation from Sequential VR Sketches

Authors: Yizi Chen, Sidi Wu, Tianyi Xiao, Nina Wiedemann, Loic Landrieu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[142] arXiv:2512.04734 [pdf, ps, other]: Title: MT-Depth: Multi-task Instance feature analysis for the Depth Completion

Authors: Abdul Haseeb Nizamani, Dandi Zhou, Xinhai Sun

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[143] arXiv:2512.04733 [pdf, ps, other]: Title: E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving

Authors: Yihong Tang, Haicheng Liao, Tong Nie, Junlin He, Ao Qu, Kehua Chen, Wei Ma, Zhenning Li, Lijun Sun, Chengzhong Xu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[144] arXiv:2512.04728 [pdf, ps, other]: Title: Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild

Authors: Yigui Feng, Qinglin Wang, Haotian Mo, Yang Liu, Ke Liu, Gencheng Liu, Xinhai Chen, Siqi Shen, Songzhu Mei, Jie Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[145] arXiv:2512.04699 [pdf, ps, other]: Title: OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution

Authors: Xinning Chai, Zhengxue Cheng, Yuhong Zhang, Hengsheng Zhang, Yingsheng Qin, Yucai Yang, Rong Xie, Li Song

Comments: Accepted as TCSVT, 15 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[146] arXiv:2512.04686 [pdf, ps, other]: Title: Towards Cross-View Point Correspondence in Vision-Language Models

Authors: Yipu Wang, Yuheng Ji, Yuyang Liu, Enshen Zhou, Ziqiang Yang, Yuxuan Tian, Ziheng Qin, Yue Liu, Huajie Tan, Cheng Chi, Zhiyuan Ma, Daniel Dajun Zeng, Xiaolong Zheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[147] arXiv:2512.04678 [pdf, ps, other]: Title: Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Authors: Yunhong Lu, Yanhong Zeng, Haobo Li, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jiapeng Zhu, Hengyuan Cao, Zhipeng Zhang, Xing Zhu, Yujun Shen, Min Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[148] arXiv:2512.04677 [pdf, ps, other]: Title: Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Authors: Yubo Huang, Hailong Guo, Fangtai Wu, Shifeng Zhang, Shijie Huang, Qijun Gan, Lin Liu, Sirui Zhao, Enhong Chen, Jiaming Liu, Steven Hoi

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[149] arXiv:2512.04660 [pdf, ps, other]: Title: I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models

Authors: Juntong Wang, Jiarui Wang, Huiyu Duan, Jiaxiang Kang, Guangtao Zhai, Xiongkuo Min

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[150] arXiv:2512.04643 [pdf, ps, other]: Title: SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding

Authors: Chang-Hsun Wu, Kai-Po Chang, Yu-Yang Sheng, Hung-Kai Chung, Kuei-Chun Wang, Yu-Chiang Frank Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[151] arXiv:2512.04619 [pdf, ps, other]: Title: Denoise to Track: Harnessing Video Diffusion Priors for Robust Correspondence

Authors: Tianyu Yuan, Yuanbo Yang, Lin-Zhuo Chen, Yao Yao, Zhuzhong Qian

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[152] arXiv:2512.04599 [pdf, ps, other]: Title: Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot

Authors: Sheng Hang, Chaoxiang He, Hongsheng Hu, Hanqing Hu, Bin Benjamin Zhu, Shi-Feng Sun, Dawu Gu, Shuo Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[153] arXiv:2512.04597 [pdf, ps, other]: Title: When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question Answering

Authors: Tao Wu, Chuhao Zhou, Guangyu Zhao, Haozhi Cao, Yewen Pu, Jianfei Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[154] arXiv:2512.04585 [pdf, ps, other]: Title: SAM3-I: Segment Anything with Instructions

Authors: Jingjing Li, Yue Feng, Yuchen Guo, Jincai Huang, Yongri Piao, Qi Bi, Miao Zhang, Xiaoqi Zhao, Qiang Chen, Shihao Zou, Wei Ji, Huchuan Lu, Li Cheng

Comments: Preliminary results; work in progress

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[155] arXiv:2512.04581 [pdf, ps, other]: Title: Infrared UAV Target Tracking with Dynamic Feature Refinement and Global Contextual Attention Knowledge Distillation

Authors: Houzhang Fang, Chenxing Wu, Kun Bai, Tianqi Chen, Xiaolin Wang, Xiyang Liu, Yi Chang, Luxin Yan

Comments: Accepted by IEEE TMM

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[156] arXiv:2512.04576 [pdf, ps, other]: Title: TARDis: Time Attenuated Representation Disentanglement for Incomplete Multi-Modal Tumor Segmentation and Classification

Authors: Zishuo Wan, Qinqin Kang, Yi Huang, Yun Bian, Dawei Ding, Ke Yan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[157] arXiv:2512.04568 [pdf, ps, other]: Title: Prompt2Craft: Generating Functional Craft Assemblies with LLMs

Authors: Vitor Hideyo Isume, Takuya Kiyokawa, Natsuki Yamanobe, Yukiyasu Domae, Weiwei Wan, Kensuke Harada

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[158] arXiv:2512.04564 [pdf, ps, other]: Title: Dataset creation for supervised deep learning-based analysis of microscopic images -- review of important considerations and recommendations

Authors: Christof A. Bertram, Viktoria Weiss, Jonas Ammeling, F. Maria Schabel, Taryn A. Donovan, Frauke Wilm, Christian Marzahl, Katharina Breininger, Marc Aubreville

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[159] arXiv:2512.04563 [pdf, ps, other]: Title: COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

Authors: Zefeng Zhang, Xiangzhao Hao, Hengzhu Tang, Zhenyu Zhang, Jiawei Sheng, Xiaodong Li, Zhenyang Li, Li Gao, Daiting Shi, Dawei Yin, Tingwen Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[160] arXiv:2512.04554 [pdf, ps, other]: Title: Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering

Authors: Marco Pintore, Maura Pintor, Dimosthenis Karatzas, Battista Biggio

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[161] arXiv:2512.04542 [pdf, ps, other]: Title: Gaussian Entropy Fields: Driving Adaptive Sparsity in 3D Gaussian Optimization

Authors: Hong Kuang, Jianchen Liu

Comments: 28 pages,11 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[162] arXiv:2512.04540 [pdf, ps, other]: Title: VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management

Authors: Hongbo Jin, Qingyuan Wang, Wenhao Zhang, Yang Liu, Sijie Cheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[163] arXiv:2512.04537 [pdf, ps, other]: Title: X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale

Authors: Pei Yang, Hai Ci, Yiren Song, Mike Zheng Shou

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[164] arXiv:2512.04536 [pdf, ps, other]: Title: Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model

Authors: Bita Baroutian, Atefe Aghaei, Mohsen Ebrahimi Moghaddam

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[165] arXiv:2512.04534 [pdf, ps, other]: Title: Refaçade: Editing Object with Given Reference Texture

Authors: Youze Huang (1), Penghui Ruan (2), Bojia Zi (3), Xianbiao Qi (4), Jianan Wang (5), Rong Xiao (4) ((1) University of Electronic Science and Technology of China, (2) The Hong Kong Polytechnic University, (3) The Chinese University of Hong Kong, (4) IntelliFusion Inc., (5) Astribot Inc.)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[166] arXiv:2512.04532 [pdf, ps, other]: Title: PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement

Authors: Yu-Wei Zhan, Xin Wang, Hong Chen, Tongtong Feng, Wei Feng, Ren Wang, Guangyao Li, Qing Li, Wenwu Zhu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[167] arXiv:2512.04528 [pdf, ps, other]: Title: Auto3R: Automated 3D Reconstruction and Scanning via Data-driven Uncertainty Quantification

Authors: Chentao Shen, Sizhe Zheng, Bingqian Wu, Yaohua Feng, Yuanchen Fei, Mingyu Mei, Hanwen Jiang, Xiangru Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[168] arXiv:2512.04522 [pdf, ps, other]: Title: Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-Identification

Authors: Guoqing Zhang, Zhun Wang, Hairui Wang, Zhonglin Ye, Yuhui Zheng

Comments: 14 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[169] arXiv:2512.04521 [pdf, ps, other]: Title: WiFi-based Cross-Domain Gesture Recognition Using Attention Mechanism

Authors: Ruijing Liu, Cunhua Pan, Jiaming Zeng, Hong Ren, Kezhi Wang, Lei Kong, Jiangzhou Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
[170] arXiv:2512.04520 [pdf, ps, other]: Title: Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation

Authors: Chenlin Xu, Lei Zhang, Lituan Wang, Xinyu Pu, Pengfei Ma, Guangwu Qian, Zizhou Wang, Yan Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[171] arXiv:2512.04519 [pdf, ps, other]: Title: VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

Authors: Yifei Yu, Xiaoshan Wu, Xinting Hu, Tao Hu, Yangtian Sun, Xiaoyang Lyu, Bo Wang, Lin Ma, Yuewen Ma, Zhongrui Wang, Xiaojuan Qi

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[172] arXiv:2512.04515 [pdf, ps, other]: Title: EgoLCD: Egocentric Video Generation with Long Context Diffusion

Authors: Liuzhou Zhang, Jiarui Ye, Yuanlei Wang, Ming Zhong, Mingju Cao, Wanke Xia, Bowen Zeng, Zeyu Zhang, Hao Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[173] arXiv:2512.04511 [pdf, ps, other]: Title: DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance

Authors: Yinghui Xing, Xiaoting Su, Shizhou Zhang, Donghao Chu, Di Xu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[174] arXiv:2512.04504 [pdf, ps, other]: Title: UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers

Authors: Min Zhao, Bokai Yan, Xue Yang, Hongzhou Zhu, Jintao Zhang, Shilong Liu, Chongxuan Li, Jun Zhu

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[175] arXiv:2512.04499 [pdf, ps, other]: Title: Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion Model

Authors: Yuduo Jin, Brandon Haworth

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[176] arXiv:2512.04496 [pdf, ps, other]: Title: Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight Removal

Authors: Tianci Huo, Lingfeng Qi, Yuhan Chen, Qihong Xue, Jinyuan Shao, Hai Yu, Jie Li, Zhanhua Zhang, Guofa Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[177] arXiv:2512.04487 [pdf, ps, other]: Title: Controllable Long-term Motion Generation with Extended Joint Targets

Authors: Eunjong Lee, Eunhee Kim, Sanghoon Hong, Eunho Jung, Jihoon Kim

Comments: WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[178] arXiv:2512.04485 [pdf, ps, other]: Title: Not All Birds Look The Same: Identity-Preserving Generation For Birds

Authors: Aaron Sun, Oindrila Saha, Subhransu Maji

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[179] arXiv:2512.04483 [pdf, ps, other]: Title: DeRA: Decoupled Representation Alignment for Video Tokenization

Authors: Pengbo Guo, Junke Wang, Zhen Xing, Chengxu Liu, Daoguo Dong, Xueming Qian, Zuxuan Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[180] arXiv:2512.04461 [pdf, ps, other]: Title: UniTS: Unified Time Series Generative Model for Remote Sensing

Authors: Yuxiang Zhang, Shunlin Liang, Wenyuan Li, Han Ma, Jianglei Xu, Yichuan Ma, Jiangwei Xie, Wei Li, Mengmeng Zhang, Ran Tao, Xiang-Gen Xia

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[181] arXiv:2512.04459 [pdf, ps, other]: Title: dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning

Authors: Yingzi Ma, Yulong Cao, Wenhao Ding, Shuibai Zhang, Yan Wang, Boris Ivanovic, Ming Jiang, Marco Pavone, Chaowei Xiao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[182] arXiv:2512.04456 [pdf, ps, other]: Title: GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis

Authors: Changjin Kim, HyeokJun Lee, YoungJoon Yoo

Comments: AAAI2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[183] arXiv:2512.04451 [pdf, ps, other]: Title: StreamEQA: Towards Streaming Video Understanding for Embodied Scenarios

Authors: Yifei Wang, Zhenkai Li, Tianwen Qian, Huanran Zheng, Zheng Wang, Yuqian Fu, Xiaoling Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[184] arXiv:2512.04441 [pdf, ps, other]: Title: MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving

Authors: Bin Sun, Yaoguang Cao, Yan Wang, Rui Wang, Jiachen Shang, Xiejie Feng, Jiayi Lu, Jia Shi, Shichun Yang, Xiaoyu Yane, Ziying Song

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[185] arXiv:2512.04426 [pdf, ps, other]: Title: Self-Paced and Self-Corrective Masked Prediction for Movie Trailer Generation

Authors: Sidan Zhu, Hongteng Xu, Dixin Luo

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[186] arXiv:2512.04425 [pdf, ps, other]: Title: Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models

Authors: Manar Alnaasan, Md Selim Sarowar, Sungho Kim

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[187] arXiv:2512.04421 [pdf, ps, other]: Title: UTrice: Unifying Primitives in Differentiable Ray Tracing and Rasterization via Triangles for Particle-Based 3D Scenes

Authors: Changhe Liu, Ehsan Javanmardi, Naren Bao, Alex Orsholits, Manabu Tsukada

Comments: 13 pages, 10 figures, submitted to CVPR2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[188] arXiv:2512.04413 [pdf, ps, other]: Title: Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection

Authors: Xiangyi Gao, Danpei Zhao, Bo Yuan, Wentao Li

Comments: 12 pages, 8 figures, 11 tables

Journal-ref: IEEE Transactions on Geoscience and Remote Sensing 63 (2025) 1-11

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[189] arXiv:2512.04397 [pdf, ps, other]: Title: Performance Evaluation of Transfer Learning Based Medical Image Classification Techniques for Disease Detection

Authors: Zeeshan Ahmad, Shudi Bao, Meng Chen

Journal-ref: 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Copenhagen, Denmark, 2025, pp. 1-5

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[190] arXiv:2512.04395 [pdf, ps, other]: Title: Fourier-Attentive Representation Learning: A Fourier-Guided Framework for Few-Shot Generalization in Vision-Language Models

Authors: Hieu Dinh Trung Pham, Huy Minh Nhat Nguyen, Cuong Tuan Nguyen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[191] arXiv:2512.04390 [pdf, ps, other]: Title: FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring

Authors: Geunhyuk Youk, Jihyong Oh, Munchurl Kim

Comments: 20 pages, 15 figures. Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[192] arXiv:2512.04358 [pdf, ps, other]: Title: MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching

Authors: Ao Xu, Rujin Zhao, Xiong Xu, Boceng Huang, Yujia Jia, Hongfeng Long, Fuxuan Chen, Zilong Cao, Fangyuan Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[193] arXiv:2512.04356 [pdf, ps, other]: Title: Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment

Authors: Kai-Po Chang, Wei-Yuan Cheng, Chi-Pin Huang, Fu-En Yang, Yu-Chiang Frank Wang

Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[194] arXiv:2512.04331 [pdf, ps, other]: Title: Open Set Face Forgery Detection via Dual-Level Evidence Collection

Authors: Zhongyi Cai, Bryce Gernon, Wentao Bao, Yifan Li, Matthew Wright, Yu Kong

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[195] arXiv:2512.04329 [pdf, ps, other]: Title: A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks

Authors: Waleed Khalid, Dmitry Ignatov, Radu Timofte

Subjects: Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
[196] arXiv:2512.04323 [pdf, ps, other]: Title: Bayes-DIC Net: Estimating Digital Image Correlation Uncertainty with Bayesian Neural Networks

Authors: Biao Chen, Zhenhua Lei, Yahui Zhang, Tongzhi Niu

Comments: 17 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
[197] arXiv:2512.04315 [pdf, ps, other]: Title: SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting

Authors: Yonghan Lee, Tsung-Wei Huang, Shiv Gehlot, Jaehoon Choi, Guan-Ming Su, Dinesh Manocha

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[198] arXiv:2512.04314 [pdf, ps, other]: Title: DisentangleFormer: Spatial-Channel Decoupling for Multi-Channel Vision

Authors: Jiashu Liao, Pietro Liò, Marc de Kamps, Duygu Sarikaya

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[199] arXiv:2512.04313 [pdf, ps, other]: Title: Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding

Authors: Haolin Xiong, Tianwen Fu, Pratusha Bhuvana Prasad, Yunxuan Cai, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie Zhao

Comments: 16 pages, 11 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[200] arXiv:2512.04311 [pdf, ps, other]: Title: Real-time Cricket Sorting By Sex

Authors: Juan Manuel Cantarero Angulo, Matthew Smith

Comments: 13 pages, 14 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
[201] arXiv:2512.04309 [pdf, ps, other]: Title: Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction

Authors: Rui Fonseca, Bruno Martins, Gil Rocha

Comments: Submitted to CVPR 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[202] arXiv:2512.04305 [pdf, ps, other]: Title: How (Mis)calibrated is Your Federated CLIP and What To Do About It?

Authors: Mainak Singha, Masih Aminbeidokhti, Paolo Casari, Elisa Ricci, Subhankar Roy

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[203] arXiv:2512.04303 [pdf, ps, other]: Title: Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications

Authors: Gasser Elazab, Maximilian Jansen, Michael Unterreiner, Olaf Hellwich

Comments: Accepted in 3DV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[204] arXiv:2512.04284 [pdf, ps, other]: Title: Learning Single-Image Super-Resolution in the JPEG Compressed Domain

Authors: Sruthi Srinivasan, Elham Shakibapour, Rajy Rawther, Mehdi Saeedi

Comments: 7 pages, 4 figures, 2 tables, SEEDS Workshop, ICIP 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[205] arXiv:2512.04283 [pdf, ps, other]: Title: Plug-and-Play Image Restoration with Flow Matching: A Continuous Viewpoint

Authors: Fan Jia, Yuhao Huang, Shih-Hsin Wang, Cristina Garcia-Cardona, Andrea L. Bertozzi, Bao Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[206] arXiv:2512.04282 [pdf, ps, other]: Title: Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion Transfer

Authors: Tasmiah Haque, Srinjoy Das

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[207] arXiv:2512.04267 [pdf, ps, other]: Title: UniLight: A Unified Representation for Lighting

Authors: Zitian Zhang, Iliyan Georgiev, Michael Fischer, Yannick Hold-Geoffroy, Jean-François Lalonde, Valentin Deschaintre

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[208] arXiv:2512.04248 [pdf, ps, other]: Title: MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

Authors: Shaoheng Fang, Chaohui Yu, Fan Wang, Qixing Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[209] arXiv:2512.04238 [pdf, ps, other]: Title: 6 Fingers, 1 Kidney: Natural Adversarial Medical Images Reveal Critical Weaknesses of Vision-Language Models

Authors: Leon Mayer, Piotr Kalinowski, Caroline Ebersbach, Marcel Knopp, Tim Rädsch, Evangelia Christodoulou, Annika Reinke, Fiona R. Kolbinger, Lena Maier-Hein

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[210] arXiv:2512.04222 [pdf, ps, other]: Title: ReasonX: MLLM-Guided Intrinsic Image Decomposition

Authors: Alara Dirik, Tuanfeng Wang, Duygu Ceylan, Stefanos Zafeiriou, Anna Frühstück

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[211] arXiv:2512.04221 [pdf, ps, other]: Title: MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis

Authors: Xiangyu Bai, He Liang, Bishoy Galoaa, Utsav Nandi, Shayda Moezzi, Yuhang He, Sarah Ostadabbas

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[212] arXiv:2512.04219 [pdf, ps, other]: Title: Generalized Event Partonomy Inference with Structured Hierarchical Predictive Learning

Authors: Zhou Chen, Joe Lin, Sathyanarayanan N. Aakur\\

Comments: 16 pages, 7 figures, 3 tables. Under Review

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[213] arXiv:2512.04187 [pdf, ps, other]: Title: OnSight Pathology: A real-time platform-agnostic computational pathology companion for histopathology

Authors: Jinzhen Hu, Kevin Faust, Parsa Babaei Zadeh, Adrienn Bourkas, Shane Eaton, Andrew Young, Anzar Alvi, Dimitrios George Oreopoulos, Ameesha Paliwal, Assem Saleh Alrumeh, Evelyn Rose Kamski-Hennekam, Phedias Diamandis

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[214] arXiv:2512.04175 [pdf, ps, other]: Title: Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection

Authors: Alejandro Cobo, Roberto Valle, José Miguel Buenaposada, Luis Baumela

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[215] arXiv:2512.05117 (cross-list from cs.LG) [pdf, ps, other]: Title: The Universal Weight Subspace Hypothesis

Authors: Prakhar Kaushik, Shravan Chaudhari, Ankit Vaidya, Rama Chellappa, Alan Yuille

Comments: 37 pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[216] arXiv:2512.05116 (cross-list from cs.LG) [pdf, ps, other]: Title: Value Gradient Guidance for Flow Matching Alignment

Authors: Zhen Liu, Tim Z. Xiao, Carles Domingo-Enrich, Weiyang Liu, Dinghuai Zhang

Comments: Accepted at NeurIPS 2025; 26 pages, 20 figures

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[217] arXiv:2512.05114 (cross-list from cs.LG) [pdf, ps, other]: Title: Deep infant brain segmentation from multi-contrast MRI

Authors: Malte Hoffmann, Lilla Zöllei, Adrian V. Dalca

Comments: 8 pages, 8 figures, 1 table, website at this https URL, presented at the 2025 IEEE Asilomar Conference on Signals, Systems, and Computers

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[218] arXiv:2512.05103 (cross-list from cs.LG) [pdf, ps, other]: Title: TV2TV: A Unified Framework for Interleaved Language and Video Generation

Authors: Xiaochuang Han, Youssef Emad, Melissa Hall, John Nguyen, Karthik Padthe, Liam Robbins, Amir Bar, Delong Chen, Michal Drozdzal, Maha Elbayad, Yushi Hu, Shang-Wen Li, Sreya Dutta Roy, Jakob Verbeek, XuDong Wang, Marjan Ghazvininejad, Luke Zettlemoyer, Emily Dinan

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[219] arXiv:2512.05094 (cross-list from cs.RO) [pdf, ps, other]: Title: From Generated Human Videos to Physically Plausible Robot Trajectories

Authors: James Ni, Zekai Wang, Wei Lin, Amir Bar, Yann LeCun, Trevor Darrell, Jitendra Malik, Roei Herzig

Comments: For project website, see this https URL

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[220] arXiv:2512.04814 (cross-list from cs.SD) [pdf, ps, other]: Title: Shared Multi-modal Embedding Space for Face-Voice Association

Authors: Christopher Simic, Korbinian Riedhammer, Tobias Bocklet

Comments: Ranked 1st in Fame 2026 Challenge, ICASSP

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
[221] arXiv:2512.04763 (cross-list from cs.LG) [pdf, ps, other]: Title: MemLoRA: Distilling Expert Adapters for On-Device Memory Systems

Authors: Massimo Bini, Ondrej Bohdal, Umberto Michieli, Zeynep Akata, Mete Ozay, Taha Ceritli

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[222] arXiv:2512.04705 (cross-list from cs.CC) [pdf, ps, other]: Title: Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators

Authors: Alaa Zniber, Arne Symons, Ouassim Karrakchou, Marian Verhelst, Mounir Ghogho

Comments: Submitted to IEEE Transactions on Emerging Topics in Computing

Subjects: Computational Complexity (cs.CC); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)
[223] arXiv:2512.04625 (cross-list from cs.LG) [pdf, ps, other]: Title: Rethinking Decoupled Knowledge Distillation: A Predictive Distribution Perspective

Authors: Bowen Zheng, Ran Cheng

Comments: Accepted to IEEE TNNLS

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[224] arXiv:2512.04556 (cross-list from cs.GR) [pdf, ps, other]: Title: Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex

Authors: Zhizhen Wu, Zhe Cao, Yuchi Huo

Comments: 10 pages, 7 figures

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[225] arXiv:2512.04464 (cross-list from cs.LG) [pdf, ps, other]: Title: Feature Engineering vs. Deep Learning for Automated Coin Grading: A Comparative Study on Saint-Gaudens Double Eagles

Authors: Tanmay Dogra, Eric Ngo, Mohammad Alam, Jean-Paul Talavera, Asim Dahal

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[226] arXiv:2512.04385 (cross-list from cs.LG) [pdf, ps, other]: Title: STeP-Diff: Spatio-Temporal Physics-Informed Diffusion Models for Mobile Fine-Grained Pollution Forecasting

Authors: Nan Zhou, Weijie Hong, Huandong Wang, Jianfeng Zheng, Qiuhua Wang, Yali Song, Xiao-Ping Zhang, Yong Li, Xinlei Chen

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[227] arXiv:2512.04264 (cross-list from cs.LG) [pdf, ps, other]: Title: Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness

Authors: Long Dang, Thushari Hapuarachchi, Kaiqi Xiong, Jing Lin

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[228] arXiv:2512.04092 (cross-list from physics.soc-ph) [pdf, ps, other]: Title: The changing surface of the world's roads

Authors: Sukanya Randhawa, Guntaj Randhawa, Clemens Langer, Francis Andorful, Benjamin Herfort, Daniel Kwakye, Omer Olchik, Sven Lautenbach, Alexander Zipf

Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
[229] arXiv:2512.04087 (cross-list from q-bio.NC) [pdf, ps, other]: Title: Human-Centred Evaluation of Text-to-Image Generation Models for Self-expression of Mental Distress: A Dataset Based on GPT-4o

Authors: Sui He, Shenbin Qian

Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Thu, 4 Dec 2025

[230] arXiv:2512.04085 [pdf, ps, other]: Title: Unique Lives, Shared World: Learning from Single-Life Videos

Authors: Tengda Han, Sayna Ebrahimi, Dilara Gokay, Li Yang Ku, Maks Ovsjanikov, Iva Babukova, Daniel Zoran, Viorica Patraucean, Joao Carreira, Andrew Zisserman, Dima Damen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[231] arXiv:2512.04084 [pdf, ps, other]: Title: SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows

Authors: Qinyu Zhao, Guangting Zheng, Tao Yang, Rui Zhu, Xingjian Leng, Stephen Gould, Liang Zheng

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[232] arXiv:2512.04082 [pdf, ps, other]: Title: PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

Authors: Jiazhe Wei, Ken Li, Tianyu Lao, Haofan Wang, Liang Wang, Caifeng Shan, Chenyang Si

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[233] arXiv:2512.04069 [pdf, ps, other]: Title: SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

Authors: Siyi Chen, Mikaela Angelina Uy, Chan Hee Song, Faisal Ladhak, Adithyavairavan Murali, Qing Qu, Stan Birchfield, Valts Blukis, Jonathan Tremblay

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[234] arXiv:2512.04048 [pdf, ps, other]: Title: Stable Signer: Hierarchical Sign Language Generative Model

Authors: Sen Fang, Yalin Feng, Hongbin Zhong, Yanxin Zhang, Dimitris N. Metaxas

Comments: 12 pages, 7 figures. More Demo at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Computers and Society (cs.CY)
[235] arXiv:2512.04040 [pdf, ps, other]: Title: RELIC: Interactive Video World Model with Long-Horizon Memory

Authors: Yicong Hong, Yiqun Mei, Chongjian Ge, Yiran Xu, Yang Zhou, Sai Bi, Yannick Hold-Geoffroy, Mike Roberts, Matthew Fisher, Eli Shechtman, Kalyan Sunkavalli, Feng Liu, Zhengqi Li, Hao Tan

Comments: 22 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[236] arXiv:2512.04039 [pdf, ps, other]: Title: Fast & Efficient Normalizing Flows and Applications of Image Generative Models

Authors: Sandeep Nagar

Comments: PhD Thesis

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[237] arXiv:2512.04025 [pdf, ps, other]: Title: PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

Authors: Xiaolong Li, Youping Gu, Xi Lin, Weijie Wang, Bohan Zhuang

Comments: Tech report

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[238] arXiv:2512.04021 [pdf, ps, other]: Title: C3G: Learning Compact 3D Representations with 2K Gaussians

Authors: Honggyu An, Jaewoo Jung, Mungyeom Kim, Sunghwan Hong, Chaehyun Kim, Kazumi Fukuda, Minkyeong Jeon, Jisang Han, Takuya Narihira, Hyuna Ko, Junsu Kim, Yuki Mitsufuji, Seungryong Kim

Comments: Project Page : this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[239] arXiv:2512.04019 [pdf, ps, other]: Title: Ultra-lightweight Neural Video Representation Compression

Authors: Ho Man Kwan, Tianhao Peng, Ge Gao, Fan Zhang, Mike Nilsson, Andrew Gower, David Bull

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[240] arXiv:2512.04015 [pdf, ps, other]: Title: Learning Group Actions In Disentangled Latent Image Representations

Authors: Farhana Hossain Swarnali, Miaomiao Zhang, Tonmoy Hossain

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[241] arXiv:2512.04012 [pdf, ps, other]: Title: Emergent Outlier View Rejection in Visual Geometry Grounded Transformers

Authors: Jisang Han, Sunghwan Hong, Jaewoo Jung, Wooseok Jang, Honggyu An, Qianqian Wang, Seungryong Kim, Chen Feng

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[242] arXiv:2512.04007 [pdf, ps, other]: Title: On the Temporality for Sketch Representation Learning

Authors: Marcelo Isaias de Moraes Junior, Moacir Antonelli Ponti

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[243] arXiv:2512.04000 [pdf, ps, other]: Title: Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

Authors: Jialuo Li, Bin Li, Jiahao Li, Yan Lu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[244] arXiv:2512.03996 [pdf, ps, other]: Title: Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation

Authors: Hang Xu, Linjiang Huang, Feng Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[245] arXiv:2512.03992 [pdf, ps, other]: Title: DIQ-H: Evaluating Hallucination Persistence in VLMs Under Temporal Visual Degradation

Authors: Zexin Lin, Hawen Wan, Yebin Zhong, Xiaoqiang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[246] arXiv:2512.03981 [pdf, ps, other]: Title: DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment

Authors: Sheng-Hao Liao, Shang-Fu Chen, Tai-Ming Huang, Wen-Huang Cheng, Kai-Lung Hua

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[247] arXiv:2512.03979 [pdf, ps, other]: Title: BlurDM: A Blur Diffusion Model for Image Deblurring

Authors: Jin-Ting He, Fu-Jen Tsai, Yan-Tsung Peng, Min-Hung Chen, Chia-Wen Lin, Yen-Yu Lin

Comments: NeurIPS 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[248] arXiv:2512.03964 [pdf, ps, other]: Title: Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization

Authors: Lianyu Pang, Ji Zhou, Qiping Wang, Baoquan Zhao, Zhenguo Yang, Qing Li, Xudong Mao

Comments: 17 pages, 13 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[249] arXiv:2512.03963 [pdf, ps, other]: Title: TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning

Authors: Tao Wu, Li Yang, Gen Zhan, Yabin Zhang, Yiting Liao, Junlin Li, Deliang Fu, Li Zhang, Limin Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[250] arXiv:2512.03939 [pdf, ps, other]: Title: MUT3R: Motion-aware Updating Transformer for Dynamic 3D Reconstruction

Authors: Guole Shen, Tianchen Deng, Xingrui Qin, Nailin Wang, Jianyu Wang, Yanbo Wang, Yongtao Chen, Hesheng Wang, Jingchuan Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[251] arXiv:2512.03932 [pdf, ps, other]: Title: Beyond the Ground Truth: Enhanced Supervision for Image Restoration

Authors: Donghun Ryou, Inju Ha, Sanghyeok Chu, Bohyung Han

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[252] arXiv:2512.03918 [pdf, ps, other]: Title: UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework

Authors: Youxin Pang, Yong Zhang, Ruizhi Shao, Xiang Deng, Feng Gao, Xu Xiaoming, Xiaoming Wei, Yebin Liu

Comments: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[253] arXiv:2512.03905 [pdf, ps, other]: Title: Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence

Authors: Shuai Yang, Junxin Lin, Yifan Zhou, Ziwei Liu, Chen Change Loy

Comments: Code: this https URL, Project: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[254] arXiv:2512.03883 [pdf, ps, other]: Title: Dual Cross-Attention Siamese Transformer for Rectal Tumor Regrowth Assessment in Watch-and-Wait Endoscopy

Authors: Jorge Tapias Gomez, Despoina Kanata, Aneesh Rangnekar, Christina Lee, Julio Garcia-Aguilar, Joshua Jesse Smith, Harini Veeraraghavan

Comments: 6 pages, 5 figures, 1 table, submitted to ISBI conference

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[255] arXiv:2512.03869 [pdf, ps, other]: Title: An Automated Framework for Large-Scale Graph-Based Cerebrovascular Analysis

Authors: Daniele Falcetta, Liane S. Canas, Lorenzo Suppa, Matteo Pentassuglia, Jon Cleary, Marc Modat, Sébastien Ourselin, Maria A. Zuluaga

Comments: Submitted to ISBI 2026. 6 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
[256] arXiv:2512.03862 [pdf, ps, other]: Title: Diminishing Returns in Self-Supervised Learning

Authors: Oli Bridge, Huey Sun, Botond Branyicskai-Nagy, Charles D'Ornano, Shomit Basu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[257] arXiv:2512.03854 [pdf, ps, other]: Title: Prostate biopsy whole slide image dataset from an underrepresented Middle Eastern population

Authors: Peshawa J. Muhammad Ali, Navin Vincent, Saman S. Abdulla, Han N. Mohammed Fadhl, Anders Blilie, Kelvin Szolnoky, Julia Anna Mielcarz, Xiaoyi Ji, Kimmo Kartasalo, Abdulbasit K. Al-Talabani, Nita Mulliqi

Comments: 13 pages, 2 figures and 1 table

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[258] arXiv:2512.03852 [pdf, ps, other]: Title: Traffic Image Restoration under Adverse Weather via Frequency-Aware Mamba

Authors: Liwen Pan, Longguang Wang, Guangwei Gao, Jun Wang, Jun Shi, Juncheng Li

Comments: 12pages, 13 figures, 5tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[259] arXiv:2512.03848 [pdf, ps, other]: Title: PULSE: A Unified Multi-Task Architecture for Cardiac Segmentation, Diagnosis, and Few-Shot Cross-Modality Clinical Adaptation

Authors: Hania Ghouse, Maryam Alsharqi, Farhad R. Nezami, Muzammil Behzad

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[260] arXiv:2512.03844 [pdf, ps, other]: Title: CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation

Authors: Letian Zhou, Songhua Liu, Xinchao Wang

Comments: 34 pages, 24 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[261] arXiv:2512.03837 [pdf, ps, other]: Title: Heatmap Pooling Network for Action Recognition from RGB Videos

Authors: Mengyuan Liu, Jinfu Liu, Yongkang Jiang, Bin He

Comments: Final Version of IEEE Transactions on Pattern Analysis and Machine Intelligence

Journal-ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[262] arXiv:2512.03834 [pdf, ps, other]: Title: Lean Unet: A Compact Model for Image Segmentation

Authors: Ture Hassler, Ida Åkerholm, Marcus Nordström, Gabriele Balletti, Orcun Goksel

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[263] arXiv:2512.03827 [pdf, ps, other]: Title: A Robust Camera-based Method for Breath Rate Measurement

Authors: Alexey Protopopov

Comments: 9 pages, 4 figures, 2 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[264] arXiv:2512.03817 [pdf, ps, other]: Title: HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to English

Authors: Ahmed Nasser, Marwan Mohamed, Alaa Sherif, Basmala Mahmoud, Shereen Yehia, Asmaa Saad, Mariam S. El-Rahmany, Ensaf H. Mohamed

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[265] arXiv:2512.03796 [pdf, ps, other]: Title: LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling

Authors: Hong-Kai Zheng, Piji Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[266] arXiv:2512.03794 [pdf, ps, other]: Title: AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

Authors: Zichuan Lin, Yicheng Liu, Yang Yang, Lvfang Tao, Deheng Ye

Comments: 15 pages, 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[267] arXiv:2512.03751 [pdf, ps, other]: Title: Research on Brain Tumor Classification Method Based on Improved ResNet34 Network

Authors: Yufeng Li, Wenchao Zhao, Bo Dang, Weimin Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[268] arXiv:2512.03749 [pdf, ps, other]: Title: Fully Unsupervised Self-debiasing of Text-to-Image Diffusion Models

Authors: Korada Sri Vardhana, Shrikrishna Lolla, Soma Biswas

Comments: Accepted at WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[269] arXiv:2512.03746 [pdf, ps, other]: Title: Thinking with Programming Vision: Towards a Unified View for Thinking with Images

Authors: Zirun Guo, Minjie Hong, Feng Zhang, Kai Jia, Tao Jin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[270] arXiv:2512.03745 [pdf, ps, other]: Title: Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification

Authors: Jiaze Li, Yan Lu, Bin Liu, Guojun Yin, Mang Ye

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[271] arXiv:2512.03730 [pdf, ps, other]: Title: Out-of-the-box: Black-box Causal Attacks on Object Detectors

Authors: Melane Navaratnarajah, David A. Kelly, Hana Chockler

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[272] arXiv:2512.03724 [pdf, ps, other]: Title: PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention

Authors: Ziwen Li, Xin Wang, Hanlue Zhang, Runnan Chen, Runqi Lin, Xiao He, Han Huang, Yandong Guo, Fakhri Karray, Tongliang Liu, Mingming Gong

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[273] arXiv:2512.03715 [pdf, ps, other]: Title: DINO-RotateMatch: A Rotation-Aware Deep Framework for Robust Image Matching in Large-Scale 3D Reconstruction

Authors: Kaichen Zhang, Tianxiang Sheng, Xuanming Shi

Comments: 9 pages, 5 figures, 1 table

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[274] arXiv:2512.03701 [pdf, ps, other]: Title: Structured Uncertainty Similarity Score (SUSS): Learning a Probabilistic, Interpretable, Perceptual Metric Between Images

Authors: Paula Seidler, Neill D. F. Campbell, Ivor J A Simpson

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[275] arXiv:2512.03687 [pdf, ps, other]: Title: Active Visual Perception: Opportunities and Challenges

Authors: Yian Li, Xiaoyu Guo, Hao Zhang, Shuiwang Li, Xiaowei Dai

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[276] arXiv:2512.03683 [pdf, ps, other]: Title: GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces

Authors: Melis Ocal, Xiaoyan Xing, Yue Li, Ngo Anh Vien, Sezer Karaoglu, Theo Gevers

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[277] arXiv:2512.03673 [pdf, ps, other]: Title: ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

Authors: Feice Huang, Zuliang Han, Xing Zhou, Yihuang Chen, Lifei Zhu, Haoqian Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[278] arXiv:2512.03667 [pdf, ps, other]: Title: Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning

Authors: Ge-Peng Ji, Jingyi Liu, Deng-Ping Fan, Nick Barnes

Comments: Technical report

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[279] arXiv:2512.03666 [pdf, ps, other]: Title: ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos

Authors: Qi'ao Xu, Tianwen Qian, Yuqian Fu, Kailing Li, Yang Jiao, Jiacheng Zhang, Xiaoling Wang, Liang He

Comments: 26 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[280] arXiv:2512.03663 [pdf, ps, other]: Title: Multi-Scale Visual Prompting for Lightweight Small-Image Classification

Authors: Salim Khazem

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[281] arXiv:2512.03643 [pdf, ps, other]: Title: Optical Context Compression Is Just (Bad) Autoencoding

Authors: Ivan Yee Lee, Cheng Yang, Taylor Berg-Kirkpatrick

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[282] arXiv:2512.03640 [pdf, ps, other]: Title: MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention Mechanisms

Authors: Jiahao Zhang, Xiao Zhao, Guangyu Gao

Journal-ref: MultiMedia Modeling. MMM 2025. Lecture Notes in Computer Science, vol 15521. Springer, Singapore

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[283] arXiv:2512.03625 [pdf, ps, other]: Title: FeatureLens: A Highly Generalizable and Interpretable Framework for Detecting Adversarial Examples Based on Image Features

Authors: Zhigang Yang, Yuan Liu, Jiawei Zhang, Puning Zhang, Xinqiang Ma

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[284] arXiv:2512.03621 [pdf, ps, other]: Title: ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation

Authors: Yaokun Li, Shuaixian Wang, Mantang Guo, Jiehui Huang, Taojun Ding, Mu Hu, Kaixuan Wang, Shaojie Shen, Guang Tan

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[285] arXiv:2512.03619 [pdf, ps, other]: Title: LAMP: Language-Assisted Motion Planning for Controllable Video Generation

Authors: Muhammed Burak Kizil, Enes Sanli, Niloy J. Mitra, Erkut Erdem, Aykut Erdem, Duygu Ceylan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[286] arXiv:2512.03601 [pdf, ps, other]: Title: Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding

Authors: Haoran Zhou, Gim Hee Lee

Comments: Accepted to NeurIPS 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[287] arXiv:2512.03598 [pdf, ps, other]: Title: Memory-Guided Point Cloud Completion for Dental Reconstruction

Authors: Jianan Sun, Yukang Huang, Dongzhihan Wang, Mingyu Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[288] arXiv:2512.03597 [pdf, ps, other]: Title: HBFormer: A Hybrid-Bridge Transformer for Microtumor and Miniature Organ Segmentation

Authors: Fuchen Zheng, Xinyi Chen, Weixuan Li, Quanjun Li, Junhua Zhou, Xiaojiao Guo, Xuhang Chen, Chi-Man Pun, Shoujun Zhou

Comments: 6 pages, 4 figures, 3 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[289] arXiv:2512.03593 [pdf, ps, other]: Title: CloseUpAvatar: High-Fidelity Animatable Full-Body Avatars with Mixture of Multi-Scale Textures

Authors: David Svitov, Pietro Morerio, Lourdes Agapito, Alessio Del Bue

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[290] arXiv:2512.03592 [pdf, ps, other]: Title: Harnessing Hypergraphs in Geometric Deep Learning for 3D RNA Inverse Folding

Authors: Guang Yang, Lei Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[291] arXiv:2512.03590 [pdf, ps, other]: Title: Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video Interpolation

Authors: Yuchen Deng, Xiuyang Wu, Hai-Tao Zheng, Jie Wang, Feidiao Yang, Yuxing Han

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[292] arXiv:2512.03580 [pdf, ps, other]: Title: Dynamic Optical Test for Bot Identification (DOT-BI): A simple check to identify bots in surveys and online processes

Authors: Malte Bleeker, Mauro Gotsch

Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
[293] arXiv:2512.03577 [pdf, ps, other]: Title: Cross-Stain Contrastive Learning for Paired Immunohistochemistry and Histopathology Slide Representation Learning

Authors: Yizhi Zhang, Lei Fan, Zhulin Tao, Donglin Di, Yang Song, Sidong Liu, Cong Cong

Comments: 6 pages, 2 figures. Camera-ready version accepted for IEEE BIBM 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[294] arXiv:2512.03575 [pdf, ps, other]: Title: UniComp: Rethinking Video Compression Through Informational Uniqueness

Authors: Chao Yuan, Shimin Chen, Minliang Lin, Limeng Qiao, Guanglu Wan, Lin Ma

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[295] arXiv:2512.03574 [pdf, ps, other]: Title: Global-Local Aware Scene Text Editing

Authors: Fuxiang Yang, Tonghua Su, Donglin Di, Yin Chen, Xiangqian Wu, Zhongjie Wang, Lei Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[296] arXiv:2512.03566 [pdf, ps, other]: Title: GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models

Authors: Hao Sun, Lei Fan, Donglin Di, Shaohui Liu

Comments: Accepted by ACM MM Asia2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[297] arXiv:2512.03558 [pdf, ps, other]: Title: CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding

Authors: Huy Quang Ung, Guillaume Habault, Yasutaka Nishimura, Hao Niu, Roberto Legaspi, Tomoki Oya, Ryoichi Kojima, Masato Taya, Chihiro Ono, Atsunori Minamikawa, Yan Liu

Comments: Accepted at SIGSPATIAL 2025 (Best paper candidates), 15 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[298] arXiv:2512.03553 [pdf, ps, other]: Title: Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching

Authors: Wei Chee Yew, Hailun Xu, Sanjay Saha, Xiaotian Fan, Hiok Hian Ong, David Yuchen Wang, Kanchan Sarkar, Zhenheng Yang, Danhui Guan

Comments: Accepted at KDD 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[299] arXiv:2512.03542 [pdf, ps, other]: Title: V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention

Authors: Nan Sun, Zhenyu Zhang, Xixun Lin, Kun Wang, Yanmin Shang, Naibin Gu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang, Yanan Cao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[300] arXiv:2512.03540 [pdf, ps, other]: Title: CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation

Authors: Ruoxuan Zhang, Bin Wen, Hongxia Xie, Yi Yao, Songhan Zuo, Jian-Yu Jiang-Lin, Hong-Han Shuai, Wen-Huang Cheng

Comments: Accepted by ACM Multimedia 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[301] arXiv:2512.03534 [pdf, ps, other]: Title: Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Authors: Subin Kim, Sangwoo Mo, Mamshad Nayeem Rizve, Yiran Xu, Difan Liu, Jinwoo Shin, Tobias Hinz

Comments: Visualizations are available at the website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[302] arXiv:2512.03532 [pdf, ps, other]: Title: OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation

Authors: Zhishan Zhou, Siyuan Wei, Zengran Wang, Chunjie Wang, Xiaosheng Yan, Xiao Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[303] arXiv:2512.03520 [pdf, ps, other]: Title: FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation

Authors: Yiyi Cai, Yuhan Wu, Kunhang Li, You Zhou, Bo Zheng, Haiyang Liu

Comments: 15 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[304] arXiv:2512.03510 [pdf, ps, other]: Title: CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving

Authors: Zhijian Qiao, Zehuan Yu, Tong Li, Chih-Chung Chou, Wenchao Ding, Shaojie Shen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[305] arXiv:2512.03509 [pdf, ps, other]: Title: AfroBeats Dance Movement Analysis Using Computer Vision: A Proof-of-Concept Framework Combining YOLO and Segment Anything Model

Authors: Kwaku Opoku-Ware, Gideon Opoku

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[306] arXiv:2512.03508 [pdf, ps, other]: Title: Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation

Authors: Seogkyu Jeon, Kibeom Hong, Hyeran Byun

Comments: ICCV 2025 (poster)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[307] arXiv:2512.03500 [pdf, ps, other]: Title: EEA: Exploration-Exploitation Agent for Long Video Understanding

Authors: Te Yang, Xiangyu Zhu, Bo Wang, Quan Chen, Peng Jiang, Zhen Lei

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[308] arXiv:2512.03499 [pdf, ps, other]: Title: NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

Authors: Renqi Chen, Haoyang Su, Shixiang Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[309] arXiv:2512.03479 [pdf, ps, other]: Title: Towards Object-centric Understanding for Instructional Videos

Authors: Wenliang Guo, Yu Kong

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[310] arXiv:2512.03477 [pdf, ps, other]: Title: Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma Diagnosis

Authors: Zijian Gu, Yuxi Liu, Zhenhao Zhang, Song Wang

Comments: 10 pages, 3 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[311] arXiv:2512.03474 [pdf, ps, other]: Title: Procedural Mistake Detection via Action Effect Modeling

Authors: Wenliang Guo, Yujiang Pu, Yu Kong

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[312] arXiv:2512.03470 [pdf, ps, other]: Title: Difference Decomposition Networks for Infrared Small Target Detection

Authors: Chen Hu, Mingyu Zhou, Shuai Yuan, Hongbo Hu, Xiangyu Qiu, Junhai Luo, Tian Pu, Xiyin Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[313] arXiv:2512.03463 [pdf, ps, other]: Title: Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models

Authors: Shojiro Yamabe, Futa Waseda, Daiki Shiono, Tsubasa Takahashi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[314] arXiv:2512.03454 [pdf, ps, other]: Title: Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles

Authors: Haicheng Liao, Huanming Shen, Bonan Wang, Yongkang Li, Yihong Tang, Chengyue Wang, Dingyi Zhuang, Kehua Chen, Hai Yang, Chengzhong Xu, Zhenning Li

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[315] arXiv:2512.03453 [pdf, ps, other]: Title: GeoVideo: Introducing Geometric Regularization into Video Generation Model

Authors: Yunpeng Bai, Shaoheng Fang, Chaohui Yu, Fan Wang, Qixing Huang

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[316] arXiv:2512.03451 [pdf, ps, other]: Title: GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers

Authors: Zhiye Song, Steve Dai, Ben Keller, Brucek Khailany

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[317] arXiv:2512.03450 [pdf, ps, other]: Title: KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models

Authors: Rhys Newbury, Juyan Zhang, Tin Tran, Hanna Kurniawati, Dana Kulić

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[318] arXiv:2512.03449 [src]: Title: LM-CartSeg: Automated Segmentation of Lateral and Medial Cartilage and Subchondral Bone for Radiomics Analysis

Authors: Tongxu Zhang

Comments: The manuscript represents only a preliminary and substantially incompleted exploration. The author has decided not to stand by these results, and a thoroughly revised and significantly different version will be developed separately. Therefore this version is withdrawn and should not be cited

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[319] arXiv:2512.03445 [pdf, ps, other]: Title: Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation

Authors: Xieji Li, Siyuan Yan, Yingsheng Liu, H. Peter Soyer, Monika Janda, Victoria Mar, Zongyuan Ge

Comments: 10 pages. Under Review

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[320] arXiv:2512.03430 [pdf, ps, other]: Title: Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion Features

Authors: Yuzhen Hu, Biplab Banerjee, Saurabh Prasad

Comments: Accepted to the ICML 2025 TerraBytes Workshop (June 9, 2025)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[321] arXiv:2512.03427 [pdf, ps, other]: Title: Generalization Evaluation of Deep Stereo Matching Methods for UAV-Based Forestry Applications

Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[322] arXiv:2512.03424 [pdf, ps, other]: Title: DM3D: Deformable Mamba via Offset-Guided Gaussian Sequencing for Point Cloud Understanding

Authors: Bin Liu, Chunyang Wang, Xuelian Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[323] arXiv:2512.03418 [pdf, ps, other]: Title: YOLOA: Real-Time Affordance Detection via LLM Adapter

Authors: Yuqi Ji, Junjie Ke, Lihuo He, Jun Liu, Kaifan Zhang, Yu-Kun Lai, Guiguang Ding, Xinbo Gao

Comments: 13 pages, 9 figures, conference

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[324] arXiv:2512.03405 [pdf, ps, other]: Title: ViDiC: Video Difference Captioning

Authors: Jiangtao Wu, Shihao Li, Zhaozhou Bian, Jialu Chen, Runzhe Wen, An Ping, Yiwen He, Jiakai Wang, Yuanxing Zhang, Jiaheng Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[325] arXiv:2512.03404 [pdf, ps, other]: Title: MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification

Authors: Yujian Zhao, Hankun Liu, Guanglin Niu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[326] arXiv:2512.03370 [pdf, ps, other]: Title: ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding

Authors: Lingjun Zhao, Yandong Luo, James Hay, Lu Gan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[327] arXiv:2512.03369 [pdf, ps, other]: Title: FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread Forecasting

Authors: Nan Zhou, Huandong Wang, Jiahao Li, Han Li, Yali Song, Qiuhua Wang, Yong Li, Xinlei Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[328] arXiv:2512.03359 [pdf, ps, other]: Title: A Hybrid Deep Learning Framework with Explainable AI for Lung Cancer Classification with DenseNet169 and SVM

Authors: Md Rashidul Islam, Bakary Gibba, Altagi Abdallah Bakheit Abdelgadir

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[329] arXiv:2512.03350 [pdf, ps, other]: Title: SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation

Authors: Yu Yuan, Tharindu Wickremasinghe, Zeeshan Nadir, Xijun Wang, Yiheng Chi, Stanley H. Chan

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[330] arXiv:2512.03346 [pdf, ps, other]: Title: Hierarchical Attention for Sparse Volumetric Anomaly Detection in Subclinical Keratoconus

Authors: Lynn Kandakji, William Woof, Nikolas Pontikos

Comments: 16 pages, 7 figures, 6 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[331] arXiv:2512.03345 [pdf, ps, other]: Title: HalluGen: Synthesizing Realistic and Controllable Hallucinations for Evaluating Image Restoration

Authors: Seunghoi Kim, Henry F. J. Tregidgo, Chen Jin, Matteo Figini, Daniel C. Alexander

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[332] arXiv:2512.03339 [pdf, ps, other]: Title: ProtoEFNet: Dynamic Prototype Learning for Inherently Interpretable Ejection Fraction Estimation in Echocardiography

Authors: Yeganeh Ghamary, Victoria Wu, Hooman Vaseli, Christina Luong, Teresa Tsang, Siavash Bigdeli, Purang Abolmaesumi

Comments: 11 pages, Accepted in IMIMIC Workshop at MICCAI 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[333] arXiv:2512.03335 [pdf, ps, other]: Title: Step-by-step Layered Design Generation

Authors: Faizan Farooq Khan, K J Joseph, Koustava Goswami, Mohamed Elhoseiny, Balaji Vasan Srinivasan

Journal-ref: AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[334] arXiv:2512.03317 [pdf, ps, other]: Title: NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map Construction

Authors: Thomas Monninger, Zihan Zhang, Steffen Staab, Sihao Ding

Comments: Accepted to 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[335] arXiv:2512.03284 [pdf, ps, other]: Title: SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding

Authors: Hongpei Zheng, Shijie Li, Yanran Li, Hujun Yin

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[336] arXiv:2512.03257 [pdf, ps, other]: Title: PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing Imagery

Authors: Mark Moussa, Andre Williams, Seth Roffe, Douglas Morton

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[337] arXiv:2512.03247 [pdf, ps, other]: Title: PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement

Authors: Haitian Zheng, Yuan Yao, Yongsheng Yu, Yuqian Zhou, Jiebo Luo, Zhe Lin

Comments: Published in the Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[338] arXiv:2512.03245 [pdf, ps, other]: Title: 2-Shots in the Dark: Low-Light Denoising with Minimal Data Acquisition

Authors: Liying Lu, Raphaël Achddou, Sabine Süsstrunk

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[339] arXiv:2512.03237 [pdf, ps, other]: Title: LLM-Guided Material Inference for 3D Point Clouds

Authors: Nafiseh Izadyar, Teseo Schneider

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[340] arXiv:2512.03233 [pdf, ps, other]: Title: Object Counting with GPT-4o and GPT-5: A Comparative Study

Authors: Richard Füzesséry, Kaziwa Saleh, Sándor Szénási, Zoltán Vámossy

Comments: 5 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[341] arXiv:2512.03210 [pdf, ps, other]: Title: Flux4D: Flow-based Unsupervised 4D Reconstruction

Authors: Jingkang Wang, Henry Che, Yun Chen, Ze Yang, Lily Goli, Sivabalan Manivasagam, Raquel Urtasun

Comments: NeurIPS 2025. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[342] arXiv:2512.03199 [pdf, ps, other]: Title: Does Head Pose Correction Improve Biometric Facial Recognition?

Authors: Justin Norman, Hany Farid

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[343] arXiv:2512.03182 [pdf, ps, other]: Title: Drainage: A Unifying Framework for Addressing Class Uncertainty

Authors: Yasser Taha, Grégoire Montavon, Nils Körber

Comments: 16 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[344] arXiv:2512.03126 [pdf, ps, other]: Title: Hierarchical Process Reward Models are Symbolic Vision Learners

Authors: Shan Zhang, Aotian Chen, Kai Zou, Jindong Gu, Yuan Xue, Anton van den Hengel

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[345] arXiv:2512.04076 (cross-list from cs.GR) [pdf, ps, other]: Title: Radiance Meshes for Volumetric Reconstruction

Authors: Alexander Mai, Trevor Hedstrom, George Kopanas, Janne Kontkanen, Falko Kuester, Jonathan T. Barron

Comments: Website: half-potato.gitlab.io/rm

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[346] arXiv:2512.04032 (cross-list from cs.CL) [pdf, ps, other]: Title: Jina-VLM: Small Multilingual Vision Language Model

Authors: Andreas Koukounas, Georgios Mastrapas, Florian Hönicke, Sedigheh Eslami, Guillaume Roncari, Scott Martens, Han Xiao

Comments: 18 pages, 1-7 main content, 13-18 appendix for tables and dataset

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[347] arXiv:2512.03995 (cross-list from cs.RO) [pdf, ps, other]: Title: Artificial Microsaccade Compensation: Stable Vision for an Ornithopter

Authors: Levi Burner, Guido de Croon, Yiannis Aloimonos

Comments: 29 pages, 5 figures, 2 tables, under review

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[348] arXiv:2512.03962 (cross-list from eess.IV) [pdf, ps, other]: Title: Tada-DIP: Input-adaptive Deep Image Prior for One-shot 3D Image Reconstruction

Authors: Evan Bell, Shijun Liang, Ismail Alkhouri, Saiprasad Ravishankar

Comments: 6 pages, 8 figures, 2025 Asilomar Conference on Signals, Systems, and Computers. Code is available at github.com/evanbell02/Tada-DIP/

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[349] arXiv:2512.03656 (cross-list from cs.LG) [pdf, ps, other]: Title: Cyclical Temporal Encoding and Hybrid Deep Ensembles for Multistep Energy Forecasting

Authors: Salim Khazem, Houssam Kanso

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[350] arXiv:2512.03556 (cross-list from cs.RO) [pdf, ps, other]: Title: RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL

Authors: Yinzhou Tang, Yu Shang, Yinuo Chen, Bingwen Wei, Xin Zhang, Shu'ang Yu, Liangzhi Shi, Chao Yu, Chen Gao, Wei Wu, Yong Li

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[351] arXiv:2512.03522 (cross-list from cs.RO) [pdf, ps, other]: Title: MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global Localization

Authors: Gihyeon Lee, Jungwoo Lee, Juwon Kim, Young-Sik Shin, Younggun Cho

Comments: Accepted in IEEE Robotics and Automation Letters (2025)

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[352] arXiv:2512.03514 (cross-list from cs.IR) [pdf, ps, other]: Title: M3DR: Towards Universal Multilingual Multimodal Document Retrieval

Authors: Adithya S Kolavi, Vyoman Jain

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[353] arXiv:2512.03422 (cross-list from cs.RO) [pdf, ps, other]: Title: What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models

Authors: Tianchen Deng, Yue Pan, Shenghai Yuan, Dong Li, Chen Wang, Mingrui Li, Long Chen, Lihua Xie, Danwei Wang, Jingchuan Wang, Javier Civera, Hesheng Wang, Weidong Chen

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[354] arXiv:2512.03216 (cross-list from physics.ins-det) [pdf, ps, other]: Title: Kaleidoscopic Scintillation Event Imaging

Authors: Alex Bocchieri, John Mamish, David Appleyard, Andreas Velten

Subjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[355] arXiv:2512.03173 (cross-list from cs.CY) [pdf, ps, other]: Title: Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping

Authors: Joan Nwatu, Longju Bai, Oana Ignat, Rada Mihalcea

Journal-ref: AAAI 2026 Social Impact Track

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[356] arXiv:2512.03166 (cross-list from cs.RO) [pdf, ps, other]: Title: Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual Environments

Authors: Aya Taourirte, Md Sohag Mia

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[357] arXiv:2512.03111 (cross-list from q-bio.GN) [pdf, ps, other]: Title: PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer

Authors: Xiaoshui Huang, Tianlin Zhu, Yifan Zuo, Xue Xia, Zonghan Wu, Jiebin Yan, Dingli Hua, Zongyi Xu, Yuming Fang, Jian Zhang

Comments: Accepted by AAAI 2026

Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[358] arXiv:2512.03054 (cross-list from cs.LG) [pdf, ps, other]: Title: Energy-Efficient Federated Learning via Adaptive Encoder Freezing for MRI-to-CT Conversion: A Green AI-Guided Research

Authors: Ciro Benito Raggio, Lucia Migliorelli, Nils Skupien, Mathias Krohmer Zabaleta, Oliver Blanck, Francesco Cicone, Giuseppe Lucio Cascini, Paolo Zaffino, Maria Francesca Spadea

Comments: 22 pages, 13 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Medical Physics (physics.med-ph)
[359] arXiv:2512.03052 (cross-list from cs.GR) [pdf, ps, other]: Title: LATTICE: Democratize High-Fidelity 3D Generation at Scale

Authors: Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Qingxiang Lin, Jingwei Huang, Chunchao Guo, Xiangyu Yue

Comments: Technical Report

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

Wed, 3 Dec 2025

[360] arXiv:2512.03046 [pdf, ps, other]: Title: MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues

Authors: Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Shuailei Ma, Ka Leong Cheng, Wen Wang, Qingyan Bai, Yuxuan Zhang, Yanhong Zeng, Yixuan Li, Xing Zhu, Yujun Shen, Qifeng Chen

Comments: Code and demo available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[361] arXiv:2512.03045 [pdf, ps, other]: Title: CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models

Authors: Minkyung Kwon, Jinhyeok Choi, Jiho Park, Seonghu Jeon, Jinhyuk Jang, Junyoung Seo, Minseop Kwak, Jin-Hwa Kim, Seungryong Kim

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[362] arXiv:2512.03043 [pdf, ps, other]: Title: OneThinker: All-in-one Reasoning Model for Image and Video

Authors: Kaituo Feng, Manyuan Zhang, Hongyu Li, Kaixuan Fan, Shuang Chen, Yilei Jiang, Dian Zheng, Peiwen Sun, Yiyuan Zhang, Haoze Sun, Yan Feng, Peng Pei, Xunliang Cai, Xiangyu Yue

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[363] arXiv:2512.03042 [pdf, ps, other]: Title: PPTArena: A Benchmark for Agentic PowerPoint Editing

Authors: Michael Ofengenden, Yunze Man, Ziqi Pang, Yu-Xiong Wang

Comments: 25 pages, 26 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[364] arXiv:2512.03041 [pdf, ps, other]: Title: MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Authors: Qinghe Wang, Xiaoyu Shi, Baolu Li, Weikang Bian, Quande Liu, Huchuan Lu, Xintao Wang, Pengfei Wan, Kun Gai, Xu Jia

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[365] arXiv:2512.03040 [pdf, ps, other]: Title: Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation

Authors: Zeqi Xiao, Yiwei Zhao, Lingxiao Li, Yushi Lan, Yu Ning, Rahul Garg, Roshni Cooper, Mohammad H. Taghavi, Xingang Pan

Comments: Project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[366] arXiv:2512.03036 [pdf, ps, other]: Title: ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

Authors: Mengchen Zhang, Qi Chen, Tong Wu, Zihan Liu, Dahua Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[367] arXiv:2512.03034 [pdf, ps, other]: Title: MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

Authors: Youxin Pang, Jiajun Liu, Lingfeng Tan, Yong Zhang, Feng Gao, Xiang Deng, Zhuoliang Kang, Xiaoming Wei, Yebin Liu

Comments: Our project website is this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[368] arXiv:2512.03020 [pdf, ps, other]: Title: Unrolled Networks are Conditional Probability Flows in MRI Reconstruction

Authors: Kehan Qi, Saumya Gupta, Qingqiao Hu, Weimin Lyu, Chao Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[369] arXiv:2512.03018 [pdf, ps, other]: Title: AutoBrep: Autoregressive B-Rep Generation with Unified Topology and Geometry

Authors: Xiang Xu, Pradeep Kumar Jayaraman, Joseph G. Lambourne, Yilin Liu, Durvesh Malpure, Pete Meltzer

Comments: Accepted to Siggraph Asia 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[370] arXiv:2512.03014 [pdf, ps, other]: Title: Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks

Authors: Matthew Dutson, Nathan Labiosa, Yin Li, Mohit Gupta

Comments: NeurIPS 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[371] arXiv:2512.03013 [pdf, ps, other]: Title: In-Context Sync-LoRA for Portrait Video Editing

Authors: Sagi Polaczek, Or Patashnik, Ali Mahdavi-Amiri, Daniel Cohen-Or

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[372] arXiv:2512.03010 [pdf, ps, other]: Title: SurfFill: Completion of LiDAR Point Clouds via Gaussian Surfel Splatting

Authors: Svenja Strobel, Matthias Innmann, Bernhard Egger, Marc Stamminger, Linus Franke

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
[373] arXiv:2512.03004 [pdf, ps, other]: Title: DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images

Authors: Xiaoxue Chen, Ziyi Xiong, Yuantao Chen, Gen Li, Nan Wang, Hongcheng Luo, Long Chen, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Hongyang Li, Ya-Qin Zhang, Hao Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[374] arXiv:2512.03000 [pdf, ps, other]: Title: DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

Authors: Kairun Wen, Yuzhi Huang, Runyu Chen, Hui Zheng, Yunlong Lin, Panwang Pan, Chenxin Li, Wenyan Cong, Jian Zhang, Junbin Lu, Chenguo Lin, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Yue Huang, Xinghao Ding, Rakesh Ranjan, Zhiwen Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[375] arXiv:2512.02993 [pdf, ps, other]: Title: TEXTRIX: Latent Attribute Grid for Native Texture Generation and Beyond

Authors: Yifei Zeng, Yajie Bao, Jiachen Qian, Shuang Wu, Youtian Lin, Hao Zhu, Buyu Li, Feihu Zhang, Xun Cao, Yao Yao

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[376] arXiv:2512.02991 [pdf, ps, other]: Title: GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection

Authors: Md Sohag Mia, Md Nahid Hasan, Tawhid Ahmed, Muhammad Abdullah Adnan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[377] arXiv:2512.02982 [pdf, ps, other]: Title: U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Authors: Xiang Xu, Ao Liang, Youquan Liu, Linfeng Li, Lingdong Kong, Ziwei Liu, Qingshan Liu

Comments: Preprint; 19 pages, 7 figures, 8 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[378] arXiv:2512.02981 [pdf, ps, other]: Title: InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration

Authors: Zhongyu Yang, Yingfang Yuan, Xuanming Jiang, Baoyi An, Wei Pang

Comments: Published in AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[379] arXiv:2512.02973 [pdf, ps, other]: Title: Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Authors: Yuan Xiong, Ziqi Miao, Lijun Li, Chen Qian, Jie Li, Jing Shao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[380] arXiv:2512.02972 [pdf, ps, other]: Title: BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection

Authors: Guowen Zhang, Chenhang He, Liyi Chen, Lei Zhang

Comments: Accept by AAAI26

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[381] arXiv:2512.02965 [pdf, ps, other]: Title: A Lightweight Real-Time Low-Light Enhancement Network for Embedded Automotive Vision Systems

Authors: Yuhan Chen, Yicui Shi, Guofa Li, Guangrui Bai, Jinyuan Shao, Xiangfei Huang, Wenbo Chu, Keqiang Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[382] arXiv:2512.02952 [pdf, ps, other]: Title: Layout Anything: One Transformer for Universal Room Layout Estimation

Authors: Md Sohag Mia, Muhammad Abdullah Adnan

Comments: Published at WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[383] arXiv:2512.02942 [pdf, ps, other]: Title: Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench

Authors: Lanxiang Hu, Abhilash Shankarampeta, Yixin Huang, Zilin Dai, Haoyang Yu, Yujie Zhao, Haoqiang Kang, Daniel Zhao, Tajana Rosing, Hao Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[384] arXiv:2512.02933 [pdf, ps, other]: Title: LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization

Authors: Zhihan Xiao, Lin Liu, Yixin Gao, Xiaopeng Zhang, Haoxuan Che, Songping Mai, Qi Tian

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[385] arXiv:2512.02932 [pdf, ps, other]: Title: EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View Synthesis

Authors: Yancheng Zhang, Guangyu Sun, Chen Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[386] arXiv:2512.02931 [pdf, ps, other]: Title: DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation

Authors: Ying Yang, Zhengyao Lv, Tianlin Pan, Haofan Wang, Binxin Yang, Hubery Yin, Chen Li, Chenyang Si

Comments: 23 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[387] arXiv:2512.02906 [pdf, ps, other]: Title: MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding

Authors: Fan Yang, Kaihao Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[388] arXiv:2512.02899 [pdf, ps, other]: Title: Glance: Accelerating Diffusion Models with 1 Sample

Authors: Zhuobai Dong, Rui Zhao, Songjie Wu, Junchao Yi, Linjie Li, Zhengyuan Yang, Lijuan Wang, Alex Jinpeng Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[389] arXiv:2512.02897 [pdf, ps, other]: Title: Polar Perspectives: Evaluating 2-D LiDAR Projections for Robust Place Recognition with Visual Foundation Models

Authors: Pierpaolo Serio, Giulio Pisaneschi, Andrea Dan Ryals, Vincenzo Infantino, Lorenzo Gentilini, Valentina Donzella, Lorenzo Pollini

Comments: 13 Pages, 5 Figures, 2 Tables Under Review

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[390] arXiv:2512.02895 [pdf, ps, other]: Title: MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm

Authors: Wei Chen, Chaoqun Du, Feng Gu, Wei He, Qizhen Li, Zide Liu, Xuhao Pan, Chang Ren, Xudong Rao, Chenfeng Wang, Tao Wei, Chengjun Yu, Pengfei Yu, Yufei Zheng, Chunpeng Zhou, Pan Zhou, Xuhan Zhu

Comments: 33 pages, 14 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[391] arXiv:2512.02870 [pdf, ps, other]: Title: Taming Camera-Controlled Video Generation with Verifiable Geometry Reward

Authors: Zhaoqing Wang, Xiaobo Xia, Zhuolin Bie, Jinlin Liu, Dongdong Yu, Jia-Wang Bian, Changhu Wang

Comments: 11 pages, 4 figures, 7 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[392] arXiv:2512.02867 [pdf, ps, other]: Title: MICCAI STSR 2025 Challenge: Semi-Supervised Teeth and Pulp Segmentation and CBCT-IOS Registration

Authors: Yaqi Wang, Zhi Li, Chengyu Wu, Jun Liu, Yifan Zhang, Jialuo Chen, Jiaxue Ni, Qian Luo, Jin Liu, Can Han, Changkai Ji, Zhi Qin Tan, Ajo Babu George, Liangyu Chen, Qianni Zhang, Dahong Qian, Shuai Wang, Huiyu Zhou

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[393] arXiv:2512.02860 [pdf, ps, other]: Title: RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association

Authors: Abdul Hannan, Furqan Malik, Hina Jabbar, Syed Suleman Sadiq, Mubashir Noman

Comments: Ranked 3rd in Fame 2026 Challenge, ICASSP

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[394] arXiv:2512.02850 [pdf, ps, other]: Title: Are Detectors Fair to Indian IP-AIGC? A Cross-Generator Study

Authors: Vishal Dubey, Pallavi Tyagi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[395] arXiv:2512.02846 [pdf, ps, other]: Title: Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?

Authors: Manuel Benavent-Lledo, Konstantinos Bacharidis, Victoria Manousaki, Konstantinos Papoutsakis, Antonis Argyros, Jose Garcia-Rodriguez

Comments: Accepted in WACV 2026 - Applications Track

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[396] arXiv:2512.02835 [pdf, ps, other]: Title: ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Authors: Yifan Li, Yingda Yin, Lingting Zhu, Weikai Chen, Shengju Qian, Xin Wang, Yanwei Fu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[397] arXiv:2512.02830 [pdf, ps, other]: Title: Defense That Attacks: How Robust Models Become Better Attackers

Authors: Mohamed Awad, Mahmoud Akrm, Walid Gomaa

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[398] arXiv:2512.02794 [pdf, ps, other]: Title: PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation

Authors: Fan Wu, Cheng Chen, Zhoujie Fu, Jiacheng Wei, Yi Xu, Deheng Ye, Guosheng Lin

Comments: codes:this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[399] arXiv:2512.02793 [pdf, ps, other]: Title: IC-World: In-Context Generation for Shared World Modeling

Authors: Fan Wu, Jiacheng Wei, Ruibo Li, Yi Xu, Junyou Li, Deheng Ye, Guosheng Lin

Comments: codes:this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[400] arXiv:2512.02792 [pdf, ps, other]: Title: HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

Authors: Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, Weili Guan

Comments: Accepted by ACM MM 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[401] arXiv:2512.02790 [pdf, ps, other]: Title: UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits

Authors: Keming Ye, Zhipeng Huang, Canmiao Fu, Qingyang Liu, Jiani Cai, Zheqi Lv, Chen Li, Jing Lyu, Zhou Zhao, Shengyu Zhang

Comments: 31 pages, 15 figures, 12 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[402] arXiv:2512.02789 [pdf, ps, other]: Title: TrackNetV5: Residual-Driven Spatio-Temporal Refinement and Motion Direction Decoupling for Fast Object Tracking

Authors: Tang Haonan, Chen Yanjun, Jiang Lezhi

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[403] arXiv:2512.02781 [pdf, ps, other]: Title: LumiX: Structured and Coherent Text-to-Intrinsic Generation

Authors: Xu Han, Biao Zhang, Xiangjun Tang, Xianzhi Li, Peter Wonka

Comments: The code will be available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[404] arXiv:2512.02780 [pdf, ps, other]: Title: Rethinking Surgical Smoke: A Smoke-Type-Aware Laparoscopic Video Desmoking Method and Dataset

Authors: Qifan Liang, Junlin Li, Zhen Han, Xihao Wang, Zhongyuan Wang, Bin Mei

Comments: 12 pages, 15 figures. Accepted to AAAI-26 (Main Technical Track)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[405] arXiv:2512.02751 [pdf, ps, other]: Title: AttMetNet: Attention-Enhanced Deep Neural Network for Methane Plume Detection in Sentinel-2 Satellite Imagery

Authors: Rakib Ahsan, MD Sadik Hossain Shanto, Md Sultanul Arifin, Tanzima Hashem

Comments: 15 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[406] arXiv:2512.02743 [pdf, ps, other]: Title: Reasoning-Aware Multimodal Fusion for Hateful Video Detection

Authors: Shuonan Yang, Tailin Chen, Jiangbei Yue, Guangliang Cheng, Jianbo Jiao, Zeyu Fu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[407] arXiv:2512.02737 [pdf, ps, other]: Title: Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone

Authors: Tristan Amadei, Enric Meinhardt-Llopis, Benedicte Bascle, Corentin Abgrall, Gabriele Facciolo

Comments: Accepted at WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[408] arXiv:2512.02727 [pdf, ps, other]: Title: DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions

Authors: Yifan Zhou, Takehiko Ohkawa, Guwenxiao Zhou, Kanoko Goto, Takumi Hirose, Yusuke Sekikawa, Nakamasa Inoue

Comments: Accepted to WACV 2026. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[409] arXiv:2512.02715 [pdf, ps, other]: Title: GeoViS: Geospatially Rewarded Visual Search for Remote Sensing Visual Grounding

Authors: Peirong Zhang, Yidan Zhang, Luxiao Xu, Jinliang Lin, Zonghao Guo, Fengxiang Wang, Xue Yang, Kaiwen Wei, Lei Wang

Comments: 11 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[410] arXiv:2512.02702 [pdf, ps, other]: Title: Tissue-mask supported inter-subject whole-body image registration in the UK Biobank -- A method benchmarking study

Authors: Yasemin Utkueri, Elin Lundström, Håkan Ahlström, Johan Öfverstedt, Joel Kullberg

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[411] arXiv:2512.02700 [pdf, ps, other]: Title: VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm

Authors: Zhenkai Wu, Xiaowen Ma, Zhenliang Ni, Dengming Zhang, Han Shu, Xin Jiang, Xinghao Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[412] arXiv:2512.02697 [pdf, ps, other]: Title: GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization

Authors: Zixuan Song, Jing Zhang, Di Wang, Zidie Zhou, Wenbin Liu, Haonan Guo, En Wang, Bo Du

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[413] arXiv:2512.02696 [pdf, ps, other]: Title: ALDI-ray: Adapting the ALDI Framework for Security X-ray Object Detection

Authors: Omid Reza Heidari, Yang Wang, Xinxin Zuo

Comments: Submitted to ICASSP 2026 Conference

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[414] arXiv:2512.02686 [pdf, ps, other]: Title: ClimaOoD: Improving Anomaly Segmentation via Physically Realistic Synthetic Data

Authors: Yuxing Liu, Yong Liu

Comments: Under review;

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[415] arXiv:2512.02685 [pdf, ps, other]: Title: Unsupervised Structural Scene Decomposition via Foreground-Aware Slot Attention with Pseudo-Mask Guidance

Authors: Huankun Sheng, Ming Li, Yixiang Wei, Yeying Fan, Yu-Hui Wen, Tieliang Gong, Yong-Jin Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[416] arXiv:2512.02681 [pdf, ps, other]: Title: PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution

Authors: Zhongbao Yang, Jiangxin Dong, Yazhou Yao, Jinhui Tang, Jinshan Pan

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[417] arXiv:2512.02668 [pdf, ps, other]: Title: UAUTrack: Towards Unified Multimodal Anti-UAV Visual Tracking

Authors: Qionglin Ren, Dawei Zhang, Chunxu Tian, Dan Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[418] arXiv:2512.02664 [pdf, ps, other]: Title: PolarGuide-GSDR: 3D Gaussian Splatting Driven by Polarization Priors and Deferred Reflection for Real-World Reflective Scenes

Authors: Derui Shan, Qian Qiao, Hao Lu, Tao Du, Peng Lu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[419] arXiv:2512.02660 [pdf, ps, other]: Title: Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation

Authors: Agathoklis Georgiou

Comments: 13 pages, 1 figure, 2 tables. Open-source implementation available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
[420] arXiv:2512.02650 [pdf, ps, other]: Title: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Authors: Junwon Lee, Juhan Nam, Jiyoung Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[421] arXiv:2512.02648 [pdf, ps, other]: Title: PoreTrack3D: A Benchmark for Dynamic 3D Gaussian Splatting in Pore-Scale Facial Trajectory Tracking

Authors: Dong Li, Jiahao Xiong, Yingda Huang, Le Chang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[422] arXiv:2512.02643 [pdf, ps, other]: Title: Leveraging Large-Scale Pretrained Spatial-Spectral Priors for General Zero-Shot Pansharpening

Authors: Yongchuan Cui, Peng Liu, Yi Zeng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[423] arXiv:2512.02624 [pdf, ps, other]: Title: PPTBench: Towards Holistic Evaluation of Large Language Models for PowerPoint Layout and Design Understanding

Authors: Zheng Huang, Xukai Liu, Tianyu Hu, Kai Zhang, Ye Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[424] arXiv:2512.02622 [pdf, ps, other]: Title: RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation Intelligence

Authors: Xuming He, Zehao Fan, Hengjia Li, Fan Zhuo, Hankun Xu, Senlin Cheng, Di Weng, Haifeng Liu, Can Ye, Boxi Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[425] arXiv:2512.02621 [pdf, ps, other]: Title: Content-Aware Texturing for Gaussian Splatting

Authors: Panagiotis Papantonakis, Georgios Kopanas, Fredo Durand, George Drettakis

Comments: Project Page: this https URL

Journal-ref: Eurographics Symposium on Rendering (Symposium Track), 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[426] arXiv:2512.02576 [pdf, ps, other]: Title: Co-speech Gesture Video Generation via Motion-Based Graph Retrieval

Authors: Yafei Song, Peng Zhang, Bang Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[427] arXiv:2512.02566 [pdf, ps, other]: Title: From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature

Authors: Kun Yuan, Min Woo Sun, Zhen Chen, Alejandro Lozano, Xiangteng He, Shi Li, Nassir Navab, Xiaoxiao Sun, Nicolas Padoy, Serena Yeung-Levy

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[428] arXiv:2512.02554 [pdf, ps, other]: Title: OmniPerson: Unified Identity-Preserving Pedestrian Generation

Authors: Changxiao Ma, Chao Yuan, Xincheng Shi, Yuzhuo Ma, Yongfei Zhang, Longkun Zhou, Yujia Zhang, Shangze Li, Yifan Xu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[429] arXiv:2512.02541 [pdf, ps, other]: Title: AVGGT: Rethinking Global Attention for Accelerating VGGT

Authors: Xianbing Sun, Zhikai Zhu, Zhengyu Lou, Bo Yang, Jinyang Tang, Liqing Zhang, He Wang, Jianfu Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[430] arXiv:2512.02536 [pdf, ps, other]: Title: WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens

Authors: Jian Yang, Dacheng Yin, Xiaoxuan He, Yong Li, Fengyun Rao, Jing Lyu, Wei Zhai, Yang Cao, Zheng-Jun Zha

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[431] arXiv:2512.02520 [pdf, ps, other]: Title: On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection

Authors: Tai Le-Gia

Comments: PhD Dissertation

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[432] arXiv:2512.02517 [pdf, ps, other]: Title: SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts

Authors: Jiaqi Liu, Ronghao Fu, Lang Sun, Haoran Liu, Xiao Yang, Weipeng Zhang, Xu Na, Zhuoran Duan, Bo Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[433] arXiv:2512.02512 [pdf, ps, other]: Title: Two-Stage Vision Transformer for Image Restoration: Colorization Pretraining + Residual Upsampling

Authors: Aditya Chaudhary, Prachet Dev Singh, Ankit Jha

Comments: Accepted as a Tiny Paper at the 13th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP 2025), IIT Mandi, India. 3 pages, 1 figure

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[434] arXiv:2512.02505 [pdf, ps, other]: Title: GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding

Authors: Jiaqi Liu, Ronghao Fu, Haoran Liu, Lang Sun, Bo Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[435] arXiv:2512.02498 [pdf, ps, other]: Title: dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model

Authors: Yumeng Li, Guang Yang, Hao Liu, Bowen Wang, Colin Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[436] arXiv:2512.02497 [pdf, ps, other]: Title: A Large Scale Benchmark for Test Time Adaptation Methods in Medical Image Segmentation

Authors: Wenjing Yu, Shuo Jiang, Yifei Chen, Shuo Chang, Yuanhan Wang, Beining Wu, Jie Dong, Mingxuan Liu, Shenghao Zhu, Feiwei Qin, Changmiao Wang, Qiyuan Tian

Comments: 45 pages, 18 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[437] arXiv:2512.02496 [pdf, ps, other]: Title: Attention-guided reference point shifting for Gaussian-mixture-based partial point set registration

Authors: Mizuki Kikkawa, Tatsuya Yatagawa, Yutaka Ohtake, Hiromasa Suzuki

Comments: 16 pages, 9 figures, 7 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[438] arXiv:2512.02492 [pdf, ps, other]: Title: YingVideo-MV: Music-Driven Multi-Stage Video Generation

Authors: Jiahui Chen, Weida Wang, Runhua Shi, Huan Yang, Chaofan Ding, Zihao Chen

Comments: 18 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[439] arXiv:2512.02487 [pdf, ps, other]: Title: Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding

Authors: Yerim Jeon, Miso Lee, WonJun Moon, Jae-Pil Heo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[440] arXiv:2512.02485 [pdf, ps, other]: Title: UCAgents: Unidirectional Convergence for Visual Evidence Anchored Multi-Agent Medical Decision-Making

Authors: Qianhan Feng, Zhongzhen Huang, Yakun Zhu, Xiaofan Zhang, Qi Dou

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[441] arXiv:2512.02482 [pdf, ps, other]: Title: G-SHARP: Gaussian Surgical Hardware Accelerated Real-time Pipeline

Authors: Vishwesh Nath, Javier G. Tejero, Ruilong Li, Filippo Filicori, Mahdi Azizian, Sean D. Huver

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[442] arXiv:2512.02473 [pdf, ps, other]: Title: WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling

Authors: Yuta Oshima, Yusuke Iwasawa, Masahiro Suzuki, Yutaka Matsuo, Hiroki Furuta

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[443] arXiv:2512.02469 [pdf, ps, other]: Title: TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution

Authors: Fengli Ran, Xiao Pu, Bo Liu, Xiuli Bi, Bin Xiao

Comments: Accepted in AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[444] arXiv:2512.02458 [pdf, ps, other]: Title: Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration

Authors: Zhongyi Cai, Yi Du, Chen Wang, Yu Kong

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[445] arXiv:2512.02457 [pdf, ps, other]: Title: Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

Authors: Jianzong Wu, Hao Lian, Dachao Hao, Ye Tian, Qingyu Shi, Biaolong Chen, Hao Jiang, Yunhai Tong

Comments: Project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[446] arXiv:2512.02456 [pdf, ps, other]: Title: See, Think, Learn: A Self-Taught Multimodal Reasoner

Authors: Sourabh Sharma, Sonam Gupta, Sadbhawna

Comments: Winter Conference on Applications of Computer Vision 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[447] arXiv:2512.02453 [pdf, ps, other]: Title: ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation

Authors: Kerui Chen, Jianrong Zhang, Ming Li, Zhonglong Zheng, Hehe Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[448] arXiv:2512.02450 [pdf, ps, other]: Title: HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild

Authors: Valentin Bieri, Marie-Julie Rakotosaona, Keisuke Tateno, Francis Engelmann, Leonidas Guibas

Comments: NeurIPS 2025 (Datasets and Benchmarks Track) Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[449] arXiv:2512.02448 [pdf, ps, other]: Title: nuScenes Revisited: Progress and Challenges in Autonomous Driving

Authors: Whye Kit Fong, Venice Erin Liong, Kok Seang Tan, Holger Caesar

Comments: 18 pages, 17 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[450] arXiv:2512.02447 [pdf, ps, other]: Title: Temporal Dynamics Enhancer for Directly Trained Spiking Object Detectors

Authors: Fan Luo, Zeyu Gao, Xinhao Luo, Kai Zhao, Yanfeng Lu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[451] arXiv:2512.02441 [pdf, ps, other]: Title: Basis-Oriented Low-rank Transfer for Few-Shot and Test-Time Adaptation

Authors: Junghwan Park, Woojin Cho, Junhyuk Heo, Darongsae Kwon, Kookjin Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[452] arXiv:2512.02438 [pdf, ps, other]: Title: Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources

Authors: Phuc Pham, Nhu Pham, Ngoc Quoc Ly

Comments: WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[453] arXiv:2512.02437 [pdf, ps, other]: Title: LightHCG: a Lightweight yet powerful HSIC Disentanglement based Causal Glaucoma Detection Model framework

Authors: Daeyoung Kim

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[454] arXiv:2512.02425 [pdf, ps, other]: Title: WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Authors: Woongyeong Yeo, Kangsan Kim, Jaehong Yoon, Sung Ju Hwang

Comments: Project page : this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[455] arXiv:2512.02423 [pdf, ps, other]: Title: GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

Authors: Haolong Yan, Yeqing Shen, Xin Huang, Jia Wang, Kaijun Tan, Zhixuan Liang, Hongxin Li, Zheng Ge, Osamu Yoshie, Si Li, Xiangyu Zhang, Daxin Jiang

Comments: 26 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[456] arXiv:2512.02421 [pdf, ps, other]: Title: Generalizing Vision-Language Models with Dedicated Prompt Guidance

Authors: Xinyao Li, Yinjie Min, Hongbo Chen, Zhekai Du, Fengling Li, Jingjing Li

Comments: Accepted to AAAI26

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[457] arXiv:2512.02413 [pdf, ps, other]: Title: MitUNet: Enhancing Floor Plan Recognition using a Hybrid Mix-Transformer and U-Net Architecture

Authors: Dmitriy Parashchuk, Alexey Kapshitskiy, Yuriy Karyakin

Comments: 9 pages, 4 figures, 3 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[458] arXiv:2512.02405 [pdf, ps, other]: Title: WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate

Authors: Anoop Cherian, River Doyle, Eyal Ben-Dov, Suhas Lohit, Kuan-Chuan Peng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[459] arXiv:2512.02400 [pdf, ps, other]: Title: Nav-$R^2$ Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation

Authors: Wentao Xiang, Haokang Zhang, Tianhang Yang, Zedong Chu, Ruihang Chu, Shichao Xie, Yujian Yuan, Jian Sun, Zhining Gu, Junjie Wang, Xiaolong Wu, Mu Xu, Yujiu Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[460] arXiv:2512.02395 [pdf, ps, other]: Title: Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Authors: Yifan Zhang, Liang Hu, Haofeng Sun, Peiyu Wang, Yichen Wei, Shukang Yin, Jiangbo Pei, Wei Shen, Peng Xia, Yi Peng, Tianyidan Xie, Eric Li, Yang Liu, Xuchen Song, Yahui Zhou

Comments: 21 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[461] arXiv:2512.02394 [pdf, ps, other]: Title: Reproducing and Extending RaDelft 4D Radar with Camera-Assisted Labels

Authors: Kejia Hu, Mohammed Alsakabi, John M. Dolan, Ozan K. Tonguz

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[462] arXiv:2512.02392 [pdf, ps, other]: Title: From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking

Authors: Yuqing Shao, Yuchen Yang, Rui Yu, Weilong Li, Xu Guo, Huaicheng Yan, Wei Wang, Xiao Sun

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[463] arXiv:2512.02375 [pdf, ps, other]: Title: On-the-fly Feedback SfM: Online Explore-and-Exploit UAV Photogrammetry with Incremental Mesh Quality-Aware Indicator and Predictive Path Planning

Authors: Liyuan Lou, Wanyun Li, Wentian Gan, Yifei Yu, Tengfei Wang, Xin Wang, Zongqian Zhan

Comments: This work was submitted to IEEE GRSM Journal for consideration.COPYRIGHT would be transferred once it get accepted

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[464] arXiv:2512.02369 [pdf, ps, other]: Title: SAGE: Style-Adaptive Generalization for Privacy-Constrained Semantic Segmentation Across Domains

Authors: Qingmei Li, Yang Zhang, Peifeng Zhang, Haohuan Fu, Juepeng Zheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[465] arXiv:2512.02368 [pdf, ps, other]: Title: Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention

Authors: Wenyi Xiong, Jian Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[466] arXiv:2512.02364 [pdf, ps, other]: Title: Tackling Tuberculosis: A Comparative Dive into Machine Learning for Tuberculosis Detection

Authors: Daanish Hindustani, Sanober Hindustani, Preston Nguyen

Journal-ref: Vol. 6, No. 1 (2024), Minnesota Undergraduate Research & Academic Journal (MURAJ)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[467] arXiv:2512.02361 [pdf, ps, other]: Title: VACoT: Rethinking Visual Data Augmentation with VLMs

Authors: Zhengzhuo Xu, Chong Sun, SiNan Du, Chen Li, Jing Lyu, Chun Yuan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[468] arXiv:2512.02359 [pdf, ps, other]: Title: WSCF-MVCC: Weakly-supervised Calibration-free Multi-view Crowd Counting

Authors: Bin Li, Daijie Chen, Qi Zhang

Comments: PRCV 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[469] arXiv:2512.02351 [pdf, ps, other]: Title: Understanding and Harnessing Sparsity in Unified Multimodal Models

Authors: Shwai He, Chaorui Deng, Ang Li, Shen Yan

Comments: 13 pages, 13 figures, 8 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[470] arXiv:2512.02344 [pdf, ps, other]: Title: A multi-weight self-matching visual explanation for cnns on sar images

Authors: Siyuan Sun, Yongping Zhang, Hongcheng Zeng, Yamin Wang, Wei Yang, Wanting Yang, Jie Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[471] arXiv:2512.02341 [pdf, ps, other]: Title: TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction

Authors: Fengyi Zhang, Tianjun Zhang, Kasra Khosoussi, Zheng Zhang, Zi Huang, Yadan Luo

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[472] arXiv:2512.02339 [pdf, ps, other]: Title: Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision

Authors: Chenshuang Zhang, Kang Zhang, Joon Son Chung, In So Kweon, Junmo Kim, Chengzhi Mao

Comments: Accepted at NeurIPS 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[473] arXiv:2512.02290 [pdf, ps, other]: Title: Enhancing Cross Domain SAR Oil Spill Segmentation via Morphological Region Perturbation and Synthetic Label-to-SAR Generation

Authors: Andre Juarez, Luis Salsavilca, Frida Coaquira, Celso Gonzales

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[474] arXiv:2512.02273 [pdf, ps, other]: Title: Progressive Image Restoration via Text-Conditioned Video Generation

Authors: Peng Kang, Xijun Wang, Yu Yuan

Comments: First two authors contributed equally to this work. IEEE ICNC Accepted

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[475] arXiv:2512.02268 [pdf, ps, other]: Title: Spatiotemporal Pyramid Flow Matching for Climate Emulation

Authors: Jeremy Andrew Irvin, Jiaqi Han, Zikui Wang, Abdulaziz Alharbi, Yufei Zhao, Nomin-Erdene Bayarsaikhan, Daniele Visioni, Andrew Y. Ng, Duncan Watson-Parris

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[476] arXiv:2512.02258 [pdf, ps, other]: Title: Exploring the Potentials of Spiking Neural Networks for Image Deraining

Authors: Shuang Chen, Tomas Krajnik, Farshad Arvin, Amir Atapour-Abarghouei

Comments: Accepted By AAAI2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[477] arXiv:2512.02231 [pdf, ps, other]: Title: See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Authors: Le Thien Phuc Nguyen, Zhuoran Yu, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae Lee

Comments: preprint

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[478] arXiv:2512.02224 [pdf, ps, other]: Title: Towards Unified Video Quality Assessment

Authors: Chen Feng, Tianhao Peng, Fan Zhang, David Bull

Comments: 8 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[479] arXiv:2512.02198 [pdf, ps, other]: Title: Multifractal Recalibration of Neural Networks for Medical Imaging Segmentation

Authors: Miguel L. Martins, Miguel T. Coimbra, Francesco Renna

Comments: 30 pages, 9 figures, journal paper

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[480] arXiv:2512.02188 [pdf, ps, other]: Title: RobustSurg: Tackling domain generalisation for out-of-distribution surgical scene segmentation

Authors: Mansoor Ali, Maksim Richards, Gilberto Ochoa-Ruiz, Sharib Ali

Comments: Submitted to Medical Image Analysis

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[481] arXiv:2512.02172 [pdf, ps, other]: Title: SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian Splatting

Authors: Pranav Asthana, Alex Hanson, Allen Tu, Tom Goldstein, Matthias Zwicker, Amitabh Varshney

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[482] arXiv:2512.02162 [pdf, ps, other]: Title: Mapping of Lesion Images to Somatic Mutations

Authors: Rahul Mehta

Comments: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
[483] arXiv:2512.02161 [pdf, ps, other]: Title: FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges

Authors: Kevin David Hayes, Micah Goldblum, Vikash Sehwag, Gowthami Somepalli, Ashwinee Panda, Tom Goldstein

Comments: Accepted to NeurIPS 2025 Datasets and Benchmarks Track

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[484] arXiv:2512.02152 [pdf, ps, other]: Title: Context-Enriched Contrastive Loss: Enhancing Presentation of Inherent Sample Connections in Contrastive Learning Framework

Authors: Haojin Deng, Yimin Yang

Comments: 13 pages, 7 figures. Published in IEEE Transactions on Multimedia. Code available at: this https URL

Journal-ref: IEEE Transactions on Multimedia, Vol. 27, pp. 429-441, December 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[485] arXiv:2512.02055 [pdf, ps, other]: Title: Leveraging AI multimodal geospatial foundation models for improved near-real-time flood mapping at a global scale

Authors: Mirela G. Tulbure, Julio Caineta, Mark Broich, Mollie D. Gaines, Philippe Rufin, Leon-Friedrich Thomas, Hamed Alemohammad, Jan Hemmerling, Patrick Hostert

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[486] arXiv:2512.03028 (cross-list from cs.GR) [pdf, ps, other]: Title: SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control

Authors: Yuxuan Mu, Ziyu Zhang, Yi Shi, Minami Matsumoto, Kotaro Imamura, Guy Tevet, Chuan Guo, Michael Taylor, Chang Shu, Pengcheng Xi, Xue Bin Peng

Comments: 14 pages, 9 figures

Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[487] arXiv:2512.02920 (cross-list from cs.LG) [pdf, ps, other]: Title: Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation

Authors: Ziniu Zhang, Minxuan Duan, Haris N. Koutsopoulos, Hongyang R. Zhang

Comments: 17 pages. To appear in KDD'26 Datasets

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)
[488] arXiv:2512.02787 (cross-list from cs.RO) [pdf, ps, other]: Title: Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols

Authors: Xianchao Zeng, Xinyu Zhou, Youcheng Li, Jiayou Shi, Tianle Li, Liangming Chen, Lei Ren, Yong-Lu Li

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[489] arXiv:2512.02719 (cross-list from cs.CL) [pdf, ps, other]: Title: Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs

Authors: Julian Ma, Jun Wang, Zafeirios Fountas

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
[490] arXiv:2512.02651 (cross-list from cs.HC) [pdf, ps, other]: Title: Real-Time Multimodal Data Collection Using Smartwatches and Its Visualization in Education

Authors: Alvaro Becerra, Pablo Villegas, Ruth Cobos

Comments: Accepted in Technological Ecosystems for Enhancing Multiculturality (TEEM) 2025

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
[491] arXiv:2512.02636 (cross-list from cs.LG) [pdf, ps, other]: Title: Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models

Authors: Xinyue Ai, Yutong He, Albert Gu, Ruslan Salakhutdinov, J Zico Kolter, Nicholas Matthew Boffi, Max Simchowitz

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[492] arXiv:2512.02609 (cross-list from cs.RO) [pdf, ps, other]: Title: SAM2Grasp: Resolve Multi-modal Grasping via Prompt-conditioned Temporal Action Prediction

Authors: Shengkai Wu, Jinrong Yang, Wenqiu Luo, Linfeng Gao, Chaohui Shang, Meiyu Zhi, Mingshan Sun, Fangping Yang, Liangliang Ren, Yong Zhao

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[493] arXiv:2512.02340 (cross-list from cs.AI) [pdf, ps, other]: Title: Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective

Authors: Qiyao Xue, Weichen Liu, Shiqi Wang, Haoming Wang, Yuyang Wu, Wei Gao

Comments: 23 pages, 37 figures

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[494] arXiv:2512.02306 (cross-list from cs.AI) [pdf, ps, other]: Title: OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning

Authors: Boyu Zhu, Xiaofei Wen, Wenjie Jacky Mo, Tinghui Zhu, Yanan Xie, Peng Qi, Muhao Chen

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[495] arXiv:2512.02293 (cross-list from cs.RO) [pdf, ps, other]: Title: VIGS-SLAM: Visual Inertial Gaussian Splatting SLAM

Authors: Zihan Zhu, Wei Zhang, Norbert Haala, Marc Pollefeys, Daniel Barath

Comments: Project page: this https URL

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[496] arXiv:2512.02280 (cross-list from cs.AI) [pdf, ps, other]: Title: Bridging the Gap: Toward Cognitive Autonomy in Artificial Intelligence

Authors: Noorbakhsh Amiri Golilarz, Sindhuja Penchala, Shahram Rahimi

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[497] arXiv:2512.02243 (cross-list from cs.CR) [pdf, ps, other]: Title: PhishSnap: Image-Based Phishing Detection Using Perceptual Hashing

Authors: Md Abdul Ahad Minhaz, Zannatul Zahan Meem, Md. Shohrab Hossain

Comments: IEE Standard Formatting, 3 pages, 3 figures

Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[498] arXiv:2512.02143 (cross-list from cs.GR) [pdf, ps, other]: Title: CoatFusion: Controllable Material Coating in Images

Authors: Sagie Levy, Elad Aharoni, Matan Levy, Ariel Shamir, Dani Lischinski

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[499] arXiv:2512.02088 (cross-list from eess.IV) [pdf, ps, other]: Title: Comparing Baseline and Day-1 Diffusion MRI Using Multimodal Deep Embeddings for Stroke Outcome Prediction

Authors: Sina Raeisadigh, Myles Joshua Toledo Tan, Henning Müller, Abderrahmane Hedjoudje

Comments: 5 pages, 5 figures, 2 tables

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[500] arXiv:2512.02062 (cross-list from cs.CR) [pdf, ps, other]: Title: Superpixel Attack: Enhancing Black-box Adversarial Attack with Image-driven Division Areas

Authors: Issa Oe, Keiichiro Yamamura, Hiroki Ishikura, Ryo Hamahira, Katsuki Fujisawa

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Tue, 2 Dec 2025

[501] arXiv:2512.02018 [pdf, ps, other]: Title: Data-Centric Visual Development for Self-Driving Labs

Authors: Anbang Liu, Guanzhong Hu, Jiayi Wang, Ping Guo, Han Liu

Comments: 11 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[502] arXiv:2512.02017 [pdf, ps, other]: Title: Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion

Authors: Shaowei Liu, David Yifan Yao, Saurabh Gupta, Shenlong Wang

Comments: Accepted to NeurIPS 2025. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[503] arXiv:2512.02016 [pdf, ps, other]: Title: Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now

Authors: Varun Varma Thozhiyoor, Shivam Tripathi, Venkatesh Babu Radhakrishnan, Anand Bhattad

Comments: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[504] arXiv:2512.02015 [pdf, ps, other]: Title: Generative Video Motion Editing with 3D Point Tracks

Authors: Yao-Chih Lee, Zhoutong Zhang, Jiahui Huang, Jui-Hsien Wang, Joon-Young Lee, Jia-Bin Huang, Eli Shechtman, Zhengqi Li

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[505] arXiv:2512.02014 [pdf, ps, other]: Title: TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Authors: Zhiheng Liu, Weiming Ren, Haozhe Liu, Zijian Zhou, Shoufa Chen, Haonan Qiu, Xiaoke Huang, Zhaochong An, Fanny Yang, Aditya Patel, Viktar Atliha, Tony Ng, Xiao Han, Chuyan Zhu, Chenyang Zhang, Ding Liu, Juan-Manuel Perez-Rua, Sen He, Jürgen Schmidhuber, Wenhu Chen, Ping Luo, Wei Liu, Tao Xiang, Jonas Schult, Yuren Cong

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[506] arXiv:2512.02012 [pdf, ps, other]: Title: Improved Mean Flows: On the Challenges of Fastforward Generative Models

Authors: Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, Kaiming He

Comments: Technical report

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[507] arXiv:2512.02009 [pdf, ps, other]: Title: AirSim360: A Panoramic Simulation Platform within Drone View

Authors: Xian Ge, Yuling Pan, Yuhang Zhang, Xiang Li, Weijun Zhang, Dizhe Zhang, Zhaoliang Wan, Xin Lin, Xiangkai Zhang, Juntao Liang, Jason Li, Wenjie Jiang, Bo Du, Ming-Hsuan Yang, Lu Qi

Comments: Project Website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[508] arXiv:2512.02006 [pdf, ps, other]: Title: MV-TAP: Tracking Any Point in Multi-View Videos

Authors: Jahyeok Koo, Inès Hyeonsu Kim, Mungyeom Kim, Junghyun Park, Seohyun Park, Jaeyeong Kim, Jung Yi, Seokju Cho, Seungryong Kim

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[509] arXiv:2512.02005 [pdf, ps, other]: Title: Learning Visual Affordance from Audio

Authors: Lidong Lu, Guo Chen, Zhu Wei, Yicheng Liu, Tong Lu

Comments: 15 pages, 10 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[510] arXiv:2512.01989 [pdf, ps, other]: Title: PAI-Bench: A Comprehensive Benchmark For Physical AI

Authors: Fengzhe Zhou, Jiannan Huang, Jialuo Li, Deva Ramanan, Humphrey Shi

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[511] arXiv:2512.01988 [pdf, ps, other]: Title: Artemis: Structured Visual Reasoning for Perception Policy Learning

Authors: Wei Tang, Yanpeng Sun, Shan Zhang, Xiaofan Li, Piotr Koniusz, Wei Li, Na Zhao, Zechao Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[512] arXiv:2512.01975 [pdf, ps, other]: Title: SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning

Authors: Xu Zhang, Jin Yuan, Hanwang Zhang, Guojin Zhong, Yongsheng Zang, Jiacheng Lin, Zhiyong Li

Comments: Accept by AAAI-2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[513] arXiv:2512.01960 [pdf, ps, other]: Title: SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation

Authors: Zisu Li, Hengye Lyu, Jiaxin Shi, Yufeng Zeng, Mingming Fan, Hanwang Zhang, Chen Liang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[514] arXiv:2512.01952 [pdf, ps, other]: Title: GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment

Authors: Haoyang He, Jay Patrikar, Dong-Ki Kim, Max Smith, Daniel McGann, Ali-akbar Agha-mohammadi, Shayegan Omidshafiei, Sebastian Scherer

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[515] arXiv:2512.01949 [pdf, ps, other]: Title: Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models

Authors: Zhongyu Yang, Dannong Xu, Wei Pang, Yingfang Yuan

Comments: Published in Transactions on Machine Learning Research, Project in this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[516] arXiv:2512.01934 [pdf, ps, other]: Title: Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial Trajectory

Authors: Chenyi Wang, Yanmao Man, Raymond Muller, Ming Li, Z. Berkay Celik, Ryan Gerdes, Jonathan Petit

Comments: Accepted to Annual Computer Security Applications Conference (ACSAC) 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[517] arXiv:2512.01922 [pdf, ps, other]: Title: Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding

Authors: Zahra Mahdavi, Zahra Khodakaramimaghsoud, Hooman Khaloo, Sina Bakhshandeh Taleshani, Erfan Hashemi, Javad Mirzapour Kaleybar, Omid Nejati Manzari

Journal-ref: Computers in Biology and Medicine (2026)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[518] arXiv:2512.01908 [pdf, ps, other]: Title: SARL: Spatially-Aware Self-Supervised Representation Learning for Visuo-Tactile Perception

Authors: Gurmeher Khurana, Lan Wei, Dandan Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[519] arXiv:2512.01895 [pdf, ps, other]: Title: StyleYourSmile: Cross-Domain Face Retargeting Without Paired Multi-Style Data

Authors: Avirup Dey, Vinay Namboodiri

Comments: 15 pages, 14 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[520] arXiv:2512.01889 [pdf, ps, other]: Title: KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM

Authors: Zaid Nasser, Mikhail Iumanov, Tianhao Li, Maxim Popov, Jaafar Mahmoud, Malik Mohrat, Ilya Obrubov, Ekaterina Derevyanka, Ivan Sosin, Sergey Kolyubin

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[521] arXiv:2512.01885 [pdf, ps, other]: Title: TransientTrack: Advanced Multi-Object Tracking and Classification of Cancer Cells with Transient Fluorescent Signals

Authors: Florian Bürger, Martim Dias Gomes, Nica Gutu, Adrián E. Granada, Noémie Moreau, Katarzyna Bozek

Comments: 13 pages, 7 figures, 2 tables. This work has been submitted to IEEE Transactions on Medical Imaging

Subjects: Computer Vision and Pattern Recognition (cs.CV); Cell Behavior (q-bio.CB); Quantitative Methods (q-bio.QM)
[522] arXiv:2512.01853 [pdf, ps, other]: Title: COACH: Collaborative Agents for Contextual Highlighting -- A Multi-Agent Framework for Sports Video Analysis

Authors: Tsz-To Wong, Ching-Chun Huang, Hong-Han Shuai

Comments: Accepted by AAAI 2026 Workshop LaMAS

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[523] arXiv:2512.01850 [pdf, ps, other]: Title: Register Any Point: Scaling 3D Point Cloud Registration by Flow Matching

Authors: Yue Pan, Tao Sun, Liyuan Zhu, Lucas Nunes, Iro Armeni, Jens Behley, Cyrill Stachniss

Comments: 22 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[524] arXiv:2512.01843 [pdf, ps, other]: Title: PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models

Authors: Zeqing Wang, Keze Wang, Lei Zhang

Comments: 17 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[525] arXiv:2512.01830 [pdf, ps, other]: Title: OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic

Authors: Songyan Zhang, Wenhui Huang, Zhan Chen, Chua Jiahao Collister, Qihang Huang, Chen Lv

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[526] arXiv:2512.01827 [pdf, ps, other]: Title: CauSight: Learning to Supersense for Visual Causal Discovery

Authors: Yize Zhang, Meiqi Chen, Sirui Chen, Bo Peng, Yanxi Zhang, Tianyu Li, Chaochao Lu

Comments: project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[527] arXiv:2512.01821 [pdf, ps, other]: Title: Seeing through Imagination: Learning Scene Geometry via Implicit Spatial World Modeling

Authors: Meng Cao, Haokun Lin, Haoyuan Li, Haoran Tang, Rongtao Xu, Dong An, Xue Liu, Ian Reid, Xiaodan Liang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[528] arXiv:2512.01816 [pdf, ps, other]: Title: Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

Authors: Juanxi Tian, Siyuan Li, Conghui He, Lijun Wu, Cheng Tan

Comments: 35 pages, 12 figures, 10 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[529] arXiv:2512.01803 [pdf, ps, other]: Title: Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos

Authors: Xavier Thomas, Youngsun Lim, Ananya Srinivasan, Audrey Zheng, Deepti Ghadiyaram

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[530] arXiv:2512.01789 [pdf, ps, other]: Title: SAM3-UNet: Simplified Adaptation of Segment Anything Model 3

Authors: Xinyu Xiong, Zihuang Wu, Lei Lu, Yufa Xia

Comments: Technical Report

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[531] arXiv:2512.01788 [pdf, ps, other]: Title: Learned Image Compression for Earth Observation: Implications for Downstream Segmentation Tasks

Authors: Christian Mollière, Iker Cumplido, Marco Zeulner, Lukas Liesenhoff, Matthias Schubert, Julia Gottfriedsen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[532] arXiv:2512.01774 [pdf, ps, other]: Title: Evaluating SAM2 for Video Semantic Segmentation

Authors: Syed Hesham Syed Ariff, Yun Liu, Guolei Sun, Jing Yang, Henghui Ding, Xue Geng, Xudong Jiang

Comments: 17 pages, 3 figures and 7 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[533] arXiv:2512.01771 [pdf, ps, other]: Title: Robust Rigid and Non-Rigid Medical Image Registration Using Learnable Edge Kernels

Authors: Ahsan Raza Siyal, Markus Haltmeier, Ruth Steiger, Malik Galijasevic, Elke Ruth Gizewski, Astrid Ellen Grams

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[534] arXiv:2512.01769 [pdf, ps, other]: Title: VideoScoop: A Non-Traditional Domain-Independent Framework For Video Analysis

Authors: Hafsa Billah

Comments: This is a report submitted as part of PhD proposal defense of Hafsa Billah

Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)
[535] arXiv:2512.01763 [pdf, ps, other]: Title: HiconAgent: History Context-aware Policy Optimization for GUI Agents

Authors: Xurui Zhou, Gongwei Chen, Yuquan Xie, Zaijing Li, Kaiwen Zhou, Shuai Wang, Shuo Yang, Zhuotao Tian, Rui Shao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[536] arXiv:2512.01755 [pdf, ps, other]: Title: FreqEdit: Preserving High-Frequency Features for Robust Multi-Turn Image Editing

Authors: Yucheng Liao, Jiajun Liang, Kaiqian Cui, Baoquan Zhao, Haoran Xie, Wei Liu, Qing Li, Xudong Mao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[537] arXiv:2512.01707 [pdf, ps, other]: Title: StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

Authors: Daeun Lee, Subhojyoti Mukherjee, Branislav Kveton, Ryan A. Rossi, Viet Dac Lai, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Mohit Bansal

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[538] arXiv:2512.01701 [pdf, ps, other]: Title: SSR: Semantic and Spatial Rectification for CLIP-based Weakly Supervised Segmentation

Authors: Xiuli Bi, Die Xiao, Junchao Fan, Bin Xiao

Comments: Accepted in AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[539] arXiv:2512.01686 [pdf, ps, other]: Title: DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models

Authors: Patrick Kwon, Chen Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[540] arXiv:2512.01681 [pdf, ps, other]: Title: Cross-Domain Validation of a Resection-Trained Self-Supervised Model on Multicentre Mesothelioma Biopsies

Authors: Farzaneh Seyedshahi, Francesca Damiola, Sylvie Lantuejoul, Ke Yuan, John Le Quesne

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[541] arXiv:2512.01677 [pdf, ps, other]: Title: Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation

Authors: Haodong Yan, Hang Yu, Zhide Zhong, Weilin Yuan, Xin Gong, Zehang Luo, Chengxi Heyu, Junfeng Li, Wenxuan Song, Shunbo Zhou, Haoang Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[542] arXiv:2512.01675 [pdf, ps, other]: Title: GRASP: Guided Residual Adapters with Sample-wise Partitioning

Authors: Felix Nützel, Mischa Dombrowski, Bernhard Kainz

Comments: 10 pages, 4 figures, 6 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[543] arXiv:2512.01665 [pdf, ps, other]: Title: Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery

Authors: Zhicheng Zhao, Yin Huang, Lingma Sun, Chenglong Li, Jin Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[544] arXiv:2512.01657 [pdf, ps, other]: Title: DB-KAUNet: An Adaptive Dual Branch Kolmogorov-Arnold UNet for Retinal Vessel Segmentation

Authors: Hongyu Xu, Panpan Meng, Meng Wang, Dayu Hu, Liming Liang, Xiaoqi Sheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[545] arXiv:2512.01643 [pdf, ps, other]: Title: ViT$^3$: Unlocking Test-Time Training in Vision

Authors: Dongchen Han, Yining Li, Tianyu Li, Zixuan Cao, Ziming Wang, Jun Song, Yu Cheng, Bo Zheng, Gao Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[546] arXiv:2512.01636 [pdf, ps, other]: Title: Generative Editing in the Joint Vision-Language Space for Zero-Shot Composed Image Retrieval

Authors: Xin Wang, Haipeng Zhang, Mang Li, Zhaohui Xia, Yueguo Chen, Yu Zhang, Chunyu Wei

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[547] arXiv:2512.01629 [pdf, ps, other]: Title: SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge

Authors: Yumeng He, Ying Jiang, Jiayin Lu, Yin Yang, Chenfanfu Jiang

Comments: Project page: this https URL 17 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[548] arXiv:2512.01611 [pdf, ps, other]: Title: Depth Matching Method Based on ShapeDTW for Oil-Based Mud Imager

Authors: Fengfeng Li, Zhou Feng, Hongliang Wu, Hao Zhang, Han Tian, Peng Liu, Lixin Yuan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Geophysics (physics.geo-ph)
[549] arXiv:2512.01589 [pdf, ps, other]: Title: Toward Content-based Indexing and Retrieval of Head and Neck CT with Abscess Segmentation

Authors: Thao Thi Phuong Dao, Tan-Cong Nguyen, Trong-Le Do, Truong Hoang Viet, Nguyen Chi Thanh, Huynh Nguyen Thuan, Do Vo Cong Nguyen, Minh-Khoi Pham, Mai-Khiem Tran, Viet-Tham Huynh, Trong-Thuan Nguyen, Trung-Nghia Le, Vo Thanh Toan, Tam V. Nguyen, Minh-Triet Tran, Thanh Dinh Le

Comments: The 2025 IEEE International Conference on Content-Based Multimedia Indexing (IEEE CBMI)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[550] arXiv:2512.01582 [pdf, ps, other]: Title: RoleMotion: A Large-Scale Dataset towards Robust Scene-Specific Role-Playing Motion Synthesis with Fine-grained Descriptions

Authors: Junran Peng, Yiheng Huang, Silei Shen, Zeji Wei, Jingwei Yang, Baojie Wang, Yonghao He, Chuanchen Luo, Man Zhang, Xucheng Yin, Wei Sui

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[551] arXiv:2512.01563 [pdf, ps, other]: Title: MasHeNe: A Benchmark for Head and Neck CT Mass Segmentation using Window-Enhanced Mamba with Frequency-Domain Integration

Authors: Thao Thi Phuong Dao, Tan-Cong Nguyen, Nguyen Chi Thanh, Truong Hoang Viet, Trong-Le Do, Mai-Khiem Tran, Minh-Khoi Pham, Trung-Nghia Le, Minh-Triet Tran, Thanh Dinh Le

Comments: The 14th International Symposium on Information and Communication Technology Conference SoICT 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[552] arXiv:2512.01540 [pdf, ps, other]: Title: FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention

Authors: Zipeng Wang, Dan Xu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[553] arXiv:2512.01534 [pdf, ps, other]: Title: Deep Unsupervised Anomaly Detection in Brain Imaging: Large-Scale Benchmarking and Bias Analysis

Authors: Alexander Frotscher, Christian F. Baumgartner, Thomas Wolfers

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[554] arXiv:2512.01533 [pdf, ps, other]: Title: Diffusion Fuzzy System: Fuzzy Rule Guided Latent Multi-Path Diffusion Modeling

Authors: Hailong Yang, Te Zhang, Kup-sze Choi, Zhaohong Deng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[555] arXiv:2512.01519 [pdf, ps, other]: Title: QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic Interactions

Authors: Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban

Subjects: Computer Vision and Pattern Recognition (cs.CV); Materials Science (cond-mat.mtrl-sci); Quantum Physics (quant-ph)
[556] arXiv:2512.01510 [pdf, ps, other]: Title: Semantic-aware Random Convolution and Source Matching for Domain Generalization in Medical Image Segmentation

Authors: Franz Thaler, Martin Urschler, Mateusz Kozinski, Matthias AF Gsell, Gernot Plank, Darko Stern

Comments: Preprint submitted to Computer Methods and Programs in Biomedicine (currently under revision)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[557] arXiv:2512.01495 [pdf, ps, other]: Title: ELVIS: Enhance Low-Light for Video Instance Segmentation in the Dark

Authors: Joanne Lin, Ruirui Lin, Yini Li, David Bull, Nantheera Anantrasirichai

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[558] arXiv:2512.01494 [pdf, other]: Title: A variational method for curve extraction with curvature-dependent energies

Authors: Majid Arthaud (ENPC, MOKAPLAN, UMich), Antonin Chambolle (CEREMADE, MOKAPLAN), Vincent Duval (MOKAPLAN)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[559] arXiv:2512.01481 [pdf, ps, other]: Title: ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling

Authors: Qisen Wang, Yifan Zhao, Peisen Shen, Jialu Li, Jia Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[560] arXiv:2512.01478 [pdf, ps, other]: Title: CourtMotion: Learning Event-Driven Motion Representations from Skeletal Data for Basketball

Authors: Omer Sela (1 and 2), Michael Chertok (1), Lior Wolf (2) ((1) Amazon, (2) Tel Aviv University)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
[561] arXiv:2512.01444 [pdf, ps, other]: Title: FastAnimate: Towards Learnable Template Construction and Pose Deformation for Fast 3D Human Avatar Animation

Authors: Jian Shu, Nanjie Yao, Gangjian Zhang, Junlong Ren, Yu Feng, Hao Wang

Comments: 9 pages,4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[562] arXiv:2512.01427 [pdf, ps, other]: Title: Language-Guided Open-World Anomaly Segmentation

Authors: Klara Reichard, Nikolas Brasch, Nassir Navab, Federico Tombari

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[563] arXiv:2512.01426 [pdf, ps, other]: Title: ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers

Authors: Yiyang Ma, Feng Zhou, Xuedan Yin, Pu Cao, Yonghao Dang, Jianqin Yin

Comments: 8 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[564] arXiv:2512.01424 [pdf, ps, other]: Title: ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models

Authors: Xusen Hei, Jiali Chen, Jinyu Yang, Mengchen Zhao, Yi Cai

Comments: 22 pages, 11 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[565] arXiv:2512.01422 [pdf, ps, other]: Title: MDiff4STR: Mask Diffusion Model for Scene Text Recognition

Authors: Yongkun Du, Miaomiao Zhao, Songlin Fan, Zhineng Chen, Caiyan Jia, Yu-Gang Jiang

Comments: Accepted by AAAI 2026 (Oral)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[566] arXiv:2512.01419 [pdf, ps, other]: Title: Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries

Authors: Tushar Pranav, Eshan Pandey, Austria Lyka Diane Bala, Aman Chadha, Indriyati Atmosukarto, Donny Soh Cheng Lock

Comments: 14 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[567] arXiv:2512.01390 [pdf, ps, other]: Title: FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution

Authors: Seungho Choi, Jeahun Sung, Jihyong Oh

Comments: Comments: Please visit our project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[568] arXiv:2512.01383 [pdf, ps, other]: Title: PointNet4D: A Lightweight 4D Point Cloud Video Backbone for Online and Offline Perception in Robotic Applications

Authors: Yunze Liu, Zifan Wang, Peiran Wu, Jiayang Ao

Comments: Accepted by WACV2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[569] arXiv:2512.01382 [pdf, ps, other]: Title: Reversible Inversion for Training-Free Exemplar-guided Image Editing

Authors: Yuke Li, Lianli Gao, Ji Zhang, Pengpeng Zeng, Lichuan Xiang, Hongkai Wen, Heng Tao Shen, Jingkuan Song

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[570] arXiv:2512.01380 [pdf, ps, other]: Title: Textured Geometry Evaluation: Perceptual 3D Textured Shape Metric via 3D Latent-Geometry Network

Authors: Tianyu Luan, Xuelu Feng, Zixin Zhu, Phani Nuney, Sheng Liu, Xuan Gong, David Doermann, Chunming Qiao, Junsong Yuan

Comments: Accepted by AAAI26

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[571] arXiv:2512.01373 [pdf, ps, other]: Title: SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape Evaluation

Authors: Sheng Liu, Tianyu Luan, Phani Nuney, Xuelu Feng, Junsong Yuan

Comments: Accepted by AAAI2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[572] arXiv:2512.01366 [pdf, ps, other]: Title: BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single Earbud

Authors: Yunzhe Li, Jiajun Yan, Yuzhou Wei, Kechen Liu, Yize Zhao, Chong Zhang, Hongzi Zhu, Li Lu, Shan Chang, Minyi Guo

Comments: This is the author-accepted version of the paper published in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Vol. 9, No. 4, Article 191, 2025. Final published version: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[573] arXiv:2512.01352 [pdf, ps, other]: Title: OpenBox: Annotate Any Bounding Boxes in 3D

Authors: In-Jae Lee, Mungyeom Kim, Kwonyoung Ryu, Pierre Musacchio, Jaesik Park

Comments: Accepted by NeurIPS 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[574] arXiv:2512.01348 [pdf, ps, other]: Title: Handwritten Text Recognition for Low Resource Languages

Authors: Sayantan Dey, Alireza Alaei, Partha Pratim Roy

Comments: 21 Pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[575] arXiv:2512.01342 [pdf, ps, other]: Title: InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

Authors: Chenting Wang, Yuhan Zhu, Yicheng Xu, Jiange Yang, Ziang Yan, Yali Wang, Yi Wang, Limin Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[576] arXiv:2512.01340 [pdf, ps, other]: Title: EvalTalker: Learning to Evaluate Real-Portrait-Driven Multi-Subject Talking Humans

Authors: Yingjie Zhou, Xilei Zhu, Siyu Ren, Ziyi Zhao, Ziwen Wang, Farong Wen, Yu Zhou, Jiezhang Cao, Xiongkuo Min, Fengjiao Chen, Xiaoyu Li, Xuezhi Cao, Guangtao Zhai, Xiaohong Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[577] arXiv:2512.01334 [pdf, ps, other]: Title: AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video Generation

Authors: Yexin Liu, Wen-Jie Shu, Zile Huang, Haoze Zheng, Yueze Wang, Manyuan Zhang, Ser-Nam Lim, Harry Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[578] arXiv:2512.01333 [pdf, ps, other]: Title: Optimizing Stroke Risk Prediction: A Machine Learning Pipeline Combining ROS-Balanced Ensembles and XAI

Authors: A S M Ahsanul Sarkar Akib, Raduana Khawla, Abdul Hasib

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[579] arXiv:2512.01319 [pdf, ps, other]: Title: Rethinking Intracranial Aneurysm Vessel Segmentation: A Perspective from Computational Fluid Dynamics Applications

Authors: Feiyang Xiao, Yichi Zhang, Xigui Li, Yuanye Zhou, Chen Jiang, Xin Guo, Limei Han, Yuxin Li, Fengping Zhu, Yuan Cheng

Comments: 18 pages, 5 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[580] arXiv:2512.01315 [pdf, ps, other]: Title: FOD-S2R: A FOD Dataset for Sim2Real Transfer Learning based Object Detection

Authors: Ashish Vashist, Qiranul Saadiyean, Suresh Sundaram, Chandra Sekhar Seelamantula

Comments: 8 pages, 11 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[581] arXiv:2512.01314 [pdf, ps, other]: Title: TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance

Authors: Pei Yang, Yepeng Liu, Kelly Peng, Yuan Gao, Yiren Song

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[582] arXiv:2512.01312 [pdf, ps, other]: Title: IVCR-200K: A Large-Scale Multi-turn Dialogue Benchmark for Interactive Video Corpus Retrieval

Authors: Ning Han, Yawen Zeng, Shaohua Long, Chengqing Li, Sijie Yang, Dun Tan, Jianfeng Dong, Jingjing Chen

Comments: Accepted by SIGIR2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[583] arXiv:2512.01310 [pdf, ps, other]: Title: Lost in Distortion: Uncovering the Domain Gap Between Computer Vision and Brain Imaging - A Study on Pretraining for Age Prediction

Authors: Yanteng Zhang, Songheng Li, Zeyu Shen, Qizhen Lan, Lipei Zhang, Yang Liu, Vince Calhoun

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[584] arXiv:2512.01306 [pdf, ps, other]: Title: Gaussian Swaying: Surface-Based Framework for Aerodynamic Simulation with 3D Gaussians

Authors: Hongru Yan, Xiang Zhang, Zeyuan Chen, Fangyin Wei, Zhuowen Tu

Comments: Accepted to WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[585] arXiv:2512.01302 [pdf, ps, other]: Title: DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy

Authors: Jaewoo Song, Jooyoung Choi, Kanghyun Baek, Sangyub Lee, Daemin Park, Sungroh Yoon

Comments: Accepted to WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[586] arXiv:2512.01298 [pdf, ps, other]: Title: TBT-Former: Learning Temporal Boundary Distributions for Action Localization

Authors: Thisara Rathnayaka, Uthayasanker Thayasivam

Comments: 8 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[587] arXiv:2512.01296 [pdf, ps, other]: Title: EGG-Fusion: Efficient 3D Reconstruction with Geometry-aware Gaussian Surfel on the Fly

Authors: Xiaokun Pan, Zhenzhe Li, Zhichao Ye, Hongjia Zhai, Guofeng Zhang

Comments: SIGGRAPH ASIA 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[588] arXiv:2512.01292 [pdf, ps, other]: Title: Diffusion Model in Latent Space for Medical Image Segmentation Task

Authors: Huynh Trinh Ngoc, Toan Nguyen Hai, Ba Luong Son, Long Tran Quoc

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[589] arXiv:2512.01291 [pdf, ps, other]: Title: Supervised Contrastive Machine Unlearning of Background Bias in Sonar Image Classification with Fine-Grained Explainable AI

Authors: Kamal Basha S, Athira Nambiar

Comments: Accepted to CVIP 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[590] arXiv:2512.01273 [pdf, ps, other]: Title: nnMobileNet++: Towards Efficient Hybrid Networks for Retinal Image Analysis

Authors: Xin Li, Wenhui Zhu, Xuanzhao Dong, Hao Wang, Yujian Xiong, Oana Dumitrascu, Yalin Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[591] arXiv:2512.01268 [pdf, ps, other]: Title: ViscNet: Vision-Based In-line Viscometry for Fluid Mixing Process

Authors: Jongwon Sohn, Juhyeon Moon, Hyunjoon Jung, Jaewook Nam

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[592] arXiv:2512.01248 [pdf, ps, other]: Title: TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

Authors: Junyuan Zhang, Bin Wang, Qintong Zhang, Fan Wu, Zichen Wen, Jialin Lu, Junjie Shan, Ziqi Zhao, Shuya Yang, Ziling Wang, Ziyang Miao, Huaping Zhong, Yuhang Zang, Xiaoyi Dong, Ka-Ho Chow, Conghui He

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[593] arXiv:2512.01242 [pdf, ps, other]: Title: Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation

Authors: Zirui Zhao, Boye Niu, David Hsu, Wee Sun Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[594] arXiv:2512.01236 [pdf, ps, other]: Title: PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards

Authors: Shulei Wang, Longhui Wei, Xin He, Jianbo Ouyang, Hui Lu, Zhou Zhao, Qi Tian

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[595] arXiv:2512.01223 [pdf, ps, other]: Title: S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance

Authors: Beining Xu, Siting Zhu, Zhao Jin, Junxian Li, Hesheng Wang

Comments: 18 pages, 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[596] arXiv:2512.01214 [pdf, ps, other]: Title: M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis

Authors: Hang Wu, Ke Sun, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji

Comments: 12 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[597] arXiv:2512.01213 [pdf, ps, other]: Title: Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations

Authors: Yangbangyan Jiang, Qianqian Xu, Huiyang Shao, Zhiyong Yang, Shilong Bao, Xiaochun Cao, Qingming Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[598] arXiv:2512.01204 [pdf, ps, other]: Title: TabletopGen: Instance-Level Interactive 3D Tabletop Scene Generation from Text or Single Image

Authors: Ziqian Wang, Yonghao He, Licheng Yang, Wei Zou, Hongxuan Ma, Liu Liu, Wei Sui, Yuxin Guo, Hu Su

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[599] arXiv:2512.01178 [pdf, ps, other]: Title: VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering

Authors: Zihua Liu, Hiroki Sakuma, Masatoshi Okutomi

Comments: arXiv admin note: text overlap with arXiv:2404.00149

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[600] arXiv:2512.01165 [pdf, ps, other]: Title: Real-Time On-the-Go Annotation Framework Using YOLO for Automated Dataset Generation

Authors: Mohamed Abdallah Salem (1), Ahmed Harb Rabia (1) ((1) North Dakota State University)

Comments: Copyright 2025 IEEE. This is the author's version of the work that has been accepted for publication in Proceedings of the 5. Interdisciplinary Conference on Electrics and Computer (INTCEC 2025) 15-16 September 2025, Chicago-USA. The final version of record is available at: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[601] arXiv:2512.01153 [pdf, ps, other]: Title: DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling

Authors: Han-Jin Lee, Han-Ju Lee, Jin-Seong Kim, Seok-Hwan Choi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[602] arXiv:2512.01148 [pdf, ps, other]: Title: SocialFusion: Addressing Social Degradation in Pre-trained Vision-Language Models

Authors: Hamza Tahboub, Weiyan Shi, Gang Hua, Huaizu Jiang

Comments: 22 pages, 10 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[603] arXiv:2512.01145 [pdf, ps, other]: Title: Weakly Supervised Continuous Micro-Expression Intensity Estimation Using Temporal Deep Neural Network

Authors: Riyadh Mohammed Almushrafy (Majmaah University, Saudi Arabia)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[604] arXiv:2512.01128 [pdf, ps, other]: Title: OmniFD: A Unified Model for Versatile Face Forgery Detection

Authors: Haotian Liu, Haoyu Chen, Chenhui Pan, You Hu, Guoying Zhao, Xiaobai Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[605] arXiv:2512.01116 [pdf, ps, other]: Title: Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis

Authors: Yilan Zhang, Li Nanbo, Changchun Yang, Jürgen Schmidhuber, Xin Gao

Comments: 37 pages, 14 Figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[606] arXiv:2512.01103 [pdf, ps, other]: Title: Learning Eigenstructures of Unstructured Data Manifolds

Authors: Roy Velich, Arkadi Piven, David Bensaïd, Daniel Cremers, Thomas Dagès, Ron Kimmel

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[607] arXiv:2512.01095 [pdf, ps, other]: Title: CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions

Authors: Simon Kohaut, Daniel Ochs, Shun Zhang, Benedict Flade, Julian Eggert, Kristian Kersting, Devendra Singh Dhami

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[608] arXiv:2512.01094 [pdf, ps, other]: Title: Accelerating Inference of Masked Image Generators via Reinforcement Learning

Authors: Pranav Subbaraman, Shufan Li, Siyan Zhao, Aditya Grover

Comments: 15 pages, 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[609] arXiv:2512.01085 [pdf, ps, other]: Title: Generalized Medical Phrase Grounding

Authors: Wenjun Zhang, Shekhar S. Chandra, Aaron Nicolson

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[610] arXiv:2512.01059 [pdf, ps, other]: Title: Parameter Reduction Improves Vision Transformers: A Comparative Study of Sharing and Width Reduction

Authors: Anantha Padmanaban Krishna Kumar (Boston University)

Comments: 7 pages total (6 pages main text, 1 page references), 1 figures, 2 tables. Code available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[611] arXiv:2512.01048 [pdf, ps, other]: Title: TRoVe: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models

Authors: Maya Varma, Jean-Benoit Delbrouck, Sophie Ostmeier, Akshay Chaudhari, Curtis Langlotz

Comments: NeurIPS 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[612] arXiv:2512.01030 [pdf, ps, other]: Title: Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model

Authors: Jing He, Haodong Li, Mingzhi Sheng, Ying-Cong Chen

Comments: Work done at the Hong Kong University of Science and Technology (Guangzhou). Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[613] arXiv:2512.01008 [pdf, ps, other]: Title: LISA-3D: Lifting Language-Image Segmentation to 3D via Multi-View Consistency

Authors: Zhongbin Guo, Jiahe Liu, Wenyu Gao, Yushan Li, Chengzhi Li, Ping Jian

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[614] arXiv:2512.00999 [pdf, ps, other]: Title: Provenance-Driven Reliable Semantic Medical Image Vector Reconstruction via Lightweight Blockchain-Verified Latent Fingerprints

Authors: Mohsin Rasheed, Abdullah Al-Mamun

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[615] arXiv:2512.00995 [pdf, ps, other]: Title: S2AM3D: Scale-controllable Part Segmentation of 3D Point Cloud

Authors: Han Su, Tianyu Huang, Zichen Wan, Xiaohe Wu, Wangmeng Zuo

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[616] arXiv:2512.00993 [pdf, ps, other]: Title: PhotoFramer: Multi-modal Image Composition Instruction

Authors: Zhiyuan You, Ke Wang, He Zhang, Xin Cai, Jinjin Gu, Tianfan Xue, Chao Dong, Zhoutong Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[617] arXiv:2512.00975 [pdf, ps, other]: Title: MM-ACT: Learn from Multimodal Parallel Generation to Act

Authors: Haotian Liang, Xinyi Chen, Bin Wang, Mingkang Chen, Yitian Liu, Yuhao Zhang, Zanxin Chen, Tianshuo Yang, Yilun Chen, Jiangmiao Pang, Dong Liu, Xiaokang Yang, Yao Mu, Wenqi Shao, Ping Luo

Comments: 17 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[618] arXiv:2512.00960 [pdf, ps, other]: Title: Efficient and Scalable Monocular Human-Object Interaction Motion Reconstruction

Authors: Boran Wen, Ye Lu, Keyan Wan, Sirui Wang, Jiahong Zhou, Junxuan Liang, Xinpeng Liu, Bang Xiao, Dingbang Huang, Ruiyang Liu, Yong-Lu Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[619] arXiv:2512.00953 [pdf, ps, other]: Title: Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval

Authors: Haojian Huang, Kaijing Ma, Jin Chen, Haodong Chen, Zhou Wu, Xianghao Zang, Han Fang, Chao Ban, Hao Sun, Mulin Chen, Zhongjiang He

Comments: Accepted by AAAI 2026, 10 pages, 9 figures, 5 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[620] arXiv:2512.00944 [pdf, ps, other]: Title: Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation

Authors: An Yang, Chenyu Liu, Jun Du, Jianqing Gao, Jia Pan, Jinshui Hu, Baocai Yin, Bing Yin, Cong Liu

Journal-ref: AAAI2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[621] arXiv:2512.00936 [pdf, ps, other]: Title: SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding

Authors: Keita Otani, Tatsuya Harada

Comments: Accepted to WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[622] arXiv:2512.00927 [pdf, ps, other]: Title: LAHNet: Local Attentive Hashing Network for Point Cloud Registration

Authors: Wentao Qu, Xiaoshui Huang, Liang Xiao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[623] arXiv:2512.00912 [pdf, ps, other]: Title: ForamDeepSlice: A High-Accuracy Deep Learning Framework for Foraminifera Species Classification from 2D Micro-CT Slices

Authors: Abdelghafour Halimi, Ali Alibrahim, Didier Barradas-Bautista, Ronell Sicat, Abdulkader M. Afifi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[624] arXiv:2512.00911 [pdf, ps, other]: Title: Dual-Projection Fusion for Accurate Upright Panorama Generation in Robotic Vision

Authors: Yuhao Shan, Qianyi Yuan, Jingguo Liu, Shigang Li, Jianfeng Li, Tong Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[625] arXiv:2512.00909 [pdf, ps, other]: Title: TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model

Authors: Alireza Javanmardi, Pragati Jaiswal, Tewodros Amberbir Habtegebrial, Christen Millerdurai, Shaoxiang Wang, Alain Pagani, Didier Stricker

Comments: WACV 2026, Project page available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[626] arXiv:2512.00904 [pdf, ps, other]: Title: Hierarchical Semantic Alignment for Image Clustering

Authors: Xingyu Zhu, Beier Zhu, Yunfan Li, Junfeng Fang, Shuo Wang, Kesen Zhao, Hanwang Zhang

Comments: AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[627] arXiv:2512.00903 [pdf, ps, other]: Title: SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead

Authors: Chaojun Ni, Cheng Chen, Xiaofeng Wang, Zheng Zhu, Wenzhao Zheng, Boyuan Wang, Tianrun Chen, Guosheng Zhao, Haoyun Li, Zhehao Dong, Qiang Zhang, Yun Ye, Yang Wang, Guan Huang, Wenjun Mei

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[628] arXiv:2512.00891 [pdf, ps, other]: Title: Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Authors: Yiyu Wang, Xuyang Liu, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, Linfeng Zhang

Comments: Code is avaliable at \url{this https URL}

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[629] arXiv:2512.00887 [pdf, ps, other]: Title: Multilingual Training-Free Remote Sensing Image Captioning

Authors: Carlos Rebelo, Gil Rocha, João Daniel Silva, Bruno Martins

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[630] arXiv:2512.00885 [pdf, ps, other]: Title: HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics

Authors: Masatoshi Tateno, Gido Kato, Hirokatsu Kataoka, Yoichi Sato, Takuma Yagi

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[631] arXiv:2512.00882 [pdf, ps, other]: Title: Look, Recite, Then Answer: Enhancing VLM Performance via Self-Generated Knowledge Hints

Authors: Xisheng Feng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[632] arXiv:2512.00880 [pdf, ps, other]: Title: Quantum-Inspired Spectral Geometry for Neural Operator Equivalence and Structured Pruning

Authors: Haijian Shao, Wei Liu, Xing Deng

Comments: 6 pages, 1 figure, preliminary version; concepts and simulation experiments only

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[633] arXiv:2512.00877 [pdf, ps, other]: Title: Feed-Forward 3D Gaussian Splatting Compression with Long-Context Modeling

Authors: Zhening Liu, Rui Song, Yushi Huang, Yingdong Hu, Xinjie Zhang, Jiawei Shao, Zehong Lin, Jun Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[634] arXiv:2512.00873 [pdf, ps, other]: Title: Neural Discrete Representation Learning for Sparse-View CBCT Reconstruction: From Algorithm Design to Prospective Multicenter Clinical Evaluation

Authors: Haoshen Wang, Lei Chen, Wei-Hua Zhang, Linxia Wu, Yong Luo, Zengmao Wang, Yuan Xiong, Chengcheng Zhu, Wenjuan Tang, Xueyi Zhang, Wei Zhou, Xuhua Duan, Lefei Zhang, Gao-Jun Teng, Bo Du, Huangxuan Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[635] arXiv:2512.00872 [pdf, ps, other]: Title: TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models

Authors: Tim Veenboer, George Yiasemis, Eric Marcus, Vivien Van Veldhuizen, Cees G. M. Snoek, Jonas Teuwen, Kevin B. W. Groot Lipman

Comments: 22 pages, 4 figures, 8 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[636] arXiv:2512.00850 [pdf, ps, other]: Title: Smol-GS: Compact Representations for Abstract 3D Gaussian Splatting

Authors: Haishan Wang, Mohammad Hassan Vali, Arno Solin

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[637] arXiv:2512.00846 [pdf, ps, other]: Title: AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agent

Authors: Neeraj Anand, Rishabh Jain, Sohan Patnaik, Balaji Krishnamurthy, Mausoom Sarkar

Comments: Accepted at WACV 2026 Conference

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[638] arXiv:2512.00832 [pdf, ps, other]: Title: PanFlow: Decoupled Motion Control for Panoramic Video Generation

Authors: Cheng Zhang, Hanwen Liang, Donny Y. Chen, Qianyi Wu, Konstantinos N. Plataniotis, Camilo Cruz Gambardella, Jianfei Cai

Comments: Accepted by AAAI. Code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[639] arXiv:2512.00814 [pdf, ps, other]: Title: IRPO: Boosting Image Restoration via Post-training GRPO

Authors: Haoxuan Xu. Yi Liu, Boyuan Jiang, Jinlong Peng, Donghao Luo, Xiaobin Hu, Shuicheng Yan, Haoang Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[640] arXiv:2512.00805 [pdf, ps, other]: Title: Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding

Authors: Pengfei Hu, Meng Cao, Yingyao Wang, Yi Wang, Jiahua Dong, Jun Song, Yu Cheng, Bo Zheng, Xiaodan Liang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[641] arXiv:2512.00796 [pdf, ps, other]: Title: CircleFlow: Flow-Guided Camera Blur Estimation using a Circle Grid Target

Authors: Jiajian He, Enjie Hu, Shiqi Chen, Tianchen Qiu, Huajun Feng, Zhihai Xu, Yueting Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[642] arXiv:2512.00794 [pdf, ps, other]: Title: PolarGS: Polarimetric Cues for Ambiguity-Free Gaussian Splatting with Accurate Geometry Recovery

Authors: Bo Guo, Sijia Wen, Yifan Zhao, Jia Li, Zhiming Zheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[643] arXiv:2512.00773 [pdf, ps, other]: Title: DEJIMA: A Novel Large-scale Japanese Dataset for Image Captioning and Visual Question Answering

Authors: Toshiki Katsube, Taiga Fukuhara, Kenichiro Ando, Yusuke Mukuta, Kohei Uehara, Tatsuya Harada

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[644] arXiv:2512.00771 [pdf, ps, other]: Title: EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting Scenes

Authors: Xiaoshan Wu, Yifei Yu, Xiaoyang Lyu, Yihua Huang, Bo Wang, Baoheng Zhang, Zhongrui Wang, Xiaojuan Qi

Comments: Accepted at NeurIPS 2025 (spotlight)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[645] arXiv:2512.00765 [pdf, ps, other]: Title: The Outline of Deception: Physical Adversarial Attacks on Traffic Signs Using Edge Patches

Authors: Haojie Ji, Te Hu, Haowen Li, Long Jin, Chongshi Xin, Yuchi Yao, Jiarui Xiao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[646] arXiv:2512.00762 [pdf, ps, other]: Title: Seeing the Wind from a Falling Leaf

Authors: Zhiyuan Gao, Jiageng Mao, Hong-Xing Yu, Haozhe Lou, Emily Yue-Ting Jia, Jernej Barbic, Jiajun Wu, Yue Wang

Comments: Accepted at NeurIPS 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[647] arXiv:2512.00752 [pdf, ps, other]: Title: Charts Are Not Images: On the Challenges of Scientific Chart Editing

Authors: Shawn Li, Ryan Rossi, Sungchul Kim, Sunav Choudhary, Franck Dernoncourt, Puneet Mathur, Zhengzhong Tu, Yue Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[648] arXiv:2512.00748 [pdf, ps, other]: Title: Probabilistic Modeling of Multi-rater Medical Image Segmentation for Diversity and Personalization

Authors: Ke Liu, Shangde Gao, Yichao Fu, Shangqi Gao, Chunhua Shen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[649] arXiv:2512.00744 [pdf, ps, other]: Title: Joint Multi-scale Gated Transformer and Prior-guided Convolutional Network for Learned Image Compression

Authors: Zhengxin Chen, Xiaohai He, Tingrong Zhang, Shuhua Xiong, Chao Ren

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[650] arXiv:2512.00743 [pdf, ps, other]: Title: Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards

Authors: Qiang Lyu, Zicong Chen, Chongxiao Wang, Haolin Shi, Shibo Gao, Ran Piao, Youwei Zeng, Jianlou Si, Fei Ding, Jing Li, Chun Pong Lau, Weiqiang Wang

Comments: 20 pages, 15 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[651] arXiv:2512.00723 [pdf, ps, other]: Title: TrajDiff: End-to-end Autonomous Driving without Perception Annotation

Authors: Xingtai Gui, Jianbo Zhao, Wencheng Han, Jikai Wang, Jiahao Gong, Feiyang Tan, Cheng-zhong Xu, Jianbing Shen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[652] arXiv:2512.00718 [pdf, ps, other]: Title: RS-ISRefiner: Towards Better Adapting Vision Foundation Models for Interactive Segmentation of Remote Sensing Images

Authors: Deliang Wang, Peng Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[653] arXiv:2512.00714 [pdf, ps, other]: Title: Deep Learning-Based Computer Vision Models for Early Cancer Detection Using Multimodal Medical Imaging and Radiogenomic Integration Frameworks

Authors: Emmanuella Avwerosuoghene Oghenekaro

Journal-ref: International Journal of Computer Applications Technology and Research, vol. 14, no. 11, pp. 1-14, 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[654] arXiv:2512.00706 [pdf, ps, other]: Title: Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation

Authors: Chengzhi Yu, Yifan Xu, Yifan Chen, Wenyi Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[655] arXiv:2512.00700 [pdf, ps, other]: Title: CAR-Net: A Cascade Refinement Network for Rotational Motion Deblurring under Angle Information Uncertainty

Authors: Ka Chung Lai, Ahmet Cetinkaya

Comments: Accepted to AAIML 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[656] arXiv:2512.00694 [pdf, ps, other]: Title: Affordance-First Decomposition for Continual Learning in Video-Language Understanding

Authors: Mengzhu Xu, Hanzhi Liu, Ningkang Peng, Qianyu Chen, Canran Xiao

Comments: Under review

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[657] arXiv:2512.00691 [pdf, ps, other]: Title: Silhouette-based Gait Foundation Model

Authors: Dingqiang Ye, Chao Fan, Kartik Narayan, Bingzhe Wu, Chengwen Luo, Jianqiang Li, Vishal M. Patel

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[658] arXiv:2512.00677 [pdf, ps, other]: Title: Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer

Authors: Dong In Lee, Hyungjun Doh, Seunggeun Chi, Runlin Duan, Sangpil Kim, Karthik Ramani

Comments: 4D Scene Editing

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[659] arXiv:2512.00676 [pdf, ps, other]: Title: Realistic Handwritten Multi-Digit Writer (MDW) Number Recognition Challenges

Authors: Kiri L. Wagstaff

Comments: 10 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[660] arXiv:2512.00647 [pdf, ps, other]: Title: MambaScope: Coarse-to-Fine Scoping for Efficient Vision Mamba

Authors: Shanhui Liu, Rui Xu, Yunke Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[661] arXiv:2512.00641 [pdf, ps, other]: Title: Graph-Attention Network with Adversarial Domain Alignment for Robust Cross-Domain Facial Expression Recognition

Authors: Razieh Ghaedi, AmirReza BabaAhmadi, Reyer Zwiggelaar, Xinqi Fan, Nashid Alam

Comments: 17 pages, 5 figures. Accepted at the 17th Asian Conference on Machine Learning (ACML 2025), Taipei, Taiwan, December 9-12, 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[662] arXiv:2512.00639 [pdf, ps, other]: Title: Doppler-Enhanced Deep Learning: Improving Thyroid Nodule Segmentation with YOLOv5 Instance Segmentation

Authors: Mahmoud El Hussieni

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Performance (cs.PF)
[663] arXiv:2512.00626 [pdf, ps, other]: Title: XAI-Driven Skin Disease Classification: Leveraging GANs to Augment ResNet-50 Performance

Authors: Kim Gerard A. Villanueva, Priyanka Kumar

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[664] arXiv:2512.00625 [pdf, ps, other]: Title: Automatic Pith Detection in Tree Cross-Section Images Using Deep Learning

Authors: Tzu-I Liao, Mahmoud Fakhry, Jibin Yesudas Varghese

Comments: 8 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[665] arXiv:2512.00597 [pdf, ps, other]: Title: Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models

Authors: Thuraya Alzubaidi, Farhad R. Nezami, Muzammil Behzad

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[666] arXiv:2512.00582 [pdf, ps, other]: Title: SatireDecoder: Visual Cascaded Decoupling for Enhancing Satirical Image Comprehension

Authors: Yue Jiang, Haiwei Xue, Minghao Han, Mingcheng Li, Xiaolu Hou, Dingkang Yang, Lihua Zhang, Xu Zheng

Comments: Accepted by AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[667] arXiv:2512.00572 [pdf, ps, other]: Title: Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models

Authors: Mohammed Mohiuddin, Syed Mohammod Minhaz Hossain, Sumaiya Khanam, Prionkar Barua, Aparup Barua, MD Tamim Hossain

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[668] arXiv:2512.00565 [pdf, ps, other]: Title: Describe Anything Anywhere At Any Moment

Authors: Nicolas Gorlo, Lukas Schmid, Luca Carlone

Comments: 14 pages, 5 figures, 6 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[669] arXiv:2512.00557 [pdf, ps, other]: Title: NeuroVolve: Evolving Visual Stimuli toward Programmable Neural Objectives

Authors: Haomiao Chen, Keith W Jamison, Mert R. Sabuncu, Amy Kuceyeski

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[670] arXiv:2512.00547 [pdf, ps, other]: Title: Asset-Driven Sematic Reconstruction of Dynamic Scene with Multi-Human-Object Interactions

Authors: Sandika Biswas, Qianyi Wu, Biplab Banerjee, Hamid Rezatofighi

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[671] arXiv:2512.00539 [pdf, ps, other]: Title: SAIDO: Generalizable Detection of AI-Generated Images via Scene-Aware and Importance-Guided Dynamic Optimization in Continual Learning

Authors: Yongkang Hu, Yu Cheng, Yushuo Zhang, Yuan Xie, Zhaoxia Yin

Comments: 17 pages, 19 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[672] arXiv:2512.00534 [pdf, ps, other]: Title: Cross-Temporal 3D Gaussian Splatting for Sparse-View Guided Scene Update

Authors: Zeyuan An, Yanghang Xiao, Zhiying Leng, Frederick W. B. Li, Xiaohui Liang

Comments: AAAI2026 accepted

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[673] arXiv:2512.00532 [pdf, ps, other]: Title: Image Generation as a Visual Planner for Robotic Manipulation

Authors: Ye Pang

Comments: 11 pages 9 figures Under review at CVPR 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[674] arXiv:2512.00514 [pdf, ps, other]: Title: Terrain Sensing with Smartphone Structured Light: 2D Dynamic Time Warping for Grid Pattern Matching

Authors: Tanaka Nobuaki

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[675] arXiv:2512.00493 [pdf, ps, other]: Title: CC-FMO: Camera-Conditioned Zero-Shot Single Image to 3D Scene Generation with Foundation Model Orchestration

Authors: Boshi Tang, Henry Zheng, Rui Huang, Gao Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[676] arXiv:2512.00489 [pdf, ps, other]: Title: Learning What Helps: Task-Aligned Context Selection for Vision Tasks

Authors: Jingyu Guo, Emir Konuk, Fredrik Strand, Christos Matsoukas, Kevin Smith

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[677] arXiv:2512.00475 [pdf, ps, other]: Title: Structured Context Learning for Generic Event Boundary Detection

Authors: Xin Gu, Congcong Li, Xinyao Wang, Dexiang Hong, Libo Zhang, Tiejian Luo, Longyin Wen, Heng Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[678] arXiv:2512.00473 [pdf, ps, other]: Title: RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

Authors: Junyan Ye, Leiqi Zhu, Yuncheng Guo, Dongzhi Jiang, Zilong Huang, Yifan Zhang, Zhiyuan Yan, Haohuan Fu, Conghui He, Weijia Li

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[679] arXiv:2512.00456 [pdf, ps, other]: Title: CausalAffect: Causal Discovery for Facial Affective Understanding

Authors: Guanyu Hu, Tangzheng Lian, Dimitrios Kollias, Oya Celiktutan, Xinyu Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[680] arXiv:2512.00450 [pdf, ps, other]: Title: RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources Applications

Authors: Amit Kumar Gupta, Farhan Sheth, Hammad Shaikh, Dheeraj Kumar, Angkul Puniya, Deepak Panwar, Sandeep Chaurasia, Priya Mathur

Comments: 20 pages, 10 figures, 10 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[681] arXiv:2512.00438 [pdf, ps, other]: Title: FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal

Authors: Hang Xu, Linjiang Huang, Feng Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[682] arXiv:2512.00428 [pdf, ps, other]: Title: Recognizing Pneumonia in Real-World Chest X-rays with a Classifier Trained with Images Synthetically Generated by Nano Banana

Authors: Jiachuan Peng, Kyle Lam, Jianing Qiu

Comments: 9 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[683] arXiv:2512.00425 [pdf, ps, other]: Title: What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

Authors: Minh-Quan Le, Yuanzhi Zhu, Vicky Kalogeiton, Dimitris Samaras

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[684] arXiv:2512.00424 [pdf, ps, other]: Title: Recovering Origin Destination Flows from Bus CCTV: Early Results from Nairobi and Kigali

Authors: Nthenya Kyatha, Jay Taneja

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[685] arXiv:2512.00422 [pdf, ps, other]: Title: PhysGen: Physically Grounded 3D Shape Generation for Industrial Design

Authors: Yingxuan You, Chen Zhao, Hantao Zhang, Mingda Xu, Pascal Fua

Comments: 14 pages, 10 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[686] arXiv:2512.00413 [pdf, ps, other]: Title: SplatFont3D: Structure-Aware Text-to-3D Artistic Font Generation with Part-Level Style Control

Authors: Ji Gan, Lingxu Chen, Jiaxu Leng, Xinbo Gao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[687] arXiv:2512.00408 [pdf, ps, other]: Title: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Authors: Lingdong Wang, Guan-Ming Su, Divya Kothandaraman, Tsung-Wei Huang, Mohammad Hajiesmaili, Ramesh K. Sitaraman

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[688] arXiv:2512.00395 [pdf, ps, other]: Title: Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction

Authors: Jiazhen Liu, Mingkuan Feng, Long Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[689] arXiv:2512.00387 [pdf, ps, other]: Title: WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing

Authors: Kaihang Pan, Weile Chen, Haiyi Qiu, Qifan Yu, Wendong Bu, Zehan Wang, Yun Zhu, Juncheng Li, Siliang Tang

Comments: 32 pages, 20 figures. Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[690] arXiv:2512.00385 [pdf, ps, other]: Title: EZ-SP: Fast and Lightweight Superpoint-Based 3D Segmentation

Authors: Louis Geist, Loic Landrieu, Damien Robert

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[691] arXiv:2512.00381 [pdf, ps, other]: Title: Pore-scale Image Patch Dataset and A Comparative Evaluation of Pore-scale Facial Features

Authors: Dong Li, HuaLiang Lin, JiaYu Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[692] arXiv:2512.00369 [pdf, ps, other]: Title: POLARIS: Projection-Orthogonal Least Squares for Robust and Adaptive Inversion in Diffusion Models

Authors: Wenshuo Chen, Haosen Li, Shaofeng Liang, Lei Wang, Haozhe Jia, Kaishen Yuan, Jieming Wu, Bowen Tian, Yutao Yue

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[693] arXiv:2512.00368 [pdf, ps, other]: Title: THCRL: Trusted Hierarchical Contrastive Representation Learning for Multi-View Clustering

Authors: Jian Zhu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[694] arXiv:2512.00365 [pdf, ps, other]: Title: Towards aligned body representations in vision models

Authors: Andrey Gizdov, Andrea Procopio, Yichen Li, Daniel Harari, Tomer Ullman

Comments: Andrea Procopio and Andrey Gizdov have equal contributions

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[695] arXiv:2512.00363 [pdf, ps, other]: Title: MM-DETR: An Efficient Multimodal Detection Transformer with Mamba-Driven Dual-Granularity Fusion and Frequency-Aware Modality Adapters

Authors: Jianhong Han, Yupei Wang, Yuan Zhang, Liang Chen

Comments: Manuscript submitted to IEEE Transactions on Geoscience and Remote Sensing

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[696] arXiv:2512.00355 [pdf, ps, other]: Title: SMamDiff: Spatial Mamba for Stochastic Human Motion Prediction

Authors: Junqiao Fan, Pengfei Liu, Haocong Rao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[697] arXiv:2512.00345 [pdf, ps, other]: Title: mmPred: Radar-based Human Motion Prediction in the Dark

Authors: Junqiao Fan, Haocong Rao, Jiarui Zhang, Jianfei Yang, Lihua Xie

Comments: This paper is accepted by AAAI-2026

Journal-ref: AAAI-2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[698] arXiv:2512.00343 [pdf, ps, other]: Title: Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models

Authors: Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[699] arXiv:2512.00336 [pdf, ps, other]: Title: MVAD : A Comprehensive Multimodal Video-Audio Dataset for AIGC Detection

Authors: Mengxue Hu, Yunfeng Diao, Changtao Miao, Jianshu Li, Zhe Li, Joey Tianyi Zhou

Comments: 7 pages,2 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[700] arXiv:2512.00327 [pdf, ps, other]: Title: Odometry Without Correspondence from Inertially Constrained Ruled Surfaces

Authors: Chenqi Zhu, Levi Burner, Yiannis Aloimonos

Comments: 14 pages, 13 figures, 5 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[701] arXiv:2512.00310 [pdf, ps, other]: Title: ART-ASyn: Anatomy-aware Realistic Texture-based Anomaly Synthesis Framework for Chest X-Rays

Authors: Qinyi Cao, Jianan Fan, Weidong Cai

Comments: Accepted in WACV2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[702] arXiv:2512.00308 [pdf, ps, other]: Title: Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation

Authors: Xiao Cui, Yulei Qin, Wengang Zhou, Hongsheng Li, Houqiang Li

Comments: NeurIPS 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[703] arXiv:2512.00300 [pdf, ps, other]: Title: TGSFormer: Scalable Temporal Gaussian Splatting for Embodied Semantic Scene Completion

Authors: Rui Qian, Haozhi Cao, Tianchen Deng, Tianxin Hu, Weixiang Guo, Shenghai Yuan, Lihua Xie

Comments: 14 pages, 10 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[704] arXiv:2512.00294 [pdf, ps, other]: Title: Words into World: A Task-Adaptive Agent for Language-Guided Spatial Retrieval in AR

Authors: Lixing Guo, Tobias Höllerer

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[705] arXiv:2512.00281 [pdf, ps, other]: Title: Rethinking Lung Cancer Screening: AI Nodule Detection and Diagnosis Outperforms Radiologists, Leading Models, and Standards Beyond Size and Growth

Authors: Sylvain Bodard, Pierre Baudot, Benjamin Renoust, Charles Voyton, Gwendoline De Bie, Ezequiel Geremia, Van-Khoa Le, Danny Francis, Pierre-Henri Siot, Yousra Haddou, Vincent Bobin, Jean-Christophe Brisset, Carey C. Thomson, Valerie Bourdes, Benoit Huet

Comments: 25 pages, 8 figures, with supplementary information containing 11 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
[706] arXiv:2512.00275 [pdf, ps, other]: Title: HIMOSA: Efficient Remote Sensing Image Super-Resolution with Hierarchical Mixture of Sparse Attention

Authors: Yi Liu, Yi Wan, Xinyi Liu, Qiong Wu, Panwang Xia, Xuejun Huang, Yongjun Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[707] arXiv:2512.00269 [pdf, ps, other]: Title: USB: Unified Synthetic Brain Framework for Bidirectional Pathology-Healthy Generation and Editing

Authors: Jun Wang, Peirong Liu

Comments: 16 pages, 17 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[708] arXiv:2512.00264 [pdf, ps, other]: Title: HeartFormer: Semantic-Aware Dual-Structure Transformers for 3D Four-Chamber Cardiac Point Cloud Reconstruction

Authors: Zhengda Ma, Abhirup Banerjee

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[709] arXiv:2512.00261 [pdf, ps, other]: Title: UniDiff: Parameter-Efficient Adaptation of Diffusion Models for Land Cover Classification with Multi-Modal Remotely Sensed Imagery and Sparse Annotations

Authors: Yuzhen Hu, Saurabh Prasad

Comments: Camera-ready for WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[710] arXiv:2512.00255 [pdf, ps, other]: Title: Relightable Holoported Characters: Capturing and Relighting Dynamic Human Performance from Sparse Views

Authors: Kunwar Maheep Singh, Jianchun Chen, Vladislav Golyanik, Stephan J. Garbin, Thabo Beeler, Rishabh Dabral, Marc Habermann, Christian Theobalt

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[711] arXiv:2512.00226 [pdf, ps, other]: Title: DenseScan: Advancing 3D Scene Understanding with 2D Dense Annotation

Authors: Zirui Wang, Tao Zhang

Comments: Workshop on Space in Vision, Language, and Embodied AI at NeurIPS 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[712] arXiv:2512.00208 [pdf, ps, other]: Title: ReactionMamba: Generating Short &Long Human Reaction Sequences

Authors: Hajra Anwar Beg, Baptiste Chopin, Hao Tang, Mohamed Daoudi

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[713] arXiv:2512.00198 [pdf, ps, other]: Title: Mammo-FM: Breast-specific foundational model for Integrated Mammographic Diagnosis, Prognosis, and Reporting

Authors: Shantanu Ghosh, Vedant Parthesh Joshi, Rayan Syed, Aya Kassem, Abhishek Varshney, Payel Basak, Weicheng Dai, Judy Wawira Gichoya, Hari M. Trivedi, Imon Banerjee, Shyam Visweswaran, Clare B. Poynton, Kayhan Batmanghelich

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[714] arXiv:2512.00194 [pdf, ps, other]: Title: AutocleanEEG ICVision: Automated ICA Artifact Classification Using Vision-Language AI

Authors: Zag ElSayed, Grace Westerkamp, Gavin Gammoh, Yanchen Liu, Peyton Siekierski, Craig Erickson, Ernest Pedapati

Comments: 6 pages, 8 figures

Journal-ref: Conference ICMI2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
[715] arXiv:2512.00179 [pdf, ps, other]: Title: Efficient Edge-Compatible CNN for Speckle-Based Material Recognition in Laser Cutting Systems

Authors: Mohamed Abdallah Salem (North Dakota State University), Nourhan Zein Diab (New Mansoura University)

Comments: Copyright 2025 IEEE. This is the author's version of the work that has been Accepted for publication in the Proceedings of the 2025 IEEE The 35th International Conference on Computer Theory and Applications (ICCTA 2025). Final published version will be available on IEEE Xplore

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[716] arXiv:2512.00130 [pdf, ps, other]: Title: Local and Global Context-and-Object-part-Aware Superpixel-based Data Augmentation for Deep Visual Recognition

Authors: Fadi Dornaika, Danyang Sun

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[717] arXiv:2512.00129 [pdf, ps, other]: Title: Analysis of Incursive Breast Cancer in Mammograms Using YOLO, Explainability, and Domain Adaptation

Authors: Jayan Adhikari, Prativa Joshi, Susish Baral

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[718] arXiv:2512.00125 [pdf, ps, other]: Title: Hybrid Synthetic Data Generation with Domain Randomization Enables Zero-Shot Vision-Based Part Inspection Under Extreme Class Imbalance

Authors: Ruo-Syuan Mei, Sixian Jia, Guangze Li, Soo Yeon Lee, Brian Musser, William Keller, Sreten Zakula, Jorge Arinez, Chenhui Shao

Comments: Submitted to the NAMRC 54

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[719] arXiv:2512.00117 [pdf, ps, other]: Title: TinyViT: Field Deployable Transformer Pipeline for Solar Panel Surface Fault and Severity Screening

Authors: Ishwaryah Pandiarajan, Mohamed Mansoor Roomi Sindha, Uma Maheswari Pandyan, Sharafia N

Comments: 3pages, 2figures,ICGVIP 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[720] arXiv:2512.00103 [pdf, ps, other]: Title: Comparative Analysis of Vision Transformer, Convolutional, and Hybrid Architectures for Mental Health Classification Using Actigraphy-Derived Images

Authors: Ifeanyi Okala

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[721] arXiv:2512.00091 [pdf, ps, other]: Title: Deep Filament Extraction for 3D Concrete Printing

Authors: Karam Mawas, Mehdi Maboudi, Pedro Achanccaray, Markus Gerke

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[722] arXiv:2512.00089 [pdf, ps, other]: Title: TeleViT1.0: Teleconnection-aware Vision Transformers for Subseasonal to Seasonal Wildfire Pattern Forecasts

Authors: Ioannis Prapas, Nikolaos Papadopoulos, Nikolaos-Ioannis Bountos, Dimitrios Michail, Gustau Camps-Valls, Ioannis Papoutsis

Comments: Under review

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[723] arXiv:2512.00088 [pdf, ps, other]: Title: SemImage: Semantic Image Representation for Text, a Novel Framework for Embedding Disentangled Linguistic Features

Authors: Mohammad Zare

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[724] arXiv:2512.00087 [pdf, ps, other]: Title: Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom Data

Authors: Ivo Bueno, Ruikun Hou, Babette Bühler, Tim Fütterer, James Drimalla, Jonathan Kyle Foster, Peter Youngs, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci

Comments: This article has been accepted for publication in the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[725] arXiv:2512.00086 [pdf, ps, other]: Title: Multi-modal On-Device Learning for Monocular Depth Estimation on Ultra-low-power MCUs

Authors: Davide Nadalini, Manuele Rusci, Elia Cereda, Luca Benini, Francesco Conti, Daniele Palossi

Comments: 14 pages, 9 figures, 3 tables. Associated open-source release available at: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[726] arXiv:2512.00084 [pdf, ps, other]: Title: A Fast and Efficient Modern BERT based Text-Conditioned Diffusion Model for Medical Image Segmentation

Authors: Venkata Siddharth Dhara, Pawan Kumar

Comments: 15 pages, 3 figures, Accepted in Slide 3 10th International Conference on Computer Vision & Image Processing (CVIP 2026)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[727] arXiv:2512.00082 [pdf, ps, other]: Title: Exploring Diagnostic Prompting Approach for Multimodal LLM-based Visual Complexity Assessment: A Case Study of Amazon Search Result Pages

Authors: Divendar Murtadak, Yoon Kim, Trilokya Akula

Comments: 9 pages, 4 figures, 9 tables. Study on diagnostic prompting for multimodal LLM-based visual complexity assessment of Amazon search result pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[728] arXiv:2512.00080 [pdf, ps, other]: Title: Conceptual Evaluation of Deep Visual Stereo Odometry for the MARWIN Radiation Monitoring Robot in Accelerator Tunnels

Authors: André Dehne, Juri Zach, Peer Stelldinger

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[729] arXiv:2512.00078 [pdf, ps, other]: Title: Diffusion-Based Synthetic Brightfield Microscopy Images for Enhanced Single Cell Detection

Authors: Mario de Jesus da Graca, Jörg Dahlkemper, Peer Stelldinger

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[730] arXiv:2512.00075 [pdf, ps, other]: Title: Adapter Shield: A Unified Framework with Built-in Authentication for Preventing Unauthorized Zero-Shot Image-to-Image Generation

Authors: Jun Jia, Hongyi Miao, Yingjie Zhou, Wangqiu Zhou, Jianbo Zhang, Linhan Cao, Dandan Zhu, Hua Yang, Xiongkuo Min, Wei Sun, Guangtao Zhai

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[731] arXiv:2512.00073 [pdf, ps, other]: Title: ProvRain: Rain-Adaptive Denoising and Vehicle Detection via MobileNet-UNet and Faster R-CNN

Authors: Aswinkumar Varathakumaran, Nirmala Paramanandham

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[732] arXiv:2512.00065 [pdf, ps, other]: Title: Satellite to Street : Disaster Impact Estimator

Authors: Sreesritha Sai, Sai Venkata Suma Sreeja, Deepthi, Nikhil

Comments: 11 pages,9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[733] arXiv:2512.00061 [pdf, ps, other]: Title: DL-CapsNet: A Deep and Light Capsule Network

Authors: Pouya Shiri, Amirali Baniasadi

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[734] arXiv:2512.00060 [pdf, ps, other]: Title: PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving

Authors: Abdolazim Rezaei, Mehdi Sookhak

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[735] arXiv:2512.00042 [pdf, ps, other]: Title: Closing the Gap: Data-Centric Fine-Tuning of Vision Language Models for the Standardized Exam Questions

Authors: Egemen Sert, Şeyda Ertekin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
[736] arXiv:2512.00008 [pdf, ps, other]: Title: MOTION: ML-Assisted On-Device Low-Latency Motion Recognition

Authors: Veeramani Pugazhenthi, Wei-Hsiang Chu, Junwei Lu, Jadyn N. Miyahira, Soheil Salehi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[737] arXiv:2512.02020 (cross-list from cs.RO) [pdf, ps, other]: Title: EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI

Authors: Jianlei Chang, Ruofeng Mei, Wei Ke, Xiangyu Xu

Comments: Accepted by AAAI 2026. Project Page: this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[738] arXiv:2512.01993 (cross-list from cs.RO) [pdf, ps, other]: Title: RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies

Authors: Guillermo Garcia-Cobo, Maximilian Igl, Peter Karkus, Zhejun Zhang, Michael Watson, Yuxiao Chen, Boris Ivanovic, Marco Pavone

Comments: Preprint

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[739] arXiv:2512.01979 (cross-list from cs.AI) [pdf, ps, other]: Title: Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference Feedback

Authors: Aiden Yiliu Li, Bizhi Yu, Daoan Lei, Tianhe Ren, Shilong Liu

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[740] arXiv:2512.01946 (cross-list from cs.RO) [pdf, ps, other]: Title: Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models

Authors: Paul Pacaud, Ricardo Garcia, Shizhe Chen, Cordelia Schmid

Comments: Code, Data, and Models available at this https URL The paper contains 8 pages, 9 figures, 6 tables

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[741] arXiv:2512.01913 (cross-list from eess.IV) [pdf, ps, other]: Title: Disentangling Progress in Medical Image Registration: Beyond Trend-Driven Architectures towards Domain-Specific Strategies

Authors: Bailiang Jian, Jiazhen Pan, Rohit Jena, Morteza Ghahremani, Hongwei Bran Li, Daniel Rueckert, Christian Wachinger, Benedikt Wiestler

Comments: Submitted to Medical Image Analysis. Journal Extension of arXiv:2407.19274

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[742] arXiv:2512.01822 (cross-list from cs.CL) [pdf, ps, other]: Title: InnoGym: Benchmarking the Innovation Potential of AI Agents

Authors: Jintian Zhang, Kewei Xu, Jingsheng Zheng, Zhuoyun Yu, Yuqi Zhu, Yujie Luo, Lanning Wei, Shuofei Qiao, Lun Du, Da Zheng, Shumin Deng, Huajun Chen, Ningyu Zhang

Comments: Work in progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[743] arXiv:2512.01818 (cross-list from cs.LG) [pdf, ps, other]: Title: Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning

Authors: Lama Alssum, Hasan Abed Al Kader Hammoud, Motasem Alfarra, Juan C Leon Alcazar, Bernard Ghanem

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[744] arXiv:2512.01687 (cross-list from cs.NE) [pdf, ps, other]: Title: Revisiting Direct Encoding: Learnable Temporal Dynamics for Static Image Spiking Neural Networks

Authors: Huaxu He

Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
[745] arXiv:2512.01550 (cross-list from cs.RO) [pdf, ps, other]: Title: NavForesee: A Unified Vision-Language World Model for Hierarchical Planning and Dual-Horizon Navigation Prediction

Authors: Fei Liu, Shichao Xie, Minghua Luo, Zedong Chu, Junjun Hu, Xiaolong Wu, Mu Xu

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[746] arXiv:2512.01461 (cross-list from cs.LG) [pdf, ps, other]: Title: Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging

Authors: Kuangpu Guo, Yuhe Ding, Jian Liang, Zilei Wang, Ran He

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[747] arXiv:2512.01329 (cross-list from cs.GR) [pdf, ps, other]: Title: TagSplat: Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking

Authors: Hanzhi Guo, Dongdong Weng, Mo Su, Yixiao Chen, Xiaonuo Dongye, Chenyu Xu

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[748] arXiv:2512.01324 (cross-list from hep-ex) [pdf, ps, other]: Title: Panda: Self-distillation of Reusable Sensor-level Representations for High Energy Physics

Authors: Samuel Young, Kazuhiro Terao

Comments: 23 pages, 15 figures, preprint. Project page at this https URL

Subjects: High Energy Physics - Experiment (hep-ex); Computer Vision and Pattern Recognition (cs.CV)
[749] arXiv:2512.01252 (cross-list from cs.LG) [pdf, ps, other]: Title: Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe

Authors: Yahui Liu, Yang Yue, Jingyuan Zhang, Chenxi Sun, Yang Zhou, Wencong Zeng, Ruiming Tang, Guorui Zhou

Comments: 9 pages, 7 figures

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[750] arXiv:2512.01181 (cross-list from cs.LG) [pdf, ps, other]: Title: First On-Orbit Demonstration of a Geospatial Foundation Model

Authors: Andrew Du, Roberto Del Prete, Alejandro Mousist, Nick Manser, Fabrice Marre, Andrew Barton, Carl Seubert, Gabriele Meoni, Tat-Jun Chin

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[751] arXiv:2512.01152 (cross-list from cs.LG) [pdf, ps, other]: Title: Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A Provably Efficient Solution

Authors: Shravan Chaudhari, Yoav Wald, Suchi Saria

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[752] arXiv:2512.01104 (cross-list from cs.RO) [pdf, ps, other]: Title: Estimation of Kinematic Motion from Dashcam Footage

Authors: Evelyn Zhang, Alex Richardson, Jonathan Sprinkle

Comments: 8 pages, 10 figures

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[753] arXiv:2512.01061 (cross-list from cs.RO) [pdf, ps, other]: Title: Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

Authors: Haoru Xue, Tairan He, Zi Wang, Qingwei Ben, Wenli Xiao, Zhengyi Luo, Xingye Da, Fernando Castañeda, Guanya Shi, Shankar Sastry, Linxi "Jim" Fan, Yuke Zhu

Comments: this https URL

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[754] arXiv:2512.01009 (cross-list from cs.RO) [pdf, ps, other]: Title: FOM-Nav: Frontier-Object Maps for Object Goal Navigation

Authors: Thomas Chabal, Shizhe Chen, Jean Ponce, Cordelia Schmid

Comments: Project page: this https URL

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[755] arXiv:2512.00883 (cross-list from cs.MM) [pdf, ps, other]: Title: Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound

Authors: Jiahua Wang, Shannan Yan, Leqi Zheng, Jialong Wu, Yaoxin Mao

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[756] arXiv:2512.00818 (cross-list from cs.AI) [pdf, ps, other]: Title: Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning

Authors: Haozhen Gong, Xiaozhong Ji, Yuansen Liu, Wenbin Wu, Xiaoxiao Yan, Jingjing Liu, Kai Wu, Jiazhen Pan, Bailiang Jian, Jiangning Zhang, Xiaobin Hu, Hongwei Bran Li

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[757] arXiv:2512.00777 (cross-list from cs.RO) [pdf, ps, other]: Title: Sign Language Recognition using Bidirectional Reservoir Computing

Authors: Nitin Kumar Singh, Arie Rachmad Syulistyo, Yuichiro Tanaka, Hakaru Tamukoh

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[758] arXiv:2512.00736 (cross-list from cs.LG) [pdf, ps, other]: Title: REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories

Authors: Jacob Thompson, Emiliano Garcia-Lopez, Yonatan Bisk

Journal-ref: Proceedings of the Conference on Language Modeling (COLM 2025)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[759] arXiv:2512.00659 (cross-list from cs.RO) [pdf, ps, other]: Title: Fast, Robust, Permutation-and-Sign Invariant SO(3) Pattern Alignment

Authors: Anik Sarker, Alan T. Asbeck

Subjects: Robotics (cs.RO); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV)
[760] arXiv:2512.00403 (cross-list from cs.LG) [pdf, ps, other]: Title: SelfAI: Building a Self-Training AI System with LLM Agents

Authors: Xiao Wu, Ting-Zhu Huang, Liang-Jian Deng, Xiaobing Yu, Yu Zhong, Shangqi Deng, Ufaq Khan, Jianghao Wu, Xiaofeng Liu, Imran Razzak, Xiaojun Chang, Yutong Xie

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[761] arXiv:2512.00396 (cross-list from cs.LG) [pdf, ps, other]: Title: Time-Series at the Edge: Tiny Separable CNNs for Wearable Gait Detection and Optimal Sensor Placement

Authors: Andrea Procopio, Marco Esposito, Sara Raggiunto, Andrey Gizdov, Alberto Belli, Paola Pierleoni

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[762] arXiv:2512.00350 (cross-list from eess.IV) [pdf, ps, other]: Title: MedCondDiff: Lightweight, Robust, Semantically Guided Diffusion for Medical Image Segmentation

Authors: Ruirui Huang, Jiacheng Li

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[763] arXiv:2512.00324 (cross-list from cs.RO) [pdf, ps, other]: Title: MILE: A Mechanically Isomorphic Exoskeleton Data Collection System with Fingertip Visuotactile Sensing for Dexterous Manipulation

Authors: Jinda Du, Jieji Ren, Qiaojun Yu, Ningbin Zhang, Yu Deng, Xingyu Wei, Yufei Liu, Guoying Gu, Xiangyang Zhu

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[764] arXiv:2512.00287 (cross-list from cs.RO) [pdf, ps, other]: Title: RealAppliance: Let High-fidelity Appliance Assets Controllable and Workable as Aligned Real Manuals

Authors: Yuzheng Gao, Yuxing Long, Lei Kang, Yuchong Guo, Ziyan Yu, Shangqing Mao, Jiyao Zhang, Ruihai Wu, Dongjiang Li, Hui Shen, Hao Dong

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[765] arXiv:2512.00229 (cross-list from cs.LG) [pdf, ps, other]: Title: TIE: A Training-Inversion-Exclusion Framework for Visually Interpretable and Uncertainty-Guided Out-of-Distribution Detection

Authors: Pirzada Suhail, Rehna Afroz, Amit Sethi

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[766] arXiv:2512.00138 (cross-list from cs.AR) [pdf, ps, other]: Title: Ternary-Input Binary-Weight CNN Accelerator Design for Miniature Object Classification System with Query-Driven Spatial DVS

Authors: Yuyang Li, Swasthik Muloor, Jack Laudati, Nickolas Dematteis, Yidam Park, Hana Kim, Nathan Chang, Inhee Lee

Comments: 6 pages.12 figures & 2 table

Subjects: Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[767] arXiv:2512.00120 (cross-list from cs.SD) [pdf, ps, other]: Title: Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment

Authors: Jiaying Hong, Ting Zhu, Thanet Markchom, Huizhi Liang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[768] arXiv:2512.00115 (cross-list from cs.SD) [pdf, ps, other]: Title: MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning

Authors: Kyeongha Rho, Hyeongkeun Lee, Jae Won Cho, Joon Son Chung

Comments: 10 pages, 5 figures

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[769] arXiv:2512.00094 (cross-list from cs.CR) [pdf, ps, other]: Title: HMARK: Radioactive Multi-Bit Semantic-Latent Watermarking for Diffusion Models

Authors: Kexin Li, Guozhen Ding, Ilya Grishchenko, David Lie

Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[770] arXiv:2512.00076 (cross-list from cs.RO) [pdf, ps, other]: Title: Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning

Authors: Minghe Gao, Juncheng Li, Yuze Lin, Xuqi Liu, Jiaming Ji, Xiaoran Pan, Zihan Xu, Xian Li, Mingjie Li, Wei Ji, Rong Wei, Rui Tang, Qizhou Wang, Kai Shen, Jun Xiao, Qi Wu, Siliang Tang, Yueting Zhuang

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[771] arXiv:2512.00074 (cross-list from cs.RO) [pdf, ps, other]: Title: Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

Authors: Qiwei Liang, Boyang Cai, Minghao Lai, Sitong Zhuang, Tao Lin, Yan Qin, Yixuan Ye, Jiaming Liang, Renjing Xu

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[772] arXiv:2512.00052 (cross-list from physics.geo-ph) [pdf, ps, other]: Title: Coarse-to-Fine Non-Rigid Registration for Side-Scan Sonar Mosaicking

Authors: Can Lei, Nuno Gracias, Rafael Garcia, Hayat Rajani, Huigang Wang

Subjects: Geophysics (physics.geo-ph); Computer Vision and Pattern Recognition (cs.CV)
[773] arXiv:2512.00041 (cross-list from cs.RO) [pdf, ps, other]: Title: VISTAv2: World Imagination for Indoor Vision-and-Language Navigation

Authors: Yanjia Huang, Xianshun Jiang, Xiangbo Gao, Mingyang Wu, Zhengzhong Tu

Comments: 11 pages, 5 figures

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[774] arXiv:2512.00037 (cross-list from cs.RO) [pdf, ps, other]: Title: ICD-Net: Inertial Covariance Displacement Network for Drone Visual-Inertial SLAM

Authors: Tali Orlev Shapira, Itzik Klein

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[775] arXiv:2512.00027 (cross-list from cs.RO) [pdf, ps, other]: Title: A Survey on Improving Human Robot Collaboration through Vision-and-Language Navigation

Authors: Nivedan Yakolli, Avinash Gautam, Abhijit Das, Yuankai Qi, Virendra Singh Shekhawat

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[776] arXiv:2512.00024 (cross-list from cs.RO) [pdf, ps, other]: Title: Learning from Watching: Scalable Extraction of Manipulation Trajectories from Human Videos

Authors: X. Hu, G. Ye

Comments: Accepted to RSS 2025 Workshop

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[777] arXiv:2512.00021 (cross-list from cs.RO) [pdf, ps, other]: Title: Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges

Authors: Kemal Oksuz, Alexandru Buburuzan, Anthony Knittel, Yuhan Yao, Puneet K. Dokania

Comments: Under review

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[778] arXiv:2512.00019 (cross-list from cs.RO) [pdf, ps, other]: Title: A Comprehensive Survey on Surgical Digital Twin

Authors: Afsah Sharaf Khan, Falong Fan, Doohwan DH Kim, Abdurrahman Alshareef, Dong Chen, Justin Kim, Ernest Carter, Bo Liu, Jerzy W. Rozenblit, Bernard Zeigler

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Mon, 8 Dec 2025
Fri, 5 Dec 2025
Thu, 4 Dec 2025
Wed, 3 Dec 2025
Tue, 2 Dec 2025

[ total of 778 entries: 1-778 ]
[ showing 778 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)

> cs > cs.CV

Computer Vision and Pattern Recognition

Authors and titles for recent submissions

Mon, 8 Dec 2025

Fri, 5 Dec 2025

Thu, 4 Dec 2025

Wed, 3 Dec 2025

Tue, 2 Dec 2025