Computer Vision and Pattern Recognition

Authors and titles for recent submissions, skipping first 359

[ total of 778 entries: 1-100 | 60-159 | 160-259 | 260-359 | 360-459 | 460-559 | 560-659 | 660-759 | 760-778 ]
[ showing 100 entries per page: fewer | more | all ]

Wed, 3 Dec 2025 (showing first 100 of 141 entries)

[360] arXiv:2512.03046 [pdf, ps, other]: Title: MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues

Authors: Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Shuailei Ma, Ka Leong Cheng, Wen Wang, Qingyan Bai, Yuxuan Zhang, Yanhong Zeng, Yixuan Li, Xing Zhu, Yujun Shen, Qifeng Chen

Comments: Code and demo available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[361] arXiv:2512.03045 [pdf, ps, other]: Title: CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models

Authors: Minkyung Kwon, Jinhyeok Choi, Jiho Park, Seonghu Jeon, Jinhyuk Jang, Junyoung Seo, Minseop Kwak, Jin-Hwa Kim, Seungryong Kim

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[362] arXiv:2512.03043 [pdf, ps, other]: Title: OneThinker: All-in-one Reasoning Model for Image and Video

Authors: Kaituo Feng, Manyuan Zhang, Hongyu Li, Kaixuan Fan, Shuang Chen, Yilei Jiang, Dian Zheng, Peiwen Sun, Yiyuan Zhang, Haoze Sun, Yan Feng, Peng Pei, Xunliang Cai, Xiangyu Yue

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[363] arXiv:2512.03042 [pdf, ps, other]: Title: PPTArena: A Benchmark for Agentic PowerPoint Editing

Authors: Michael Ofengenden, Yunze Man, Ziqi Pang, Yu-Xiong Wang

Comments: 25 pages, 26 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[364] arXiv:2512.03041 [pdf, ps, other]: Title: MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Authors: Qinghe Wang, Xiaoyu Shi, Baolu Li, Weikang Bian, Quande Liu, Huchuan Lu, Xintao Wang, Pengfei Wan, Kun Gai, Xu Jia

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[365] arXiv:2512.03040 [pdf, ps, other]: Title: Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation

Authors: Zeqi Xiao, Yiwei Zhao, Lingxiao Li, Yushi Lan, Yu Ning, Rahul Garg, Roshni Cooper, Mohammad H. Taghavi, Xingang Pan

Comments: Project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[366] arXiv:2512.03036 [pdf, ps, other]: Title: ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

Authors: Mengchen Zhang, Qi Chen, Tong Wu, Zihan Liu, Dahua Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[367] arXiv:2512.03034 [pdf, ps, other]: Title: MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

Authors: Youxin Pang, Jiajun Liu, Lingfeng Tan, Yong Zhang, Feng Gao, Xiang Deng, Zhuoliang Kang, Xiaoming Wei, Yebin Liu

Comments: Our project website is this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[368] arXiv:2512.03020 [pdf, ps, other]: Title: Unrolled Networks are Conditional Probability Flows in MRI Reconstruction

Authors: Kehan Qi, Saumya Gupta, Qingqiao Hu, Weimin Lyu, Chao Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[369] arXiv:2512.03018 [pdf, ps, other]: Title: AutoBrep: Autoregressive B-Rep Generation with Unified Topology and Geometry

Authors: Xiang Xu, Pradeep Kumar Jayaraman, Joseph G. Lambourne, Yilin Liu, Durvesh Malpure, Pete Meltzer

Comments: Accepted to Siggraph Asia 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[370] arXiv:2512.03014 [pdf, ps, other]: Title: Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks

Authors: Matthew Dutson, Nathan Labiosa, Yin Li, Mohit Gupta

Comments: NeurIPS 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[371] arXiv:2512.03013 [pdf, ps, other]: Title: In-Context Sync-LoRA for Portrait Video Editing

Authors: Sagi Polaczek, Or Patashnik, Ali Mahdavi-Amiri, Daniel Cohen-Or

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[372] arXiv:2512.03010 [pdf, ps, other]: Title: SurfFill: Completion of LiDAR Point Clouds via Gaussian Surfel Splatting

Authors: Svenja Strobel, Matthias Innmann, Bernhard Egger, Marc Stamminger, Linus Franke

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
[373] arXiv:2512.03004 [pdf, ps, other]: Title: DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images

Authors: Xiaoxue Chen, Ziyi Xiong, Yuantao Chen, Gen Li, Nan Wang, Hongcheng Luo, Long Chen, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Hongyang Li, Ya-Qin Zhang, Hao Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[374] arXiv:2512.03000 [pdf, ps, other]: Title: DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

Authors: Kairun Wen, Yuzhi Huang, Runyu Chen, Hui Zheng, Yunlong Lin, Panwang Pan, Chenxin Li, Wenyan Cong, Jian Zhang, Junbin Lu, Chenguo Lin, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Yue Huang, Xinghao Ding, Rakesh Ranjan, Zhiwen Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[375] arXiv:2512.02993 [pdf, ps, other]: Title: TEXTRIX: Latent Attribute Grid for Native Texture Generation and Beyond

Authors: Yifei Zeng, Yajie Bao, Jiachen Qian, Shuang Wu, Youtian Lin, Hao Zhu, Buyu Li, Feihu Zhang, Xun Cao, Yao Yao

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[376] arXiv:2512.02991 [pdf, ps, other]: Title: GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection

Authors: Md Sohag Mia, Md Nahid Hasan, Tawhid Ahmed, Muhammad Abdullah Adnan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[377] arXiv:2512.02982 [pdf, ps, other]: Title: U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Authors: Xiang Xu, Ao Liang, Youquan Liu, Linfeng Li, Lingdong Kong, Ziwei Liu, Qingshan Liu

Comments: Preprint; 19 pages, 7 figures, 8 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[378] arXiv:2512.02981 [pdf, ps, other]: Title: InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration

Authors: Zhongyu Yang, Yingfang Yuan, Xuanming Jiang, Baoyi An, Wei Pang

Comments: Published in AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[379] arXiv:2512.02973 [pdf, ps, other]: Title: Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Authors: Yuan Xiong, Ziqi Miao, Lijun Li, Chen Qian, Jie Li, Jing Shao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[380] arXiv:2512.02972 [pdf, ps, other]: Title: BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection

Authors: Guowen Zhang, Chenhang He, Liyi Chen, Lei Zhang

Comments: Accept by AAAI26

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[381] arXiv:2512.02965 [pdf, ps, other]: Title: A Lightweight Real-Time Low-Light Enhancement Network for Embedded Automotive Vision Systems

Authors: Yuhan Chen, Yicui Shi, Guofa Li, Guangrui Bai, Jinyuan Shao, Xiangfei Huang, Wenbo Chu, Keqiang Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[382] arXiv:2512.02952 [pdf, ps, other]: Title: Layout Anything: One Transformer for Universal Room Layout Estimation

Authors: Md Sohag Mia, Muhammad Abdullah Adnan

Comments: Published at WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[383] arXiv:2512.02942 [pdf, ps, other]: Title: Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench

Authors: Lanxiang Hu, Abhilash Shankarampeta, Yixin Huang, Zilin Dai, Haoyang Yu, Yujie Zhao, Haoqiang Kang, Daniel Zhao, Tajana Rosing, Hao Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[384] arXiv:2512.02933 [pdf, ps, other]: Title: LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization

Authors: Zhihan Xiao, Lin Liu, Yixin Gao, Xiaopeng Zhang, Haoxuan Che, Songping Mai, Qi Tian

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[385] arXiv:2512.02932 [pdf, ps, other]: Title: EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View Synthesis

Authors: Yancheng Zhang, Guangyu Sun, Chen Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[386] arXiv:2512.02931 [pdf, ps, other]: Title: DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation

Authors: Ying Yang, Zhengyao Lv, Tianlin Pan, Haofan Wang, Binxin Yang, Hubery Yin, Chen Li, Chenyang Si

Comments: 23 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[387] arXiv:2512.02906 [pdf, ps, other]: Title: MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding

Authors: Fan Yang, Kaihao Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[388] arXiv:2512.02899 [pdf, ps, other]: Title: Glance: Accelerating Diffusion Models with 1 Sample

Authors: Zhuobai Dong, Rui Zhao, Songjie Wu, Junchao Yi, Linjie Li, Zhengyuan Yang, Lijuan Wang, Alex Jinpeng Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[389] arXiv:2512.02897 [pdf, ps, other]: Title: Polar Perspectives: Evaluating 2-D LiDAR Projections for Robust Place Recognition with Visual Foundation Models

Authors: Pierpaolo Serio, Giulio Pisaneschi, Andrea Dan Ryals, Vincenzo Infantino, Lorenzo Gentilini, Valentina Donzella, Lorenzo Pollini

Comments: 13 Pages, 5 Figures, 2 Tables Under Review

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[390] arXiv:2512.02895 [pdf, ps, other]: Title: MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm

Authors: Wei Chen, Chaoqun Du, Feng Gu, Wei He, Qizhen Li, Zide Liu, Xuhao Pan, Chang Ren, Xudong Rao, Chenfeng Wang, Tao Wei, Chengjun Yu, Pengfei Yu, Yufei Zheng, Chunpeng Zhou, Pan Zhou, Xuhan Zhu

Comments: 33 pages, 14 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[391] arXiv:2512.02870 [pdf, ps, other]: Title: Taming Camera-Controlled Video Generation with Verifiable Geometry Reward

Authors: Zhaoqing Wang, Xiaobo Xia, Zhuolin Bie, Jinlin Liu, Dongdong Yu, Jia-Wang Bian, Changhu Wang

Comments: 11 pages, 4 figures, 7 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[392] arXiv:2512.02867 [pdf, ps, other]: Title: MICCAI STSR 2025 Challenge: Semi-Supervised Teeth and Pulp Segmentation and CBCT-IOS Registration

Authors: Yaqi Wang, Zhi Li, Chengyu Wu, Jun Liu, Yifan Zhang, Jialuo Chen, Jiaxue Ni, Qian Luo, Jin Liu, Can Han, Changkai Ji, Zhi Qin Tan, Ajo Babu George, Liangyu Chen, Qianni Zhang, Dahong Qian, Shuai Wang, Huiyu Zhou

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[393] arXiv:2512.02860 [pdf, ps, other]: Title: RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association

Authors: Abdul Hannan, Furqan Malik, Hina Jabbar, Syed Suleman Sadiq, Mubashir Noman

Comments: Ranked 3rd in Fame 2026 Challenge, ICASSP

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[394] arXiv:2512.02850 [pdf, ps, other]: Title: Are Detectors Fair to Indian IP-AIGC? A Cross-Generator Study

Authors: Vishal Dubey, Pallavi Tyagi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[395] arXiv:2512.02846 [pdf, ps, other]: Title: Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?

Authors: Manuel Benavent-Lledo, Konstantinos Bacharidis, Victoria Manousaki, Konstantinos Papoutsakis, Antonis Argyros, Jose Garcia-Rodriguez

Comments: Accepted in WACV 2026 - Applications Track

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[396] arXiv:2512.02835 [pdf, ps, other]: Title: ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Authors: Yifan Li, Yingda Yin, Lingting Zhu, Weikai Chen, Shengju Qian, Xin Wang, Yanwei Fu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[397] arXiv:2512.02830 [pdf, ps, other]: Title: Defense That Attacks: How Robust Models Become Better Attackers

Authors: Mohamed Awad, Mahmoud Akrm, Walid Gomaa

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[398] arXiv:2512.02794 [pdf, ps, other]: Title: PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation

Authors: Fan Wu, Cheng Chen, Zhoujie Fu, Jiacheng Wei, Yi Xu, Deheng Ye, Guosheng Lin

Comments: codes:this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[399] arXiv:2512.02793 [pdf, ps, other]: Title: IC-World: In-Context Generation for Shared World Modeling

Authors: Fan Wu, Jiacheng Wei, Ruibo Li, Yi Xu, Junyou Li, Deheng Ye, Guosheng Lin

Comments: codes:this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[400] arXiv:2512.02792 [pdf, ps, other]: Title: HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

Authors: Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, Weili Guan

Comments: Accepted by ACM MM 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[401] arXiv:2512.02790 [pdf, ps, other]: Title: UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits

Authors: Keming Ye, Zhipeng Huang, Canmiao Fu, Qingyang Liu, Jiani Cai, Zheqi Lv, Chen Li, Jing Lyu, Zhou Zhao, Shengyu Zhang

Comments: 31 pages, 15 figures, 12 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[402] arXiv:2512.02789 [pdf, ps, other]: Title: TrackNetV5: Residual-Driven Spatio-Temporal Refinement and Motion Direction Decoupling for Fast Object Tracking

Authors: Tang Haonan, Chen Yanjun, Jiang Lezhi

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[403] arXiv:2512.02781 [pdf, ps, other]: Title: LumiX: Structured and Coherent Text-to-Intrinsic Generation

Authors: Xu Han, Biao Zhang, Xiangjun Tang, Xianzhi Li, Peter Wonka

Comments: The code will be available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[404] arXiv:2512.02780 [pdf, ps, other]: Title: Rethinking Surgical Smoke: A Smoke-Type-Aware Laparoscopic Video Desmoking Method and Dataset

Authors: Qifan Liang, Junlin Li, Zhen Han, Xihao Wang, Zhongyuan Wang, Bin Mei

Comments: 12 pages, 15 figures. Accepted to AAAI-26 (Main Technical Track)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[405] arXiv:2512.02751 [pdf, ps, other]: Title: AttMetNet: Attention-Enhanced Deep Neural Network for Methane Plume Detection in Sentinel-2 Satellite Imagery

Authors: Rakib Ahsan, MD Sadik Hossain Shanto, Md Sultanul Arifin, Tanzima Hashem

Comments: 15 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[406] arXiv:2512.02743 [pdf, ps, other]: Title: Reasoning-Aware Multimodal Fusion for Hateful Video Detection

Authors: Shuonan Yang, Tailin Chen, Jiangbei Yue, Guangliang Cheng, Jianbo Jiao, Zeyu Fu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[407] arXiv:2512.02737 [pdf, ps, other]: Title: Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone

Authors: Tristan Amadei, Enric Meinhardt-Llopis, Benedicte Bascle, Corentin Abgrall, Gabriele Facciolo

Comments: Accepted at WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[408] arXiv:2512.02727 [pdf, ps, other]: Title: DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions

Authors: Yifan Zhou, Takehiko Ohkawa, Guwenxiao Zhou, Kanoko Goto, Takumi Hirose, Yusuke Sekikawa, Nakamasa Inoue

Comments: Accepted to WACV 2026. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[409] arXiv:2512.02715 [pdf, ps, other]: Title: GeoViS: Geospatially Rewarded Visual Search for Remote Sensing Visual Grounding

Authors: Peirong Zhang, Yidan Zhang, Luxiao Xu, Jinliang Lin, Zonghao Guo, Fengxiang Wang, Xue Yang, Kaiwen Wei, Lei Wang

Comments: 11 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[410] arXiv:2512.02702 [pdf, ps, other]: Title: Tissue-mask supported inter-subject whole-body image registration in the UK Biobank -- A method benchmarking study

Authors: Yasemin Utkueri, Elin Lundström, Håkan Ahlström, Johan Öfverstedt, Joel Kullberg

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[411] arXiv:2512.02700 [pdf, ps, other]: Title: VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm

Authors: Zhenkai Wu, Xiaowen Ma, Zhenliang Ni, Dengming Zhang, Han Shu, Xin Jiang, Xinghao Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[412] arXiv:2512.02697 [pdf, ps, other]: Title: GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization

Authors: Zixuan Song, Jing Zhang, Di Wang, Zidie Zhou, Wenbin Liu, Haonan Guo, En Wang, Bo Du

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[413] arXiv:2512.02696 [pdf, ps, other]: Title: ALDI-ray: Adapting the ALDI Framework for Security X-ray Object Detection

Authors: Omid Reza Heidari, Yang Wang, Xinxin Zuo

Comments: Submitted to ICASSP 2026 Conference

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[414] arXiv:2512.02686 [pdf, ps, other]: Title: ClimaOoD: Improving Anomaly Segmentation via Physically Realistic Synthetic Data

Authors: Yuxing Liu, Yong Liu

Comments: Under review;

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[415] arXiv:2512.02685 [pdf, ps, other]: Title: Unsupervised Structural Scene Decomposition via Foreground-Aware Slot Attention with Pseudo-Mask Guidance

Authors: Huankun Sheng, Ming Li, Yixiang Wei, Yeying Fan, Yu-Hui Wen, Tieliang Gong, Yong-Jin Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[416] arXiv:2512.02681 [pdf, ps, other]: Title: PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution

Authors: Zhongbao Yang, Jiangxin Dong, Yazhou Yao, Jinhui Tang, Jinshan Pan

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[417] arXiv:2512.02668 [pdf, ps, other]: Title: UAUTrack: Towards Unified Multimodal Anti-UAV Visual Tracking

Authors: Qionglin Ren, Dawei Zhang, Chunxu Tian, Dan Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[418] arXiv:2512.02664 [pdf, ps, other]: Title: PolarGuide-GSDR: 3D Gaussian Splatting Driven by Polarization Priors and Deferred Reflection for Real-World Reflective Scenes

Authors: Derui Shan, Qian Qiao, Hao Lu, Tao Du, Peng Lu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[419] arXiv:2512.02660 [pdf, ps, other]: Title: Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation

Authors: Agathoklis Georgiou

Comments: 13 pages, 1 figure, 2 tables. Open-source implementation available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
[420] arXiv:2512.02650 [pdf, ps, other]: Title: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Authors: Junwon Lee, Juhan Nam, Jiyoung Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[421] arXiv:2512.02648 [pdf, ps, other]: Title: PoreTrack3D: A Benchmark for Dynamic 3D Gaussian Splatting in Pore-Scale Facial Trajectory Tracking

Authors: Dong Li, Jiahao Xiong, Yingda Huang, Le Chang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[422] arXiv:2512.02643 [pdf, ps, other]: Title: Leveraging Large-Scale Pretrained Spatial-Spectral Priors for General Zero-Shot Pansharpening

Authors: Yongchuan Cui, Peng Liu, Yi Zeng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[423] arXiv:2512.02624 [pdf, ps, other]: Title: PPTBench: Towards Holistic Evaluation of Large Language Models for PowerPoint Layout and Design Understanding

Authors: Zheng Huang, Xukai Liu, Tianyu Hu, Kai Zhang, Ye Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[424] arXiv:2512.02622 [pdf, ps, other]: Title: RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation Intelligence

Authors: Xuming He, Zehao Fan, Hengjia Li, Fan Zhuo, Hankun Xu, Senlin Cheng, Di Weng, Haifeng Liu, Can Ye, Boxi Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[425] arXiv:2512.02621 [pdf, ps, other]: Title: Content-Aware Texturing for Gaussian Splatting

Authors: Panagiotis Papantonakis, Georgios Kopanas, Fredo Durand, George Drettakis

Comments: Project Page: this https URL

Journal-ref: Eurographics Symposium on Rendering (Symposium Track), 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[426] arXiv:2512.02576 [pdf, ps, other]: Title: Co-speech Gesture Video Generation via Motion-Based Graph Retrieval

Authors: Yafei Song, Peng Zhang, Bang Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[427] arXiv:2512.02566 [pdf, ps, other]: Title: From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature

Authors: Kun Yuan, Min Woo Sun, Zhen Chen, Alejandro Lozano, Xiangteng He, Shi Li, Nassir Navab, Xiaoxiao Sun, Nicolas Padoy, Serena Yeung-Levy

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[428] arXiv:2512.02554 [pdf, ps, other]: Title: OmniPerson: Unified Identity-Preserving Pedestrian Generation

Authors: Changxiao Ma, Chao Yuan, Xincheng Shi, Yuzhuo Ma, Yongfei Zhang, Longkun Zhou, Yujia Zhang, Shangze Li, Yifan Xu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[429] arXiv:2512.02541 [pdf, ps, other]: Title: AVGGT: Rethinking Global Attention for Accelerating VGGT

Authors: Xianbing Sun, Zhikai Zhu, Zhengyu Lou, Bo Yang, Jinyang Tang, Liqing Zhang, He Wang, Jianfu Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[430] arXiv:2512.02536 [pdf, ps, other]: Title: WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens

Authors: Jian Yang, Dacheng Yin, Xiaoxuan He, Yong Li, Fengyun Rao, Jing Lyu, Wei Zhai, Yang Cao, Zheng-Jun Zha

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[431] arXiv:2512.02520 [pdf, ps, other]: Title: On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection

Authors: Tai Le-Gia

Comments: PhD Dissertation

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[432] arXiv:2512.02517 [pdf, ps, other]: Title: SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts

Authors: Jiaqi Liu, Ronghao Fu, Lang Sun, Haoran Liu, Xiao Yang, Weipeng Zhang, Xu Na, Zhuoran Duan, Bo Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[433] arXiv:2512.02512 [pdf, ps, other]: Title: Two-Stage Vision Transformer for Image Restoration: Colorization Pretraining + Residual Upsampling

Authors: Aditya Chaudhary, Prachet Dev Singh, Ankit Jha

Comments: Accepted as a Tiny Paper at the 13th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP 2025), IIT Mandi, India. 3 pages, 1 figure

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[434] arXiv:2512.02505 [pdf, ps, other]: Title: GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding

Authors: Jiaqi Liu, Ronghao Fu, Haoran Liu, Lang Sun, Bo Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[435] arXiv:2512.02498 [pdf, ps, other]: Title: dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model

Authors: Yumeng Li, Guang Yang, Hao Liu, Bowen Wang, Colin Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[436] arXiv:2512.02497 [pdf, ps, other]: Title: A Large Scale Benchmark for Test Time Adaptation Methods in Medical Image Segmentation

Authors: Wenjing Yu, Shuo Jiang, Yifei Chen, Shuo Chang, Yuanhan Wang, Beining Wu, Jie Dong, Mingxuan Liu, Shenghao Zhu, Feiwei Qin, Changmiao Wang, Qiyuan Tian

Comments: 45 pages, 18 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[437] arXiv:2512.02496 [pdf, ps, other]: Title: Attention-guided reference point shifting for Gaussian-mixture-based partial point set registration

Authors: Mizuki Kikkawa, Tatsuya Yatagawa, Yutaka Ohtake, Hiromasa Suzuki

Comments: 16 pages, 9 figures, 7 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[438] arXiv:2512.02492 [pdf, ps, other]: Title: YingVideo-MV: Music-Driven Multi-Stage Video Generation

Authors: Jiahui Chen, Weida Wang, Runhua Shi, Huan Yang, Chaofan Ding, Zihao Chen

Comments: 18 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[439] arXiv:2512.02487 [pdf, ps, other]: Title: Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding

Authors: Yerim Jeon, Miso Lee, WonJun Moon, Jae-Pil Heo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[440] arXiv:2512.02485 [pdf, ps, other]: Title: UCAgents: Unidirectional Convergence for Visual Evidence Anchored Multi-Agent Medical Decision-Making

Authors: Qianhan Feng, Zhongzhen Huang, Yakun Zhu, Xiaofan Zhang, Qi Dou

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[441] arXiv:2512.02482 [pdf, ps, other]: Title: G-SHARP: Gaussian Surgical Hardware Accelerated Real-time Pipeline

Authors: Vishwesh Nath, Javier G. Tejero, Ruilong Li, Filippo Filicori, Mahdi Azizian, Sean D. Huver

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[442] arXiv:2512.02473 [pdf, ps, other]: Title: WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling

Authors: Yuta Oshima, Yusuke Iwasawa, Masahiro Suzuki, Yutaka Matsuo, Hiroki Furuta

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[443] arXiv:2512.02469 [pdf, ps, other]: Title: TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution

Authors: Fengli Ran, Xiao Pu, Bo Liu, Xiuli Bi, Bin Xiao

Comments: Accepted in AAAI 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[444] arXiv:2512.02458 [pdf, ps, other]: Title: Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration

Authors: Zhongyi Cai, Yi Du, Chen Wang, Yu Kong

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[445] arXiv:2512.02457 [pdf, ps, other]: Title: Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

Authors: Jianzong Wu, Hao Lian, Dachao Hao, Ye Tian, Qingyu Shi, Biaolong Chen, Hao Jiang, Yunhai Tong

Comments: Project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[446] arXiv:2512.02456 [pdf, ps, other]: Title: See, Think, Learn: A Self-Taught Multimodal Reasoner

Authors: Sourabh Sharma, Sonam Gupta, Sadbhawna

Comments: Winter Conference on Applications of Computer Vision 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[447] arXiv:2512.02453 [pdf, ps, other]: Title: ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation

Authors: Kerui Chen, Jianrong Zhang, Ming Li, Zhonglong Zheng, Hehe Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[448] arXiv:2512.02450 [pdf, ps, other]: Title: HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild

Authors: Valentin Bieri, Marie-Julie Rakotosaona, Keisuke Tateno, Francis Engelmann, Leonidas Guibas

Comments: NeurIPS 2025 (Datasets and Benchmarks Track) Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[449] arXiv:2512.02448 [pdf, ps, other]: Title: nuScenes Revisited: Progress and Challenges in Autonomous Driving

Authors: Whye Kit Fong, Venice Erin Liong, Kok Seang Tan, Holger Caesar

Comments: 18 pages, 17 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[450] arXiv:2512.02447 [pdf, ps, other]: Title: Temporal Dynamics Enhancer for Directly Trained Spiking Object Detectors

Authors: Fan Luo, Zeyu Gao, Xinhao Luo, Kai Zhao, Yanfeng Lu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[451] arXiv:2512.02441 [pdf, ps, other]: Title: Basis-Oriented Low-rank Transfer for Few-Shot and Test-Time Adaptation

Authors: Junghwan Park, Woojin Cho, Junhyuk Heo, Darongsae Kwon, Kookjin Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[452] arXiv:2512.02438 [pdf, ps, other]: Title: Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources

Authors: Phuc Pham, Nhu Pham, Ngoc Quoc Ly

Comments: WACV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[453] arXiv:2512.02437 [pdf, ps, other]: Title: LightHCG: a Lightweight yet powerful HSIC Disentanglement based Causal Glaucoma Detection Model framework

Authors: Daeyoung Kim

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[454] arXiv:2512.02425 [pdf, ps, other]: Title: WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Authors: Woongyeong Yeo, Kangsan Kim, Jaehong Yoon, Sung Ju Hwang

Comments: Project page : this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[455] arXiv:2512.02423 [pdf, ps, other]: Title: GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

Authors: Haolong Yan, Yeqing Shen, Xin Huang, Jia Wang, Kaijun Tan, Zhixuan Liang, Hongxin Li, Zheng Ge, Osamu Yoshie, Si Li, Xiangyu Zhang, Daxin Jiang

Comments: 26 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[456] arXiv:2512.02421 [pdf, ps, other]: Title: Generalizing Vision-Language Models with Dedicated Prompt Guidance

Authors: Xinyao Li, Yinjie Min, Hongbo Chen, Zhekai Du, Fengling Li, Jingjing Li

Comments: Accepted to AAAI26

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[457] arXiv:2512.02413 [pdf, ps, other]: Title: MitUNet: Enhancing Floor Plan Recognition using a Hybrid Mix-Transformer and U-Net Architecture

Authors: Dmitriy Parashchuk, Alexey Kapshitskiy, Yuriy Karyakin

Comments: 9 pages, 4 figures, 3 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[458] arXiv:2512.02405 [pdf, ps, other]: Title: WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate

Authors: Anoop Cherian, River Doyle, Eyal Ben-Dov, Suhas Lohit, Kuan-Chuan Peng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[459] arXiv:2512.02400 [pdf, ps, other]: Title: Nav-$R^2$ Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation

Authors: Wentao Xiang, Haokang Zhang, Tianhang Yang, Zedong Chu, Ruihang Chu, Shichao Xie, Yujian Yuan, Jian Sun, Zhining Gu, Junjie Wang, Xiaolong Wu, Mu Xu, Yujiu Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

[ total of 778 entries: 1-100 | 60-159 | 160-259 | 260-359 | 360-459 | 460-559 | 560-659 | 660-759 | 760-778 ]
[ showing 100 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)

> cs > cs.CV

Computer Vision and Pattern Recognition

Authors and titles for recent submissions, skipping first 359

Wed, 3 Dec 2025 (showing first 100 of 141 entries)