Computer Vision and Pattern Recognition
Authors and titles for recent submissions
[ total of 778 entries: 1-778 ][ showing 778 entries per page: fewer | more ]
Mon, 8 Dec 2025
- [1] arXiv:2512.05965 [pdf, ps, other]
-
Title: EditThinker: Unlocking Iterative Reasoning for Any Image EditorAuthors: Hongyu Li, Manyuan Zhang, Dian Zheng, Ziyu Guo, Yimeng Jia, Kaituo Feng, Hao Yu, Yexin Liu, Yan Feng, Peng Pei, Xunliang Cai, Linjiang Huang, Hongsheng Li, Si LiuComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [2] arXiv:2512.05960 [pdf, ps, other]
-
Title: AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image EnhancementSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [3] arXiv:2512.05941 [pdf, ps, other]
-
Title: Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI GroundingAuthors: Zhiyuan Jiang, Shenghao Xie, Wenyi Li, Wenqiang Zu, Peihang Li, Jiahao Qiu, Siqi Pei, Lei Ma, Tiejun Huang, Mengdi Wang, Shilong LiuComments: Code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [4] arXiv:2512.05937 [pdf, ps, other]
-
Title: Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV PerceptionComments: 8 pages, 2 figures, 7 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [5] arXiv:2512.05936 [pdf, ps, other]
-
Title: Synset Signset Germany: a Synthetic Dataset for German Traffic Sign RecognitionAuthors: Anne Sielemann, Lena Loercher, Max-Lion Schumacher, Stefan Wolf, Masoud Roschani, Jens ZiehnComments: 8 pages, 8 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [6] arXiv:2512.05928 [pdf, ps, other]
-
Title: A Comparative Study on Synthetic Facial Data Generation Techniques for Face RecognitionComments: 18 pages, 17 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [7] arXiv:2512.05927 [pdf, ps, other]
-
Title: World Models That Know When They Don't Know: Controllable Video Generation with Calibrated UncertaintySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [8] arXiv:2512.05922 [pdf, ps, other]
-
Title: LPD: Learnable Prototypes with Diversity Regularization for Weakly Supervised Histopathology SegmentationAuthors: Khang Le, Anh Mai Vu, Thi Kim Trang Vo, Ha Thach, Ngoc Bui Lam Quang, Thanh-Huy Nguyen, Minh H. N. Le, Zhu Han, Chandra Mohan, Hien Van NguyenComments: Note: Khang Le and Anh Mai Vu contributed equallySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [9] arXiv:2512.05920 [pdf, ps, other]
-
Title: NICE: Neural Implicit Craniofacial Model for Orthognathic Surgery PredictionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [10] arXiv:2512.05905 [pdf, ps, other]
-
Title: SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose RepresentationsAuthors: Wenhao Yan, Sheng Ye, Zhuoyi Yang, Jiayan Teng, ZhenHui Dong, Kairui Wen, Xiaotao Gu, Yong-Jin Liu, Jie TangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [11] arXiv:2512.05866 [pdf, ps, other]
-
Title: Underwater Image Reconstruction Using a Swin Transformer-Based Generator and PatchGAN DiscriminatorComments: This paper has been accepted for presentation at the IEEE 28th International Conference on Computer and Information Technology (ICCIT), December 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [12] arXiv:2512.05859 [pdf, ps, other]
-
Title: Edit-aware RAW ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [13] arXiv:2512.05853 [pdf, ps, other]
-
Title: VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential AttackAuthors: Shiji Zhao, Shukun Xiong, Yao Huang, Yan Jin, Zhenyu Wu, Jiyang Guan, Ranjie Duan, Jialing Tao, Hui Xue, Xingxing WeiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [14] arXiv:2512.05830 [pdf, ps, other]
-
Title: Phase-OTDR Event Detection Using Image-Based Data Transformation and Deep LearningComments: 22 pages, 11 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [15] arXiv:2512.05814 [pdf, ps, other]
-
Title: UG-FedDA: Uncertainty-Guided Federated Domain Adaptation for Multi-Center Alzheimer's Disease DetectionAuthors: Fubao Zhu, Zhanyuan Jia, Zhiguo Wang, Huan Huang, Danyang Sun, Chuang Han, Yanting Li, Jiaofen Nan, Chen Zhao, Weihua ZhouComments: The code is already available on GitHub: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [16] arXiv:2512.05809 [pdf, ps, other]
-
Title: Probing the effectiveness of World Models for Spatial Reasoning through Test-time ScalingComments: Extended abstract at World Modeling Workshop 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [17] arXiv:2512.05802 [pdf, ps, other]
-
Title: Bring Your Dreams to Life: Continual Text-to-Video CustomizationAuthors: Jiahua Dong, Xudong Wang, Wenqi Liang, Zongyan Han, Meng Cao, Duzhen Zhang, Hanbin Zhao, Zhi Han, Salman Khan, Fahad Shahbaz KhanComments: Accepted to AAAI2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [18] arXiv:2512.05783 [pdf, ps, other]
-
Title: Curvature-Regularized Variational Autoencoder for 3D Scene Reconstruction from Sparse DepthSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [19] arXiv:2512.05774 [pdf, ps, other]
-
Title: Active Video Perception: Iterative Evidence Seeking for Agentic Long Video UnderstandingAuthors: Ziyang Wang, Honglu Zhou, Shijie Wang, Junnan Li, Caiming Xiong, Silvio Savarese, Mohit Bansal, Michael S. Ryoo, Juan Carlos NieblesComments: Website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [20] arXiv:2512.05762 [pdf, ps, other]
-
Title: FNOPT: Resolution-Agnostic, Self-Supervised Cloth Simulation using Meta-Optimization with Fourier Neural OperatorsComments: Accepted for WACVSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [21] arXiv:2512.05759 [pdf, ps, other]
-
Title: Label-Efficient Point Cloud Segmentation with Active LearningAuthors: Johannes Meyer, Jasper Hoffmann, Felix Schulz, Dominik Merkle, Daniel Buescher, Alexander Reiterer, Joschka Boedecker, Wolfram BurgardSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [22] arXiv:2512.05754 [pdf, ps, other]
-
Title: USV: Unified Sparsification for Accelerating Video Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [23] arXiv:2512.05746 [pdf, ps, other]
-
Title: HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [24] arXiv:2512.05740 [pdf, ps, other]
-
Title: Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic ExcisionAuthors: Lennart Maack, Julia-Kristin Graß, Lisa-Marie Toscha, Nathaniel Melling, Alexander SchlaeferSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [25] arXiv:2512.05710 [pdf, ps, other]
-
Title: Manifold-Aware Point Cloud Completion via Geodesic-Attentive Hierarchical Feature LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [26] arXiv:2512.05698 [pdf, ps, other]
-
Title: OWL: Unsupervised 3D Object Detection by Occupancy Guided Warm-up and Large Model Priors ReasoningAuthors: Xusheng Guo, Wanfa Zhang, Shijia Zhao, Qiming Xia, Xiaolong Xie, Mingming Wang, Hai Wu, Chenglu WenComments: The 40th Annual AAAI Conference on Artificial IntelligenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [27] arXiv:2512.05683 [pdf, ps, other]
-
Title: Physics-Informed Graph Neural Network with Frequency-Aware Learning for Optical Aberration CorrectionAuthors: Yong En Kok, Bowen Deng, Alexander Bentley, Andrew J. Parkes, Michael G. Somekh, Amanda J. Wright, Michael P. PoundSubjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
- [28] arXiv:2512.05674 [pdf, ps, other]
-
Title: Hyperspectral Unmixing with 3D Convolutional Sparse Coding and Projected Simplex Volume MaximizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [29] arXiv:2512.05672 [pdf, ps, other]
-
Title: InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse ProblemSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [30] arXiv:2512.05669 [pdf, ps, other]
-
Title: Deep Learning-Based Real-Time Sequential Facial Expression Analysis Using Geometric FeaturesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [31] arXiv:2512.05663 [pdf, ps, other]
-
Title: LeAD-M3D: Leveraging Asymmetric Distillation for Real-time Monocular 3D DetectionAuthors: Johannes Meier, Jonathan Michel, Oussema Dhaouadi, Yung-Hsu Yang, Christoph Reich, Zuria Bauer, Stefan Roth, Marc Pollefeys, Jacques Kaiser, Daniel CremersSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [32] arXiv:2512.05651 [pdf, ps, other]
-
Title: Self-Supervised AI-Generated Image Detection: A Camera Metadata PerspectiveSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [33] arXiv:2512.05635 [pdf, ps, other]
-
Title: Experts-Guided Unbalanced Optimal Transport for ISP Learning from Unpaired and/or Paired DataSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [34] arXiv:2512.05613 [pdf, ps, other]
-
Title: DistillFSS: Synthesizing Few-Shot Knowledge into a Lightweight Segmentation ModelAuthors: Pasquale De Marinis, Pieter M. Blok, Uzay Kaymak, Rogier Brussee, Gennaro Vessio, Giovanna CastellanoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [35] arXiv:2512.05610 [pdf, ps, other]
-
Title: NormalView: sensor-agnostic tree species classification from backpack and aerial lidar data using geometric projectionsAuthors: Juho Korkeala, Jesse Muhojoki, Josef Taher, Klaara Salolahti, Matti Hyyppä, Antero Kukko, Juha HyyppäComments: 19 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [36] arXiv:2512.05597 [pdf, ps, other]
-
Title: Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token PredictionComments: 10 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [37] arXiv:2512.05593 [pdf, ps, other]
-
Title: Learning High-Fidelity Cloth Animation via Skinning-Free Image TransferComments: Accepted to 3DV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [38] arXiv:2512.05571 [pdf, ps, other]
-
Title: MedDIFT: Multi-Scale Diffusion-Based Correspondence in 3D Medical ImagingAuthors: Xingyu Zhang, Anna Reithmeir, Fryderyk Kögl, Rickmer Braren, Julia A. Schnabel, Daniel M. LangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [39] arXiv:2512.05564 [pdf, ps, other]
-
Title: ProPhy: Progressive Physical Alignment for Dynamic World SimulationAuthors: Zijun Wang, Panwen Hu, Jing Wang, Terry Jingchen Zhang, Yuhao Cheng, Long Chen, Yiqiang Yan, Zutao Jiang, Hanhui Li, Xiaodan LiangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [40] arXiv:2512.05557 [pdf, ps, other]
-
Title: 2K-Characters-10K-Stories: A Quality-Gated Stylized Narrative Dataset with Disentangled Control and Sequence ConsistencySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [41] arXiv:2512.05546 [pdf, ps, other]
-
Title: Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language ModelsComments: 6 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [42] arXiv:2512.05539 [pdf, ps, other]
-
Title: Ideal Observer for Segmentation of Dead Leaves ImagesComments: 41 pages, 16 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Methodology (stat.ME)
- [43] arXiv:2512.05529 [pdf, ps, other]
-
Title: See in Depth: Training-Free Surgical Scene Segmentation with Monocular Depth PriorsComments: The first two authors contributed equallySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [44] arXiv:2512.05524 [pdf, ps, other]
-
Title: VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [45] arXiv:2512.05515 [pdf, ps, other]
-
Title: DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment AnalysisComments: Accepted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [46] arXiv:2512.05513 [pdf, ps, other]
-
Title: Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded ReasoningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [47] arXiv:2512.05511 [pdf, ps, other]
-
Title: Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient ParadigmAuthors: Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Yaokun Li, Xiujun Shu, Yuanhao Feng, Bo Wang, Yimian Dai, Xiangyu YueSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [48] arXiv:2512.05494 [pdf, ps, other]
- [49] arXiv:2512.05492 [pdf, ps, other]
-
Title: WaterWave: Bridging Underwater Image Enhancement into Video Streams via Wavelet-based Temporal Consistency FieldSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [50] arXiv:2512.05482 [pdf, ps, other]
-
Title: Concept-based Explainable Data Mining with VLM for 3D DetectionAuthors: Mai TsujimotoComments: 28 pages including appendix. Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [51] arXiv:2512.05481 [pdf, ps, other]
-
Title: UniFS: Unified Multi-Contrast MRI Reconstruction via Frequency-Spatial FusionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [52] arXiv:2512.05478 [pdf, ps, other]
-
Title: EmoStyle: Emotion-Driven Image StylizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [53] arXiv:2512.05468 [pdf, ps, other]
-
Title: University Building Recognition Dataset in Thailand for the mission-oriented IoT sensor systemAuthors: Takara Taniguchi, Yudai Ueda, Atsuya Muramatsu, Kohki Hashimoto, Ryo Yagi, Hideya Ochiai, Chaodit AswakulSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [54] arXiv:2512.05446 [pdf, ps, other]
-
Title: TED-4DGS: Temporally Activated and Embedding-based Deformation for 4DGS CompressionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [55] arXiv:2512.05422 [pdf, ps, other]
-
Title: ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information InteractionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [56] arXiv:2512.05418 [pdf, ps, other]
-
Title: Performance Evaluation of Deep Learning for Tree Branch Segmentation in Autonomous Forestry SystemsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [57] arXiv:2512.05415 [pdf, ps, other]
-
Title: Moving object detection from multi-depth images with an attention-enhanced CNNAuthors: Masato Shibukawa, Fumi Yoshida, Toshifumi Yanagisawa, Takashi Ito, Hirohisa Kurosaki, Makoto Yoshikawa, Kohki Kamiya, Ji-an Jiang, Wesley Fraser, JJ Kavelaars, Susan Benecchi, Anne Verbiscer, Akira Hatakeyama, Hosei O, Naoya OzakiComments: 14 pages, 22 figures, submitted to PASJSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [58] arXiv:2512.05412 [pdf, ps, other]
-
Title: YOLO and SGBM Integration for Autonomous Tree Branch Detection and Depth Estimation in Radiata Pine Pruning ApplicationsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [59] arXiv:2512.05410 [pdf, ps, other]
-
Title: Genetic Algorithms For Parameter Optimization for Disparity Map Generation of Radiata Pine Branch ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [60] arXiv:2512.05398 [pdf, ps, other]
-
Title: The Dynamic Prior: Understanding 3D Structures for Casual Dynamic VideosComments: Code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [61] arXiv:2512.05394 [pdf, ps, other]
-
Title: Delving into Latent Spectral Biasing of Video VAEs for Superior DiffusabilitySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [62] arXiv:2512.05391 [pdf, ps, other]
-
Title: LoC-Path: Learning to Compress for Pathology Multimodal Large Language ModelsAuthors: Qingqiao Hu, Weimin Lyu, Meilong Xu, Kehan Qi, Xiaoling Hu, Saumya Gupta, Jiawei Zhou, Chao ChenComments: 20 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [63] arXiv:2512.05385 [pdf, ps, other]
-
Title: ShaRP: SHAllow-LayeR Pruning for Video Large Language Models AccelerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [64] arXiv:2512.05362 [pdf, ps, other]
-
Title: PoolNet: Deep Learning for 2D to 3D Video Process ValidationComments: All code related to this paper can be found at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [65] arXiv:2512.05359 [pdf, ps, other]
-
Title: Group Orthogonal Low-Rank Adaptation for RGB-T TrackingComments: 13 pages, 8 figures. Accepted by AAAI 2026. Extended versionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [66] arXiv:2512.05354 [pdf, ps, other]
-
Title: SplatPainter: Interactive Authoring of 3D Gaussians from 2D Edits via Test-Time TrainingComments: project page this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [67] arXiv:2512.05343 [pdf, ps, other]
-
Title: SpaceControl: Introducing Test-Time Spatial Control to 3D Generative ModelingAuthors: Elisabetta Fedele, Francis Engelmann, Ian Huang, Or Litany, Marc Pollefeys, Leonidas GuibasComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [68] arXiv:2512.05277 [pdf, ps, other]
-
Title: From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language ModelAuthors: Kevin Cannons, Saeed Ranjbar Alvar, Mohammad Asiful Hossain, Ahmad Rezaei, Mohsen Gholami, Alireza Heidarikhazaei, Zhou Weimin, Yong Zhang, Mohammad AkbariSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [69] arXiv:2512.05272 [pdf, ps, other]
-
Title: Inferring Compositional 4D Scenes without Ever Seeing OneComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [70] arXiv:2512.05268 [pdf, ps, other]
-
Title: CARD: Correlation Aware Restoration with DiffusionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [71] arXiv:2512.05259 [pdf, ps, other]
-
Title: Age-Inclusive 3D Human Mesh Recovery for Action-Preserving Data AnonymizationAuthors: Georgios Chatzichristodoulou, Niki Efthymiou, Panagiotis Filntisis, Georgios Pavlakos, Petros MaragosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [72] arXiv:2512.05240 [pdf, ps, other]
-
Title: IE2Video: Adapting Pretrained Diffusion Models for Event-Based Video ReconstructionAuthors: Dmitrii Torbunov, Onur Okuducu, Yi Huang, Odera Dim, Rebecca Coles, Yonggang Cui, Yihui RenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [73] arXiv:2512.05209 [pdf, ps, other]
-
Title: DEAR: Dataset for Evaluating the Aesthetics of RenderingDEAR: Dataset for Evaluating the Aesthetics of RenderingAuthors: Vsevolod Plohotnuk, Artyom Panshin, Nikola Banić, Simone Bianco, Michael Freeman, Egor ErshovSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [74] arXiv:2512.05198 [pdf, ps, other]
-
Title: Your Latent Mask is Wrong: Pixel-Equivalent Latent Compositing for Diffusion ModelsComments: 16 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
- [75] arXiv:2512.05172 [pdf, ps, other]
-
Title: Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [76] arXiv:2512.05152 [pdf, ps, other]
-
Title: EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer ModelsComments: 6pages, 5figures, published to 2025 IEEE International Conference on Multimedia and Expo (ICME), Nantes, France, 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [77] arXiv:2512.05150 [pdf, ps, other]
-
Title: TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial FlowsComments: arxiv v0Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [78] arXiv:2512.05145 [pdf, ps, other]
-
Title: Self-Improving VLM Judges Without Human AnnotationsAuthors: Inna Wanyin Lin, Yushi Hu, Shuyue Stella Li, Scott Geng, Pang Wei Koh, Luke Zettlemoyer, Tim Althoff, Marjan GhazvininejadSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [79] arXiv:2512.05140 [pdf, other]
-
Title: FlowEO: Generative Unsupervised Domain Adaptation for Earth ObservationAuthors: Georges Le Bellier (CEDRIC - VERTIGO, Cnam), Nicolas Audebert (LaSTIG, IGN, CEDRIC - VERTIGO)Comments: 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Mar 2026, Tucson (AZ), United StatesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [80] arXiv:2512.05139 [pdf, ps, other]
-
Title: Spatiotemporal Satellite Image Downscaling with Transfer Encoders and Autoregressive Generative ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
- [81] arXiv:2512.05137 [pdf, ps, other]
-
Title: ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged ImagesAuthors: Yunfei Zhang, Yizhuo He, Yuanxun Shao, Zhengtao Yao, Haoyan Xu, Junhao Dong, Zhen Yao, Zhikang DongSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [82] arXiv:2512.05136 [pdf, ps, other]
-
Title: Fine-tuning an ECG Foundation Model to Predict Coronary CT Angiography OutcomesAuthors: Yujie Xiao, Gongzhen Tang, Deyun Zhang, Jun Li, Guangkun Nie, Haoyu Wang, Shun Huang, Tong Liu, Qinghao Zhao, Kangyin Chen, Shenda HongSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [83] arXiv:2512.05134 [pdf, ps, other]
-
Title: InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion ModelsAuthors: Zihao WuComments: 8 pages main, 8 pages appendix, 16 figures, 5 tables. Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
- [84] arXiv:2512.05132 [pdf, ps, other]
-
Title: Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution TrainingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [85] arXiv:2512.05131 [pdf, ps, other]
-
Title: AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language GuidanceComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [86] arXiv:2512.05959 (cross-list from cs.CL) [pdf, ps, other]
-
Title: M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAGAuthors: David Anugraha, Patrick Amadeus Irawan, Anshul Singh, En-Shiun Annie Lee, Genta Indra WinataComments: PreprintSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [87] arXiv:2512.05955 (cross-list from cs.RO) [pdf, ps, other]
-
Title: SIMPACT: Simulation-Enabled Action Planning using Vision-Language ModelsSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [88] arXiv:2512.05932 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Physically-Based Simulation of Automotive LiDARSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [89] arXiv:2512.05824 (cross-list from cs.AI) [pdf, ps, other]
-
Title: Multimodal Oncology Agent for IDH1 Mutation Prediction in Low-Grade GliomaAuthors: Hafsa Akebli (1), Adam Shephard (2), Vincenzo Della Mea (1), Nasir Rajpoot (2 and 3) ((1) University of Udine, Udine, Italy, (2) University of Warwick, Coventry, UK, (3) Histofy Ltd, Coventry, UK)Comments: 4 pages, 2 figuresSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [90] arXiv:2512.05812 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Toward Efficient and Robust Behavior Models for Multi-Agent Driving SimulationComments: This work has been submitted to the IEEE for possible publicationSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [91] arXiv:2512.05665 (cross-list from cs.CL) [pdf, ps, other]
-
Title: Interleaved Latent Visual Reasoning with Selective Perceptual ModelingComments: 11 pages, 6 figures. Code available at this https URLSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [92] arXiv:2512.05438 (cross-list from cs.HC) [pdf, ps, other]
-
Title: EXR: An Interactive Immersive EHR Visualization in Extended RealityAuthors: Benoit Marteau, Shaun Q. Y. Tan, Jieru Li, Andrew Hornback, Yishan Zhong, Shaunna Wang, Christian Lowson, Jason Woloff, Joshua M. Pahys, Steven W. Hwang, Coleman Hilton, May D. WangComments: 11 pages, 6 figures. Preprint version. This paper has been accepted to IEEE ICIR 2025. This is the author-prepared version and not the final published version. The final version will appear in IEEE XploSubjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
- [93] arXiv:2512.05299 (cross-list from eess.SY) [pdf, ps, other]
-
Title: ARCAS: An Augmented Reality Collision Avoidance System with SLAM-Based Tracking for Enhancing VRU SafetyAuthors: Ahmad Yehia, Jiseop Byeon, Tianyi Wang, Huihai Wang, Yiming Xu, Junfeng Jiao, Christian ClaudelComments: 8 pages, 3 figures, 1 tableSubjects: Systems and Control (eess.SY); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET); Robotics (cs.RO); Image and Video Processing (eess.IV)
- [94] arXiv:2512.05126 (cross-list from eess.AS) [pdf, ps, other]
-
Title: SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS ModelAuthors: Kaidi Wang, Yi He, Wenhao Guan, Weijie Wu, Hongwu Ding, Xiong Zhang, Di Wu, Meng Meng, Jian Luan, Lin Li, Qingyang HongSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
Fri, 5 Dec 2025
- [95] arXiv:2512.05115 [pdf, ps, other]
-
Title: Light-X: Generative 4D Video Rendering with Camera and Illumination ControlAuthors: Tianqi Liu, Zhaoxi Chen, Zihao Huang, Shaocong Xu, Saining Zhang, Chongjie Ye, Bohan Li, Zhiguo Cao, Wei Li, Hao Zhao, Ziwei LiuComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [96] arXiv:2512.05113 [pdf, ps, other]
-
Title: Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection SplattingComments: WACV 2025. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [97] arXiv:2512.05112 [pdf, ps, other]
-
Title: DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept GenerationAuthors: Dongzhi Jiang, Renrui Zhang, Haodong Li, Zhuofan Zong, Ziyu Guo, Jun He, Claire Guo, Junyan Ye, Rongyao Fang, Weijia Li, Rui Liu, Hongsheng LiComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [98] arXiv:2512.05111 [pdf, ps, other]
-
Title: ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual ReasoningAuthors: Shengyuan Ding, Xinyu Fang, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiangyu Zhao, Haodong Duan, Xiaoyi Dong, Jianze Liang, Bin Wang, Conghui He, Dahua Lin, Jiaqi WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [99] arXiv:2512.05110 [pdf, ps, other]
-
Title: ShadowDraw: From Any Object to Shadow-Drawing Compositional ArtComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
- [100] arXiv:2512.05106 [pdf, ps, other]
-
Title: NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Robotics (cs.RO)
- [101] arXiv:2512.05104 [pdf, ps, other]
-
Title: EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency ModulationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [102] arXiv:2512.05098 [pdf, ps, other]
- [103] arXiv:2512.05091 [pdf, ps, other]
-
Title: Visual Reasoning Tracer: Object-Level Grounded Reasoning BenchmarkAuthors: Haobo Yuan, Yueyi Sun, Yanwei Li, Tao Zhang, Xueqing Deng, Henghui Ding, Lu Qi, Anran Wang, Xiangtai Li, Ming-Hsuan YangComments: Technical Report; Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [104] arXiv:2512.05081 [pdf, ps, other]
-
Title: Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative CompressionComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [105] arXiv:2512.05079 [pdf, ps, other]
-
Title: Object Reconstruction under Occlusion with Generative Priors and Contact-induced ConstraintsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [106] arXiv:2512.05076 [pdf, ps, other]
-
Title: BulletTime: Decoupled Control of Time and Camera Pose for Video GenerationAuthors: Yiming Wang, Qihang Zhang, Shengqu Cai, Tong Wu, Jan Ackermann, Zhengfei Kuang, Yang Zheng, Frano Rajič, Siyu Tang, Gordon WetzsteinComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [107] arXiv:2512.05060 [pdf, ps, other]
-
Title: 4DLangVGGT: 4D Language-Visual Geometry Grounded TransformerAuthors: Xianfeng Wu, Yajing Bai, Minghan Li, Xianzu Wu, Xueqi Zhao, Zhongyuan Lai, Wenyu Liu, Xinggang WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [108] arXiv:2512.05044 [pdf, ps, other]
-
Title: Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single ImageComments: 18 PagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [109] arXiv:2512.05039 [pdf, ps, other]
-
Title: Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual EncodingComments: Submitted for review CVPR-2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [110] arXiv:2512.05025 [pdf, ps, other]
-
Title: RAMEN: Resolution-Adjustable Multimodal Encoder for Earth ObservationAuthors: Nicolas Houdré, Diego Marcos, Hugo Riffaud de Turckheim, Dino Ienco, Laurent Wendling, Camille Kurtz, Sylvain LobrySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [111] arXiv:2512.05021 [pdf, ps, other]
-
Title: HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [112] arXiv:2512.05016 [pdf, ps, other]
-
Title: Generative Neural Video Compression via Video Diffusion PriorSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [113] arXiv:2512.05006 [pdf, ps, other]
-
Title: Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent ObjectsComments: conferenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [114] arXiv:2512.05000 [pdf, ps, other]
-
Title: Reflection Removal through Efficient Adaptation of Diffusion TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [115] arXiv:2512.04996 [pdf, ps, other]
-
Title: A dynamic memory assignment strategy for dilation-based ICP algorithm on embedded GPUsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [116] arXiv:2512.04981 [pdf, ps, other]
-
Title: Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image ModelsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [117] arXiv:2512.04970 [pdf, ps, other]
-
Title: Stable Single-Pixel Contrastive Learning for Semantic and Geometric TasksComments: UniReps Workshop 2025, 12 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [118] arXiv:2512.04969 [pdf, ps, other]
-
Title: Rethinking the Use of Vision Transformers for AI-Generated Image DetectionComments: Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [119] arXiv:2512.04967 [pdf, ps, other]
-
Title: Balanced Few-Shot Episodic Learning for Accurate Retinal Disease DiagnosisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [120] arXiv:2512.04963 [pdf, ps, other]
-
Title: GeoPE:A Unified Geometric Positional Embedding for Structured TensorsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [121] arXiv:2512.04952 [pdf, ps, other]
-
Title: FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action TokenizationAuthors: Yicheng Liu, Shiduo Zhang, Zibin Dong, Baijun Ye, Tianyuan Yuan, Xiaopeng Yu, Linqi Yin, Chenhao Lu, Junhao Shi, Luca Jiang-Tao Yu, Liangtao Zheng, Tao Jiang, Jingjing Gong, Xipeng Qiu, Hang ZhaoSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [122] arXiv:2512.04943 [pdf, ps, other]
-
Title: Towards Adaptive Fusion of Multimodal Deep Networks for Human Action RecognitionAuthors: Novanto YudistiraSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [123] arXiv:2512.04939 [pdf, ps, other]
-
Title: LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token MergingAuthors: Zhijian Shu, Cheng Lin, Tao Xie, Wei Yin, Ben Li, Zhiyuan Pu, Weize Li, Yao Yao, Xun Cao, Xiaoyang Guo, Xiao-Xiao LongSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [124] arXiv:2512.04927 [pdf, ps, other]
-
Title: Virtually Unrolling the Herculaneum Papyri by Diffeomorphic Spiral FittingAuthors: Paul HendersonComments: Accepted at WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [125] arXiv:2512.04926 [pdf, ps, other]
-
Title: Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent DiffusionAuthors: Yueming Pan, Ruoyu Feng, Qi Dai, Yuqi Wang, Wenfeng Lin, Mingyu Guo, Chong Luo, Nanning ZhengSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [126] arXiv:2512.04904 [pdf, ps, other]
-
Title: ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow MatchingAuthors: Guanbo Huang, Jingjia Mao, Fanding Huang, Fengkai Liu, Xiangyang Luo, Yaoyuan Liang, Jiasheng Lu, Xiaoe Wang, Pei Liu, Ruiliu Fu, Shao-Lun HuangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [127] arXiv:2512.04890 [pdf, ps, other]
-
Title: Equivariant Symmetry-Aware Head Pose Estimation for Fetal MRIAuthors: Ramya Muthukrishnan, Borjan Gagoski, Aryn Lee, P. Ellen Grant, Elfar Adalsteinsson, Polina Golland, Benjamin BillotSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [128] arXiv:2512.04888 [pdf, ps, other]
-
Title: You Only Train Once (YOTO): A Retraining-Free Object Detection FrameworkAuthors: Priyanto Hidayatullah, Nurjannah Syakrani, Yudi Widhiyasana, Muhammad Rizqi Sholahuddin, Refdinal Tubagus, Zahri Al Adzani Hidayat, Hanri Fajar Ramadhan, Dafa Alfarizki Pratama, Farhan Muhammad YasinComments: This manuscript was first submitted to the Engineering (Elsevier Journal). The preprint version was posted to arXiv afterwards to facilitate open access and community feedbackSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [129] arXiv:2512.04883 [pdf, ps, other]
-
Title: SDG-Track: A Heterogeneous Observer-Follower Framework for High-Resolution UAV Tracking on Embedded PlatformsComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [130] arXiv:2512.04875 [pdf, ps, other]
-
Title: SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion DetectionAuthors: Qing Xu, Yanqian Wang, Xiangjian Hea, Yue Li, Yixuan Zhang, Rong Qu, Wenting Duan, Zhen ChenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [131] arXiv:2512.04862 [pdf, ps, other]
-
Title: Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance SensingAuthors: Maria-Paola Forte, Nikos Athanasiou, Giulia Ballardini, Jan Ulrich Bartels, Katherine J. Kuchenbecker, Michael J. BlackComments: * Equal contribution. Minor figure corrections compared to the ICCV 2025 versionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [132] arXiv:2512.04857 [pdf, ps, other]
-
Title: Autoregressive Image Generation Needs Only a Few Lines of Cached TokensSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [133] arXiv:2512.04837 [pdf, ps, other]
-
Title: A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real WorldAuthors: Jikang Cheng, Renye Yan, Zhiyuan Yan, Yaozhong Gan, Xueyi Zhang, Zhongyuan Wang, Wei Peng, Ling LiangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [134] arXiv:2512.04832 [pdf, ps, other]
-
Title: Tokenizing Buildings: A Transformer for Layout SynthesisComments: 8 pages, 1 page References, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
- [135] arXiv:2512.04830 [pdf, ps, other]
-
Title: FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene SynthesisComments: Novel View Synthesis, Driving Scene, Free Trajectory, Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [136] arXiv:2512.04821 [pdf, ps, other]
-
Title: LatentFM: A Latent Flow Matching Approach for Generative Medical Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [137] arXiv:2512.04815 [pdf, ps, other]
-
Title: RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGSAuthors: Chuanyu Fu, Guanying Chen, Yuqi Zhang, Kunbin Yao, Yuan Xiong, Chuan Huang, Shuguang Cui, Yasuyuki Matsushita, Xiaochun CaoComments: arXiv admin note: substantial text overlap with arXiv:2506.02751Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [138] arXiv:2512.04810 [pdf, ps, other]
-
Title: EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified ArchitectureComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [139] arXiv:2512.04786 [pdf, ps, other]
-
Title: LaFiTe: A Generative Latent Field for 3D Native TexturingAuthors: Chia-Hao Chen, Zi-Xin Zou, Yan-Pei Cao, Ze Yuan, Guan Luo, Xiaojuan Qi, Ding Liang, Song-Hai Zhang, Yuan-Chen GuoComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [140] arXiv:2512.04784 [pdf, ps, other]
-
Title: PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward ModelingAuthors: Bowen Ping, Chengyou Jia, Minnan Luo, Changliang Xia, Xin Shen, Zhuohang Dang, Hangwei QianSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [141] arXiv:2512.04761 [pdf, ps, other]
-
Title: Order Matters: 3D Shape Generation from Sequential VR SketchesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [142] arXiv:2512.04734 [pdf, ps, other]
-
Title: MT-Depth: Multi-task Instance feature analysis for the Depth CompletionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [143] arXiv:2512.04733 [pdf, ps, other]
-
Title: E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous DrivingAuthors: Yihong Tang, Haicheng Liao, Tong Nie, Junlin He, Ao Qu, Kehua Chen, Wei Ma, Zhenning Li, Lijun Sun, Chengzhong XuSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [144] arXiv:2512.04728 [pdf, ps, other]
-
Title: Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the WildAuthors: Yigui Feng, Qinglin Wang, Haotian Mo, Yang Liu, Ke Liu, Gencheng Liu, Xinhai Chen, Siqi Shen, Songzhu Mei, Jie LiuSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [145] arXiv:2512.04699 [pdf, ps, other]
-
Title: OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-ResolutionAuthors: Xinning Chai, Zhengxue Cheng, Yuhong Zhang, Hengsheng Zhang, Yingsheng Qin, Yucai Yang, Rong Xie, Li SongComments: Accepted as TCSVT, 15 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [146] arXiv:2512.04686 [pdf, ps, other]
-
Title: Towards Cross-View Point Correspondence in Vision-Language ModelsAuthors: Yipu Wang, Yuheng Ji, Yuyang Liu, Enshen Zhou, Ziqiang Yang, Yuxuan Tian, Ziheng Qin, Yue Liu, Huajie Tan, Cheng Chi, Zhiyuan Ma, Daniel Dajun Zeng, Xiaolong ZhengSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [147] arXiv:2512.04678 [pdf, ps, other]
-
Title: Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching DistillationAuthors: Yunhong Lu, Yanhong Zeng, Haobo Li, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jiapeng Zhu, Hengyuan Cao, Zhipeng Zhang, Xing Zhu, Yujun Shen, Min ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [148] arXiv:2512.04677 [pdf, ps, other]
-
Title: Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite LengthAuthors: Yubo Huang, Hailong Guo, Fangtai Wu, Shifeng Zhang, Shijie Huang, Qijun Gan, Lin Liu, Sirui Zhao, Enhong Chen, Jiaming Liu, Steven HoiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [149] arXiv:2512.04660 [pdf, ps, other]
-
Title: I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [150] arXiv:2512.04643 [pdf, ps, other]
-
Title: SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive DecodingAuthors: Chang-Hsun Wu, Kai-Po Chang, Yu-Yang Sheng, Hung-Kai Chung, Kuei-Chun Wang, Yu-Chiang Frank WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [151] arXiv:2512.04619 [pdf, ps, other]
-
Title: Denoise to Track: Harnessing Video Diffusion Priors for Robust CorrespondenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [152] arXiv:2512.04599 [pdf, ps, other]
-
Title: Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shotAuthors: Sheng Hang, Chaoxiang He, Hongsheng Hu, Hanqing Hu, Bin Benjamin Zhu, Shi-Feng Sun, Dawu Gu, Shuo WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [153] arXiv:2512.04597 [pdf, ps, other]
-
Title: When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question AnsweringSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [154] arXiv:2512.04585 [pdf, ps, other]
-
Title: SAM3-I: Segment Anything with InstructionsAuthors: Jingjing Li, Yue Feng, Yuchen Guo, Jincai Huang, Yongri Piao, Qi Bi, Miao Zhang, Xiaoqi Zhao, Qiang Chen, Shihao Zou, Wei Ji, Huchuan Lu, Li ChengComments: Preliminary results; work in progressSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [155] arXiv:2512.04581 [pdf, ps, other]
-
Title: Infrared UAV Target Tracking with Dynamic Feature Refinement and Global Contextual Attention Knowledge DistillationAuthors: Houzhang Fang, Chenxing Wu, Kun Bai, Tianqi Chen, Xiaolin Wang, Xiyang Liu, Yi Chang, Luxin YanComments: Accepted by IEEE TMMSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [156] arXiv:2512.04576 [pdf, ps, other]
-
Title: TARDis: Time Attenuated Representation Disentanglement for Incomplete Multi-Modal Tumor Segmentation and ClassificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [157] arXiv:2512.04568 [pdf, ps, other]
-
Title: Prompt2Craft: Generating Functional Craft Assemblies with LLMsAuthors: Vitor Hideyo Isume, Takuya Kiyokawa, Natsuki Yamanobe, Yukiyasu Domae, Weiwei Wan, Kensuke HaradaSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [158] arXiv:2512.04564 [pdf, ps, other]
-
Title: Dataset creation for supervised deep learning-based analysis of microscopic images -- review of important considerations and recommendationsAuthors: Christof A. Bertram, Viktoria Weiss, Jonas Ammeling, F. Maria Schabel, Taryn A. Donovan, Frauke Wilm, Christian Marzahl, Katharina Breininger, Marc AubrevilleSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [159] arXiv:2512.04563 [pdf, ps, other]
-
Title: COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial IntelligenceAuthors: Zefeng Zhang, Xiangzhao Hao, Hengzhu Tang, Zhenyu Zhang, Jiawei Sheng, Xiaodong Li, Zhenyang Li, Li Gao, Daiting Shi, Dawei Yin, Tingwen LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [160] arXiv:2512.04554 [pdf, ps, other]
-
Title: Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question AnsweringSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [161] arXiv:2512.04542 [pdf, ps, other]
-
Title: Gaussian Entropy Fields: Driving Adaptive Sparsity in 3D Gaussian OptimizationComments: 28 pages,11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [162] arXiv:2512.04540 [pdf, ps, other]
-
Title: VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory ManagementSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [163] arXiv:2512.04537 [pdf, ps, other]
-
Title: X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at ScaleSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [164] arXiv:2512.04536 [pdf, ps, other]
-
Title: Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [165] arXiv:2512.04534 [pdf, ps, other]
-
Title: Refaçade: Editing Object with Given Reference TextureAuthors: Youze Huang (1), Penghui Ruan (2), Bojia Zi (3), Xianbiao Qi (4), Jianan Wang (5), Rong Xiao (4) ((1) University of Electronic Science and Technology of China, (2) The Hong Kong Polytechnic University, (3) The Chinese University of Hong Kong, (4) IntelliFusion Inc., (5) Astribot Inc.)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [166] arXiv:2512.04532 [pdf, ps, other]
-
Title: PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance DisentanglementAuthors: Yu-Wei Zhan, Xin Wang, Hong Chen, Tongtong Feng, Wei Feng, Ren Wang, Guangyao Li, Qing Li, Wenwu ZhuSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [167] arXiv:2512.04528 [pdf, ps, other]
-
Title: Auto3R: Automated 3D Reconstruction and Scanning via Data-driven Uncertainty QuantificationAuthors: Chentao Shen, Sizhe Zheng, Bingqian Wu, Yaohua Feng, Yuanchen Fei, Mingyu Mei, Hanwen Jiang, Xiangru HuangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [168] arXiv:2512.04522 [pdf, ps, other]
-
Title: Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-IdentificationComments: 14 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [169] arXiv:2512.04521 [pdf, ps, other]
-
Title: WiFi-based Cross-Domain Gesture Recognition Using Attention MechanismSubjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
- [170] arXiv:2512.04520 [pdf, ps, other]
-
Title: Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image SegmentationAuthors: Chenlin Xu, Lei Zhang, Lituan Wang, Xinyu Pu, Pengfei Ma, Guangwu Qian, Zizhou Wang, Yan WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [171] arXiv:2512.04519 [pdf, ps, other]
-
Title: VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space MemoryAuthors: Yifei Yu, Xiaoshan Wu, Xinting Hu, Tao Hu, Yangtian Sun, Xiaoyang Lyu, Bo Wang, Lin Ma, Yuewen Ma, Zhongrui Wang, Xiaojuan QiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [172] arXiv:2512.04515 [pdf, ps, other]
-
Title: EgoLCD: Egocentric Video Generation with Long Context DiffusionAuthors: Liuzhou Zhang, Jiarui Ye, Yuanlei Wang, Ming Zhong, Mingju Cao, Wanke Xia, Bowen Zeng, Zeyu Zhang, Hao TangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [173] arXiv:2512.04511 [pdf, ps, other]
-
Title: DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain GuidanceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [174] arXiv:2512.04504 [pdf, ps, other]
-
Title: UltraImage: Rethinking Resolution Extrapolation in Image Diffusion TransformersAuthors: Min Zhao, Bokai Yan, Xue Yang, Hongzhou Zhu, Jintao Zhang, Shilong Liu, Chongxuan Li, Jun ZhuComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [175] arXiv:2512.04499 [pdf, ps, other]
-
Title: Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [176] arXiv:2512.04496 [pdf, ps, other]
-
Title: Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight RemovalAuthors: Tianci Huo, Lingfeng Qi, Yuhan Chen, Qihong Xue, Jinyuan Shao, Hai Yu, Jie Li, Zhanhua Zhang, Guofa LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [177] arXiv:2512.04487 [pdf, ps, other]
-
Title: Controllable Long-term Motion Generation with Extended Joint TargetsComments: WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [178] arXiv:2512.04485 [pdf, ps, other]
-
Title: Not All Birds Look The Same: Identity-Preserving Generation For BirdsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [179] arXiv:2512.04483 [pdf, ps, other]
-
Title: DeRA: Decoupled Representation Alignment for Video TokenizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [180] arXiv:2512.04461 [pdf, ps, other]
-
Title: UniTS: Unified Time Series Generative Model for Remote SensingAuthors: Yuxiang Zhang, Shunlin Liang, Wenyuan Li, Han Ma, Jianglei Xu, Yichuan Ma, Jiangwei Xie, Wei Li, Mengmeng Zhang, Ran Tao, Xiang-Gen XiaSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [181] arXiv:2512.04459 [pdf, ps, other]
-
Title: dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable ReasoningAuthors: Yingzi Ma, Yulong Cao, Wenhao Ding, Shuibai Zhang, Yan Wang, Boris Ivanovic, Ming Jiang, Marco Pavone, Chaowei XiaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [182] arXiv:2512.04456 [pdf, ps, other]
-
Title: GuidNoise: Single-Pair Guided Diffusion for Generalized Noise SynthesisComments: AAAI2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [183] arXiv:2512.04451 [pdf, ps, other]
-
Title: StreamEQA: Towards Streaming Video Understanding for Embodied ScenariosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [184] arXiv:2512.04441 [pdf, ps, other]
-
Title: MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous DrivingAuthors: Bin Sun, Yaoguang Cao, Yan Wang, Rui Wang, Jiachen Shang, Xiejie Feng, Jiayi Lu, Jia Shi, Shichun Yang, Xiaoyu Yane, Ziying SongSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [185] arXiv:2512.04426 [pdf, ps, other]
-
Title: Self-Paced and Self-Corrective Masked Prediction for Movie Trailer GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [186] arXiv:2512.04425 [pdf, ps, other]
-
Title: Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [187] arXiv:2512.04421 [pdf, ps, other]
-
Title: UTrice: Unifying Primitives in Differentiable Ray Tracing and Rasterization via Triangles for Particle-Based 3D ScenesComments: 13 pages, 10 figures, submitted to CVPR2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [188] arXiv:2512.04413 [pdf, ps, other]
-
Title: Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object DetectionComments: 12 pages, 8 figures, 11 tablesJournal-ref: IEEE Transactions on Geoscience and Remote Sensing 63 (2025) 1-11Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [189] arXiv:2512.04397 [pdf, ps, other]
-
Title: Performance Evaluation of Transfer Learning Based Medical Image Classification Techniques for Disease DetectionJournal-ref: 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Copenhagen, Denmark, 2025, pp. 1-5Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [190] arXiv:2512.04395 [pdf, ps, other]
-
Title: Fourier-Attentive Representation Learning: A Fourier-Guided Framework for Few-Shot Generalization in Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [191] arXiv:2512.04390 [pdf, ps, other]
-
Title: FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and DeblurringComments: 20 pages, 15 figures. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [192] arXiv:2512.04358 [pdf, ps, other]
-
Title: MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo MatchingAuthors: Ao Xu, Rujin Zhao, Xiong Xu, Boceng Huang, Yujia Jia, Hongfeng Long, Fuxuan Chen, Zilong Cao, Fangyuan ChenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [193] arXiv:2512.04356 [pdf, ps, other]
-
Title: Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive AlignmentComments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [194] arXiv:2512.04331 [pdf, ps, other]
-
Title: Open Set Face Forgery Detection via Dual-Level Evidence CollectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [195] arXiv:2512.04329 [pdf, ps, other]
-
Title: A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural NetworksSubjects: Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
- [196] arXiv:2512.04323 [pdf, ps, other]
-
Title: Bayes-DIC Net: Estimating Digital Image Correlation Uncertainty with Bayesian Neural NetworksComments: 17 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
- [197] arXiv:2512.04315 [pdf, ps, other]
-
Title: SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian SplattingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [198] arXiv:2512.04314 [pdf, ps, other]
-
Title: DisentangleFormer: Spatial-Channel Decoupling for Multi-Channel VisionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [199] arXiv:2512.04313 [pdf, ps, other]
-
Title: Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG DecodingAuthors: Haolin Xiong, Tianwen Fu, Pratusha Bhuvana Prasad, Yunxuan Cai, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie ZhaoComments: 16 pages, 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [200] arXiv:2512.04311 [pdf, ps, other]
-
Title: Real-time Cricket Sorting By SexComments: 13 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
- [201] arXiv:2512.04309 [pdf, ps, other]
-
Title: Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap CorrectionComments: Submitted to CVPR 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [202] arXiv:2512.04305 [pdf, ps, other]
-
Title: How (Mis)calibrated is Your Federated CLIP and What To Do About It?Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [203] arXiv:2512.04303 [pdf, ps, other]
-
Title: Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular ApplicationsComments: Accepted in 3DV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [204] arXiv:2512.04284 [pdf, ps, other]
-
Title: Learning Single-Image Super-Resolution in the JPEG Compressed DomainComments: 7 pages, 4 figures, 2 tables, SEEDS Workshop, ICIP 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [205] arXiv:2512.04283 [pdf, ps, other]
-
Title: Plug-and-Play Image Restoration with Flow Matching: A Continuous ViewpointAuthors: Fan Jia, Yuhao Huang, Shih-Hsin Wang, Cristina Garcia-Cardona, Andrea L. Bertozzi, Bao WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [206] arXiv:2512.04282 [pdf, ps, other]
-
Title: Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion TransferSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [207] arXiv:2512.04267 [pdf, ps, other]
-
Title: UniLight: A Unified Representation for LightingAuthors: Zitian Zhang, Iliyan Georgiev, Michael Fischer, Yannick Hold-Geoffroy, Jean-François Lalonde, Valentin DeschaintreComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [208] arXiv:2512.04248 [pdf, ps, other]
-
Title: MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [209] arXiv:2512.04238 [pdf, ps, other]
-
Title: 6 Fingers, 1 Kidney: Natural Adversarial Medical Images Reveal Critical Weaknesses of Vision-Language ModelsAuthors: Leon Mayer, Piotr Kalinowski, Caroline Ebersbach, Marcel Knopp, Tim Rädsch, Evangelia Christodoulou, Annika Reinke, Fiona R. Kolbinger, Lena Maier-HeinSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [210] arXiv:2512.04222 [pdf, ps, other]
-
Title: ReasonX: MLLM-Guided Intrinsic Image DecompositionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [211] arXiv:2512.04221 [pdf, ps, other]
-
Title: MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video SynthesisAuthors: Xiangyu Bai, He Liang, Bishoy Galoaa, Utsav Nandi, Shayda Moezzi, Yuhang He, Sarah OstadabbasSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [212] arXiv:2512.04219 [pdf, ps, other]
-
Title: Generalized Event Partonomy Inference with Structured Hierarchical Predictive LearningComments: 16 pages, 7 figures, 3 tables. Under ReviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [213] arXiv:2512.04187 [pdf, ps, other]
-
Title: OnSight Pathology: A real-time platform-agnostic computational pathology companion for histopathologyAuthors: Jinzhen Hu, Kevin Faust, Parsa Babaei Zadeh, Adrienn Bourkas, Shane Eaton, Andrew Young, Anzar Alvi, Dimitrios George Oreopoulos, Ameesha Paliwal, Assem Saleh Alrumeh, Evelyn Rose Kamski-Hennekam, Phedias DiamandisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [214] arXiv:2512.04175 [pdf, ps, other]
-
Title: Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [215] arXiv:2512.05117 (cross-list from cs.LG) [pdf, ps, other]
-
Title: The Universal Weight Subspace HypothesisComments: 37 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [216] arXiv:2512.05116 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Value Gradient Guidance for Flow Matching AlignmentComments: Accepted at NeurIPS 2025; 26 pages, 20 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [217] arXiv:2512.05114 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Deep infant brain segmentation from multi-contrast MRIComments: 8 pages, 8 figures, 1 table, website at this https URL, presented at the 2025 IEEE Asilomar Conference on Signals, Systems, and ComputersSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [218] arXiv:2512.05103 (cross-list from cs.LG) [pdf, ps, other]
-
Title: TV2TV: A Unified Framework for Interleaved Language and Video GenerationAuthors: Xiaochuang Han, Youssef Emad, Melissa Hall, John Nguyen, Karthik Padthe, Liam Robbins, Amir Bar, Delong Chen, Michal Drozdzal, Maha Elbayad, Yushi Hu, Shang-Wen Li, Sreya Dutta Roy, Jakob Verbeek, XuDong Wang, Marjan Ghazvininejad, Luke Zettlemoyer, Emily DinanSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [219] arXiv:2512.05094 (cross-list from cs.RO) [pdf, ps, other]
-
Title: From Generated Human Videos to Physically Plausible Robot TrajectoriesAuthors: James Ni, Zekai Wang, Wei Lin, Amir Bar, Yann LeCun, Trevor Darrell, Jitendra Malik, Roei HerzigComments: For project website, see this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [220] arXiv:2512.04814 (cross-list from cs.SD) [pdf, ps, other]
-
Title: Shared Multi-modal Embedding Space for Face-Voice AssociationComments: Ranked 1st in Fame 2026 Challenge, ICASSPSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
- [221] arXiv:2512.04763 (cross-list from cs.LG) [pdf, ps, other]
-
Title: MemLoRA: Distilling Expert Adapters for On-Device Memory SystemsSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [222] arXiv:2512.04705 (cross-list from cs.CC) [pdf, ps, other]
-
Title: Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge AcceleratorsComments: Submitted to IEEE Transactions on Emerging Topics in ComputingSubjects: Computational Complexity (cs.CC); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)
- [223] arXiv:2512.04625 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Rethinking Decoupled Knowledge Distillation: A Predictive Distribution PerspectiveComments: Accepted to IEEE TNNLSSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [224] arXiv:2512.04556 (cross-list from cs.GR) [pdf, ps, other]
-
Title: Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel ComplexComments: 10 pages, 7 figuresSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
- [225] arXiv:2512.04464 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Feature Engineering vs. Deep Learning for Automated Coin Grading: A Comparative Study on Saint-Gaudens Double EaglesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [226] arXiv:2512.04385 (cross-list from cs.LG) [pdf, ps, other]
-
Title: STeP-Diff: Spatio-Temporal Physics-Informed Diffusion Models for Mobile Fine-Grained Pollution ForecastingAuthors: Nan Zhou, Weijie Hong, Huandong Wang, Jianfeng Zheng, Qiuhua Wang, Yali Song, Xiao-Ping Zhang, Yong Li, Xinlei ChenSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [227] arXiv:2512.04264 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Studying Various Activation Functions and Non-IID Data for Machine Learning Model RobustnessSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [228] arXiv:2512.04092 (cross-list from physics.soc-ph) [pdf, ps, other]
-
Title: The changing surface of the world's roadsAuthors: Sukanya Randhawa, Guntaj Randhawa, Clemens Langer, Francis Andorful, Benjamin Herfort, Daniel Kwakye, Omer Olchik, Sven Lautenbach, Alexander ZipfSubjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
- [229] arXiv:2512.04087 (cross-list from q-bio.NC) [pdf, ps, other]
-
Title: Human-Centred Evaluation of Text-to-Image Generation Models for Self-expression of Mental Distress: A Dataset Based on GPT-4oSubjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Thu, 4 Dec 2025
- [230] arXiv:2512.04085 [pdf, ps, other]
-
Title: Unique Lives, Shared World: Learning from Single-Life VideosAuthors: Tengda Han, Sayna Ebrahimi, Dilara Gokay, Li Yang Ku, Maks Ovsjanikov, Iva Babukova, Daniel Zoran, Viorica Patraucean, Joao Carreira, Andrew Zisserman, Dima DamenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [231] arXiv:2512.04084 [pdf, ps, other]
-
Title: SimFlow: Simplified and End-to-End Training of Latent Normalizing FlowsComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [232] arXiv:2512.04082 [pdf, ps, other]
-
Title: PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic DesignComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [233] arXiv:2512.04069 [pdf, ps, other]
-
Title: SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RLAuthors: Siyi Chen, Mikaela Angelina Uy, Chan Hee Song, Faisal Ladhak, Adithyavairavan Murali, Qing Qu, Stan Birchfield, Valts Blukis, Jonathan TremblaySubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [234] arXiv:2512.04048 [pdf, ps, other]
-
Title: Stable Signer: Hierarchical Sign Language Generative ModelComments: 12 pages, 7 figures. More Demo at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Computers and Society (cs.CY)
- [235] arXiv:2512.04040 [pdf, ps, other]
-
Title: RELIC: Interactive Video World Model with Long-Horizon MemoryAuthors: Yicong Hong, Yiqun Mei, Chongjian Ge, Yiran Xu, Yang Zhou, Sai Bi, Yannick Hold-Geoffroy, Mike Roberts, Matthew Fisher, Eli Shechtman, Kalyan Sunkavalli, Feng Liu, Zhengqi Li, Hao TanComments: 22 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [236] arXiv:2512.04039 [pdf, ps, other]
-
Title: Fast & Efficient Normalizing Flows and Applications of Image Generative ModelsAuthors: Sandeep NagarComments: PhD ThesisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [237] arXiv:2512.04025 [pdf, ps, other]
-
Title: PSA: Pyramid Sparse Attention for Efficient Video Understanding and GenerationComments: Tech reportSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [238] arXiv:2512.04021 [pdf, ps, other]
-
Title: C3G: Learning Compact 3D Representations with 2K GaussiansAuthors: Honggyu An, Jaewoo Jung, Mungyeom Kim, Sunghwan Hong, Chaehyun Kim, Kazumi Fukuda, Minkyeong Jeon, Jisang Han, Takuya Narihira, Hyuna Ko, Junsu Kim, Yuki Mitsufuji, Seungryong KimComments: Project Page : this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [239] arXiv:2512.04019 [pdf, ps, other]
-
Title: Ultra-lightweight Neural Video Representation CompressionSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [240] arXiv:2512.04015 [pdf, ps, other]
-
Title: Learning Group Actions In Disentangled Latent Image RepresentationsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [241] arXiv:2512.04012 [pdf, ps, other]
-
Title: Emergent Outlier View Rejection in Visual Geometry Grounded TransformersAuthors: Jisang Han, Sunghwan Hong, Jaewoo Jung, Wooseok Jang, Honggyu An, Qianqian Wang, Seungryong Kim, Chen FengComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [242] arXiv:2512.04007 [pdf, ps, other]
-
Title: On the Temporality for Sketch Representation LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [243] arXiv:2512.04000 [pdf, ps, other]
- [244] arXiv:2512.03996 [pdf, ps, other]
-
Title: Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding PerturbationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [245] arXiv:2512.03992 [pdf, ps, other]
-
Title: DIQ-H: Evaluating Hallucination Persistence in VLMs Under Temporal Visual DegradationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [246] arXiv:2512.03981 [pdf, ps, other]
-
Title: DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature AlignmentSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [247] arXiv:2512.03979 [pdf, ps, other]
-
Title: BlurDM: A Blur Diffusion Model for Image DeblurringComments: NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [248] arXiv:2512.03964 [pdf, ps, other]
-
Title: Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face PersonalizationComments: 17 pages, 13 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [249] arXiv:2512.03963 [pdf, ps, other]
-
Title: TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement LearningAuthors: Tao Wu, Li Yang, Gen Zhan, Yabin Zhang, Yiting Liao, Junlin Li, Deliang Fu, Li Zhang, Limin WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [250] arXiv:2512.03939 [pdf, ps, other]
-
Title: MUT3R: Motion-aware Updating Transformer for Dynamic 3D ReconstructionAuthors: Guole Shen, Tianchen Deng, Xingrui Qin, Nailin Wang, Jianyu Wang, Yanbo Wang, Yongtao Chen, Hesheng Wang, Jingchuan WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [251] arXiv:2512.03932 [pdf, ps, other]
-
Title: Beyond the Ground Truth: Enhanced Supervision for Image RestorationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [252] arXiv:2512.03918 [pdf, ps, other]
-
Title: UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive FrameworkAuthors: Youxin Pang, Yong Zhang, Ruizhi Shao, Xiang Deng, Feng Gao, Xu Xiaoming, Xiaoming Wei, Yebin LiuComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [253] arXiv:2512.03905 [pdf, ps, other]
-
Title: Zero-Shot Video Translation and Editing with Frame Spatial-Temporal CorrespondenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [254] arXiv:2512.03883 [pdf, ps, other]
-
Title: Dual Cross-Attention Siamese Transformer for Rectal Tumor Regrowth Assessment in Watch-and-Wait EndoscopyAuthors: Jorge Tapias Gomez, Despoina Kanata, Aneesh Rangnekar, Christina Lee, Julio Garcia-Aguilar, Joshua Jesse Smith, Harini VeeraraghavanComments: 6 pages, 5 figures, 1 table, submitted to ISBI conferenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [255] arXiv:2512.03869 [pdf, ps, other]
-
Title: An Automated Framework for Large-Scale Graph-Based Cerebrovascular AnalysisAuthors: Daniele Falcetta, Liane S. Canas, Lorenzo Suppa, Matteo Pentassuglia, Jon Cleary, Marc Modat, Sébastien Ourselin, Maria A. ZuluagaComments: Submitted to ISBI 2026. 6 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
- [256] arXiv:2512.03862 [pdf, ps, other]
-
Title: Diminishing Returns in Self-Supervised LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [257] arXiv:2512.03854 [pdf, ps, other]
-
Title: Prostate biopsy whole slide image dataset from an underrepresented Middle Eastern populationAuthors: Peshawa J. Muhammad Ali, Navin Vincent, Saman S. Abdulla, Han N. Mohammed Fadhl, Anders Blilie, Kelvin Szolnoky, Julia Anna Mielcarz, Xiaoyi Ji, Kimmo Kartasalo, Abdulbasit K. Al-Talabani, Nita MulliqiComments: 13 pages, 2 figures and 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [258] arXiv:2512.03852 [pdf, ps, other]
-
Title: Traffic Image Restoration under Adverse Weather via Frequency-Aware MambaComments: 12pages, 13 figures, 5tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [259] arXiv:2512.03848 [pdf, ps, other]
-
Title: PULSE: A Unified Multi-Task Architecture for Cardiac Segmentation, Diagnosis, and Few-Shot Cross-Modality Clinical AdaptationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [260] arXiv:2512.03844 [pdf, ps, other]
-
Title: CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset DistillationComments: 34 pages, 24 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [261] arXiv:2512.03837 [pdf, ps, other]
-
Title: Heatmap Pooling Network for Action Recognition from RGB VideosComments: Final Version of IEEE Transactions on Pattern Analysis and Machine IntelligenceJournal-ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [262] arXiv:2512.03834 [pdf, ps, other]
-
Title: Lean Unet: A Compact Model for Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [263] arXiv:2512.03827 [pdf, ps, other]
-
Title: A Robust Camera-based Method for Breath Rate MeasurementAuthors: Alexey ProtopopovComments: 9 pages, 4 figures, 2 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [264] arXiv:2512.03817 [pdf, ps, other]
-
Title: HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to EnglishAuthors: Ahmed Nasser, Marwan Mohamed, Alaa Sherif, Basmala Mahmoud, Shereen Yehia, Asmaa Saad, Mariam S. El-Rahmany, Ensaf H. MohamedSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [265] arXiv:2512.03796 [pdf, ps, other]
-
Title: LSRS: Latent Scale Rejection Sampling for Visual Autoregressive ModelingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [266] arXiv:2512.03794 [pdf, ps, other]
-
Title: AdaptVision: Efficient Vision-Language Models via Adaptive Visual AcquisitionComments: 15 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [267] arXiv:2512.03751 [pdf, ps, other]
-
Title: Research on Brain Tumor Classification Method Based on Improved ResNet34 NetworkSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [268] arXiv:2512.03749 [pdf, ps, other]
-
Title: Fully Unsupervised Self-debiasing of Text-to-Image Diffusion ModelsComments: Accepted at WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [269] arXiv:2512.03746 [pdf, ps, other]
-
Title: Thinking with Programming Vision: Towards a Unified View for Thinking with ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [270] arXiv:2512.03745 [pdf, ps, other]
-
Title: Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-IdentificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [271] arXiv:2512.03730 [pdf, ps, other]
-
Title: Out-of-the-box: Black-box Causal Attacks on Object DetectorsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [272] arXiv:2512.03724 [pdf, ps, other]
-
Title: PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor AttentionAuthors: Ziwen Li, Xin Wang, Hanlue Zhang, Runnan Chen, Runqi Lin, Xiao He, Han Huang, Yandong Guo, Fakhri Karray, Tongliang Liu, Mingming GongSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [273] arXiv:2512.03715 [pdf, ps, other]
-
Title: DINO-RotateMatch: A Rotation-Aware Deep Framework for Robust Image Matching in Large-Scale 3D ReconstructionComments: 9 pages, 5 figures, 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [274] arXiv:2512.03701 [pdf, ps, other]
-
Title: Structured Uncertainty Similarity Score (SUSS): Learning a Probabilistic, Interpretable, Perceptual Metric Between ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [275] arXiv:2512.03687 [pdf, ps, other]
-
Title: Active Visual Perception: Opportunities and ChallengesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [276] arXiv:2512.03683 [pdf, ps, other]
-
Title: GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent SpacesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [277] arXiv:2512.03673 [pdf, ps, other]
-
Title: ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [278] arXiv:2512.03667 [pdf, ps, other]
-
Title: Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical ReasoningComments: Technical reportSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [279] arXiv:2512.03666 [pdf, ps, other]
-
Title: ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric VideosAuthors: Qi'ao Xu, Tianwen Qian, Yuqian Fu, Kailing Li, Yang Jiao, Jiacheng Zhang, Xiaoling Wang, Liang HeComments: 26 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [280] arXiv:2512.03663 [pdf, ps, other]
-
Title: Multi-Scale Visual Prompting for Lightweight Small-Image ClassificationAuthors: Salim KhazemSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [281] arXiv:2512.03643 [pdf, ps, other]
-
Title: Optical Context Compression Is Just (Bad) AutoencodingSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [282] arXiv:2512.03640 [pdf, ps, other]
-
Title: MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention MechanismsJournal-ref: MultiMedia Modeling. MMM 2025. Lecture Notes in Computer Science, vol 15521. Springer, SingaporeSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [283] arXiv:2512.03625 [pdf, ps, other]
-
Title: FeatureLens: A Highly Generalizable and Interpretable Framework for Detecting Adversarial Examples Based on Image FeaturesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [284] arXiv:2512.03621 [pdf, ps, other]
-
Title: ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video GenerationAuthors: Yaokun Li, Shuaixian Wang, Mantang Guo, Jiehui Huang, Taojun Ding, Mu Hu, Kaixuan Wang, Shaojie Shen, Guang TanComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [285] arXiv:2512.03619 [pdf, ps, other]
-
Title: LAMP: Language-Assisted Motion Planning for Controllable Video GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [286] arXiv:2512.03601 [pdf, ps, other]
-
Title: Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene UnderstandingComments: Accepted to NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [287] arXiv:2512.03598 [pdf, ps, other]
-
Title: Memory-Guided Point Cloud Completion for Dental ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [288] arXiv:2512.03597 [pdf, ps, other]
-
Title: HBFormer: A Hybrid-Bridge Transformer for Microtumor and Miniature Organ SegmentationAuthors: Fuchen Zheng, Xinyi Chen, Weixuan Li, Quanjun Li, Junhua Zhou, Xiaojiao Guo, Xuhang Chen, Chi-Man Pun, Shoujun ZhouComments: 6 pages, 4 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [289] arXiv:2512.03593 [pdf, ps, other]
-
Title: CloseUpAvatar: High-Fidelity Animatable Full-Body Avatars with Mixture of Multi-Scale TexturesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [290] arXiv:2512.03592 [pdf, ps, other]
-
Title: Harnessing Hypergraphs in Geometric Deep Learning for 3D RNA Inverse FoldingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [291] arXiv:2512.03590 [pdf, ps, other]
-
Title: Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video InterpolationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [292] arXiv:2512.03580 [pdf, ps, other]
-
Title: Dynamic Optical Test for Bot Identification (DOT-BI): A simple check to identify bots in surveys and online processesSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
- [293] arXiv:2512.03577 [pdf, ps, other]
-
Title: Cross-Stain Contrastive Learning for Paired Immunohistochemistry and Histopathology Slide Representation LearningComments: 6 pages, 2 figures. Camera-ready version accepted for IEEE BIBM 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [294] arXiv:2512.03575 [pdf, ps, other]
-
Title: UniComp: Rethinking Video Compression Through Informational UniquenessSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [295] arXiv:2512.03574 [pdf, ps, other]
-
Title: Global-Local Aware Scene Text EditingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [296] arXiv:2512.03566 [pdf, ps, other]
-
Title: GAOT: Generating Articulated Objects Through Text-Guided Diffusion ModelsComments: Accepted by ACM MM Asia2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [297] arXiv:2512.03558 [pdf, ps, other]
-
Title: CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map UnderstandingAuthors: Huy Quang Ung, Guillaume Habault, Yasutaka Nishimura, Hao Niu, Roberto Legaspi, Tomoki Oya, Ryoichi Kojima, Masato Taya, Chihiro Ono, Atsunori Minamikawa, Yan LiuComments: Accepted at SIGSPATIAL 2025 (Best paper candidates), 15 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [298] arXiv:2512.03553 [pdf, ps, other]
-
Title: Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity MatchingAuthors: Wei Chee Yew, Hailun Xu, Sanjay Saha, Xiaotian Fan, Hiok Hian Ong, David Yuchen Wang, Kanchan Sarkar, Zhenheng Yang, Danhui GuanComments: Accepted at KDD 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [299] arXiv:2512.03542 [pdf, ps, other]
-
Title: V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time InterventionAuthors: Nan Sun, Zhenyu Zhang, Xixun Lin, Kun Wang, Yanmin Shang, Naibin Gu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang, Yanan CaoSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [300] arXiv:2512.03540 [pdf, ps, other]
-
Title: CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image GenerationAuthors: Ruoxuan Zhang, Bin Wen, Hongxia Xie, Yi Yao, Songhan Zuo, Jian-Yu Jiang-Lin, Hong-Han Shuai, Wen-Huang ChengComments: Accepted by ACM Multimedia 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [301] arXiv:2512.03534 [pdf, ps, other]
-
Title: Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual GenerationComments: Visualizations are available at the website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [302] arXiv:2512.03532 [pdf, ps, other]
-
Title: OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [303] arXiv:2512.03520 [pdf, ps, other]
-
Title: FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion GenerationComments: 15 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [304] arXiv:2512.03510 [pdf, ps, other]
-
Title: CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous DrivingSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [305] arXiv:2512.03509 [pdf, ps, other]
-
Title: AfroBeats Dance Movement Analysis Using Computer Vision: A Proof-of-Concept Framework Combining YOLO and Segment Anything ModelSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [306] arXiv:2512.03508 [pdf, ps, other]
-
Title: Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic SegmentationComments: ICCV 2025 (poster)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [307] arXiv:2512.03500 [pdf, ps, other]
-
Title: EEA: Exploration-Exploitation Agent for Long Video UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [308] arXiv:2512.03499 [pdf, ps, other]
-
Title: NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable AdaptationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [309] arXiv:2512.03479 [pdf, ps, other]
-
Title: Towards Object-centric Understanding for Instructional VideosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [310] arXiv:2512.03477 [pdf, ps, other]
-
Title: Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma DiagnosisComments: 10 pages, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [311] arXiv:2512.03474 [pdf, ps, other]
-
Title: Procedural Mistake Detection via Action Effect ModelingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [312] arXiv:2512.03470 [pdf, ps, other]
-
Title: Difference Decomposition Networks for Infrared Small Target DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [313] arXiv:2512.03463 [pdf, ps, other]
-
Title: Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [314] arXiv:2512.03454 [pdf, ps, other]
-
Title: Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous VehiclesAuthors: Haicheng Liao, Huanming Shen, Bonan Wang, Yongkang Li, Yihong Tang, Chengyue Wang, Dingyi Zhuang, Kehua Chen, Hai Yang, Chengzhong Xu, Zhenning LiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [315] arXiv:2512.03453 [pdf, ps, other]
-
Title: GeoVideo: Introducing Geometric Regularization into Video Generation ModelComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [316] arXiv:2512.03451 [pdf, ps, other]
-
Title: GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [317] arXiv:2512.03450 [pdf, ps, other]
-
Title: KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [318] arXiv:2512.03449 [src]
-
Title: LM-CartSeg: Automated Segmentation of Lateral and Medial Cartilage and Subchondral Bone for Radiomics AnalysisAuthors: Tongxu ZhangComments: The manuscript represents only a preliminary and substantially incompleted exploration. The author has decided not to stand by these results, and a thoroughly revised and significantly different version will be developed separately. Therefore this version is withdrawn and should not be citedSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [319] arXiv:2512.03445 [pdf, ps, other]
-
Title: Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data GenerationAuthors: Xieji Li, Siyuan Yan, Yingsheng Liu, H. Peter Soyer, Monika Janda, Victoria Mar, Zongyuan GeComments: 10 pages. Under ReviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [320] arXiv:2512.03430 [pdf, ps, other]
-
Title: Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion FeaturesComments: Accepted to the ICML 2025 TerraBytes Workshop (June 9, 2025)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [321] arXiv:2512.03427 [pdf, ps, other]
-
Title: Generalization Evaluation of Deep Stereo Matching Methods for UAV-Based Forestry ApplicationsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [322] arXiv:2512.03424 [pdf, ps, other]
-
Title: DM3D: Deformable Mamba via Offset-Guided Gaussian Sequencing for Point Cloud UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [323] arXiv:2512.03418 [pdf, ps, other]
-
Title: YOLOA: Real-Time Affordance Detection via LLM AdapterComments: 13 pages, 9 figures, conferenceSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
- [324] arXiv:2512.03405 [pdf, ps, other]
-
Title: ViDiC: Video Difference CaptioningAuthors: Jiangtao Wu, Shihao Li, Zhaozhou Bian, Jialu Chen, Runzhe Wen, An Ping, Yiwen He, Jiakai Wang, Yuanxing Zhang, Jiaheng LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [325] arXiv:2512.03404 [pdf, ps, other]
-
Title: MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-IdentificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [326] arXiv:2512.03370 [pdf, ps, other]
-
Title: ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [327] arXiv:2512.03369 [pdf, ps, other]
-
Title: FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread ForecastingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [328] arXiv:2512.03359 [pdf, ps, other]
-
Title: A Hybrid Deep Learning Framework with Explainable AI for Lung Cancer Classification with DenseNet169 and SVMSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [329] arXiv:2512.03350 [pdf, ps, other]
-
Title: SeeU: Seeing the Unseen World via 4D Dynamics-aware GenerationComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [330] arXiv:2512.03346 [pdf, ps, other]
-
Title: Hierarchical Attention for Sparse Volumetric Anomaly Detection in Subclinical KeratoconusComments: 16 pages, 7 figures, 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [331] arXiv:2512.03345 [pdf, ps, other]
-
Title: HalluGen: Synthesizing Realistic and Controllable Hallucinations for Evaluating Image RestorationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [332] arXiv:2512.03339 [pdf, ps, other]
-
Title: ProtoEFNet: Dynamic Prototype Learning for Inherently Interpretable Ejection Fraction Estimation in EchocardiographyAuthors: Yeganeh Ghamary, Victoria Wu, Hooman Vaseli, Christina Luong, Teresa Tsang, Siavash Bigdeli, Purang AbolmaesumiComments: 11 pages, Accepted in IMIMIC Workshop at MICCAI 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [333] arXiv:2512.03335 [pdf, ps, other]
-
Title: Step-by-step Layered Design GenerationAuthors: Faizan Farooq Khan, K J Joseph, Koustava Goswami, Mohamed Elhoseiny, Balaji Vasan SrinivasanJournal-ref: AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [334] arXiv:2512.03317 [pdf, ps, other]
-
Title: NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map ConstructionComments: Accepted to 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [335] arXiv:2512.03284 [pdf, ps, other]
-
Title: SpatialReasoner: Active Perception for Large-Scale 3D Scene UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [336] arXiv:2512.03257 [pdf, ps, other]
-
Title: PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing ImagerySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [337] arXiv:2512.03247 [pdf, ps, other]
-
Title: PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space RefinementComments: Published in the Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [338] arXiv:2512.03245 [pdf, ps, other]
-
Title: 2-Shots in the Dark: Low-Light Denoising with Minimal Data AcquisitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [339] arXiv:2512.03237 [pdf, ps, other]
-
Title: LLM-Guided Material Inference for 3D Point CloudsSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [340] arXiv:2512.03233 [pdf, ps, other]
-
Title: Object Counting with GPT-4o and GPT-5: A Comparative StudyComments: 5 pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [341] arXiv:2512.03210 [pdf, ps, other]
-
Title: Flux4D: Flow-based Unsupervised 4D ReconstructionAuthors: Jingkang Wang, Henry Che, Yun Chen, Ze Yang, Lily Goli, Sivabalan Manivasagam, Raquel UrtasunComments: NeurIPS 2025. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
- [342] arXiv:2512.03199 [pdf, ps, other]
-
Title: Does Head Pose Correction Improve Biometric Facial Recognition?Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [343] arXiv:2512.03182 [pdf, ps, other]
-
Title: Drainage: A Unifying Framework for Addressing Class UncertaintyComments: 16 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [344] arXiv:2512.03126 [pdf, ps, other]
-
Title: Hierarchical Process Reward Models are Symbolic Vision LearnersSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [345] arXiv:2512.04076 (cross-list from cs.GR) [pdf, ps, other]
-
Title: Radiance Meshes for Volumetric ReconstructionAuthors: Alexander Mai, Trevor Hedstrom, George Kopanas, Janne Kontkanen, Falko Kuester, Jonathan T. BarronComments: Website: half-potato.gitlab.io/rmSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
- [346] arXiv:2512.04032 (cross-list from cs.CL) [pdf, ps, other]
-
Title: Jina-VLM: Small Multilingual Vision Language ModelAuthors: Andreas Koukounas, Georgios Mastrapas, Florian Hönicke, Sedigheh Eslami, Guillaume Roncari, Scott Martens, Han XiaoComments: 18 pages, 1-7 main content, 13-18 appendix for tables and datasetSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [347] arXiv:2512.03995 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Artificial Microsaccade Compensation: Stable Vision for an OrnithopterComments: 29 pages, 5 figures, 2 tables, under reviewSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [348] arXiv:2512.03962 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Tada-DIP: Input-adaptive Deep Image Prior for One-shot 3D Image ReconstructionComments: 6 pages, 8 figures, 2025 Asilomar Conference on Signals, Systems, and Computers. Code is available at github.com/evanbell02/Tada-DIP/Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [349] arXiv:2512.03656 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Cyclical Temporal Encoding and Hybrid Deep Ensembles for Multistep Energy ForecastingSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [350] arXiv:2512.03556 (cross-list from cs.RO) [pdf, ps, other]
-
Title: RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RLAuthors: Yinzhou Tang, Yu Shang, Yinuo Chen, Bingwen Wei, Xin Zhang, Shu'ang Yu, Liangzhi Shi, Chao Yu, Chen Gao, Wei Wu, Yong LiSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [351] arXiv:2512.03522 (cross-list from cs.RO) [pdf, ps, other]
-
Title: MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global LocalizationComments: Accepted in IEEE Robotics and Automation Letters (2025)Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [352] arXiv:2512.03514 (cross-list from cs.IR) [pdf, ps, other]
-
Title: M3DR: Towards Universal Multilingual Multimodal Document RetrievalSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [353] arXiv:2512.03422 (cross-list from cs.RO) [pdf, ps, other]
-
Title: What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation ModelsAuthors: Tianchen Deng, Yue Pan, Shenghai Yuan, Dong Li, Chen Wang, Mingrui Li, Long Chen, Lihua Xie, Danwei Wang, Jingchuan Wang, Javier Civera, Hesheng Wang, Weidong ChenSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [354] arXiv:2512.03216 (cross-list from physics.ins-det) [pdf, ps, other]
-
Title: Kaleidoscopic Scintillation Event ImagingSubjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [355] arXiv:2512.03173 (cross-list from cs.CY) [pdf, ps, other]
-
Title: Culture Affordance Atlas: Reconciling Object Diversity Through Functional MappingJournal-ref: AAAI 2026 Social Impact TrackSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [356] arXiv:2512.03166 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual EnvironmentsSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [357] arXiv:2512.03111 (cross-list from q-bio.GN) [pdf, ps, other]
-
Title: PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-CancerAuthors: Xiaoshui Huang, Tianlin Zhu, Yifan Zuo, Xue Xia, Zonghan Wu, Jiebin Yan, Dingli Hua, Zongyi Xu, Yuming Fang, Jian ZhangComments: Accepted by AAAI 2026Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [358] arXiv:2512.03054 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Energy-Efficient Federated Learning via Adaptive Encoder Freezing for MRI-to-CT Conversion: A Green AI-Guided ResearchAuthors: Ciro Benito Raggio, Lucia Migliorelli, Nils Skupien, Mathias Krohmer Zabaleta, Oliver Blanck, Francesco Cicone, Giuseppe Lucio Cascini, Paolo Zaffino, Maria Francesca SpadeaComments: 22 pages, 13 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Medical Physics (physics.med-ph)
- [359] arXiv:2512.03052 (cross-list from cs.GR) [pdf, ps, other]
-
Title: LATTICE: Democratize High-Fidelity 3D Generation at ScaleAuthors: Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Qingxiang Lin, Jingwei Huang, Chunchao Guo, Xiangyu YueComments: Technical ReportSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Wed, 3 Dec 2025
- [360] arXiv:2512.03046 [pdf, ps, other]
-
Title: MagicQuillV2: Precise and Interactive Image Editing with Layered Visual CuesAuthors: Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Shuailei Ma, Ka Leong Cheng, Wen Wang, Qingyan Bai, Yuxuan Zhang, Yanhong Zeng, Yixuan Li, Xing Zhu, Yujun Shen, Qifeng ChenComments: Code and demo available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [361] arXiv:2512.03045 [pdf, ps, other]
-
Title: CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion ModelsAuthors: Minkyung Kwon, Jinhyeok Choi, Jiho Park, Seonghu Jeon, Jinhyuk Jang, Junyoung Seo, Minseop Kwak, Jin-Hwa Kim, Seungryong KimComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [362] arXiv:2512.03043 [pdf, ps, other]
-
Title: OneThinker: All-in-one Reasoning Model for Image and VideoAuthors: Kaituo Feng, Manyuan Zhang, Hongyu Li, Kaixuan Fan, Shuang Chen, Yilei Jiang, Dian Zheng, Peiwen Sun, Yiyuan Zhang, Haoze Sun, Yan Feng, Peng Pei, Xunliang Cai, Xiangyu YueComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [363] arXiv:2512.03042 [pdf, ps, other]
-
Title: PPTArena: A Benchmark for Agentic PowerPoint EditingComments: 25 pages, 26 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [364] arXiv:2512.03041 [pdf, ps, other]
-
Title: MultiShotMaster: A Controllable Multi-Shot Video Generation FrameworkAuthors: Qinghe Wang, Xiaoyu Shi, Baolu Li, Weikang Bian, Quande Liu, Huchuan Lu, Xintao Wang, Pengfei Wan, Kun Gai, Xu JiaComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [365] arXiv:2512.03040 [pdf, ps, other]
-
Title: Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video GenerationAuthors: Zeqi Xiao, Yiwei Zhao, Lingxiao Li, Yushi Lan, Yu Ning, Rahul Garg, Roshni Cooper, Mohammad H. Taghavi, Xingang PanComments: Project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [366] arXiv:2512.03036 [pdf, ps, other]
-
Title: ViSAudio: End-to-End Video-Driven Binaural Spatial Audio GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [367] arXiv:2512.03034 [pdf, ps, other]
-
Title: MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and GenerationAuthors: Youxin Pang, Jiajun Liu, Lingfeng Tan, Yong Zhang, Feng Gao, Xiang Deng, Zhuoliang Kang, Xiaoming Wei, Yebin LiuComments: Our project website is this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [368] arXiv:2512.03020 [pdf, ps, other]
-
Title: Unrolled Networks are Conditional Probability Flows in MRI ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [369] arXiv:2512.03018 [pdf, ps, other]
-
Title: AutoBrep: Autoregressive B-Rep Generation with Unified Topology and GeometryAuthors: Xiang Xu, Pradeep Kumar Jayaraman, Joseph G. Lambourne, Yilin Liu, Durvesh Malpure, Pete MeltzerComments: Accepted to Siggraph Asia 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [370] arXiv:2512.03014 [pdf, ps, other]
-
Title: Instant Video Models: Universal Adapters for Stabilizing Image-Based NetworksComments: NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [371] arXiv:2512.03013 [pdf, ps, other]
-
Title: In-Context Sync-LoRA for Portrait Video EditingComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
- [372] arXiv:2512.03010 [pdf, ps, other]
-
Title: SurfFill: Completion of LiDAR Point Clouds via Gaussian Surfel SplattingComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
- [373] arXiv:2512.03004 [pdf, ps, other]
-
Title: DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed ImagesAuthors: Xiaoxue Chen, Ziyi Xiong, Yuantao Chen, Gen Li, Nan Wang, Hongcheng Luo, Long Chen, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Hongyang Li, Ya-Qin Zhang, Hao ZhaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [374] arXiv:2512.03000 [pdf, ps, other]
-
Title: DynamicVerse: A Physically-Aware Multimodal Framework for 4D World ModelingAuthors: Kairun Wen, Yuzhi Huang, Runyu Chen, Hui Zheng, Yunlong Lin, Panwang Pan, Chenxin Li, Wenyan Cong, Jian Zhang, Junbin Lu, Chenguo Lin, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Yue Huang, Xinghao Ding, Rakesh Ranjan, Zhiwen FanSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [375] arXiv:2512.02993 [pdf, ps, other]
-
Title: TEXTRIX: Latent Attribute Grid for Native Texture Generation and BeyondAuthors: Yifei Zeng, Yajie Bao, Jiachen Qian, Shuang Wu, Youtian Lin, Hao Zhu, Buyu Li, Feihu Zhang, Xun Cao, Yao YaoComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [376] arXiv:2512.02991 [pdf, ps, other]
-
Title: GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [377] arXiv:2512.02982 [pdf, ps, other]
-
Title: U4D: Uncertainty-Aware 4D World Modeling from LiDAR SequencesComments: Preprint; 19 pages, 7 figures, 8 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [378] arXiv:2512.02981 [pdf, ps, other]
-
Title: InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent CollaborationComments: Published in AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [379] arXiv:2512.02973 [pdf, ps, other]
- [380] arXiv:2512.02972 [pdf, ps, other]
-
Title: BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object DetectionComments: Accept by AAAI26Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [381] arXiv:2512.02965 [pdf, ps, other]
-
Title: A Lightweight Real-Time Low-Light Enhancement Network for Embedded Automotive Vision SystemsAuthors: Yuhan Chen, Yicui Shi, Guofa Li, Guangrui Bai, Jinyuan Shao, Xiangfei Huang, Wenbo Chu, Keqiang LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [382] arXiv:2512.02952 [pdf, ps, other]
-
Title: Layout Anything: One Transformer for Universal Room Layout EstimationComments: Published at WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [383] arXiv:2512.02942 [pdf, ps, other]
-
Title: Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-BenchAuthors: Lanxiang Hu, Abhilash Shankarampeta, Yixin Huang, Zilin Dai, Haoyang Yu, Yujie Zhao, Haoqiang Kang, Daniel Zhao, Tajana Rosing, Hao ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [384] arXiv:2512.02933 [pdf, ps, other]
-
Title: LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware LocalizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [385] arXiv:2512.02932 [pdf, ps, other]
-
Title: EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [386] arXiv:2512.02931 [pdf, ps, other]
-
Title: DiverseAR: Boosting Diversity in Bitwise Autoregressive Image GenerationAuthors: Ying Yang, Zhengyao Lv, Tianlin Pan, Haofan Wang, Binxin Yang, Hubery Yin, Chen Li, Chenyang SiComments: 23 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [387] arXiv:2512.02906 [pdf, ps, other]
-
Title: MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
- [388] arXiv:2512.02899 [pdf, ps, other]
-
Title: Glance: Accelerating Diffusion Models with 1 SampleAuthors: Zhuobai Dong, Rui Zhao, Songjie Wu, Junchao Yi, Linjie Li, Zhengyuan Yang, Lijuan Wang, Alex Jinpeng WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [389] arXiv:2512.02897 [pdf, ps, other]
-
Title: Polar Perspectives: Evaluating 2-D LiDAR Projections for Robust Place Recognition with Visual Foundation ModelsAuthors: Pierpaolo Serio, Giulio Pisaneschi, Andrea Dan Ryals, Vincenzo Infantino, Lorenzo Gentilini, Valentina Donzella, Lorenzo PolliniComments: 13 Pages, 5 Figures, 2 Tables Under ReviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [390] arXiv:2512.02895 [pdf, ps, other]
-
Title: MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training ParadigmAuthors: Wei Chen, Chaoqun Du, Feng Gu, Wei He, Qizhen Li, Zide Liu, Xuhao Pan, Chang Ren, Xudong Rao, Chenfeng Wang, Tao Wei, Chengjun Yu, Pengfei Yu, Yufei Zheng, Chunpeng Zhou, Pan Zhou, Xuhan ZhuComments: 33 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [391] arXiv:2512.02870 [pdf, ps, other]
-
Title: Taming Camera-Controlled Video Generation with Verifiable Geometry RewardAuthors: Zhaoqing Wang, Xiaobo Xia, Zhuolin Bie, Jinlin Liu, Dongdong Yu, Jia-Wang Bian, Changhu WangComments: 11 pages, 4 figures, 7 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [392] arXiv:2512.02867 [pdf, ps, other]
-
Title: MICCAI STSR 2025 Challenge: Semi-Supervised Teeth and Pulp Segmentation and CBCT-IOS RegistrationAuthors: Yaqi Wang, Zhi Li, Chengyu Wu, Jun Liu, Yifan Zhang, Jialuo Chen, Jiaxue Ni, Qian Luo, Jin Liu, Can Han, Changkai Ji, Zhi Qin Tan, Ajo Babu George, Liangyu Chen, Qianni Zhang, Dahong Qian, Shuai Wang, Huiyu ZhouSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [393] arXiv:2512.02860 [pdf, ps, other]
-
Title: RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice AssociationComments: Ranked 3rd in Fame 2026 Challenge, ICASSPSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [394] arXiv:2512.02850 [pdf, ps, other]
-
Title: Are Detectors Fair to Indian IP-AIGC? A Cross-Generator StudySubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [395] arXiv:2512.02846 [pdf, ps, other]
-
Title: Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?Authors: Manuel Benavent-Lledo, Konstantinos Bacharidis, Victoria Manousaki, Konstantinos Papoutsakis, Antonis Argyros, Jose Garcia-RodriguezComments: Accepted in WACV 2026 - Applications TrackSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [396] arXiv:2512.02835 [pdf, ps, other]
-
Title: ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [397] arXiv:2512.02830 [pdf, ps, other]
-
Title: Defense That Attacks: How Robust Models Become Better AttackersSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [398] arXiv:2512.02794 [pdf, ps, other]
-
Title: PhyCustom: Towards Realistic Physical Customization in Text-to-Image GenerationComments: codes:this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [399] arXiv:2512.02793 [pdf, ps, other]
-
Title: IC-World: In-Context Generation for Shared World ModelingComments: codes:this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [400] arXiv:2512.02792 [pdf, ps, other]
-
Title: HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video RetrievalComments: Accepted by ACM MM 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [401] arXiv:2512.02790 [pdf, ps, other]
-
Title: UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched EditsAuthors: Keming Ye, Zhipeng Huang, Canmiao Fu, Qingyang Liu, Jiani Cai, Zheqi Lv, Chen Li, Jing Lyu, Zhou Zhao, Shengyu ZhangComments: 31 pages, 15 figures, 12 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [402] arXiv:2512.02789 [pdf, ps, other]
-
Title: TrackNetV5: Residual-Driven Spatio-Temporal Refinement and Motion Direction Decoupling for Fast Object TrackingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [403] arXiv:2512.02781 [pdf, ps, other]
-
Title: LumiX: Structured and Coherent Text-to-Intrinsic GenerationComments: The code will be available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
- [404] arXiv:2512.02780 [pdf, ps, other]
-
Title: Rethinking Surgical Smoke: A Smoke-Type-Aware Laparoscopic Video Desmoking Method and DatasetComments: 12 pages, 15 figures. Accepted to AAAI-26 (Main Technical Track)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [405] arXiv:2512.02751 [pdf, ps, other]
-
Title: AttMetNet: Attention-Enhanced Deep Neural Network for Methane Plume Detection in Sentinel-2 Satellite ImageryComments: 15 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [406] arXiv:2512.02743 [pdf, ps, other]
-
Title: Reasoning-Aware Multimodal Fusion for Hateful Video DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [407] arXiv:2512.02737 [pdf, ps, other]
-
Title: Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery AloneAuthors: Tristan Amadei, Enric Meinhardt-Llopis, Benedicte Bascle, Corentin Abgrall, Gabriele FaccioloComments: Accepted at WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [408] arXiv:2512.02727 [pdf, ps, other]
-
Title: DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in InteractionsAuthors: Yifan Zhou, Takehiko Ohkawa, Guwenxiao Zhou, Kanoko Goto, Takumi Hirose, Yusuke Sekikawa, Nakamasa InoueComments: Accepted to WACV 2026. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [409] arXiv:2512.02715 [pdf, ps, other]
-
Title: GeoViS: Geospatially Rewarded Visual Search for Remote Sensing Visual GroundingAuthors: Peirong Zhang, Yidan Zhang, Luxiao Xu, Jinliang Lin, Zonghao Guo, Fengxiang Wang, Xue Yang, Kaiwen Wei, Lei WangComments: 11 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [410] arXiv:2512.02702 [pdf, ps, other]
-
Title: Tissue-mask supported inter-subject whole-body image registration in the UK Biobank -- A method benchmarking studySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [411] arXiv:2512.02700 [pdf, ps, other]
-
Title: VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning ParadigmSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [412] arXiv:2512.02697 [pdf, ps, other]
-
Title: GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-LocalizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [413] arXiv:2512.02696 [pdf, ps, other]
-
Title: ALDI-ray: Adapting the ALDI Framework for Security X-ray Object DetectionComments: Submitted to ICASSP 2026 ConferenceSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [414] arXiv:2512.02686 [pdf, ps, other]
-
Title: ClimaOoD: Improving Anomaly Segmentation via Physically Realistic Synthetic DataComments: Under review;Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [415] arXiv:2512.02685 [pdf, ps, other]
-
Title: Unsupervised Structural Scene Decomposition via Foreground-Aware Slot Attention with Pseudo-Mask GuidanceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [416] arXiv:2512.02681 [pdf, ps, other]
-
Title: PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-ResolutionComments: 10 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [417] arXiv:2512.02668 [pdf, ps, other]
-
Title: UAUTrack: Towards Unified Multimodal Anti-UAV Visual TrackingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [418] arXiv:2512.02664 [pdf, ps, other]
-
Title: PolarGuide-GSDR: 3D Gaussian Splatting Driven by Polarization Priors and Deferred Reflection for Real-World Reflective ScenesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [419] arXiv:2512.02660 [pdf, ps, other]
-
Title: Spatially-Grounded Document Retrieval via Patch-to-Region Relevance PropagationAuthors: Agathoklis GeorgiouComments: 13 pages, 1 figure, 2 tables. Open-source implementation available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
- [420] arXiv:2512.02650 [pdf, ps, other]
-
Title: Hear What Matters! Text-conditioned Selective Video-to-Audio GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [421] arXiv:2512.02648 [pdf, ps, other]
-
Title: PoreTrack3D: A Benchmark for Dynamic 3D Gaussian Splatting in Pore-Scale Facial Trajectory TrackingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [422] arXiv:2512.02643 [pdf, ps, other]
-
Title: Leveraging Large-Scale Pretrained Spatial-Spectral Priors for General Zero-Shot PansharpeningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [423] arXiv:2512.02624 [pdf, ps, other]
-
Title: PPTBench: Towards Holistic Evaluation of Large Language Models for PowerPoint Layout and Design UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [424] arXiv:2512.02622 [pdf, ps, other]
-
Title: RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation IntelligenceAuthors: Xuming He, Zehao Fan, Hengjia Li, Fan Zhuo, Hankun Xu, Senlin Cheng, Di Weng, Haifeng Liu, Can Ye, Boxi WuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [425] arXiv:2512.02621 [pdf, ps, other]
-
Title: Content-Aware Texturing for Gaussian SplattingComments: Project Page: this https URLJournal-ref: Eurographics Symposium on Rendering (Symposium Track), 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [426] arXiv:2512.02576 [pdf, ps, other]
-
Title: Co-speech Gesture Video Generation via Motion-Based Graph RetrievalSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [427] arXiv:2512.02566 [pdf, ps, other]
-
Title: From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific LiteratureAuthors: Kun Yuan, Min Woo Sun, Zhen Chen, Alejandro Lozano, Xiangteng He, Shi Li, Nassir Navab, Xiaoxiao Sun, Nicolas Padoy, Serena Yeung-LevySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [428] arXiv:2512.02554 [pdf, ps, other]
-
Title: OmniPerson: Unified Identity-Preserving Pedestrian GenerationAuthors: Changxiao Ma, Chao Yuan, Xincheng Shi, Yuzhuo Ma, Yongfei Zhang, Longkun Zhou, Yujia Zhang, Shangze Li, Yifan XuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [429] arXiv:2512.02541 [pdf, ps, other]
-
Title: AVGGT: Rethinking Global Attention for Accelerating VGGTAuthors: Xianbing Sun, Zhikai Zhu, Zhengyu Lou, Bo Yang, Jinyang Tang, Liqing Zhang, He Wang, Jianfu ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [430] arXiv:2512.02536 [pdf, ps, other]
-
Title: WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query TokensAuthors: Jian Yang, Dacheng Yin, Xiaoxuan He, Yong Li, Fengyun Rao, Jing Lyu, Wei Zhai, Yang Cao, Zheng-Jun ZhaSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [431] arXiv:2512.02520 [pdf, ps, other]
-
Title: On the Problem of Consistent Anomalies in Zero-Shot Anomaly DetectionAuthors: Tai Le-GiaComments: PhD DissertationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- [432] arXiv:2512.02517 [pdf, ps, other]
-
Title: SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of ExpertsAuthors: Jiaqi Liu, Ronghao Fu, Lang Sun, Haoran Liu, Xiao Yang, Weipeng Zhang, Xu Na, Zhuoran Duan, Bo YangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [433] arXiv:2512.02512 [pdf, ps, other]
-
Title: Two-Stage Vision Transformer for Image Restoration: Colorization Pretraining + Residual UpsamplingComments: Accepted as a Tiny Paper at the 13th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP 2025), IIT Mandi, India. 3 pages, 1 figureSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [434] arXiv:2512.02505 [pdf, ps, other]
-
Title: GeoDiT: A Diffusion-based Vision-Language Model for Geospatial UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [435] arXiv:2512.02498 [pdf, ps, other]
-
Title: dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language ModelSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [436] arXiv:2512.02497 [pdf, ps, other]
-
Title: A Large Scale Benchmark for Test Time Adaptation Methods in Medical Image SegmentationAuthors: Wenjing Yu, Shuo Jiang, Yifei Chen, Shuo Chang, Yuanhan Wang, Beining Wu, Jie Dong, Mingxuan Liu, Shenghao Zhu, Feiwei Qin, Changmiao Wang, Qiyuan TianComments: 45 pages, 18 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [437] arXiv:2512.02496 [pdf, ps, other]
-
Title: Attention-guided reference point shifting for Gaussian-mixture-based partial point set registrationComments: 16 pages, 9 figures, 7 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [438] arXiv:2512.02492 [pdf, ps, other]
-
Title: YingVideo-MV: Music-Driven Multi-Stage Video GenerationComments: 18 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [439] arXiv:2512.02487 [pdf, ps, other]
-
Title: Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [440] arXiv:2512.02485 [pdf, ps, other]
-
Title: UCAgents: Unidirectional Convergence for Visual Evidence Anchored Multi-Agent Medical Decision-MakingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [441] arXiv:2512.02482 [pdf, ps, other]
-
Title: G-SHARP: Gaussian Surgical Hardware Accelerated Real-time PipelineAuthors: Vishwesh Nath, Javier G. Tejero, Ruilong Li, Filippo Filicori, Mahdi Azizian, Sean D. HuverSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [442] arXiv:2512.02473 [pdf, ps, other]
-
Title: WorldPack: Compressed Memory Improves Spatial Consistency in Video World ModelingSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [443] arXiv:2512.02469 [pdf, ps, other]
-
Title: TGDD: Trajectory Guided Dataset Distillation with Balanced DistributionComments: Accepted in AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [444] arXiv:2512.02458 [pdf, ps, other]
-
Title: Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and ExplorationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [445] arXiv:2512.02457 [pdf, ps, other]
-
Title: Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video GenerationAuthors: Jianzong Wu, Hao Lian, Dachao Hao, Ye Tian, Qingyu Shi, Biaolong Chen, Hao Jiang, Yunhai TongComments: Project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [446] arXiv:2512.02456 [pdf, ps, other]
-
Title: See, Think, Learn: A Self-Taught Multimodal ReasonerComments: Winter Conference on Applications of Computer Vision 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [447] arXiv:2512.02453 [pdf, ps, other]
-
Title: ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [448] arXiv:2512.02450 [pdf, ps, other]
-
Title: HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the WildAuthors: Valentin Bieri, Marie-Julie Rakotosaona, Keisuke Tateno, Francis Engelmann, Leonidas GuibasComments: NeurIPS 2025 (Datasets and Benchmarks Track) Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [449] arXiv:2512.02448 [pdf, ps, other]
-
Title: nuScenes Revisited: Progress and Challenges in Autonomous DrivingComments: 18 pages, 17 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [450] arXiv:2512.02447 [pdf, ps, other]
-
Title: Temporal Dynamics Enhancer for Directly Trained Spiking Object DetectorsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [451] arXiv:2512.02441 [pdf, ps, other]
-
Title: Basis-Oriented Low-rank Transfer for Few-Shot and Test-Time AdaptationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [452] arXiv:2512.02438 [pdf, ps, other]
-
Title: Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing ResourcesComments: WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [453] arXiv:2512.02437 [pdf, ps, other]
-
Title: LightHCG: a Lightweight yet powerful HSIC Disentanglement based Causal Glaucoma Detection Model frameworkAuthors: Daeyoung KimSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [454] arXiv:2512.02425 [pdf, ps, other]
-
Title: WorldMM: Dynamic Multimodal Memory Agent for Long Video ReasoningComments: Project page : this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
- [455] arXiv:2512.02423 [pdf, ps, other]
-
Title: GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement LearningAuthors: Haolong Yan, Yeqing Shen, Xin Huang, Jia Wang, Kaijun Tan, Zhixuan Liang, Hongxin Li, Zheng Ge, Osamu Yoshie, Si Li, Xiangyu Zhang, Daxin JiangComments: 26 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [456] arXiv:2512.02421 [pdf, ps, other]
-
Title: Generalizing Vision-Language Models with Dedicated Prompt GuidanceComments: Accepted to AAAI26Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [457] arXiv:2512.02413 [pdf, ps, other]
-
Title: MitUNet: Enhancing Floor Plan Recognition using a Hybrid Mix-Transformer and U-Net ArchitectureComments: 9 pages, 4 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [458] arXiv:2512.02405 [pdf, ps, other]
-
Title: WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent DebateSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [459] arXiv:2512.02400 [pdf, ps, other]
-
Title: Nav-$R^2$ Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal NavigationAuthors: Wentao Xiang, Haokang Zhang, Tianhang Yang, Zedong Chu, Ruihang Chu, Shichao Xie, Yujian Yuan, Jian Sun, Zhining Gu, Junjie Wang, Xiaolong Wu, Mu Xu, Yujiu YangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [460] arXiv:2512.02395 [pdf, ps, other]
-
Title: Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearchAuthors: Yifan Zhang, Liang Hu, Haofeng Sun, Peiyu Wang, Yichen Wei, Shukang Yin, Jiangbo Pei, Wei Shen, Peng Xia, Yi Peng, Tianyidan Xie, Eric Li, Yang Liu, Xuchen Song, Yahui ZhouComments: 21 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [461] arXiv:2512.02394 [pdf, ps, other]
-
Title: Reproducing and Extending RaDelft 4D Radar with Camera-Assisted LabelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [462] arXiv:2512.02392 [pdf, ps, other]
-
Title: From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object TrackingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [463] arXiv:2512.02375 [pdf, ps, other]
-
Title: On-the-fly Feedback SfM: Online Explore-and-Exploit UAV Photogrammetry with Incremental Mesh Quality-Aware Indicator and Predictive Path PlanningComments: This work was submitted to IEEE GRSM Journal for consideration.COPYRIGHT would be transferred once it get acceptedSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [464] arXiv:2512.02369 [pdf, ps, other]
-
Title: SAGE: Style-Adaptive Generalization for Privacy-Constrained Semantic Segmentation Across DomainsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [465] arXiv:2512.02368 [pdf, ps, other]
-
Title: Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective AttentionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [466] arXiv:2512.02364 [pdf, ps, other]
-
Title: Tackling Tuberculosis: A Comparative Dive into Machine Learning for Tuberculosis DetectionJournal-ref: Vol. 6, No. 1 (2024), Minnesota Undergraduate Research & Academic Journal (MURAJ)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [467] arXiv:2512.02361 [pdf, ps, other]
- [468] arXiv:2512.02359 [pdf, ps, other]
-
Title: WSCF-MVCC: Weakly-supervised Calibration-free Multi-view Crowd CountingComments: PRCV 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [469] arXiv:2512.02351 [pdf, ps, other]
-
Title: Understanding and Harnessing Sparsity in Unified Multimodal ModelsComments: 13 pages, 13 figures, 8 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [470] arXiv:2512.02344 [pdf, ps, other]
-
Title: A multi-weight self-matching visual explanation for cnns on sar imagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [471] arXiv:2512.02341 [pdf, ps, other]
-
Title: TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [472] arXiv:2512.02339 [pdf, ps, other]
-
Title: Video Diffusion Models Excel at Tracking Similar-Looking Objects Without SupervisionComments: Accepted at NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [473] arXiv:2512.02290 [pdf, ps, other]
-
Title: Enhancing Cross Domain SAR Oil Spill Segmentation via Morphological Region Perturbation and Synthetic Label-to-SAR GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [474] arXiv:2512.02273 [pdf, ps, other]
-
Title: Progressive Image Restoration via Text-Conditioned Video GenerationComments: First two authors contributed equally to this work. IEEE ICNC AcceptedSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [475] arXiv:2512.02268 [pdf, ps, other]
-
Title: Spatiotemporal Pyramid Flow Matching for Climate EmulationAuthors: Jeremy Andrew Irvin, Jiaqi Han, Zikui Wang, Abdulaziz Alharbi, Yufei Zhao, Nomin-Erdene Bayarsaikhan, Daniele Visioni, Andrew Y. Ng, Duncan Watson-ParrisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
- [476] arXiv:2512.02258 [pdf, ps, other]
-
Title: Exploring the Potentials of Spiking Neural Networks for Image DerainingComments: Accepted By AAAI2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [477] arXiv:2512.02231 [pdf, ps, other]
-
Title: See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language ModelsAuthors: Le Thien Phuc Nguyen, Zhuoran Yu, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae LeeComments: preprintSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [478] arXiv:2512.02224 [pdf, ps, other]
-
Title: Towards Unified Video Quality AssessmentComments: 8 pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [479] arXiv:2512.02198 [pdf, ps, other]
-
Title: Multifractal Recalibration of Neural Networks for Medical Imaging SegmentationComments: 30 pages, 9 figures, journal paperSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [480] arXiv:2512.02188 [pdf, ps, other]
-
Title: RobustSurg: Tackling domain generalisation for out-of-distribution surgical scene segmentationComments: Submitted to Medical Image AnalysisSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [481] arXiv:2512.02172 [pdf, ps, other]
-
Title: SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian SplattingComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
- [482] arXiv:2512.02162 [pdf, ps, other]
-
Title: Mapping of Lesion Images to Somatic MutationsAuthors: Rahul MehtaComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
- [483] arXiv:2512.02161 [pdf, ps, other]
-
Title: FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model JudgesAuthors: Kevin David Hayes, Micah Goldblum, Vikash Sehwag, Gowthami Somepalli, Ashwinee Panda, Tom GoldsteinComments: Accepted to NeurIPS 2025 Datasets and Benchmarks TrackSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [484] arXiv:2512.02152 [pdf, ps, other]
-
Title: Context-Enriched Contrastive Loss: Enhancing Presentation of Inherent Sample Connections in Contrastive Learning FrameworkComments: 13 pages, 7 figures. Published in IEEE Transactions on Multimedia. Code available at: this https URLJournal-ref: IEEE Transactions on Multimedia, Vol. 27, pp. 429-441, December 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [485] arXiv:2512.02055 [pdf, ps, other]
-
Title: Leveraging AI multimodal geospatial foundation models for improved near-real-time flood mapping at a global scaleAuthors: Mirela G. Tulbure, Julio Caineta, Mark Broich, Mollie D. Gaines, Philippe Rufin, Leon-Friedrich Thomas, Hamed Alemohammad, Jan Hemmerling, Patrick HostertSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [486] arXiv:2512.03028 (cross-list from cs.GR) [pdf, ps, other]
-
Title: SMP: Reusable Score-Matching Motion Priors for Physics-Based Character ControlAuthors: Yuxuan Mu, Ziyu Zhang, Yi Shi, Minami Matsumoto, Kotaro Imamura, Guy Tevet, Chuan Guo, Michael Taylor, Chang Shu, Pengcheng Xi, Xue Bin PengComments: 14 pages, 9 figuresSubjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [487] arXiv:2512.02920 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Learning Multimodal Embeddings for Traffic Accident Prediction and Causal EstimationComments: 17 pages. To appear in KDD'26 DatasetsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)
- [488] arXiv:2512.02787 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Diagnose, Correct, and Learn from Manipulation Failures via Visual SymbolsAuthors: Xianchao Zeng, Xinyu Zhou, Youcheng Li, Jiayou Shi, Tianle Li, Liangming Chen, Lei Ren, Yong-Lu LiSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [489] arXiv:2512.02719 (cross-list from cs.CL) [pdf, ps, other]
-
Title: Emergent Bayesian Behaviour and Optimal Cue Combination in LLMsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
- [490] arXiv:2512.02651 (cross-list from cs.HC) [pdf, ps, other]
-
Title: Real-Time Multimodal Data Collection Using Smartwatches and Its Visualization in EducationComments: Accepted in Technological Ecosystems for Enhancing Multiculturality (TEEM) 2025Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
- [491] arXiv:2512.02636 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based ModelsAuthors: Xinyue Ai, Yutong He, Albert Gu, Ruslan Salakhutdinov, J Zico Kolter, Nicholas Matthew Boffi, Max SimchowitzSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [492] arXiv:2512.02609 (cross-list from cs.RO) [pdf, ps, other]
-
Title: SAM2Grasp: Resolve Multi-modal Grasping via Prompt-conditioned Temporal Action PredictionAuthors: Shengkai Wu, Jinrong Yang, Wenqiu Luo, Linfeng Gao, Chaohui Shang, Meiyu Zhi, Mingshan Sun, Fangping Yang, Liangliang Ren, Yong ZhaoSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [493] arXiv:2512.02340 (cross-list from cs.AI) [pdf, ps, other]
-
Title: Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science PerspectiveComments: 23 pages, 37 figuresSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [494] arXiv:2512.02306 (cross-list from cs.AI) [pdf, ps, other]
-
Title: OmniGuard: Unified Omni-Modal Guardrails with Deliberate ReasoningSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [495] arXiv:2512.02293 (cross-list from cs.RO) [pdf, ps, other]
-
Title: VIGS-SLAM: Visual Inertial Gaussian Splatting SLAMComments: Project page: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [496] arXiv:2512.02280 (cross-list from cs.AI) [pdf, ps, other]
-
Title: Bridging the Gap: Toward Cognitive Autonomy in Artificial IntelligenceSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [497] arXiv:2512.02243 (cross-list from cs.CR) [pdf, ps, other]
-
Title: PhishSnap: Image-Based Phishing Detection Using Perceptual HashingComments: IEE Standard Formatting, 3 pages, 3 figuresSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [498] arXiv:2512.02143 (cross-list from cs.GR) [pdf, ps, other]
-
Title: CoatFusion: Controllable Material Coating in ImagesSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [499] arXiv:2512.02088 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Comparing Baseline and Day-1 Diffusion MRI Using Multimodal Deep Embeddings for Stroke Outcome PredictionComments: 5 pages, 5 figures, 2 tablesSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [500] arXiv:2512.02062 (cross-list from cs.CR) [pdf, ps, other]
-
Title: Superpixel Attack: Enhancing Black-box Adversarial Attack with Image-driven Division AreasSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Tue, 2 Dec 2025
- [501] arXiv:2512.02018 [pdf, ps, other]
-
Title: Data-Centric Visual Development for Self-Driving LabsComments: 11 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [502] arXiv:2512.02017 [pdf, ps, other]
-
Title: Visual Sync: Multi-Camera Synchronization via Cross-View Object MotionComments: Accepted to NeurIPS 2025. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [503] arXiv:2512.02016 [pdf, ps, other]
-
Title: Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for nowComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [504] arXiv:2512.02015 [pdf, ps, other]
-
Title: Generative Video Motion Editing with 3D Point TracksAuthors: Yao-Chih Lee, Zhoutong Zhang, Jiahui Huang, Jui-Hsien Wang, Joon-Young Lee, Jia-Bin Huang, Eli Shechtman, Zhengqi LiComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [505] arXiv:2512.02014 [pdf, ps, other]
-
Title: TUNA: Taming Unified Visual Representations for Native Unified Multimodal ModelsAuthors: Zhiheng Liu, Weiming Ren, Haozhe Liu, Zijian Zhou, Shoufa Chen, Haonan Qiu, Xiaoke Huang, Zhaochong An, Fanny Yang, Aditya Patel, Viktar Atliha, Tony Ng, Xiao Han, Chuyan Zhu, Chenyang Zhang, Ding Liu, Juan-Manuel Perez-Rua, Sen He, Jürgen Schmidhuber, Wenhu Chen, Ping Luo, Wei Liu, Tao Xiang, Jonas Schult, Yuren CongComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [506] arXiv:2512.02012 [pdf, ps, other]
-
Title: Improved Mean Flows: On the Challenges of Fastforward Generative ModelsComments: Technical reportSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [507] arXiv:2512.02009 [pdf, ps, other]
-
Title: AirSim360: A Panoramic Simulation Platform within Drone ViewAuthors: Xian Ge, Yuling Pan, Yuhang Zhang, Xiang Li, Weijun Zhang, Dizhe Zhang, Zhaoliang Wan, Xin Lin, Xiangkai Zhang, Juntao Liang, Jason Li, Wenjie Jiang, Bo Du, Ming-Hsuan Yang, Lu QiComments: Project Website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [508] arXiv:2512.02006 [pdf, ps, other]
-
Title: MV-TAP: Tracking Any Point in Multi-View VideosAuthors: Jahyeok Koo, Inès Hyeonsu Kim, Mungyeom Kim, Junghyun Park, Seohyun Park, Jaeyeong Kim, Jung Yi, Seokju Cho, Seungryong KimComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [509] arXiv:2512.02005 [pdf, ps, other]
-
Title: Learning Visual Affordance from AudioComments: 15 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [510] arXiv:2512.01989 [pdf, ps, other]
-
Title: PAI-Bench: A Comprehensive Benchmark For Physical AISubjects: Computer Vision and Pattern Recognition (cs.CV)
- [511] arXiv:2512.01988 [pdf, ps, other]
-
Title: Artemis: Structured Visual Reasoning for Perception Policy LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [512] arXiv:2512.01975 [pdf, ps, other]
-
Title: SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioningComments: Accept by AAAI-2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [513] arXiv:2512.01960 [pdf, ps, other]
-
Title: SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
- [514] arXiv:2512.01952 [pdf, ps, other]
-
Title: GrndCtrl: Grounding World Models via Self-Supervised Reward AlignmentAuthors: Haoyang He, Jay Patrikar, Dong-Ki Kim, Max Smith, Daniel McGann, Ali-akbar Agha-mohammadi, Shayegan Omidshafiei, Sebastian SchererSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [515] arXiv:2512.01949 [pdf, ps, other]
-
Title: Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language ModelsComments: Published in Transactions on Machine Learning Research, Project in this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [516] arXiv:2512.01934 [pdf, ps, other]
-
Title: Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial TrajectoryAuthors: Chenyi Wang, Yanmao Man, Raymond Muller, Ming Li, Z. Berkay Celik, Ryan Gerdes, Jonathan PetitComments: Accepted to Annual Computer Security Applications Conference (ACSAC) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [517] arXiv:2512.01922 [pdf, ps, other]
-
Title: Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive DecodingAuthors: Zahra Mahdavi, Zahra Khodakaramimaghsoud, Hooman Khaloo, Sina Bakhshandeh Taleshani, Erfan Hashemi, Javad Mirzapour Kaleybar, Omid Nejati ManzariJournal-ref: Computers in Biology and Medicine (2026)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [518] arXiv:2512.01908 [pdf, ps, other]
-
Title: SARL: Spatially-Aware Self-Supervised Representation Learning for Visuo-Tactile PerceptionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [519] arXiv:2512.01895 [pdf, ps, other]
-
Title: StyleYourSmile: Cross-Domain Face Retargeting Without Paired Multi-Style DataComments: 15 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [520] arXiv:2512.01889 [pdf, ps, other]
-
Title: KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAMAuthors: Zaid Nasser, Mikhail Iumanov, Tianhao Li, Maxim Popov, Jaafar Mahmoud, Malik Mohrat, Ilya Obrubov, Ekaterina Derevyanka, Ivan Sosin, Sergey KolyubinSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [521] arXiv:2512.01885 [pdf, ps, other]
-
Title: TransientTrack: Advanced Multi-Object Tracking and Classification of Cancer Cells with Transient Fluorescent SignalsAuthors: Florian Bürger, Martim Dias Gomes, Nica Gutu, Adrián E. Granada, Noémie Moreau, Katarzyna BozekComments: 13 pages, 7 figures, 2 tables. This work has been submitted to IEEE Transactions on Medical ImagingSubjects: Computer Vision and Pattern Recognition (cs.CV); Cell Behavior (q-bio.CB); Quantitative Methods (q-bio.QM)
- [522] arXiv:2512.01853 [pdf, ps, other]
-
Title: COACH: Collaborative Agents for Contextual Highlighting -- A Multi-Agent Framework for Sports Video AnalysisComments: Accepted by AAAI 2026 Workshop LaMASSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [523] arXiv:2512.01850 [pdf, ps, other]
-
Title: Register Any Point: Scaling 3D Point Cloud Registration by Flow MatchingComments: 22 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [524] arXiv:2512.01843 [pdf, ps, other]
-
Title: PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V ModelsComments: 17 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [525] arXiv:2512.01830 [pdf, ps, other]
-
Title: OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-CriticSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [526] arXiv:2512.01827 [pdf, ps, other]
-
Title: CauSight: Learning to Supersense for Visual Causal DiscoveryComments: project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [527] arXiv:2512.01821 [pdf, ps, other]
-
Title: Seeing through Imagination: Learning Scene Geometry via Implicit Spatial World ModelingAuthors: Meng Cao, Haokun Lin, Haoyuan Li, Haoran Tang, Rongtao Xu, Dong An, Xue Liu, Ian Reid, Xiaodan LiangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [528] arXiv:2512.01816 [pdf, ps, other]
-
Title: Envision: Benchmarking Unified Understanding & Generation for Causal World Process InsightsComments: 35 pages, 12 figures, 10 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [529] arXiv:2512.01803 [pdf, ps, other]
-
Title: Generative Action Tell-Tales: Assessing Human Motion in Synthesized VideosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [530] arXiv:2512.01789 [pdf, ps, other]
-
Title: SAM3-UNet: Simplified Adaptation of Segment Anything Model 3Comments: Technical ReportSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [531] arXiv:2512.01788 [pdf, ps, other]
-
Title: Learned Image Compression for Earth Observation: Implications for Downstream Segmentation TasksAuthors: Christian Mollière, Iker Cumplido, Marco Zeulner, Lukas Liesenhoff, Matthias Schubert, Julia GottfriedsenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [532] arXiv:2512.01774 [pdf, ps, other]
-
Title: Evaluating SAM2 for Video Semantic SegmentationAuthors: Syed Hesham Syed Ariff, Yun Liu, Guolei Sun, Jing Yang, Henghui Ding, Xue Geng, Xudong JiangComments: 17 pages, 3 figures and 7 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [533] arXiv:2512.01771 [pdf, ps, other]
-
Title: Robust Rigid and Non-Rigid Medical Image Registration Using Learnable Edge KernelsAuthors: Ahsan Raza Siyal, Markus Haltmeier, Ruth Steiger, Malik Galijasevic, Elke Ruth Gizewski, Astrid Ellen GramsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [534] arXiv:2512.01769 [pdf, ps, other]
-
Title: VideoScoop: A Non-Traditional Domain-Independent Framework For Video AnalysisAuthors: Hafsa BillahComments: This is a report submitted as part of PhD proposal defense of Hafsa BillahSubjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)
- [535] arXiv:2512.01763 [pdf, ps, other]
-
Title: HiconAgent: History Context-aware Policy Optimization for GUI AgentsAuthors: Xurui Zhou, Gongwei Chen, Yuquan Xie, Zaijing Li, Kaiwen Zhou, Shuai Wang, Shuo Yang, Zhuotao Tian, Rui ShaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [536] arXiv:2512.01755 [pdf, ps, other]
-
Title: FreqEdit: Preserving High-Frequency Features for Robust Multi-Turn Image EditingAuthors: Yucheng Liao, Jiajun Liang, Kaiqian Cui, Baoquan Zhao, Haoran Xie, Wei Liu, Qing Li, Xudong MaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [537] arXiv:2512.01707 [pdf, ps, other]
-
Title: StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming VideosAuthors: Daeun Lee, Subhojyoti Mukherjee, Branislav Kveton, Ryan A. Rossi, Viet Dac Lai, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Mohit BansalComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [538] arXiv:2512.01701 [pdf, ps, other]
-
Title: SSR: Semantic and Spatial Rectification for CLIP-based Weakly Supervised SegmentationComments: Accepted in AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [539] arXiv:2512.01686 [pdf, ps, other]
-
Title: DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [540] arXiv:2512.01681 [pdf, ps, other]
-
Title: Cross-Domain Validation of a Resection-Trained Self-Supervised Model on Multicentre Mesothelioma BiopsiesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [541] arXiv:2512.01677 [pdf, ps, other]
-
Title: Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware RepresentationAuthors: Haodong Yan, Hang Yu, Zhide Zhong, Weilin Yuan, Xin Gong, Zehang Luo, Chengxi Heyu, Junfeng Li, Wenxuan Song, Shunbo Zhou, Haoang LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [542] arXiv:2512.01675 [pdf, ps, other]
-
Title: GRASP: Guided Residual Adapters with Sample-wise PartitioningComments: 10 pages, 4 figures, 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [543] arXiv:2512.01665 [pdf, ps, other]
-
Title: Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing ImagerySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [544] arXiv:2512.01657 [pdf, ps, other]
-
Title: DB-KAUNet: An Adaptive Dual Branch Kolmogorov-Arnold UNet for Retinal Vessel SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [545] arXiv:2512.01643 [pdf, ps, other]
-
Title: ViT$^3$: Unlocking Test-Time Training in VisionAuthors: Dongchen Han, Yining Li, Tianyu Li, Zixuan Cao, Ziming Wang, Jun Song, Yu Cheng, Bo Zheng, Gao HuangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [546] arXiv:2512.01636 [pdf, ps, other]
-
Title: Generative Editing in the Joint Vision-Language Space for Zero-Shot Composed Image RetrievalSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [547] arXiv:2512.01629 [pdf, ps, other]
-
Title: SPARK: Sim-ready Part-level Articulated Reconstruction with VLM KnowledgeComments: Project page: this https URL 17 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [548] arXiv:2512.01611 [pdf, ps, other]
-
Title: Depth Matching Method Based on ShapeDTW for Oil-Based Mud ImagerSubjects: Computer Vision and Pattern Recognition (cs.CV); Geophysics (physics.geo-ph)
- [549] arXiv:2512.01589 [pdf, ps, other]
-
Title: Toward Content-based Indexing and Retrieval of Head and Neck CT with Abscess SegmentationAuthors: Thao Thi Phuong Dao, Tan-Cong Nguyen, Trong-Le Do, Truong Hoang Viet, Nguyen Chi Thanh, Huynh Nguyen Thuan, Do Vo Cong Nguyen, Minh-Khoi Pham, Mai-Khiem Tran, Viet-Tham Huynh, Trong-Thuan Nguyen, Trung-Nghia Le, Vo Thanh Toan, Tam V. Nguyen, Minh-Triet Tran, Thanh Dinh LeComments: The 2025 IEEE International Conference on Content-Based Multimedia Indexing (IEEE CBMI)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [550] arXiv:2512.01582 [pdf, ps, other]
-
Title: RoleMotion: A Large-Scale Dataset towards Robust Scene-Specific Role-Playing Motion Synthesis with Fine-grained DescriptionsAuthors: Junran Peng, Yiheng Huang, Silei Shen, Zeji Wei, Jingwei Yang, Baojie Wang, Yonghao He, Chuanchen Luo, Man Zhang, Xucheng Yin, Wei SuiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [551] arXiv:2512.01563 [pdf, ps, other]
-
Title: MasHeNe: A Benchmark for Head and Neck CT Mass Segmentation using Window-Enhanced Mamba with Frequency-Domain IntegrationAuthors: Thao Thi Phuong Dao, Tan-Cong Nguyen, Nguyen Chi Thanh, Truong Hoang Viet, Trong-Le Do, Mai-Khiem Tran, Minh-Khoi Pham, Trung-Nghia Le, Minh-Triet Tran, Thanh Dinh LeComments: The 14th International Symposium on Information and Communication Technology Conference SoICT 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [552] arXiv:2512.01540 [pdf, ps, other]
-
Title: FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor AttentionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [553] arXiv:2512.01534 [pdf, ps, other]
-
Title: Deep Unsupervised Anomaly Detection in Brain Imaging: Large-Scale Benchmarking and Bias AnalysisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [554] arXiv:2512.01533 [pdf, ps, other]
-
Title: Diffusion Fuzzy System: Fuzzy Rule Guided Latent Multi-Path Diffusion ModelingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [555] arXiv:2512.01519 [pdf, ps, other]
-
Title: QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic InteractionsSubjects: Computer Vision and Pattern Recognition (cs.CV); Materials Science (cond-mat.mtrl-sci); Quantum Physics (quant-ph)
- [556] arXiv:2512.01510 [pdf, ps, other]
-
Title: Semantic-aware Random Convolution and Source Matching for Domain Generalization in Medical Image SegmentationAuthors: Franz Thaler, Martin Urschler, Mateusz Kozinski, Matthias AF Gsell, Gernot Plank, Darko SternComments: Preprint submitted to Computer Methods and Programs in Biomedicine (currently under revision)Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [557] arXiv:2512.01495 [pdf, ps, other]
-
Title: ELVIS: Enhance Low-Light for Video Instance Segmentation in the DarkSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [558] arXiv:2512.01494 [pdf, other]
-
Title: A variational method for curve extraction with curvature-dependent energiesAuthors: Majid Arthaud (ENPC, MOKAPLAN, UMich), Antonin Chambolle (CEREMADE, MOKAPLAN), Vincent Duval (MOKAPLAN)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [559] arXiv:2512.01481 [pdf, ps, other]
-
Title: ChronosObserver: Taming 4D World with Hyperspace Diffusion SamplingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [560] arXiv:2512.01478 [pdf, ps, other]
-
Title: CourtMotion: Learning Event-Driven Motion Representations from Skeletal Data for BasketballAuthors: Omer Sela (1 and 2), Michael Chertok (1), Lior Wolf (2) ((1) Amazon, (2) Tel Aviv University)Subjects: Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
- [561] arXiv:2512.01444 [pdf, ps, other]
-
Title: FastAnimate: Towards Learnable Template Construction and Pose Deformation for Fast 3D Human Avatar AnimationComments: 9 pages,4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [562] arXiv:2512.01427 [pdf, ps, other]
-
Title: Language-Guided Open-World Anomaly SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [563] arXiv:2512.01426 [pdf, ps, other]
-
Title: ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion TransformersComments: 8 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [564] arXiv:2512.01424 [pdf, ps, other]
-
Title: ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language ModelsComments: 22 pages, 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [565] arXiv:2512.01422 [pdf, ps, other]
-
Title: MDiff4STR: Mask Diffusion Model for Scene Text RecognitionComments: Accepted by AAAI 2026 (Oral)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [566] arXiv:2512.01419 [pdf, ps, other]
-
Title: Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN CountriesAuthors: Tushar Pranav, Eshan Pandey, Austria Lyka Diane Bala, Aman Chadha, Indriyati Atmosukarto, Donny Soh Cheng LockComments: 14 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [567] arXiv:2512.01390 [pdf, ps, other]
-
Title: FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-ResolutionComments: Comments: Please visit our project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [568] arXiv:2512.01383 [pdf, ps, other]
-
Title: PointNet4D: A Lightweight 4D Point Cloud Video Backbone for Online and Offline Perception in Robotic ApplicationsComments: Accepted by WACV2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [569] arXiv:2512.01382 [pdf, ps, other]
-
Title: Reversible Inversion for Training-Free Exemplar-guided Image EditingAuthors: Yuke Li, Lianli Gao, Ji Zhang, Pengpeng Zeng, Lichuan Xiang, Hongkai Wen, Heng Tao Shen, Jingkuan SongSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [570] arXiv:2512.01380 [pdf, ps, other]
-
Title: Textured Geometry Evaluation: Perceptual 3D Textured Shape Metric via 3D Latent-Geometry NetworkAuthors: Tianyu Luan, Xuelu Feng, Zixin Zhu, Phani Nuney, Sheng Liu, Xuan Gong, David Doermann, Chunming Qiao, Junsong YuanComments: Accepted by AAAI26Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [571] arXiv:2512.01373 [pdf, ps, other]
-
Title: SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape EvaluationComments: Accepted by AAAI2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [572] arXiv:2512.01366 [pdf, ps, other]
-
Title: BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single EarbudAuthors: Yunzhe Li, Jiajun Yan, Yuzhou Wei, Kechen Liu, Yize Zhao, Chong Zhang, Hongzi Zhu, Li Lu, Shan Chang, Minyi GuoComments: This is the author-accepted version of the paper published in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Vol. 9, No. 4, Article 191, 2025. Final published version: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [573] arXiv:2512.01352 [pdf, ps, other]
-
Title: OpenBox: Annotate Any Bounding Boxes in 3DComments: Accepted by NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [574] arXiv:2512.01348 [pdf, ps, other]
-
Title: Handwritten Text Recognition for Low Resource LanguagesComments: 21 PagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [575] arXiv:2512.01342 [pdf, ps, other]
-
Title: InternVideo-Next: Towards General Video Foundation Models without Video-Text SupervisionAuthors: Chenting Wang, Yuhan Zhu, Yicheng Xu, Jiange Yang, Ziang Yan, Yali Wang, Yi Wang, Limin WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [576] arXiv:2512.01340 [pdf, ps, other]
-
Title: EvalTalker: Learning to Evaluate Real-Portrait-Driven Multi-Subject Talking HumansAuthors: Yingjie Zhou, Xilei Zhu, Siyu Ren, Ziyi Zhao, Ziwen Wang, Farong Wen, Yu Zhou, Jiezhang Cao, Xiongkuo Min, Fengjiao Chen, Xiaoyu Li, Xuezhi Cao, Guangtao Zhai, Xiaohong LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [577] arXiv:2512.01334 [pdf, ps, other]
-
Title: AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video GenerationAuthors: Yexin Liu, Wen-Jie Shu, Zile Huang, Haoze Zheng, Yueze Wang, Manyuan Zhang, Ser-Nam Lim, Harry YangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [578] arXiv:2512.01333 [pdf, ps, other]
-
Title: Optimizing Stroke Risk Prediction: A Machine Learning Pipeline Combining ROS-Balanced Ensembles and XAISubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [579] arXiv:2512.01319 [pdf, ps, other]
-
Title: Rethinking Intracranial Aneurysm Vessel Segmentation: A Perspective from Computational Fluid Dynamics ApplicationsAuthors: Feiyang Xiao, Yichi Zhang, Xigui Li, Yuanye Zhou, Chen Jiang, Xin Guo, Limei Han, Yuxin Li, Fengping Zhu, Yuan ChengComments: 18 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [580] arXiv:2512.01315 [pdf, ps, other]
-
Title: FOD-S2R: A FOD Dataset for Sim2Real Transfer Learning based Object DetectionComments: 8 pages, 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [581] arXiv:2512.01314 [pdf, ps, other]
-
Title: TokenPure: Watermark Removal through Tokenized Appearance and Structural GuidanceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [582] arXiv:2512.01312 [pdf, ps, other]
-
Title: IVCR-200K: A Large-Scale Multi-turn Dialogue Benchmark for Interactive Video Corpus RetrievalAuthors: Ning Han, Yawen Zeng, Shaohua Long, Chengqing Li, Sijie Yang, Dun Tan, Jianfeng Dong, Jingjing ChenComments: Accepted by SIGIR2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [583] arXiv:2512.01310 [pdf, ps, other]
-
Title: Lost in Distortion: Uncovering the Domain Gap Between Computer Vision and Brain Imaging - A Study on Pretraining for Age PredictionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [584] arXiv:2512.01306 [pdf, ps, other]
-
Title: Gaussian Swaying: Surface-Based Framework for Aerodynamic Simulation with 3D GaussiansComments: Accepted to WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [585] arXiv:2512.01302 [pdf, ps, other]
-
Title: DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer StrategyComments: Accepted to WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [586] arXiv:2512.01298 [pdf, ps, other]
-
Title: TBT-Former: Learning Temporal Boundary Distributions for Action LocalizationComments: 8 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [587] arXiv:2512.01296 [pdf, ps, other]
-
Title: EGG-Fusion: Efficient 3D Reconstruction with Geometry-aware Gaussian Surfel on the FlyComments: SIGGRAPH ASIA 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [588] arXiv:2512.01292 [pdf, ps, other]
-
Title: Diffusion Model in Latent Space for Medical Image Segmentation TaskSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [589] arXiv:2512.01291 [pdf, ps, other]
-
Title: Supervised Contrastive Machine Unlearning of Background Bias in Sonar Image Classification with Fine-Grained Explainable AIComments: Accepted to CVIP 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [590] arXiv:2512.01273 [pdf, ps, other]
-
Title: nnMobileNet++: Towards Efficient Hybrid Networks for Retinal Image AnalysisSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [591] arXiv:2512.01268 [pdf, ps, other]
-
Title: ViscNet: Vision-Based In-line Viscometry for Fluid Mixing ProcessSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [592] arXiv:2512.01248 [pdf, ps, other]
-
Title: TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table RecognitionAuthors: Junyuan Zhang, Bin Wang, Qintong Zhang, Fan Wu, Zichen Wen, Jialin Lu, Junjie Shan, Ziqi Zhao, Shuya Yang, Ziling Wang, Ziyang Miao, Huaping Zhong, Yuhang Zang, Xiaoyi Dong, Ka-Ho Chow, Conghui HeSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [593] arXiv:2512.01242 [pdf, ps, other]
-
Title: Generative Adversarial Gumbel MCTS for Abstract Visual Composition GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [594] arXiv:2512.01236 [pdf, ps, other]
-
Title: PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency RewardsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [595] arXiv:2512.01223 [pdf, ps, other]
-
Title: S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural GuidanceComments: 18 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [596] arXiv:2512.01214 [pdf, ps, other]
-
Title: M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local AnalysisComments: 12 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [597] arXiv:2512.01213 [pdf, ps, other]
-
Title: Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two FormulationsAuthors: Yangbangyan Jiang, Qianqian Xu, Huiyang Shao, Zhiyong Yang, Shilong Bao, Xiaochun Cao, Qingming HuangSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [598] arXiv:2512.01204 [pdf, ps, other]
-
Title: TabletopGen: Instance-Level Interactive 3D Tabletop Scene Generation from Text or Single ImageAuthors: Ziqian Wang, Yonghao He, Licheng Yang, Wei Zou, Hongxuan Ma, Liu Liu, Wei Sui, Yuxin Guo, Hu SuComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [599] arXiv:2512.01178 [pdf, ps, other]
-
Title: VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette RenderingComments: arXiv admin note: text overlap with arXiv:2404.00149Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [600] arXiv:2512.01165 [pdf, ps, other]
-
Title: Real-Time On-the-Go Annotation Framework Using YOLO for Automated Dataset GenerationComments: Copyright 2025 IEEE. This is the author's version of the work that has been accepted for publication in Proceedings of the 5. Interdisciplinary Conference on Electrics and Computer (INTCEC 2025) 15-16 September 2025, Chicago-USA. The final version of record is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [601] arXiv:2512.01153 [pdf, ps, other]
-
Title: DPAC: Distribution-Preserving Adversarial Control for Diffusion SamplingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [602] arXiv:2512.01148 [pdf, ps, other]
-
Title: SocialFusion: Addressing Social Degradation in Pre-trained Vision-Language ModelsComments: 22 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [603] arXiv:2512.01145 [pdf, ps, other]
-
Title: Weakly Supervised Continuous Micro-Expression Intensity Estimation Using Temporal Deep Neural NetworkAuthors: Riyadh Mohammed Almushrafy (Majmaah University, Saudi Arabia)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [604] arXiv:2512.01128 [pdf, ps, other]
-
Title: OmniFD: A Unified Model for Versatile Face Forgery DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [605] arXiv:2512.01116 [pdf, ps, other]
-
Title: Structural Prognostic Event Modeling for Multimodal Cancer Survival AnalysisComments: 37 pages, 14 FiguresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [606] arXiv:2512.01103 [pdf, ps, other]
-
Title: Learning Eigenstructures of Unstructured Data ManifoldsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [607] arXiv:2512.01095 [pdf, ps, other]
-
Title: CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State TransitionsAuthors: Simon Kohaut, Daniel Ochs, Shun Zhang, Benedict Flade, Julian Eggert, Kristian Kersting, Devendra Singh DhamiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [608] arXiv:2512.01094 [pdf, ps, other]
-
Title: Accelerating Inference of Masked Image Generators via Reinforcement LearningComments: 15 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [609] arXiv:2512.01085 [pdf, ps, other]
-
Title: Generalized Medical Phrase GroundingSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [610] arXiv:2512.01059 [pdf, ps, other]
-
Title: Parameter Reduction Improves Vision Transformers: A Comparative Study of Sharing and Width ReductionAuthors: Anantha Padmanaban Krishna Kumar (Boston University)Comments: 7 pages total (6 pages main text, 1 page references), 1 figures, 2 tables. Code available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [611] arXiv:2512.01048 [pdf, ps, other]
-
Title: TRoVe: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language ModelsComments: NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [612] arXiv:2512.01030 [pdf, ps, other]
-
Title: Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative ModelComments: Work done at the Hong Kong University of Science and Technology (Guangzhou). Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [613] arXiv:2512.01008 [pdf, ps, other]
-
Title: LISA-3D: Lifting Language-Image Segmentation to 3D via Multi-View ConsistencySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [614] arXiv:2512.00999 [pdf, ps, other]
-
Title: Provenance-Driven Reliable Semantic Medical Image Vector Reconstruction via Lightweight Blockchain-Verified Latent FingerprintsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [615] arXiv:2512.00995 [pdf, ps, other]
-
Title: S2AM3D: Scale-controllable Part Segmentation of 3D Point CloudSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [616] arXiv:2512.00993 [pdf, ps, other]
-
Title: PhotoFramer: Multi-modal Image Composition InstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [617] arXiv:2512.00975 [pdf, ps, other]
-
Title: MM-ACT: Learn from Multimodal Parallel Generation to ActAuthors: Haotian Liang, Xinyi Chen, Bin Wang, Mingkang Chen, Yitian Liu, Yuhao Zhang, Zanxin Chen, Tianshuo Yang, Yilun Chen, Jiangmiao Pang, Dong Liu, Xiaokang Yang, Yao Mu, Wenqi Shao, Ping LuoComments: 17 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
- [618] arXiv:2512.00960 [pdf, ps, other]
-
Title: Efficient and Scalable Monocular Human-Object Interaction Motion ReconstructionAuthors: Boran Wen, Ye Lu, Keyan Wan, Sirui Wang, Jiahong Zhou, Junxuan Liang, Xinpeng Liu, Bang Xiao, Dingbang Huang, Ruiyang Liu, Yong-Lu LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [619] arXiv:2512.00953 [pdf, ps, other]
-
Title: Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment RetrievalAuthors: Haojian Huang, Kaijing Ma, Jin Chen, Haodong Chen, Zhou Wu, Xianghao Zang, Han Fang, Chao Ban, Hao Sun, Mulin Chen, Zhongjiang HeComments: Accepted by AAAI 2026, 10 pages, 9 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [620] arXiv:2512.00944 [pdf, ps, other]
-
Title: Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian SegmentationAuthors: An Yang, Chenyu Liu, Jun Du, Jianqing Gao, Jia Pan, Jinshui Hu, Baocai Yin, Bing Yin, Cong LiuJournal-ref: AAAI2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [621] arXiv:2512.00936 [pdf, ps, other]
-
Title: SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph GroundingComments: Accepted to WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [622] arXiv:2512.00927 [pdf, ps, other]
-
Title: LAHNet: Local Attentive Hashing Network for Point Cloud RegistrationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [623] arXiv:2512.00912 [pdf, ps, other]
-
Title: ForamDeepSlice: A High-Accuracy Deep Learning Framework for Foraminifera Species Classification from 2D Micro-CT SlicesAuthors: Abdelghafour Halimi, Ali Alibrahim, Didier Barradas-Bautista, Ronell Sicat, Abdulkader M. AfifiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [624] arXiv:2512.00911 [pdf, ps, other]
-
Title: Dual-Projection Fusion for Accurate Upright Panorama Generation in Robotic VisionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [625] arXiv:2512.00909 [pdf, ps, other]
-
Title: TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion ModelAuthors: Alireza Javanmardi, Pragati Jaiswal, Tewodros Amberbir Habtegebrial, Christen Millerdurai, Shaoxiang Wang, Alain Pagani, Didier StrickerComments: WACV 2026, Project page available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [626] arXiv:2512.00904 [pdf, ps, other]
-
Title: Hierarchical Semantic Alignment for Image ClusteringComments: AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [627] arXiv:2512.00903 [pdf, ps, other]
-
Title: SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal OverheadAuthors: Chaojun Ni, Cheng Chen, Xiaofeng Wang, Zheng Zhu, Wenzhao Zheng, Boyuan Wang, Tianrun Chen, Guosheng Zhao, Haoyun Li, Zhehao Dong, Qiang Zhang, Yun Ye, Yang Wang, Guan Huang, Wenjun MeiSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [628] arXiv:2512.00891 [pdf, ps, other]
-
Title: Accelerating Streaming Video Large Language Models via Hierarchical Token CompressionAuthors: Yiyu Wang, Xuyang Liu, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, Linfeng ZhangComments: Code is avaliable at \url{this https URL}Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [629] arXiv:2512.00887 [pdf, ps, other]
-
Title: Multilingual Training-Free Remote Sensing Image CaptioningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [630] arXiv:2512.00885 [pdf, ps, other]
-
Title: HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction DynamicsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [631] arXiv:2512.00882 [pdf, ps, other]
-
Title: Look, Recite, Then Answer: Enhancing VLM Performance via Self-Generated Knowledge HintsAuthors: Xisheng FengSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [632] arXiv:2512.00880 [pdf, ps, other]
-
Title: Quantum-Inspired Spectral Geometry for Neural Operator Equivalence and Structured PruningComments: 6 pages, 1 figure, preliminary version; concepts and simulation experiments onlySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [633] arXiv:2512.00877 [pdf, ps, other]
-
Title: Feed-Forward 3D Gaussian Splatting Compression with Long-Context ModelingAuthors: Zhening Liu, Rui Song, Yushi Huang, Yingdong Hu, Xinjie Zhang, Jiawei Shao, Zehong Lin, Jun ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [634] arXiv:2512.00873 [pdf, ps, other]
-
Title: Neural Discrete Representation Learning for Sparse-View CBCT Reconstruction: From Algorithm Design to Prospective Multicenter Clinical EvaluationAuthors: Haoshen Wang, Lei Chen, Wei-Hua Zhang, Linxia Wu, Yong Luo, Zengmao Wang, Yuan Xiong, Chengcheng Zhu, Wenjuan Tang, Xueyi Zhang, Wei Zhou, Xuhua Duan, Lefei Zhang, Gao-Jun Teng, Bo Du, Huangxuan ZhaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [635] arXiv:2512.00872 [pdf, ps, other]
-
Title: TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation ModelsAuthors: Tim Veenboer, George Yiasemis, Eric Marcus, Vivien Van Veldhuizen, Cees G. M. Snoek, Jonas Teuwen, Kevin B. W. Groot LipmanComments: 22 pages, 4 figures, 8 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [636] arXiv:2512.00850 [pdf, ps, other]
-
Title: Smol-GS: Compact Representations for Abstract 3D Gaussian SplattingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [637] arXiv:2512.00846 [pdf, ps, other]
-
Title: AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agentComments: Accepted at WACV 2026 ConferenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [638] arXiv:2512.00832 [pdf, ps, other]
-
Title: PanFlow: Decoupled Motion Control for Panoramic Video GenerationAuthors: Cheng Zhang, Hanwen Liang, Donny Y. Chen, Qianyi Wu, Konstantinos N. Plataniotis, Camilo Cruz Gambardella, Jianfei CaiComments: Accepted by AAAI. Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [639] arXiv:2512.00814 [pdf, ps, other]
-
Title: IRPO: Boosting Image Restoration via Post-training GRPOAuthors: Haoxuan Xu. Yi Liu, Boyuan Jiang, Jinlong Peng, Donghao Luo, Xiaobin Hu, Shuicheng Yan, Haoang LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [640] arXiv:2512.00805 [pdf, ps, other]
-
Title: Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video UnderstandingAuthors: Pengfei Hu, Meng Cao, Yingyao Wang, Yi Wang, Jiahua Dong, Jun Song, Yu Cheng, Bo Zheng, Xiaodan LiangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [641] arXiv:2512.00796 [pdf, ps, other]
-
Title: CircleFlow: Flow-Guided Camera Blur Estimation using a Circle Grid TargetSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [642] arXiv:2512.00794 [pdf, ps, other]
-
Title: PolarGS: Polarimetric Cues for Ambiguity-Free Gaussian Splatting with Accurate Geometry RecoverySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [643] arXiv:2512.00773 [pdf, ps, other]
-
Title: DEJIMA: A Novel Large-scale Japanese Dataset for Image Captioning and Visual Question AnsweringAuthors: Toshiki Katsube, Taiga Fukuhara, Kenichiro Ando, Yusuke Mukuta, Kohei Uehara, Tatsuya HaradaSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [644] arXiv:2512.00771 [pdf, ps, other]
-
Title: EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting ScenesAuthors: Xiaoshan Wu, Yifei Yu, Xiaoyang Lyu, Yihua Huang, Bo Wang, Baoheng Zhang, Zhongrui Wang, Xiaojuan QiComments: Accepted at NeurIPS 2025 (spotlight)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [645] arXiv:2512.00765 [pdf, ps, other]
-
Title: The Outline of Deception: Physical Adversarial Attacks on Traffic Signs Using Edge PatchesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [646] arXiv:2512.00762 [pdf, ps, other]
-
Title: Seeing the Wind from a Falling LeafAuthors: Zhiyuan Gao, Jiageng Mao, Hong-Xing Yu, Haozhe Lou, Emily Yue-Ting Jia, Jernej Barbic, Jiajun Wu, Yue WangComments: Accepted at NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [647] arXiv:2512.00752 [pdf, ps, other]
-
Title: Charts Are Not Images: On the Challenges of Scientific Chart EditingAuthors: Shawn Li, Ryan Rossi, Sungchul Kim, Sunav Choudhary, Franck Dernoncourt, Puneet Mathur, Zhengzhong Tu, Yue ZhaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [648] arXiv:2512.00748 [pdf, ps, other]
-
Title: Probabilistic Modeling of Multi-rater Medical Image Segmentation for Diversity and PersonalizationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [649] arXiv:2512.00744 [pdf, ps, other]
-
Title: Joint Multi-scale Gated Transformer and Prior-guided Convolutional Network for Learned Image CompressionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [650] arXiv:2512.00743 [pdf, ps, other]
-
Title: Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple RewardsAuthors: Qiang Lyu, Zicong Chen, Chongxiao Wang, Haolin Shi, Shibo Gao, Ran Piao, Youwei Zeng, Jianlou Si, Fei Ding, Jing Li, Chun Pong Lau, Weiqiang WangComments: 20 pages, 15 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [651] arXiv:2512.00723 [pdf, ps, other]
-
Title: TrajDiff: End-to-end Autonomous Driving without Perception AnnotationAuthors: Xingtai Gui, Jianbo Zhao, Wencheng Han, Jikai Wang, Jiahao Gong, Feiyang Tan, Cheng-zhong Xu, Jianbing ShenSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [652] arXiv:2512.00718 [pdf, ps, other]
-
Title: RS-ISRefiner: Towards Better Adapting Vision Foundation Models for Interactive Segmentation of Remote Sensing ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [653] arXiv:2512.00714 [pdf, ps, other]
-
Title: Deep Learning-Based Computer Vision Models for Early Cancer Detection Using Multimodal Medical Imaging and Radiogenomic Integration FrameworksAuthors: Emmanuella Avwerosuoghene OghenekaroJournal-ref: International Journal of Computer Applications Technology and Research, vol. 14, no. 11, pp. 1-14, 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [654] arXiv:2512.00706 [pdf, ps, other]
-
Title: Optimizing LVLMs with On-Policy Data for Effective Hallucination MitigationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [655] arXiv:2512.00700 [pdf, ps, other]
-
Title: CAR-Net: A Cascade Refinement Network for Rotational Motion Deblurring under Angle Information UncertaintyComments: Accepted to AAIML 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [656] arXiv:2512.00694 [pdf, ps, other]
-
Title: Affordance-First Decomposition for Continual Learning in Video-Language UnderstandingComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [657] arXiv:2512.00691 [pdf, ps, other]
-
Title: Silhouette-based Gait Foundation ModelAuthors: Dingqiang Ye, Chao Fan, Kartik Narayan, Bingzhe Wu, Chengwen Luo, Jianqiang Li, Vishal M. PatelSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [658] arXiv:2512.00677 [pdf, ps, other]
-
Title: Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion TransformerComments: 4D Scene EditingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [659] arXiv:2512.00676 [pdf, ps, other]
-
Title: Realistic Handwritten Multi-Digit Writer (MDW) Number Recognition ChallengesAuthors: Kiri L. WagstaffComments: 10 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [660] arXiv:2512.00647 [pdf, ps, other]
-
Title: MambaScope: Coarse-to-Fine Scoping for Efficient Vision MambaSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [661] arXiv:2512.00641 [pdf, ps, other]
-
Title: Graph-Attention Network with Adversarial Domain Alignment for Robust Cross-Domain Facial Expression RecognitionComments: 17 pages, 5 figures. Accepted at the 17th Asian Conference on Machine Learning (ACML 2025), Taipei, Taiwan, December 9-12, 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [662] arXiv:2512.00639 [pdf, ps, other]
-
Title: Doppler-Enhanced Deep Learning: Improving Thyroid Nodule Segmentation with YOLOv5 Instance SegmentationAuthors: Mahmoud El HussieniSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Performance (cs.PF)
- [663] arXiv:2512.00626 [pdf, ps, other]
-
Title: XAI-Driven Skin Disease Classification: Leveraging GANs to Augment ResNet-50 PerformanceSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [664] arXiv:2512.00625 [pdf, ps, other]
-
Title: Automatic Pith Detection in Tree Cross-Section Images Using Deep LearningComments: 8 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [665] arXiv:2512.00597 [pdf, ps, other]
-
Title: Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [666] arXiv:2512.00582 [pdf, ps, other]
-
Title: SatireDecoder: Visual Cascaded Decoupling for Enhancing Satirical Image ComprehensionAuthors: Yue Jiang, Haiwei Xue, Minghao Han, Mingcheng Li, Xiaolu Hou, Dingkang Yang, Lihua Zhang, Xu ZhengComments: Accepted by AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [667] arXiv:2512.00572 [pdf, ps, other]
-
Title: Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning ModelsAuthors: Mohammed Mohiuddin, Syed Mohammod Minhaz Hossain, Sumaiya Khanam, Prionkar Barua, Aparup Barua, MD Tamim HossainSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [668] arXiv:2512.00565 [pdf, ps, other]
-
Title: Describe Anything Anywhere At Any MomentComments: 14 pages, 5 figures, 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [669] arXiv:2512.00557 [pdf, ps, other]
-
Title: NeuroVolve: Evolving Visual Stimuli toward Programmable Neural ObjectivesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [670] arXiv:2512.00547 [pdf, ps, other]
-
Title: Asset-Driven Sematic Reconstruction of Dynamic Scene with Multi-Human-Object InteractionsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [671] arXiv:2512.00539 [pdf, ps, other]
-
Title: SAIDO: Generalizable Detection of AI-Generated Images via Scene-Aware and Importance-Guided Dynamic Optimization in Continual LearningComments: 17 pages, 19 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [672] arXiv:2512.00534 [pdf, ps, other]
-
Title: Cross-Temporal 3D Gaussian Splatting for Sparse-View Guided Scene UpdateComments: AAAI2026 acceptedSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [673] arXiv:2512.00532 [pdf, ps, other]
-
Title: Image Generation as a Visual Planner for Robotic ManipulationAuthors: Ye PangComments: 11 pages 9 figures Under review at CVPR 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [674] arXiv:2512.00514 [pdf, ps, other]
-
Title: Terrain Sensing with Smartphone Structured Light: 2D Dynamic Time Warping for Grid Pattern MatchingAuthors: Tanaka NobuakiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [675] arXiv:2512.00493 [pdf, ps, other]
-
Title: CC-FMO: Camera-Conditioned Zero-Shot Single Image to 3D Scene Generation with Foundation Model OrchestrationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [676] arXiv:2512.00489 [pdf, ps, other]
-
Title: Learning What Helps: Task-Aligned Context Selection for Vision TasksSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [677] arXiv:2512.00475 [pdf, ps, other]
-
Title: Structured Context Learning for Generic Event Boundary DetectionAuthors: Xin Gu, Congcong Li, Xinyao Wang, Dexiang Hong, Libo Zhang, Tiejian Luo, Longyin Wen, Heng FanSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [678] arXiv:2512.00473 [pdf, ps, other]
-
Title: RealGen: Photorealistic Text-to-Image Generation via Detector-Guided RewardsAuthors: Junyan Ye, Leiqi Zhu, Yuncheng Guo, Dongzhi Jiang, Zilong Huang, Yifan Zhang, Zhiyuan Yan, Haohuan Fu, Conghui He, Weijia LiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [679] arXiv:2512.00456 [pdf, ps, other]
-
Title: CausalAffect: Causal Discovery for Facial Affective UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [680] arXiv:2512.00450 [pdf, ps, other]
-
Title: RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources ApplicationsAuthors: Amit Kumar Gupta, Farhan Sheth, Hammad Shaikh, Dheeraj Kumar, Angkul Puniya, Deepak Panwar, Sandeep Chaurasia, Priya MathurComments: 20 pages, 10 figures, 10 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [681] arXiv:2512.00438 [pdf, ps, other]
-
Title: FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward SignalSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [682] arXiv:2512.00428 [pdf, ps, other]
-
Title: Recognizing Pneumonia in Real-World Chest X-rays with a Classifier Trained with Images Synthetically Generated by Nano BananaComments: 9 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [683] arXiv:2512.00425 [pdf, ps, other]
-
Title: What about gravity in video generation? Post-Training Newton's Laws with Verifiable RewardsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [684] arXiv:2512.00424 [pdf, ps, other]
-
Title: Recovering Origin Destination Flows from Bus CCTV: Early Results from Nairobi and KigaliSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [685] arXiv:2512.00422 [pdf, ps, other]
-
Title: PhysGen: Physically Grounded 3D Shape Generation for Industrial DesignComments: 14 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [686] arXiv:2512.00413 [pdf, ps, other]
-
Title: SplatFont3D: Structure-Aware Text-to-3D Artistic Font Generation with Part-Level Style ControlSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [687] arXiv:2512.00408 [pdf, ps, other]
-
Title: Low-Bitrate Video Compression through Semantic-Conditioned DiffusionAuthors: Lingdong Wang, Guan-Ming Su, Divya Kothandaraman, Tsung-Wei Huang, Mohammad Hajiesmaili, Ramesh K. SitaramanSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [688] arXiv:2512.00395 [pdf, ps, other]
-
Title: Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask PredictionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [689] arXiv:2512.00387 [pdf, ps, other]
-
Title: WiseEdit: Benchmarking Cognition- and Creativity-Informed Image EditingAuthors: Kaihang Pan, Weile Chen, Haiyi Qiu, Qifan Yu, Wendong Bu, Zehan Wang, Yun Zhu, Juncheng Li, Siliang TangComments: 32 pages, 20 figures. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [690] arXiv:2512.00385 [pdf, ps, other]
-
Title: EZ-SP: Fast and Lightweight Superpoint-Based 3D SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [691] arXiv:2512.00381 [pdf, ps, other]
-
Title: Pore-scale Image Patch Dataset and A Comparative Evaluation of Pore-scale Facial FeaturesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [692] arXiv:2512.00369 [pdf, ps, other]
-
Title: POLARIS: Projection-Orthogonal Least Squares for Robust and Adaptive Inversion in Diffusion ModelsAuthors: Wenshuo Chen, Haosen Li, Shaofeng Liang, Lei Wang, Haozhe Jia, Kaishen Yuan, Jieming Wu, Bowen Tian, Yutao YueSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [693] arXiv:2512.00368 [pdf, ps, other]
-
Title: THCRL: Trusted Hierarchical Contrastive Representation Learning for Multi-View ClusteringAuthors: Jian ZhuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [694] arXiv:2512.00365 [pdf, ps, other]
-
Title: Towards aligned body representations in vision modelsComments: Andrea Procopio and Andrey Gizdov have equal contributionsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [695] arXiv:2512.00363 [pdf, ps, other]
-
Title: MM-DETR: An Efficient Multimodal Detection Transformer with Mamba-Driven Dual-Granularity Fusion and Frequency-Aware Modality AdaptersComments: Manuscript submitted to IEEE Transactions on Geoscience and Remote SensingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [696] arXiv:2512.00355 [pdf, ps, other]
-
Title: SMamDiff: Spatial Mamba for Stochastic Human Motion PredictionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [697] arXiv:2512.00345 [pdf, ps, other]
-
Title: mmPred: Radar-based Human Motion Prediction in the DarkComments: This paper is accepted by AAAI-2026Journal-ref: AAAI-2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [698] arXiv:2512.00343 [pdf, ps, other]
-
Title: Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [699] arXiv:2512.00336 [pdf, ps, other]
-
Title: MVAD : A Comprehensive Multimodal Video-Audio Dataset for AIGC DetectionComments: 7 pages,2 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [700] arXiv:2512.00327 [pdf, ps, other]
-
Title: Odometry Without Correspondence from Inertially Constrained Ruled SurfacesComments: 14 pages, 13 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [701] arXiv:2512.00310 [pdf, ps, other]
-
Title: ART-ASyn: Anatomy-aware Realistic Texture-based Anomaly Synthesis Framework for Chest X-RaysComments: Accepted in WACV2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [702] arXiv:2512.00308 [pdf, ps, other]
-
Title: Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset DistillationComments: NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [703] arXiv:2512.00300 [pdf, ps, other]
-
Title: TGSFormer: Scalable Temporal Gaussian Splatting for Embodied Semantic Scene CompletionComments: 14 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [704] arXiv:2512.00294 [pdf, ps, other]
-
Title: Words into World: A Task-Adaptive Agent for Language-Guided Spatial Retrieval in ARSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
- [705] arXiv:2512.00281 [pdf, ps, other]
-
Title: Rethinking Lung Cancer Screening: AI Nodule Detection and Diagnosis Outperforms Radiologists, Leading Models, and Standards Beyond Size and GrowthAuthors: Sylvain Bodard, Pierre Baudot, Benjamin Renoust, Charles Voyton, Gwendoline De Bie, Ezequiel Geremia, Van-Khoa Le, Danny Francis, Pierre-Henri Siot, Yousra Haddou, Vincent Bobin, Jean-Christophe Brisset, Carey C. Thomson, Valerie Bourdes, Benoit HuetComments: 25 pages, 8 figures, with supplementary information containing 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
- [706] arXiv:2512.00275 [pdf, ps, other]
-
Title: HIMOSA: Efficient Remote Sensing Image Super-Resolution with Hierarchical Mixture of Sparse AttentionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [707] arXiv:2512.00269 [pdf, ps, other]
-
Title: USB: Unified Synthetic Brain Framework for Bidirectional Pathology-Healthy Generation and EditingComments: 16 pages, 17 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [708] arXiv:2512.00264 [pdf, ps, other]
-
Title: HeartFormer: Semantic-Aware Dual-Structure Transformers for 3D Four-Chamber Cardiac Point Cloud ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [709] arXiv:2512.00261 [pdf, ps, other]
-
Title: UniDiff: Parameter-Efficient Adaptation of Diffusion Models for Land Cover Classification with Multi-Modal Remotely Sensed Imagery and Sparse AnnotationsComments: Camera-ready for WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [710] arXiv:2512.00255 [pdf, ps, other]
-
Title: Relightable Holoported Characters: Capturing and Relighting Dynamic Human Performance from Sparse ViewsAuthors: Kunwar Maheep Singh, Jianchun Chen, Vladislav Golyanik, Stephan J. Garbin, Thabo Beeler, Rishabh Dabral, Marc Habermann, Christian TheobaltSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [711] arXiv:2512.00226 [pdf, ps, other]
-
Title: DenseScan: Advancing 3D Scene Understanding with 2D Dense AnnotationComments: Workshop on Space in Vision, Language, and Embodied AI at NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [712] arXiv:2512.00208 [pdf, ps, other]
-
Title: ReactionMamba: Generating Short &Long Human Reaction SequencesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [713] arXiv:2512.00198 [pdf, ps, other]
-
Title: Mammo-FM: Breast-specific foundational model for Integrated Mammographic Diagnosis, Prognosis, and ReportingAuthors: Shantanu Ghosh, Vedant Parthesh Joshi, Rayan Syed, Aya Kassem, Abhishek Varshney, Payel Basak, Weicheng Dai, Judy Wawira Gichoya, Hari M. Trivedi, Imon Banerjee, Shyam Visweswaran, Clare B. Poynton, Kayhan BatmanghelichSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [714] arXiv:2512.00194 [pdf, ps, other]
-
Title: AutocleanEEG ICVision: Automated ICA Artifact Classification Using Vision-Language AIAuthors: Zag ElSayed, Grace Westerkamp, Gavin Gammoh, Yanchen Liu, Peyton Siekierski, Craig Erickson, Ernest PedapatiComments: 6 pages, 8 figuresJournal-ref: Conference ICMI2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
- [715] arXiv:2512.00179 [pdf, ps, other]
-
Title: Efficient Edge-Compatible CNN for Speckle-Based Material Recognition in Laser Cutting SystemsAuthors: Mohamed Abdallah Salem (North Dakota State University), Nourhan Zein Diab (New Mansoura University)Comments: Copyright 2025 IEEE. This is the author's version of the work that has been Accepted for publication in the Proceedings of the 2025 IEEE The 35th International Conference on Computer Theory and Applications (ICCTA 2025). Final published version will be available on IEEE XploreSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [716] arXiv:2512.00130 [pdf, ps, other]
-
Title: Local and Global Context-and-Object-part-Aware Superpixel-based Data Augmentation for Deep Visual RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [717] arXiv:2512.00129 [pdf, ps, other]
-
Title: Analysis of Incursive Breast Cancer in Mammograms Using YOLO, Explainability, and Domain AdaptationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [718] arXiv:2512.00125 [pdf, ps, other]
-
Title: Hybrid Synthetic Data Generation with Domain Randomization Enables Zero-Shot Vision-Based Part Inspection Under Extreme Class ImbalanceAuthors: Ruo-Syuan Mei, Sixian Jia, Guangze Li, Soo Yeon Lee, Brian Musser, William Keller, Sreten Zakula, Jorge Arinez, Chenhui ShaoComments: Submitted to the NAMRC 54Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [719] arXiv:2512.00117 [pdf, ps, other]
-
Title: TinyViT: Field Deployable Transformer Pipeline for Solar Panel Surface Fault and Severity ScreeningComments: 3pages, 2figures,ICGVIP 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [720] arXiv:2512.00103 [pdf, ps, other]
-
Title: Comparative Analysis of Vision Transformer, Convolutional, and Hybrid Architectures for Mental Health Classification Using Actigraphy-Derived ImagesAuthors: Ifeanyi OkalaSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [721] arXiv:2512.00091 [pdf, ps, other]
-
Title: Deep Filament Extraction for 3D Concrete PrintingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [722] arXiv:2512.00089 [pdf, ps, other]
-
Title: TeleViT1.0: Teleconnection-aware Vision Transformers for Subseasonal to Seasonal Wildfire Pattern ForecastsAuthors: Ioannis Prapas, Nikolaos Papadopoulos, Nikolaos-Ioannis Bountos, Dimitrios Michail, Gustau Camps-Valls, Ioannis PapoutsisComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [723] arXiv:2512.00088 [pdf, ps, other]
-
Title: SemImage: Semantic Image Representation for Text, a Novel Framework for Embedding Disentangled Linguistic FeaturesAuthors: Mohammad ZareSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [724] arXiv:2512.00087 [pdf, ps, other]
-
Title: Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom DataAuthors: Ivo Bueno, Ruikun Hou, Babette Bühler, Tim Fütterer, James Drimalla, Jonathan Kyle Foster, Peter Youngs, Peter Gerjets, Ulrich Trautwein, Enkelejda KasneciComments: This article has been accepted for publication in the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [725] arXiv:2512.00086 [pdf, ps, other]
-
Title: Multi-modal On-Device Learning for Monocular Depth Estimation on Ultra-low-power MCUsComments: 14 pages, 9 figures, 3 tables. Associated open-source release available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [726] arXiv:2512.00084 [pdf, ps, other]
-
Title: A Fast and Efficient Modern BERT based Text-Conditioned Diffusion Model for Medical Image SegmentationComments: 15 pages, 3 figures, Accepted in Slide 3 10th International Conference on Computer Vision & Image Processing (CVIP 2026)Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [727] arXiv:2512.00082 [pdf, ps, other]
-
Title: Exploring Diagnostic Prompting Approach for Multimodal LLM-based Visual Complexity Assessment: A Case Study of Amazon Search Result PagesComments: 9 pages, 4 figures, 9 tables. Study on diagnostic prompting for multimodal LLM-based visual complexity assessment of Amazon search result pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [728] arXiv:2512.00080 [pdf, ps, other]
-
Title: Conceptual Evaluation of Deep Visual Stereo Odometry for the MARWIN Radiation Monitoring Robot in Accelerator TunnelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [729] arXiv:2512.00078 [pdf, ps, other]
-
Title: Diffusion-Based Synthetic Brightfield Microscopy Images for Enhanced Single Cell DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [730] arXiv:2512.00075 [pdf, ps, other]
-
Title: Adapter Shield: A Unified Framework with Built-in Authentication for Preventing Unauthorized Zero-Shot Image-to-Image GenerationAuthors: Jun Jia, Hongyi Miao, Yingjie Zhou, Wangqiu Zhou, Jianbo Zhang, Linhan Cao, Dandan Zhu, Hua Yang, Xiongkuo Min, Wei Sun, Guangtao ZhaiSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [731] arXiv:2512.00073 [pdf, ps, other]
-
Title: ProvRain: Rain-Adaptive Denoising and Vehicle Detection via MobileNet-UNet and Faster R-CNNSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [732] arXiv:2512.00065 [pdf, ps, other]
-
Title: Satellite to Street : Disaster Impact EstimatorComments: 11 pages,9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [733] arXiv:2512.00061 [pdf, ps, other]
-
Title: DL-CapsNet: A Deep and Light Capsule NetworkSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [734] arXiv:2512.00060 [pdf, ps, other]
-
Title: PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous DrivingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [735] arXiv:2512.00042 [pdf, ps, other]
-
Title: Closing the Gap: Data-Centric Fine-Tuning of Vision Language Models for the Standardized Exam QuestionsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
- [736] arXiv:2512.00008 [pdf, ps, other]
-
Title: MOTION: ML-Assisted On-Device Low-Latency Motion RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
- [737] arXiv:2512.02020 (cross-list from cs.RO) [pdf, ps, other]
-
Title: EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AIComments: Accepted by AAAI 2026. Project Page: this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [738] arXiv:2512.01993 (cross-list from cs.RO) [pdf, ps, other]
-
Title: RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving PoliciesAuthors: Guillermo Garcia-Cobo, Maximilian Igl, Peter Karkus, Zhejun Zhang, Michael Watson, Yuxiao Chen, Boris Ivanovic, Marco PavoneComments: PreprintSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [739] arXiv:2512.01979 (cross-list from cs.AI) [pdf, ps, other]
-
Title: Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference FeedbackSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [740] arXiv:2512.01946 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language ModelsComments: Code, Data, and Models available at this https URL The paper contains 8 pages, 9 figures, 6 tablesSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [741] arXiv:2512.01913 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Disentangling Progress in Medical Image Registration: Beyond Trend-Driven Architectures towards Domain-Specific StrategiesAuthors: Bailiang Jian, Jiazhen Pan, Rohit Jena, Morteza Ghahremani, Hongwei Bran Li, Daniel Rueckert, Christian Wachinger, Benedikt WiestlerComments: Submitted to Medical Image Analysis. Journal Extension of arXiv:2407.19274Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [742] arXiv:2512.01822 (cross-list from cs.CL) [pdf, ps, other]
-
Title: InnoGym: Benchmarking the Innovation Potential of AI AgentsAuthors: Jintian Zhang, Kewei Xu, Jingsheng Zheng, Zhuoyun Yu, Yuqi Zhu, Yujie Luo, Lanning Wei, Shuofei Qiao, Lun Du, Da Zheng, Shumin Deng, Huajun Chen, Ningyu ZhangComments: Work in progressSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
- [743] arXiv:2512.01818 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual LearningAuthors: Lama Alssum, Hasan Abed Al Kader Hammoud, Motasem Alfarra, Juan C Leon Alcazar, Bernard GhanemSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [744] arXiv:2512.01687 (cross-list from cs.NE) [pdf, ps, other]
-
Title: Revisiting Direct Encoding: Learnable Temporal Dynamics for Static Image Spiking Neural NetworksAuthors: Huaxu HeSubjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
- [745] arXiv:2512.01550 (cross-list from cs.RO) [pdf, ps, other]
-
Title: NavForesee: A Unified Vision-Language World Model for Hierarchical Planning and Dual-Horizon Navigation PredictionSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [746] arXiv:2512.01461 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task MergingSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [747] arXiv:2512.01329 (cross-list from cs.GR) [pdf, ps, other]
-
Title: TagSplat: Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and TrackingSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
- [748] arXiv:2512.01324 (cross-list from hep-ex) [pdf, ps, other]
-
Title: Panda: Self-distillation of Reusable Sensor-level Representations for High Energy PhysicsComments: 23 pages, 15 figures, preprint. Project page at this https URLSubjects: High Energy Physics - Experiment (hep-ex); Computer Vision and Pattern Recognition (cs.CV)
- [749] arXiv:2512.01252 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Efficient Training of Diffusion Mixture-of-Experts Models: A Practical RecipeAuthors: Yahui Liu, Yang Yue, Jingyuan Zhang, Chenxi Sun, Yang Zhou, Wencong Zeng, Ruiming Tang, Guorui ZhouComments: 9 pages, 7 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [750] arXiv:2512.01181 (cross-list from cs.LG) [pdf, ps, other]
-
Title: First On-Orbit Demonstration of a Geospatial Foundation ModelAuthors: Andrew Du, Roberto Del Prete, Alejandro Mousist, Nick Manser, Fabrice Marre, Andrew Barton, Carl Seubert, Gabriele Meoni, Tat-Jun ChinSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [751] arXiv:2512.01152 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A Provably Efficient SolutionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [752] arXiv:2512.01104 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Estimation of Kinematic Motion from Dashcam FootageComments: 8 pages, 10 figuresSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [753] arXiv:2512.01061 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy TransferAuthors: Haoru Xue, Tairan He, Zi Wang, Qingwei Ben, Wenli Xiao, Zhengyi Luo, Xingye Da, Fernando Castañeda, Guanya Shi, Shankar Sastry, Linxi "Jim" Fan, Yuke ZhuComments: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [754] arXiv:2512.01009 (cross-list from cs.RO) [pdf, ps, other]
-
Title: FOM-Nav: Frontier-Object Maps for Object Goal NavigationComments: Project page: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [755] arXiv:2512.00883 (cross-list from cs.MM) [pdf, ps, other]
-
Title: Audio-Visual World Models: Towards Multisensory Imagination in Sight and SoundSubjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
- [756] arXiv:2512.00818 (cross-list from cs.AI) [pdf, ps, other]
-
Title: Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal ReasoningAuthors: Haozhen Gong, Xiaozhong Ji, Yuansen Liu, Wenbin Wu, Xiaoxiao Yan, Jingjing Liu, Kai Wu, Jiazhen Pan, Bailiang Jian, Jiangning Zhang, Xiaobin Hu, Hongwei Bran LiSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [757] arXiv:2512.00777 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Sign Language Recognition using Bidirectional Reservoir ComputingSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [758] arXiv:2512.00736 (cross-list from cs.LG) [pdf, ps, other]
-
Title: REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame TrajectoriesJournal-ref: Proceedings of the Conference on Language Modeling (COLM 2025)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [759] arXiv:2512.00659 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Fast, Robust, Permutation-and-Sign Invariant SO(3) Pattern AlignmentSubjects: Robotics (cs.RO); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV)
- [760] arXiv:2512.00403 (cross-list from cs.LG) [pdf, ps, other]
-
Title: SelfAI: Building a Self-Training AI System with LLM AgentsAuthors: Xiao Wu, Ting-Zhu Huang, Liang-Jian Deng, Xiaobing Yu, Yu Zhong, Shangqi Deng, Ufaq Khan, Jianghao Wu, Xiaofeng Liu, Imran Razzak, Xiaojun Chang, Yutong XieSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [761] arXiv:2512.00396 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Time-Series at the Edge: Tiny Separable CNNs for Wearable Gait Detection and Optimal Sensor PlacementAuthors: Andrea Procopio, Marco Esposito, Sara Raggiunto, Andrey Gizdov, Alberto Belli, Paola PierleoniSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [762] arXiv:2512.00350 (cross-list from eess.IV) [pdf, ps, other]
-
Title: MedCondDiff: Lightweight, Robust, Semantically Guided Diffusion for Medical Image SegmentationSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [763] arXiv:2512.00324 (cross-list from cs.RO) [pdf, ps, other]
-
Title: MILE: A Mechanically Isomorphic Exoskeleton Data Collection System with Fingertip Visuotactile Sensing for Dexterous ManipulationAuthors: Jinda Du, Jieji Ren, Qiaojun Yu, Ningbin Zhang, Yu Deng, Xingyu Wei, Yufei Liu, Guoying Gu, Xiangyang ZhuSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
- [764] arXiv:2512.00287 (cross-list from cs.RO) [pdf, ps, other]
-
Title: RealAppliance: Let High-fidelity Appliance Assets Controllable and Workable as Aligned Real ManualsAuthors: Yuzheng Gao, Yuxing Long, Lei Kang, Yuchong Guo, Ziyan Yu, Shangqing Mao, Jiyao Zhang, Ruihai Wu, Dongjiang Li, Hui Shen, Hao DongSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [765] arXiv:2512.00229 (cross-list from cs.LG) [pdf, ps, other]
-
Title: TIE: A Training-Inversion-Exclusion Framework for Visually Interpretable and Uncertainty-Guided Out-of-Distribution DetectionSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
- [766] arXiv:2512.00138 (cross-list from cs.AR) [pdf, ps, other]
-
Title: Ternary-Input Binary-Weight CNN Accelerator Design for Miniature Object Classification System with Query-Driven Spatial DVSAuthors: Yuyang Li, Swasthik Muloor, Jack Laudati, Nickolas Dematteis, Yidam Park, Hana Kim, Nathan Chang, Inhee LeeComments: 6 pages.12 figures & 2 tableSubjects: Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [767] arXiv:2512.00120 (cross-list from cs.SD) [pdf, ps, other]
-
Title: Art2Music: Generating Music for Art Images with Multi-modal Feeling AlignmentSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
- [768] arXiv:2512.00115 (cross-list from cs.SD) [pdf, ps, other]
-
Title: MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual LearningComments: 10 pages, 5 figuresSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [769] arXiv:2512.00094 (cross-list from cs.CR) [pdf, ps, other]
-
Title: HMARK: Radioactive Multi-Bit Semantic-Latent Watermarking for Diffusion ModelsSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
- [770] arXiv:2512.00076 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong LearningAuthors: Minghe Gao, Juncheng Li, Yuze Lin, Xuqi Liu, Jiaming Ji, Xiaoran Pan, Zihan Xu, Xian Li, Mingjie Li, Wei Ji, Rong Wei, Rui Tang, Qizhou Wang, Kai Shen, Jun Xiao, Qi Wu, Siliang Tang, Yueting ZhuangSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [771] arXiv:2512.00074 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot LearningAuthors: Qiwei Liang, Boyang Cai, Minghao Lai, Sitong Zhuang, Tao Lin, Yan Qin, Yixuan Ye, Jiaming Liang, Renjing XuSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [772] arXiv:2512.00052 (cross-list from physics.geo-ph) [pdf, ps, other]
-
Title: Coarse-to-Fine Non-Rigid Registration for Side-Scan Sonar MosaickingSubjects: Geophysics (physics.geo-ph); Computer Vision and Pattern Recognition (cs.CV)
- [773] arXiv:2512.00041 (cross-list from cs.RO) [pdf, ps, other]
-
Title: VISTAv2: World Imagination for Indoor Vision-and-Language NavigationComments: 11 pages, 5 figuresSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [774] arXiv:2512.00037 (cross-list from cs.RO) [pdf, ps, other]
-
Title: ICD-Net: Inertial Covariance Displacement Network for Drone Visual-Inertial SLAMSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [775] arXiv:2512.00027 (cross-list from cs.RO) [pdf, ps, other]
-
Title: A Survey on Improving Human Robot Collaboration through Vision-and-Language NavigationSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
- [776] arXiv:2512.00024 (cross-list from cs.RO) [pdf, ps, other]
- [777] arXiv:2512.00021 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open ChallengesComments: Under reviewSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [778] arXiv:2512.00019 (cross-list from cs.RO) [pdf, ps, other]
-
Title: A Comprehensive Survey on Surgical Digital TwinAuthors: Afsah Sharaf Khan, Falong Fan, Doohwan DH Kim, Abdurrahman Alshareef, Dong Chen, Justin Kim, Ernest Carter, Bo Liu, Jerzy W. Rozenblit, Bernard ZeiglerSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[ showing 778 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)