We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Vision and Pattern Recognition

Authors and titles for recent submissions

[ total of 778 entries: 1-778 ]
[ showing 778 entries per page: fewer | more ]

Mon, 8 Dec 2025

[1]  arXiv:2512.05965 [pdf, ps, other]
Title: EditThinker: Unlocking Iterative Reasoning for Any Image Editor
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[2]  arXiv:2512.05960 [pdf, ps, other]
Title: AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[3]  arXiv:2512.05941 [pdf, ps, other]
Title: Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding
Comments: Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[4]  arXiv:2512.05937 [pdf, ps, other]
Title: Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV Perception
Comments: 8 pages, 2 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[5]  arXiv:2512.05936 [pdf, ps, other]
Title: Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition
Comments: 8 pages, 8 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[6]  arXiv:2512.05928 [pdf, ps, other]
Title: A Comparative Study on Synthetic Facial Data Generation Techniques for Face Recognition
Comments: 18 pages, 17 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[7]  arXiv:2512.05927 [pdf, ps, other]
Title: World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[8]  arXiv:2512.05922 [pdf, ps, other]
Title: LPD: Learnable Prototypes with Diversity Regularization for Weakly Supervised Histopathology Segmentation
Comments: Note: Khang Le and Anh Mai Vu contributed equally
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[9]  arXiv:2512.05920 [pdf, ps, other]
Title: NICE: Neural Implicit Craniofacial Model for Orthognathic Surgery Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[10]  arXiv:2512.05905 [pdf, ps, other]
Title: SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[11]  arXiv:2512.05866 [pdf, ps, other]
Title: Underwater Image Reconstruction Using a Swin Transformer-Based Generator and PatchGAN Discriminator
Comments: This paper has been accepted for presentation at the IEEE 28th International Conference on Computer and Information Technology (ICCIT), December 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[12]  arXiv:2512.05859 [pdf, ps, other]
Title: Edit-aware RAW Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[13]  arXiv:2512.05853 [pdf, ps, other]
Title: VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[14]  arXiv:2512.05830 [pdf, ps, other]
Title: Phase-OTDR Event Detection Using Image-Based Data Transformation and Deep Learning
Comments: 22 pages, 11 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[15]  arXiv:2512.05814 [pdf, ps, other]
Title: UG-FedDA: Uncertainty-Guided Federated Domain Adaptation for Multi-Center Alzheimer's Disease Detection
Comments: The code is already available on GitHub: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[16]  arXiv:2512.05809 [pdf, ps, other]
Title: Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling
Comments: Extended abstract at World Modeling Workshop 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[17]  arXiv:2512.05802 [pdf, ps, other]
Title: Bring Your Dreams to Life: Continual Text-to-Video Customization
Comments: Accepted to AAAI2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[18]  arXiv:2512.05783 [pdf, ps, other]
Title: Curvature-Regularized Variational Autoencoder for 3D Scene Reconstruction from Sparse Depth
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[19]  arXiv:2512.05774 [pdf, ps, other]
Title: Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding
Comments: Website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[20]  arXiv:2512.05762 [pdf, ps, other]
Title: FNOPT: Resolution-Agnostic, Self-Supervised Cloth Simulation using Meta-Optimization with Fourier Neural Operators
Comments: Accepted for WACV
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[21]  arXiv:2512.05759 [pdf, ps, other]
Title: Label-Efficient Point Cloud Segmentation with Active Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[22]  arXiv:2512.05754 [pdf, ps, other]
Title: USV: Unified Sparsification for Accelerating Video Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[23]  arXiv:2512.05746 [pdf, ps, other]
Title: HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[24]  arXiv:2512.05740 [pdf, ps, other]
Title: Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[25]  arXiv:2512.05710 [pdf, ps, other]
Title: Manifold-Aware Point Cloud Completion via Geodesic-Attentive Hierarchical Feature Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[26]  arXiv:2512.05698 [pdf, ps, other]
Title: OWL: Unsupervised 3D Object Detection by Occupancy Guided Warm-up and Large Model Priors Reasoning
Comments: The 40th Annual AAAI Conference on Artificial Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[27]  arXiv:2512.05683 [pdf, ps, other]
Title: Physics-Informed Graph Neural Network with Frequency-Aware Learning for Optical Aberration Correction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
[28]  arXiv:2512.05674 [pdf, ps, other]
Title: Hyperspectral Unmixing with 3D Convolutional Sparse Coding and Projected Simplex Volume Maximization
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[29]  arXiv:2512.05672 [pdf, ps, other]
Title: InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[30]  arXiv:2512.05669 [pdf, ps, other]
Title: Deep Learning-Based Real-Time Sequential Facial Expression Analysis Using Geometric Features
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[31]  arXiv:2512.05663 [pdf, ps, other]
Title: LeAD-M3D: Leveraging Asymmetric Distillation for Real-time Monocular 3D Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[32]  arXiv:2512.05651 [pdf, ps, other]
Title: Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[33]  arXiv:2512.05635 [pdf, ps, other]
Title: Experts-Guided Unbalanced Optimal Transport for ISP Learning from Unpaired and/or Paired Data
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[34]  arXiv:2512.05613 [pdf, ps, other]
Title: DistillFSS: Synthesizing Few-Shot Knowledge into a Lightweight Segmentation Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[35]  arXiv:2512.05610 [pdf, ps, other]
Title: NormalView: sensor-agnostic tree species classification from backpack and aerial lidar data using geometric projections
Comments: 19 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[36]  arXiv:2512.05597 [pdf, ps, other]
Title: Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction
Comments: 10 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[37]  arXiv:2512.05593 [pdf, ps, other]
Title: Learning High-Fidelity Cloth Animation via Skinning-Free Image Transfer
Comments: Accepted to 3DV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[38]  arXiv:2512.05571 [pdf, ps, other]
Title: MedDIFT: Multi-Scale Diffusion-Based Correspondence in 3D Medical Imaging
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[39]  arXiv:2512.05564 [pdf, ps, other]
Title: ProPhy: Progressive Physical Alignment for Dynamic World Simulation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[40]  arXiv:2512.05557 [pdf, ps, other]
Title: 2K-Characters-10K-Stories: A Quality-Gated Stylized Narrative Dataset with Disentangled Control and Sequence Consistency
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[41]  arXiv:2512.05546 [pdf, ps, other]
Title: Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models
Comments: 6 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[42]  arXiv:2512.05539 [pdf, ps, other]
Title: Ideal Observer for Segmentation of Dead Leaves Images
Comments: 41 pages, 16 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Methodology (stat.ME)
[43]  arXiv:2512.05529 [pdf, ps, other]
Title: See in Depth: Training-Free Surgical Scene Segmentation with Monocular Depth Priors
Comments: The first two authors contributed equally
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[44]  arXiv:2512.05524 [pdf, ps, other]
Title: VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[45]  arXiv:2512.05515 [pdf, ps, other]
Title: DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis
Comments: Accepted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[46]  arXiv:2512.05513 [pdf, ps, other]
Title: Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[47]  arXiv:2512.05511 [pdf, ps, other]
Title: Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[48]  arXiv:2512.05494 [pdf, ps, other]
Title: Decoding with Structured Awareness: Integrating Directional, Frequency-Spatial, and Structural Attention for Medical Image Segmentation
Comments: Accepted to AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[49]  arXiv:2512.05492 [pdf, ps, other]
Title: WaterWave: Bridging Underwater Image Enhancement into Video Streams via Wavelet-based Temporal Consistency Field
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[50]  arXiv:2512.05482 [pdf, ps, other]
Title: Concept-based Explainable Data Mining with VLM for 3D Detection
Authors: Mai Tsujimoto
Comments: 28 pages including appendix. Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[51]  arXiv:2512.05481 [pdf, ps, other]
Title: UniFS: Unified Multi-Contrast MRI Reconstruction via Frequency-Spatial Fusion
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[52]  arXiv:2512.05478 [pdf, ps, other]
Title: EmoStyle: Emotion-Driven Image Stylization
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[53]  arXiv:2512.05468 [pdf, ps, other]
Title: University Building Recognition Dataset in Thailand for the mission-oriented IoT sensor system
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[54]  arXiv:2512.05446 [pdf, ps, other]
Title: TED-4DGS: Temporally Activated and Embedding-based Deformation for 4DGS Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[55]  arXiv:2512.05422 [pdf, ps, other]
Title: ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[56]  arXiv:2512.05418 [pdf, ps, other]
Title: Performance Evaluation of Deep Learning for Tree Branch Segmentation in Autonomous Forestry Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[57]  arXiv:2512.05415 [pdf, ps, other]
Title: Moving object detection from multi-depth images with an attention-enhanced CNN
Comments: 14 pages, 22 figures, submitted to PASJ
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[58]  arXiv:2512.05412 [pdf, ps, other]
Title: YOLO and SGBM Integration for Autonomous Tree Branch Detection and Depth Estimation in Radiata Pine Pruning Applications
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[59]  arXiv:2512.05410 [pdf, ps, other]
Title: Genetic Algorithms For Parameter Optimization for Disparity Map Generation of Radiata Pine Branch Images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[60]  arXiv:2512.05398 [pdf, ps, other]
Title: The Dynamic Prior: Understanding 3D Structures for Casual Dynamic Videos
Comments: Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[61]  arXiv:2512.05394 [pdf, ps, other]
Title: Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[62]  arXiv:2512.05391 [pdf, ps, other]
Title: LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models
Comments: 20 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[63]  arXiv:2512.05385 [pdf, ps, other]
Title: ShaRP: SHAllow-LayeR Pruning for Video Large Language Models Acceleration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[64]  arXiv:2512.05362 [pdf, ps, other]
Title: PoolNet: Deep Learning for 2D to 3D Video Process Validation
Comments: All code related to this paper can be found at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[65]  arXiv:2512.05359 [pdf, ps, other]
Title: Group Orthogonal Low-Rank Adaptation for RGB-T Tracking
Comments: 13 pages, 8 figures. Accepted by AAAI 2026. Extended version
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[66]  arXiv:2512.05354 [pdf, ps, other]
Title: SplatPainter: Interactive Authoring of 3D Gaussians from 2D Edits via Test-Time Training
Comments: project page this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[67]  arXiv:2512.05343 [pdf, ps, other]
Title: SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[68]  arXiv:2512.05277 [pdf, ps, other]
Title: From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[69]  arXiv:2512.05272 [pdf, ps, other]
Title: Inferring Compositional 4D Scenes without Ever Seeing One
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[70]  arXiv:2512.05268 [pdf, ps, other]
Title: CARD: Correlation Aware Restoration with Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[71]  arXiv:2512.05259 [pdf, ps, other]
Title: Age-Inclusive 3D Human Mesh Recovery for Action-Preserving Data Anonymization
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[72]  arXiv:2512.05240 [pdf, ps, other]
Title: IE2Video: Adapting Pretrained Diffusion Models for Event-Based Video Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[73]  arXiv:2512.05209 [pdf, ps, other]
Title: DEAR: Dataset for Evaluating the Aesthetics of RenderingDEAR: Dataset for Evaluating the Aesthetics of Rendering
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[74]  arXiv:2512.05198 [pdf, ps, other]
Title: Your Latent Mask is Wrong: Pixel-Equivalent Latent Compositing for Diffusion Models
Comments: 16 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[75]  arXiv:2512.05172 [pdf, ps, other]
Title: Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[76]  arXiv:2512.05152 [pdf, ps, other]
Title: EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models
Comments: 6pages, 5figures, published to 2025 IEEE International Conference on Multimedia and Expo (ICME), Nantes, France, 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[77]  arXiv:2512.05150 [pdf, ps, other]
Title: TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Comments: arxiv v0
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[78]  arXiv:2512.05145 [pdf, ps, other]
Title: Self-Improving VLM Judges Without Human Annotations
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[79]  arXiv:2512.05140 [pdf, other]
Title: FlowEO: Generative Unsupervised Domain Adaptation for Earth Observation
Authors: Georges Le Bellier (CEDRIC - VERTIGO, Cnam), Nicolas Audebert (LaSTIG, IGN, CEDRIC - VERTIGO)
Comments: 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Mar 2026, Tucson (AZ), United States
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[80]  arXiv:2512.05139 [pdf, ps, other]
Title: Spatiotemporal Satellite Image Downscaling with Transfer Encoders and Autoregressive Generative Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[81]  arXiv:2512.05137 [pdf, ps, other]
Title: ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[82]  arXiv:2512.05136 [pdf, ps, other]
Title: Fine-tuning an ECG Foundation Model to Predict Coronary CT Angiography Outcomes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[83]  arXiv:2512.05134 [pdf, ps, other]
Title: InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models
Authors: Zihao Wu
Comments: 8 pages main, 8 pages appendix, 16 figures, 5 tables. Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[84]  arXiv:2512.05132 [pdf, ps, other]
Title: Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[85]  arXiv:2512.05131 [pdf, ps, other]
Title: AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[86]  arXiv:2512.05959 (cross-list from cs.CL) [pdf, ps, other]
Title: M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG
Comments: Preprint
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[87]  arXiv:2512.05955 (cross-list from cs.RO) [pdf, ps, other]
Title: SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[88]  arXiv:2512.05932 (cross-list from cs.RO) [pdf, ps, other]
Title: Physically-Based Simulation of Automotive LiDAR
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[89]  arXiv:2512.05824 (cross-list from cs.AI) [pdf, ps, other]
Title: Multimodal Oncology Agent for IDH1 Mutation Prediction in Low-Grade Glioma
Authors: Hafsa Akebli (1), Adam Shephard (2), Vincenzo Della Mea (1), Nasir Rajpoot (2 and 3) ((1) University of Udine, Udine, Italy, (2) University of Warwick, Coventry, UK, (3) Histofy Ltd, Coventry, UK)
Comments: 4 pages, 2 figures
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[90]  arXiv:2512.05812 (cross-list from cs.RO) [pdf, ps, other]
Title: Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[91]  arXiv:2512.05665 (cross-list from cs.CL) [pdf, ps, other]
Title: Interleaved Latent Visual Reasoning with Selective Perceptual Modeling
Comments: 11 pages, 6 figures. Code available at this https URL
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[92]  arXiv:2512.05438 (cross-list from cs.HC) [pdf, ps, other]
Title: EXR: An Interactive Immersive EHR Visualization in Extended Reality
Comments: 11 pages, 6 figures. Preprint version. This paper has been accepted to IEEE ICIR 2025. This is the author-prepared version and not the final published version. The final version will appear in IEEE Xplo
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[93]  arXiv:2512.05299 (cross-list from eess.SY) [pdf, ps, other]
Title: ARCAS: An Augmented Reality Collision Avoidance System with SLAM-Based Tracking for Enhancing VRU Safety
Comments: 8 pages, 3 figures, 1 table
Subjects: Systems and Control (eess.SY); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET); Robotics (cs.RO); Image and Video Processing (eess.IV)
[94]  arXiv:2512.05126 (cross-list from eess.AS) [pdf, ps, other]
Title: SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)

Fri, 5 Dec 2025

[95]  arXiv:2512.05115 [pdf, ps, other]
Title: Light-X: Generative 4D Video Rendering with Camera and Illumination Control
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[96]  arXiv:2512.05113 [pdf, ps, other]
Title: Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting
Comments: WACV 2025. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[97]  arXiv:2512.05112 [pdf, ps, other]
Title: DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[98]  arXiv:2512.05111 [pdf, ps, other]
Title: ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[99]  arXiv:2512.05110 [pdf, ps, other]
Title: ShadowDraw: From Any Object to Shadow-Drawing Compositional Art
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[100]  arXiv:2512.05106 [pdf, ps, other]
Title: NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Robotics (cs.RO)
[101]  arXiv:2512.05104 [pdf, ps, other]
Title: EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency Modulation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[102]  arXiv:2512.05098 [pdf, ps, other]
Title: SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards
Authors: Yuan Gao, Jin Song
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[103]  arXiv:2512.05091 [pdf, ps, other]
Title: Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
Comments: Technical Report; Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[104]  arXiv:2512.05081 [pdf, ps, other]
Title: Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[105]  arXiv:2512.05079 [pdf, ps, other]
Title: Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[106]  arXiv:2512.05076 [pdf, ps, other]
Title: BulletTime: Decoupled Control of Time and Camera Pose for Video Generation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[107]  arXiv:2512.05060 [pdf, ps, other]
Title: 4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer
Comments: Code: this https URL, Webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[108]  arXiv:2512.05044 [pdf, ps, other]
Title: Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Comments: 18 Pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[109]  arXiv:2512.05039 [pdf, ps, other]
Title: Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding
Comments: Submitted for review CVPR-2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[110]  arXiv:2512.05025 [pdf, ps, other]
Title: RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[111]  arXiv:2512.05021 [pdf, ps, other]
Title: HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[112]  arXiv:2512.05016 [pdf, ps, other]
Title: Generative Neural Video Compression via Video Diffusion Prior
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[113]  arXiv:2512.05006 [pdf, ps, other]
Title: Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent Objects
Comments: conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[114]  arXiv:2512.05000 [pdf, ps, other]
Title: Reflection Removal through Efficient Adaptation of Diffusion Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[115]  arXiv:2512.04996 [pdf, ps, other]
Title: A dynamic memory assignment strategy for dilation-based ICP algorithm on embedded GPUs
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[116]  arXiv:2512.04981 [pdf, ps, other]
Title: Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[117]  arXiv:2512.04970 [pdf, ps, other]
Title: Stable Single-Pixel Contrastive Learning for Semantic and Geometric Tasks
Comments: UniReps Workshop 2025, 12 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[118]  arXiv:2512.04969 [pdf, ps, other]
Title: Rethinking the Use of Vision Transformers for AI-Generated Image Detection
Comments: Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[119]  arXiv:2512.04967 [pdf, ps, other]
Title: Balanced Few-Shot Episodic Learning for Accurate Retinal Disease Diagnosis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[120]  arXiv:2512.04963 [pdf, ps, other]
Title: GeoPE:A Unified Geometric Positional Embedding for Structured Tensors
Authors: Yupu Yao, Bowen Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[121]  arXiv:2512.04952 [pdf, ps, other]
Title: FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action Tokenization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[122]  arXiv:2512.04943 [pdf, ps, other]
Title: Towards Adaptive Fusion of Multimodal Deep Networks for Human Action Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[123]  arXiv:2512.04939 [pdf, ps, other]
Title: LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[124]  arXiv:2512.04927 [pdf, ps, other]
Title: Virtually Unrolling the Herculaneum Papyri by Diffeomorphic Spiral Fitting
Authors: Paul Henderson
Comments: Accepted at WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[125]  arXiv:2512.04926 [pdf, ps, other]
Title: Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[126]  arXiv:2512.04904 [pdf, ps, other]
Title: ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[127]  arXiv:2512.04890 [pdf, ps, other]
Title: Equivariant Symmetry-Aware Head Pose Estimation for Fetal MRI
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[128]  arXiv:2512.04888 [pdf, ps, other]
Title: You Only Train Once (YOTO): A Retraining-Free Object Detection Framework
Comments: This manuscript was first submitted to the Engineering (Elsevier Journal). The preprint version was posted to arXiv afterwards to facilitate open access and community feedback
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[129]  arXiv:2512.04883 [pdf, ps, other]
Title: SDG-Track: A Heterogeneous Observer-Follower Framework for High-Resolution UAV Tracking on Embedded Platforms
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[130]  arXiv:2512.04875 [pdf, ps, other]
Title: SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[131]  arXiv:2512.04862 [pdf, ps, other]
Title: Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing
Comments: * Equal contribution. Minor figure corrections compared to the ICCV 2025 version
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[132]  arXiv:2512.04857 [pdf, ps, other]
Title: Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[133]  arXiv:2512.04837 [pdf, ps, other]
Title: A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[134]  arXiv:2512.04832 [pdf, ps, other]
Title: Tokenizing Buildings: A Transformer for Layout Synthesis
Comments: 8 pages, 1 page References, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[135]  arXiv:2512.04830 [pdf, ps, other]
Title: FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis
Comments: Novel View Synthesis, Driving Scene, Free Trajectory, Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[136]  arXiv:2512.04821 [pdf, ps, other]
Title: LatentFM: A Latent Flow Matching Approach for Generative Medical Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[137]  arXiv:2512.04815 [pdf, ps, other]
Title: RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGS
Comments: arXiv admin note: substantial text overlap with arXiv:2506.02751
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[138]  arXiv:2512.04810 [pdf, ps, other]
Title: EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[139]  arXiv:2512.04786 [pdf, ps, other]
Title: LaFiTe: A Generative Latent Field for 3D Native Texturing
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[140]  arXiv:2512.04784 [pdf, ps, other]
Title: PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[141]  arXiv:2512.04761 [pdf, ps, other]
Title: Order Matters: 3D Shape Generation from Sequential VR Sketches
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[142]  arXiv:2512.04734 [pdf, ps, other]
Title: MT-Depth: Multi-task Instance feature analysis for the Depth Completion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[143]  arXiv:2512.04733 [pdf, ps, other]
Title: E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[144]  arXiv:2512.04728 [pdf, ps, other]
Title: Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[145]  arXiv:2512.04699 [pdf, ps, other]
Title: OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution
Comments: Accepted as TCSVT, 15 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[146]  arXiv:2512.04686 [pdf, ps, other]
Title: Towards Cross-View Point Correspondence in Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[147]  arXiv:2512.04678 [pdf, ps, other]
Title: Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[148]  arXiv:2512.04677 [pdf, ps, other]
Title: Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[149]  arXiv:2512.04660 [pdf, ps, other]
Title: I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[150]  arXiv:2512.04643 [pdf, ps, other]
Title: SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[151]  arXiv:2512.04619 [pdf, ps, other]
Title: Denoise to Track: Harnessing Video Diffusion Priors for Robust Correspondence
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[152]  arXiv:2512.04599 [pdf, ps, other]
Title: Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[153]  arXiv:2512.04597 [pdf, ps, other]
Title: When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question Answering
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[154]  arXiv:2512.04585 [pdf, ps, other]
Title: SAM3-I: Segment Anything with Instructions
Comments: Preliminary results; work in progress
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[155]  arXiv:2512.04581 [pdf, ps, other]
Title: Infrared UAV Target Tracking with Dynamic Feature Refinement and Global Contextual Attention Knowledge Distillation
Comments: Accepted by IEEE TMM
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[156]  arXiv:2512.04576 [pdf, ps, other]
Title: TARDis: Time Attenuated Representation Disentanglement for Incomplete Multi-Modal Tumor Segmentation and Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[157]  arXiv:2512.04568 [pdf, ps, other]
Title: Prompt2Craft: Generating Functional Craft Assemblies with LLMs
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[158]  arXiv:2512.04564 [pdf, ps, other]
Title: Dataset creation for supervised deep learning-based analysis of microscopic images -- review of important considerations and recommendations
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[159]  arXiv:2512.04563 [pdf, ps, other]
Title: COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[160]  arXiv:2512.04554 [pdf, ps, other]
Title: Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[161]  arXiv:2512.04542 [pdf, ps, other]
Title: Gaussian Entropy Fields: Driving Adaptive Sparsity in 3D Gaussian Optimization
Comments: 28 pages,11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[162]  arXiv:2512.04540 [pdf, ps, other]
Title: VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[163]  arXiv:2512.04537 [pdf, ps, other]
Title: X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[164]  arXiv:2512.04536 [pdf, ps, other]
Title: Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[165]  arXiv:2512.04534 [pdf, ps, other]
Title: Refaçade: Editing Object with Given Reference Texture
Authors: Youze Huang (1), Penghui Ruan (2), Bojia Zi (3), Xianbiao Qi (4), Jianan Wang (5), Rong Xiao (4) ((1) University of Electronic Science and Technology of China, (2) The Hong Kong Polytechnic University, (3) The Chinese University of Hong Kong, (4) IntelliFusion Inc., (5) Astribot Inc.)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[166]  arXiv:2512.04532 [pdf, ps, other]
Title: PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[167]  arXiv:2512.04528 [pdf, ps, other]
Title: Auto3R: Automated 3D Reconstruction and Scanning via Data-driven Uncertainty Quantification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[168]  arXiv:2512.04522 [pdf, ps, other]
Title: Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-Identification
Comments: 14 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[169]  arXiv:2512.04521 [pdf, ps, other]
Title: WiFi-based Cross-Domain Gesture Recognition Using Attention Mechanism
Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
[170]  arXiv:2512.04520 [pdf, ps, other]
Title: Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[171]  arXiv:2512.04519 [pdf, ps, other]
Title: VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[172]  arXiv:2512.04515 [pdf, ps, other]
Title: EgoLCD: Egocentric Video Generation with Long Context Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[173]  arXiv:2512.04511 [pdf, ps, other]
Title: DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[174]  arXiv:2512.04504 [pdf, ps, other]
Title: UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[175]  arXiv:2512.04499 [pdf, ps, other]
Title: Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[176]  arXiv:2512.04496 [pdf, ps, other]
Title: Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight Removal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[177]  arXiv:2512.04487 [pdf, ps, other]
Title: Controllable Long-term Motion Generation with Extended Joint Targets
Comments: WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[178]  arXiv:2512.04485 [pdf, ps, other]
Title: Not All Birds Look The Same: Identity-Preserving Generation For Birds
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[179]  arXiv:2512.04483 [pdf, ps, other]
Title: DeRA: Decoupled Representation Alignment for Video Tokenization
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[180]  arXiv:2512.04461 [pdf, ps, other]
Title: UniTS: Unified Time Series Generative Model for Remote Sensing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[181]  arXiv:2512.04459 [pdf, ps, other]
Title: dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[182]  arXiv:2512.04456 [pdf, ps, other]
Title: GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis
Comments: AAAI2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[183]  arXiv:2512.04451 [pdf, ps, other]
Title: StreamEQA: Towards Streaming Video Understanding for Embodied Scenarios
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[184]  arXiv:2512.04441 [pdf, ps, other]
Title: MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[185]  arXiv:2512.04426 [pdf, ps, other]
Title: Self-Paced and Self-Corrective Masked Prediction for Movie Trailer Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[186]  arXiv:2512.04425 [pdf, ps, other]
Title: Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[187]  arXiv:2512.04421 [pdf, ps, other]
Title: UTrice: Unifying Primitives in Differentiable Ray Tracing and Rasterization via Triangles for Particle-Based 3D Scenes
Comments: 13 pages, 10 figures, submitted to CVPR2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[188]  arXiv:2512.04413 [pdf, ps, other]
Title: Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection
Comments: 12 pages, 8 figures, 11 tables
Journal-ref: IEEE Transactions on Geoscience and Remote Sensing 63 (2025) 1-11
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[189]  arXiv:2512.04397 [pdf, ps, other]
Title: Performance Evaluation of Transfer Learning Based Medical Image Classification Techniques for Disease Detection
Journal-ref: 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Copenhagen, Denmark, 2025, pp. 1-5
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[190]  arXiv:2512.04395 [pdf, ps, other]
Title: Fourier-Attentive Representation Learning: A Fourier-Guided Framework for Few-Shot Generalization in Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[191]  arXiv:2512.04390 [pdf, ps, other]
Title: FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring
Comments: 20 pages, 15 figures. Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[192]  arXiv:2512.04358 [pdf, ps, other]
Title: MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[193]  arXiv:2512.04356 [pdf, ps, other]
Title: Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment
Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[194]  arXiv:2512.04331 [pdf, ps, other]
Title: Open Set Face Forgery Detection via Dual-Level Evidence Collection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[195]  arXiv:2512.04329 [pdf, ps, other]
Title: A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
[196]  arXiv:2512.04323 [pdf, ps, other]
Title: Bayes-DIC Net: Estimating Digital Image Correlation Uncertainty with Bayesian Neural Networks
Comments: 17 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
[197]  arXiv:2512.04315 [pdf, ps, other]
Title: SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[198]  arXiv:2512.04314 [pdf, ps, other]
Title: DisentangleFormer: Spatial-Channel Decoupling for Multi-Channel Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[199]  arXiv:2512.04313 [pdf, ps, other]
Title: Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding
Comments: 16 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[200]  arXiv:2512.04311 [pdf, ps, other]
Title: Real-time Cricket Sorting By Sex
Comments: 13 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
[201]  arXiv:2512.04309 [pdf, ps, other]
Title: Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction
Comments: Submitted to CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[202]  arXiv:2512.04305 [pdf, ps, other]
Title: How (Mis)calibrated is Your Federated CLIP and What To Do About It?
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[203]  arXiv:2512.04303 [pdf, ps, other]
Title: Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications
Comments: Accepted in 3DV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[204]  arXiv:2512.04284 [pdf, ps, other]
Title: Learning Single-Image Super-Resolution in the JPEG Compressed Domain
Comments: 7 pages, 4 figures, 2 tables, SEEDS Workshop, ICIP 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[205]  arXiv:2512.04283 [pdf, ps, other]
Title: Plug-and-Play Image Restoration with Flow Matching: A Continuous Viewpoint
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[206]  arXiv:2512.04282 [pdf, ps, other]
Title: Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion Transfer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[207]  arXiv:2512.04267 [pdf, ps, other]
Title: UniLight: A Unified Representation for Lighting
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[208]  arXiv:2512.04248 [pdf, ps, other]
Title: MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[209]  arXiv:2512.04238 [pdf, ps, other]
Title: 6 Fingers, 1 Kidney: Natural Adversarial Medical Images Reveal Critical Weaknesses of Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[210]  arXiv:2512.04222 [pdf, ps, other]
Title: ReasonX: MLLM-Guided Intrinsic Image Decomposition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[211]  arXiv:2512.04221 [pdf, ps, other]
Title: MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[212]  arXiv:2512.04219 [pdf, ps, other]
Title: Generalized Event Partonomy Inference with Structured Hierarchical Predictive Learning
Comments: 16 pages, 7 figures, 3 tables. Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[213]  arXiv:2512.04187 [pdf, ps, other]
Title: OnSight Pathology: A real-time platform-agnostic computational pathology companion for histopathology
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[214]  arXiv:2512.04175 [pdf, ps, other]
Title: Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[215]  arXiv:2512.05117 (cross-list from cs.LG) [pdf, ps, other]
Title: The Universal Weight Subspace Hypothesis
Comments: 37 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[216]  arXiv:2512.05116 (cross-list from cs.LG) [pdf, ps, other]
Title: Value Gradient Guidance for Flow Matching Alignment
Comments: Accepted at NeurIPS 2025; 26 pages, 20 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[217]  arXiv:2512.05114 (cross-list from cs.LG) [pdf, ps, other]
Title: Deep infant brain segmentation from multi-contrast MRI
Comments: 8 pages, 8 figures, 1 table, website at this https URL, presented at the 2025 IEEE Asilomar Conference on Signals, Systems, and Computers
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[218]  arXiv:2512.05103 (cross-list from cs.LG) [pdf, ps, other]
Title: TV2TV: A Unified Framework for Interleaved Language and Video Generation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[219]  arXiv:2512.05094 (cross-list from cs.RO) [pdf, ps, other]
Title: From Generated Human Videos to Physically Plausible Robot Trajectories
Comments: For project website, see this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[220]  arXiv:2512.04814 (cross-list from cs.SD) [pdf, ps, other]
Title: Shared Multi-modal Embedding Space for Face-Voice Association
Comments: Ranked 1st in Fame 2026 Challenge, ICASSP
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
[221]  arXiv:2512.04763 (cross-list from cs.LG) [pdf, ps, other]
Title: MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[222]  arXiv:2512.04705 (cross-list from cs.CC) [pdf, ps, other]
Title: Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators
Comments: Submitted to IEEE Transactions on Emerging Topics in Computing
Subjects: Computational Complexity (cs.CC); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)
[223]  arXiv:2512.04625 (cross-list from cs.LG) [pdf, ps, other]
Title: Rethinking Decoupled Knowledge Distillation: A Predictive Distribution Perspective
Comments: Accepted to IEEE TNNLS
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[224]  arXiv:2512.04556 (cross-list from cs.GR) [pdf, ps, other]
Title: Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex
Comments: 10 pages, 7 figures
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[225]  arXiv:2512.04464 (cross-list from cs.LG) [pdf, ps, other]
Title: Feature Engineering vs. Deep Learning for Automated Coin Grading: A Comparative Study on Saint-Gaudens Double Eagles
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[226]  arXiv:2512.04385 (cross-list from cs.LG) [pdf, ps, other]
Title: STeP-Diff: Spatio-Temporal Physics-Informed Diffusion Models for Mobile Fine-Grained Pollution Forecasting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[227]  arXiv:2512.04264 (cross-list from cs.LG) [pdf, ps, other]
Title: Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[228]  arXiv:2512.04092 (cross-list from physics.soc-ph) [pdf, ps, other]
Title: The changing surface of the world's roads
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
[229]  arXiv:2512.04087 (cross-list from q-bio.NC) [pdf, ps, other]
Title: Human-Centred Evaluation of Text-to-Image Generation Models for Self-expression of Mental Distress: A Dataset Based on GPT-4o
Authors: Sui He, Shenbin Qian
Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Thu, 4 Dec 2025

[230]  arXiv:2512.04085 [pdf, ps, other]
Title: Unique Lives, Shared World: Learning from Single-Life Videos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[231]  arXiv:2512.04084 [pdf, ps, other]
Title: SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[232]  arXiv:2512.04082 [pdf, ps, other]
Title: PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[233]  arXiv:2512.04069 [pdf, ps, other]
Title: SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[234]  arXiv:2512.04048 [pdf, ps, other]
Title: Stable Signer: Hierarchical Sign Language Generative Model
Comments: 12 pages, 7 figures. More Demo at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Computers and Society (cs.CY)
[235]  arXiv:2512.04040 [pdf, ps, other]
Title: RELIC: Interactive Video World Model with Long-Horizon Memory
Comments: 22 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[236]  arXiv:2512.04039 [pdf, ps, other]
Title: Fast & Efficient Normalizing Flows and Applications of Image Generative Models
Authors: Sandeep Nagar
Comments: PhD Thesis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[237]  arXiv:2512.04025 [pdf, ps, other]
Title: PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation
Comments: Tech report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[238]  arXiv:2512.04021 [pdf, ps, other]
Title: C3G: Learning Compact 3D Representations with 2K Gaussians
Comments: Project Page : this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[239]  arXiv:2512.04019 [pdf, ps, other]
Title: Ultra-lightweight Neural Video Representation Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[240]  arXiv:2512.04015 [pdf, ps, other]
Title: Learning Group Actions In Disentangled Latent Image Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[241]  arXiv:2512.04012 [pdf, ps, other]
Title: Emergent Outlier View Rejection in Visual Geometry Grounded Transformers
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[242]  arXiv:2512.04007 [pdf, ps, other]
Title: On the Temporality for Sketch Representation Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[243]  arXiv:2512.04000 [pdf, ps, other]
Title: Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[244]  arXiv:2512.03996 [pdf, ps, other]
Title: Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[245]  arXiv:2512.03992 [pdf, ps, other]
Title: DIQ-H: Evaluating Hallucination Persistence in VLMs Under Temporal Visual Degradation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[246]  arXiv:2512.03981 [pdf, ps, other]
Title: DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[247]  arXiv:2512.03979 [pdf, ps, other]
Title: BlurDM: A Blur Diffusion Model for Image Deblurring
Comments: NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[248]  arXiv:2512.03964 [pdf, ps, other]
Title: Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization
Comments: 17 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[249]  arXiv:2512.03963 [pdf, ps, other]
Title: TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[250]  arXiv:2512.03939 [pdf, ps, other]
Title: MUT3R: Motion-aware Updating Transformer for Dynamic 3D Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[251]  arXiv:2512.03932 [pdf, ps, other]
Title: Beyond the Ground Truth: Enhanced Supervision for Image Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[252]  arXiv:2512.03918 [pdf, ps, other]
Title: UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[253]  arXiv:2512.03905 [pdf, ps, other]
Title: Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence
Comments: Code: this https URL, Project: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[254]  arXiv:2512.03883 [pdf, ps, other]
Title: Dual Cross-Attention Siamese Transformer for Rectal Tumor Regrowth Assessment in Watch-and-Wait Endoscopy
Comments: 6 pages, 5 figures, 1 table, submitted to ISBI conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[255]  arXiv:2512.03869 [pdf, ps, other]
Title: An Automated Framework for Large-Scale Graph-Based Cerebrovascular Analysis
Comments: Submitted to ISBI 2026. 6 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
[256]  arXiv:2512.03862 [pdf, ps, other]
Title: Diminishing Returns in Self-Supervised Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[257]  arXiv:2512.03854 [pdf, ps, other]
Title: Prostate biopsy whole slide image dataset from an underrepresented Middle Eastern population
Comments: 13 pages, 2 figures and 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[258]  arXiv:2512.03852 [pdf, ps, other]
Title: Traffic Image Restoration under Adverse Weather via Frequency-Aware Mamba
Comments: 12pages, 13 figures, 5tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[259]  arXiv:2512.03848 [pdf, ps, other]
Title: PULSE: A Unified Multi-Task Architecture for Cardiac Segmentation, Diagnosis, and Few-Shot Cross-Modality Clinical Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[260]  arXiv:2512.03844 [pdf, ps, other]
Title: CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation
Comments: 34 pages, 24 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[261]  arXiv:2512.03837 [pdf, ps, other]
Title: Heatmap Pooling Network for Action Recognition from RGB Videos
Comments: Final Version of IEEE Transactions on Pattern Analysis and Machine Intelligence
Journal-ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[262]  arXiv:2512.03834 [pdf, ps, other]
Title: Lean Unet: A Compact Model for Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[263]  arXiv:2512.03827 [pdf, ps, other]
Title: A Robust Camera-based Method for Breath Rate Measurement
Comments: 9 pages, 4 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[264]  arXiv:2512.03817 [pdf, ps, other]
Title: HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to English
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[265]  arXiv:2512.03796 [pdf, ps, other]
Title: LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[266]  arXiv:2512.03794 [pdf, ps, other]
Title: AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition
Comments: 15 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[267]  arXiv:2512.03751 [pdf, ps, other]
Title: Research on Brain Tumor Classification Method Based on Improved ResNet34 Network
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[268]  arXiv:2512.03749 [pdf, ps, other]
Title: Fully Unsupervised Self-debiasing of Text-to-Image Diffusion Models
Comments: Accepted at WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[269]  arXiv:2512.03746 [pdf, ps, other]
Title: Thinking with Programming Vision: Towards a Unified View for Thinking with Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[270]  arXiv:2512.03745 [pdf, ps, other]
Title: Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[271]  arXiv:2512.03730 [pdf, ps, other]
Title: Out-of-the-box: Black-box Causal Attacks on Object Detectors
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[272]  arXiv:2512.03724 [pdf, ps, other]
Title: PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[273]  arXiv:2512.03715 [pdf, ps, other]
Title: DINO-RotateMatch: A Rotation-Aware Deep Framework for Robust Image Matching in Large-Scale 3D Reconstruction
Comments: 9 pages, 5 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[274]  arXiv:2512.03701 [pdf, ps, other]
Title: Structured Uncertainty Similarity Score (SUSS): Learning a Probabilistic, Interpretable, Perceptual Metric Between Images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[275]  arXiv:2512.03687 [pdf, ps, other]
Title: Active Visual Perception: Opportunities and Challenges
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[276]  arXiv:2512.03683 [pdf, ps, other]
Title: GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[277]  arXiv:2512.03673 [pdf, ps, other]
Title: ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[278]  arXiv:2512.03667 [pdf, ps, other]
Title: Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[279]  arXiv:2512.03666 [pdf, ps, other]
Title: ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos
Comments: 26 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[280]  arXiv:2512.03663 [pdf, ps, other]
Title: Multi-Scale Visual Prompting for Lightweight Small-Image Classification
Authors: Salim Khazem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[281]  arXiv:2512.03643 [pdf, ps, other]
Title: Optical Context Compression Is Just (Bad) Autoencoding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[282]  arXiv:2512.03640 [pdf, ps, other]
Title: MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention Mechanisms
Journal-ref: MultiMedia Modeling. MMM 2025. Lecture Notes in Computer Science, vol 15521. Springer, Singapore
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[283]  arXiv:2512.03625 [pdf, ps, other]
Title: FeatureLens: A Highly Generalizable and Interpretable Framework for Detecting Adversarial Examples Based on Image Features
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[284]  arXiv:2512.03621 [pdf, ps, other]
Title: ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[285]  arXiv:2512.03619 [pdf, ps, other]
Title: LAMP: Language-Assisted Motion Planning for Controllable Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[286]  arXiv:2512.03601 [pdf, ps, other]
Title: Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding
Comments: Accepted to NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[287]  arXiv:2512.03598 [pdf, ps, other]
Title: Memory-Guided Point Cloud Completion for Dental Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[288]  arXiv:2512.03597 [pdf, ps, other]
Title: HBFormer: A Hybrid-Bridge Transformer for Microtumor and Miniature Organ Segmentation
Comments: 6 pages, 4 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[289]  arXiv:2512.03593 [pdf, ps, other]
Title: CloseUpAvatar: High-Fidelity Animatable Full-Body Avatars with Mixture of Multi-Scale Textures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[290]  arXiv:2512.03592 [pdf, ps, other]
Title: Harnessing Hypergraphs in Geometric Deep Learning for 3D RNA Inverse Folding
Authors: Guang Yang, Lei Fan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[291]  arXiv:2512.03590 [pdf, ps, other]
Title: Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video Interpolation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[292]  arXiv:2512.03580 [pdf, ps, other]
Title: Dynamic Optical Test for Bot Identification (DOT-BI): A simple check to identify bots in surveys and online processes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
[293]  arXiv:2512.03577 [pdf, ps, other]
Title: Cross-Stain Contrastive Learning for Paired Immunohistochemistry and Histopathology Slide Representation Learning
Comments: 6 pages, 2 figures. Camera-ready version accepted for IEEE BIBM 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[294]  arXiv:2512.03575 [pdf, ps, other]
Title: UniComp: Rethinking Video Compression Through Informational Uniqueness
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[295]  arXiv:2512.03574 [pdf, ps, other]
Title: Global-Local Aware Scene Text Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[296]  arXiv:2512.03566 [pdf, ps, other]
Title: GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models
Comments: Accepted by ACM MM Asia2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[297]  arXiv:2512.03558 [pdf, ps, other]
Title: CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding
Comments: Accepted at SIGSPATIAL 2025 (Best paper candidates), 15 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[298]  arXiv:2512.03553 [pdf, ps, other]
Title: Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching
Comments: Accepted at KDD 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[299]  arXiv:2512.03542 [pdf, ps, other]
Title: V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[300]  arXiv:2512.03540 [pdf, ps, other]
Title: CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation
Comments: Accepted by ACM Multimedia 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[301]  arXiv:2512.03534 [pdf, ps, other]
Title: Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
Comments: Visualizations are available at the website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[302]  arXiv:2512.03532 [pdf, ps, other]
Title: OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[303]  arXiv:2512.03520 [pdf, ps, other]
Title: FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation
Comments: 15 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[304]  arXiv:2512.03510 [pdf, ps, other]
Title: CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[305]  arXiv:2512.03509 [pdf, ps, other]
Title: AfroBeats Dance Movement Analysis Using Computer Vision: A Proof-of-Concept Framework Combining YOLO and Segment Anything Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[306]  arXiv:2512.03508 [pdf, ps, other]
Title: Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation
Comments: ICCV 2025 (poster)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[307]  arXiv:2512.03500 [pdf, ps, other]
Title: EEA: Exploration-Exploitation Agent for Long Video Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[308]  arXiv:2512.03499 [pdf, ps, other]
Title: NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[309]  arXiv:2512.03479 [pdf, ps, other]
Title: Towards Object-centric Understanding for Instructional Videos
Authors: Wenliang Guo, Yu Kong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[310]  arXiv:2512.03477 [pdf, ps, other]
Title: Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma Diagnosis
Comments: 10 pages, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[311]  arXiv:2512.03474 [pdf, ps, other]
Title: Procedural Mistake Detection via Action Effect Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[312]  arXiv:2512.03470 [pdf, ps, other]
Title: Difference Decomposition Networks for Infrared Small Target Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[313]  arXiv:2512.03463 [pdf, ps, other]
Title: Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[314]  arXiv:2512.03454 [pdf, ps, other]
Title: Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[315]  arXiv:2512.03453 [pdf, ps, other]
Title: GeoVideo: Introducing Geometric Regularization into Video Generation Model
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[316]  arXiv:2512.03451 [pdf, ps, other]
Title: GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[317]  arXiv:2512.03450 [pdf, ps, other]
Title: KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[318]  arXiv:2512.03449 [src]
Title: LM-CartSeg: Automated Segmentation of Lateral and Medial Cartilage and Subchondral Bone for Radiomics Analysis
Authors: Tongxu Zhang
Comments: The manuscript represents only a preliminary and substantially incompleted exploration. The author has decided not to stand by these results, and a thoroughly revised and significantly different version will be developed separately. Therefore this version is withdrawn and should not be cited
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[319]  arXiv:2512.03445 [pdf, ps, other]
Title: Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation
Comments: 10 pages. Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[320]  arXiv:2512.03430 [pdf, ps, other]
Title: Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion Features
Comments: Accepted to the ICML 2025 TerraBytes Workshop (June 9, 2025)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[321]  arXiv:2512.03427 [pdf, ps, other]
Title: Generalization Evaluation of Deep Stereo Matching Methods for UAV-Based Forestry Applications
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[322]  arXiv:2512.03424 [pdf, ps, other]
Title: DM3D: Deformable Mamba via Offset-Guided Gaussian Sequencing for Point Cloud Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[323]  arXiv:2512.03418 [pdf, ps, other]
Title: YOLOA: Real-Time Affordance Detection via LLM Adapter
Comments: 13 pages, 9 figures, conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[324]  arXiv:2512.03405 [pdf, ps, other]
Title: ViDiC: Video Difference Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[325]  arXiv:2512.03404 [pdf, ps, other]
Title: MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[326]  arXiv:2512.03370 [pdf, ps, other]
Title: ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[327]  arXiv:2512.03369 [pdf, ps, other]
Title: FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread Forecasting
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[328]  arXiv:2512.03359 [pdf, ps, other]
Title: A Hybrid Deep Learning Framework with Explainable AI for Lung Cancer Classification with DenseNet169 and SVM
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[329]  arXiv:2512.03350 [pdf, ps, other]
Title: SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[330]  arXiv:2512.03346 [pdf, ps, other]
Title: Hierarchical Attention for Sparse Volumetric Anomaly Detection in Subclinical Keratoconus
Comments: 16 pages, 7 figures, 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[331]  arXiv:2512.03345 [pdf, ps, other]
Title: HalluGen: Synthesizing Realistic and Controllable Hallucinations for Evaluating Image Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[332]  arXiv:2512.03339 [pdf, ps, other]
Title: ProtoEFNet: Dynamic Prototype Learning for Inherently Interpretable Ejection Fraction Estimation in Echocardiography
Comments: 11 pages, Accepted in IMIMIC Workshop at MICCAI 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[333]  arXiv:2512.03335 [pdf, ps, other]
Title: Step-by-step Layered Design Generation
Journal-ref: AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[334]  arXiv:2512.03317 [pdf, ps, other]
Title: NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map Construction
Comments: Accepted to 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[335]  arXiv:2512.03284 [pdf, ps, other]
Title: SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[336]  arXiv:2512.03257 [pdf, ps, other]
Title: PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing Imagery
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[337]  arXiv:2512.03247 [pdf, ps, other]
Title: PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement
Comments: Published in the Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[338]  arXiv:2512.03245 [pdf, ps, other]
Title: 2-Shots in the Dark: Low-Light Denoising with Minimal Data Acquisition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[339]  arXiv:2512.03237 [pdf, ps, other]
Title: LLM-Guided Material Inference for 3D Point Clouds
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[340]  arXiv:2512.03233 [pdf, ps, other]
Title: Object Counting with GPT-4o and GPT-5: A Comparative Study
Comments: 5 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[341]  arXiv:2512.03210 [pdf, ps, other]
Title: Flux4D: Flow-based Unsupervised 4D Reconstruction
Comments: NeurIPS 2025. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[342]  arXiv:2512.03199 [pdf, ps, other]
Title: Does Head Pose Correction Improve Biometric Facial Recognition?
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[343]  arXiv:2512.03182 [pdf, ps, other]
Title: Drainage: A Unifying Framework for Addressing Class Uncertainty
Comments: 16 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[344]  arXiv:2512.03126 [pdf, ps, other]
Title: Hierarchical Process Reward Models are Symbolic Vision Learners
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[345]  arXiv:2512.04076 (cross-list from cs.GR) [pdf, ps, other]
Title: Radiance Meshes for Volumetric Reconstruction
Comments: Website: half-potato.gitlab.io/rm
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[346]  arXiv:2512.04032 (cross-list from cs.CL) [pdf, ps, other]
Title: Jina-VLM: Small Multilingual Vision Language Model
Comments: 18 pages, 1-7 main content, 13-18 appendix for tables and dataset
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[347]  arXiv:2512.03995 (cross-list from cs.RO) [pdf, ps, other]
Title: Artificial Microsaccade Compensation: Stable Vision for an Ornithopter
Comments: 29 pages, 5 figures, 2 tables, under review
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[348]  arXiv:2512.03962 (cross-list from eess.IV) [pdf, ps, other]
Title: Tada-DIP: Input-adaptive Deep Image Prior for One-shot 3D Image Reconstruction
Comments: 6 pages, 8 figures, 2025 Asilomar Conference on Signals, Systems, and Computers. Code is available at github.com/evanbell02/Tada-DIP/
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[349]  arXiv:2512.03656 (cross-list from cs.LG) [pdf, ps, other]
Title: Cyclical Temporal Encoding and Hybrid Deep Ensembles for Multistep Energy Forecasting
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[350]  arXiv:2512.03556 (cross-list from cs.RO) [pdf, ps, other]
Title: RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[351]  arXiv:2512.03522 (cross-list from cs.RO) [pdf, ps, other]
Title: MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global Localization
Comments: Accepted in IEEE Robotics and Automation Letters (2025)
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[352]  arXiv:2512.03514 (cross-list from cs.IR) [pdf, ps, other]
Title: M3DR: Towards Universal Multilingual Multimodal Document Retrieval
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[353]  arXiv:2512.03422 (cross-list from cs.RO) [pdf, ps, other]
Title: What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[354]  arXiv:2512.03216 (cross-list from physics.ins-det) [pdf, ps, other]
Title: Kaleidoscopic Scintillation Event Imaging
Subjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[355]  arXiv:2512.03173 (cross-list from cs.CY) [pdf, ps, other]
Title: Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping
Journal-ref: AAAI 2026 Social Impact Track
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[356]  arXiv:2512.03166 (cross-list from cs.RO) [pdf, ps, other]
Title: Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual Environments
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[357]  arXiv:2512.03111 (cross-list from q-bio.GN) [pdf, ps, other]
Title: PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer
Comments: Accepted by AAAI 2026
Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[358]  arXiv:2512.03054 (cross-list from cs.LG) [pdf, ps, other]
Title: Energy-Efficient Federated Learning via Adaptive Encoder Freezing for MRI-to-CT Conversion: A Green AI-Guided Research
Comments: 22 pages, 13 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Medical Physics (physics.med-ph)
[359]  arXiv:2512.03052 (cross-list from cs.GR) [pdf, ps, other]
Title: LATTICE: Democratize High-Fidelity 3D Generation at Scale
Comments: Technical Report
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

Wed, 3 Dec 2025

[360]  arXiv:2512.03046 [pdf, ps, other]
Title: MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
Comments: Code and demo available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[361]  arXiv:2512.03045 [pdf, ps, other]
Title: CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[362]  arXiv:2512.03043 [pdf, ps, other]
Title: OneThinker: All-in-one Reasoning Model for Image and Video
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[363]  arXiv:2512.03042 [pdf, ps, other]
Title: PPTArena: A Benchmark for Agentic PowerPoint Editing
Comments: 25 pages, 26 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[364]  arXiv:2512.03041 [pdf, ps, other]
Title: MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[365]  arXiv:2512.03040 [pdf, ps, other]
Title: Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation
Comments: Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[366]  arXiv:2512.03036 [pdf, ps, other]
Title: ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[367]  arXiv:2512.03034 [pdf, ps, other]
Title: MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation
Comments: Our project website is this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[368]  arXiv:2512.03020 [pdf, ps, other]
Title: Unrolled Networks are Conditional Probability Flows in MRI Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[369]  arXiv:2512.03018 [pdf, ps, other]
Title: AutoBrep: Autoregressive B-Rep Generation with Unified Topology and Geometry
Comments: Accepted to Siggraph Asia 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[370]  arXiv:2512.03014 [pdf, ps, other]
Title: Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks
Comments: NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[371]  arXiv:2512.03013 [pdf, ps, other]
Title: In-Context Sync-LoRA for Portrait Video Editing
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[372]  arXiv:2512.03010 [pdf, ps, other]
Title: SurfFill: Completion of LiDAR Point Clouds via Gaussian Surfel Splatting
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
[373]  arXiv:2512.03004 [pdf, ps, other]
Title: DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[374]  arXiv:2512.03000 [pdf, ps, other]
Title: DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[375]  arXiv:2512.02993 [pdf, ps, other]
Title: TEXTRIX: Latent Attribute Grid for Native Texture Generation and Beyond
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[376]  arXiv:2512.02991 [pdf, ps, other]
Title: GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[377]  arXiv:2512.02982 [pdf, ps, other]
Title: U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
Comments: Preprint; 19 pages, 7 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[378]  arXiv:2512.02981 [pdf, ps, other]
Title: InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration
Comments: Published in AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[379]  arXiv:2512.02973 [pdf, ps, other]
Title: Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[380]  arXiv:2512.02972 [pdf, ps, other]
Title: BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection
Comments: Accept by AAAI26
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[381]  arXiv:2512.02965 [pdf, ps, other]
Title: A Lightweight Real-Time Low-Light Enhancement Network for Embedded Automotive Vision Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[382]  arXiv:2512.02952 [pdf, ps, other]
Title: Layout Anything: One Transformer for Universal Room Layout Estimation
Comments: Published at WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[383]  arXiv:2512.02942 [pdf, ps, other]
Title: Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[384]  arXiv:2512.02933 [pdf, ps, other]
Title: LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[385]  arXiv:2512.02932 [pdf, ps, other]
Title: EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View Synthesis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[386]  arXiv:2512.02931 [pdf, ps, other]
Title: DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation
Comments: 23 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[387]  arXiv:2512.02906 [pdf, ps, other]
Title: MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[388]  arXiv:2512.02899 [pdf, ps, other]
Title: Glance: Accelerating Diffusion Models with 1 Sample
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[389]  arXiv:2512.02897 [pdf, ps, other]
Title: Polar Perspectives: Evaluating 2-D LiDAR Projections for Robust Place Recognition with Visual Foundation Models
Comments: 13 Pages, 5 Figures, 2 Tables Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[390]  arXiv:2512.02895 [pdf, ps, other]
Title: MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm
Comments: 33 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[391]  arXiv:2512.02870 [pdf, ps, other]
Title: Taming Camera-Controlled Video Generation with Verifiable Geometry Reward
Comments: 11 pages, 4 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[392]  arXiv:2512.02867 [pdf, ps, other]
Title: MICCAI STSR 2025 Challenge: Semi-Supervised Teeth and Pulp Segmentation and CBCT-IOS Registration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[393]  arXiv:2512.02860 [pdf, ps, other]
Title: RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association
Comments: Ranked 3rd in Fame 2026 Challenge, ICASSP
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[394]  arXiv:2512.02850 [pdf, ps, other]
Title: Are Detectors Fair to Indian IP-AIGC? A Cross-Generator Study
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[395]  arXiv:2512.02846 [pdf, ps, other]
Title: Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?
Comments: Accepted in WACV 2026 - Applications Track
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[396]  arXiv:2512.02835 [pdf, ps, other]
Title: ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[397]  arXiv:2512.02830 [pdf, ps, other]
Title: Defense That Attacks: How Robust Models Become Better Attackers
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[398]  arXiv:2512.02794 [pdf, ps, other]
Title: PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation
Comments: codes:this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[399]  arXiv:2512.02793 [pdf, ps, other]
Title: IC-World: In-Context Generation for Shared World Modeling
Comments: codes:this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[400]  arXiv:2512.02792 [pdf, ps, other]
Title: HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval
Comments: Accepted by ACM MM 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[401]  arXiv:2512.02790 [pdf, ps, other]
Title: UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits
Comments: 31 pages, 15 figures, 12 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[402]  arXiv:2512.02789 [pdf, ps, other]
Title: TrackNetV5: Residual-Driven Spatio-Temporal Refinement and Motion Direction Decoupling for Fast Object Tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[403]  arXiv:2512.02781 [pdf, ps, other]
Title: LumiX: Structured and Coherent Text-to-Intrinsic Generation
Comments: The code will be available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[404]  arXiv:2512.02780 [pdf, ps, other]
Title: Rethinking Surgical Smoke: A Smoke-Type-Aware Laparoscopic Video Desmoking Method and Dataset
Comments: 12 pages, 15 figures. Accepted to AAAI-26 (Main Technical Track)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[405]  arXiv:2512.02751 [pdf, ps, other]
Title: AttMetNet: Attention-Enhanced Deep Neural Network for Methane Plume Detection in Sentinel-2 Satellite Imagery
Comments: 15 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[406]  arXiv:2512.02743 [pdf, ps, other]
Title: Reasoning-Aware Multimodal Fusion for Hateful Video Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[407]  arXiv:2512.02737 [pdf, ps, other]
Title: Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone
Comments: Accepted at WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[408]  arXiv:2512.02727 [pdf, ps, other]
Title: DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions
Comments: Accepted to WACV 2026. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[409]  arXiv:2512.02715 [pdf, ps, other]
Title: GeoViS: Geospatially Rewarded Visual Search for Remote Sensing Visual Grounding
Comments: 11 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[410]  arXiv:2512.02702 [pdf, ps, other]
Title: Tissue-mask supported inter-subject whole-body image registration in the UK Biobank -- A method benchmarking study
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[411]  arXiv:2512.02700 [pdf, ps, other]
Title: VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[412]  arXiv:2512.02697 [pdf, ps, other]
Title: GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[413]  arXiv:2512.02696 [pdf, ps, other]
Title: ALDI-ray: Adapting the ALDI Framework for Security X-ray Object Detection
Comments: Submitted to ICASSP 2026 Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[414]  arXiv:2512.02686 [pdf, ps, other]
Title: ClimaOoD: Improving Anomaly Segmentation via Physically Realistic Synthetic Data
Authors: Yuxing Liu, Yong Liu
Comments: Under review;
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[415]  arXiv:2512.02685 [pdf, ps, other]
Title: Unsupervised Structural Scene Decomposition via Foreground-Aware Slot Attention with Pseudo-Mask Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[416]  arXiv:2512.02681 [pdf, ps, other]
Title: PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[417]  arXiv:2512.02668 [pdf, ps, other]
Title: UAUTrack: Towards Unified Multimodal Anti-UAV Visual Tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[418]  arXiv:2512.02664 [pdf, ps, other]
Title: PolarGuide-GSDR: 3D Gaussian Splatting Driven by Polarization Priors and Deferred Reflection for Real-World Reflective Scenes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[419]  arXiv:2512.02660 [pdf, ps, other]
Title: Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation
Comments: 13 pages, 1 figure, 2 tables. Open-source implementation available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
[420]  arXiv:2512.02650 [pdf, ps, other]
Title: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[421]  arXiv:2512.02648 [pdf, ps, other]
Title: PoreTrack3D: A Benchmark for Dynamic 3D Gaussian Splatting in Pore-Scale Facial Trajectory Tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[422]  arXiv:2512.02643 [pdf, ps, other]
Title: Leveraging Large-Scale Pretrained Spatial-Spectral Priors for General Zero-Shot Pansharpening
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[423]  arXiv:2512.02624 [pdf, ps, other]
Title: PPTBench: Towards Holistic Evaluation of Large Language Models for PowerPoint Layout and Design Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[424]  arXiv:2512.02622 [pdf, ps, other]
Title: RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[425]  arXiv:2512.02621 [pdf, ps, other]
Title: Content-Aware Texturing for Gaussian Splatting
Comments: Project Page: this https URL
Journal-ref: Eurographics Symposium on Rendering (Symposium Track), 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[426]  arXiv:2512.02576 [pdf, ps, other]
Title: Co-speech Gesture Video Generation via Motion-Based Graph Retrieval
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[427]  arXiv:2512.02566 [pdf, ps, other]
Title: From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[428]  arXiv:2512.02554 [pdf, ps, other]
Title: OmniPerson: Unified Identity-Preserving Pedestrian Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[429]  arXiv:2512.02541 [pdf, ps, other]
Title: AVGGT: Rethinking Global Attention for Accelerating VGGT
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[430]  arXiv:2512.02536 [pdf, ps, other]
Title: WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[431]  arXiv:2512.02520 [pdf, ps, other]
Title: On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection
Authors: Tai Le-Gia
Comments: PhD Dissertation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[432]  arXiv:2512.02517 [pdf, ps, other]
Title: SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[433]  arXiv:2512.02512 [pdf, ps, other]
Title: Two-Stage Vision Transformer for Image Restoration: Colorization Pretraining + Residual Upsampling
Comments: Accepted as a Tiny Paper at the 13th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP 2025), IIT Mandi, India. 3 pages, 1 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[434]  arXiv:2512.02505 [pdf, ps, other]
Title: GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[435]  arXiv:2512.02498 [pdf, ps, other]
Title: dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[436]  arXiv:2512.02497 [pdf, ps, other]
Title: A Large Scale Benchmark for Test Time Adaptation Methods in Medical Image Segmentation
Comments: 45 pages, 18 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[437]  arXiv:2512.02496 [pdf, ps, other]
Title: Attention-guided reference point shifting for Gaussian-mixture-based partial point set registration
Comments: 16 pages, 9 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[438]  arXiv:2512.02492 [pdf, ps, other]
Title: YingVideo-MV: Music-Driven Multi-Stage Video Generation
Comments: 18 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[439]  arXiv:2512.02487 [pdf, ps, other]
Title: Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[440]  arXiv:2512.02485 [pdf, ps, other]
Title: UCAgents: Unidirectional Convergence for Visual Evidence Anchored Multi-Agent Medical Decision-Making
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[441]  arXiv:2512.02482 [pdf, ps, other]
Title: G-SHARP: Gaussian Surgical Hardware Accelerated Real-time Pipeline
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[442]  arXiv:2512.02473 [pdf, ps, other]
Title: WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[443]  arXiv:2512.02469 [pdf, ps, other]
Title: TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution
Comments: Accepted in AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[444]  arXiv:2512.02458 [pdf, ps, other]
Title: Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[445]  arXiv:2512.02457 [pdf, ps, other]
Title: Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation
Comments: Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[446]  arXiv:2512.02456 [pdf, ps, other]
Title: See, Think, Learn: A Self-Taught Multimodal Reasoner
Comments: Winter Conference on Applications of Computer Vision 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[447]  arXiv:2512.02453 [pdf, ps, other]
Title: ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[448]  arXiv:2512.02450 [pdf, ps, other]
Title: HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild
Comments: NeurIPS 2025 (Datasets and Benchmarks Track) Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[449]  arXiv:2512.02448 [pdf, ps, other]
Title: nuScenes Revisited: Progress and Challenges in Autonomous Driving
Comments: 18 pages, 17 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[450]  arXiv:2512.02447 [pdf, ps, other]
Title: Temporal Dynamics Enhancer for Directly Trained Spiking Object Detectors
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[451]  arXiv:2512.02441 [pdf, ps, other]
Title: Basis-Oriented Low-rank Transfer for Few-Shot and Test-Time Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[452]  arXiv:2512.02438 [pdf, ps, other]
Title: Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources
Comments: WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[453]  arXiv:2512.02437 [pdf, ps, other]
Title: LightHCG: a Lightweight yet powerful HSIC Disentanglement based Causal Glaucoma Detection Model framework
Authors: Daeyoung Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[454]  arXiv:2512.02425 [pdf, ps, other]
Title: WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
Comments: Project page : this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[455]  arXiv:2512.02423 [pdf, ps, other]
Title: GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning
Comments: 26 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[456]  arXiv:2512.02421 [pdf, ps, other]
Title: Generalizing Vision-Language Models with Dedicated Prompt Guidance
Comments: Accepted to AAAI26
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[457]  arXiv:2512.02413 [pdf, ps, other]
Title: MitUNet: Enhancing Floor Plan Recognition using a Hybrid Mix-Transformer and U-Net Architecture
Comments: 9 pages, 4 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[458]  arXiv:2512.02405 [pdf, ps, other]
Title: WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[459]  arXiv:2512.02400 [pdf, ps, other]
Title: Nav-$R^2$ Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[460]  arXiv:2512.02395 [pdf, ps, other]
Title: Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Comments: 21 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[461]  arXiv:2512.02394 [pdf, ps, other]
Title: Reproducing and Extending RaDelft 4D Radar with Camera-Assisted Labels
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[462]  arXiv:2512.02392 [pdf, ps, other]
Title: From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[463]  arXiv:2512.02375 [pdf, ps, other]
Title: On-the-fly Feedback SfM: Online Explore-and-Exploit UAV Photogrammetry with Incremental Mesh Quality-Aware Indicator and Predictive Path Planning
Comments: This work was submitted to IEEE GRSM Journal for consideration.COPYRIGHT would be transferred once it get accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[464]  arXiv:2512.02369 [pdf, ps, other]
Title: SAGE: Style-Adaptive Generalization for Privacy-Constrained Semantic Segmentation Across Domains
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[465]  arXiv:2512.02368 [pdf, ps, other]
Title: Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[466]  arXiv:2512.02364 [pdf, ps, other]
Title: Tackling Tuberculosis: A Comparative Dive into Machine Learning for Tuberculosis Detection
Journal-ref: Vol. 6, No. 1 (2024), Minnesota Undergraduate Research & Academic Journal (MURAJ)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[467]  arXiv:2512.02361 [pdf, ps, other]
Title: VACoT: Rethinking Visual Data Augmentation with VLMs
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[468]  arXiv:2512.02359 [pdf, ps, other]
Title: WSCF-MVCC: Weakly-supervised Calibration-free Multi-view Crowd Counting
Comments: PRCV 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[469]  arXiv:2512.02351 [pdf, ps, other]
Title: Understanding and Harnessing Sparsity in Unified Multimodal Models
Comments: 13 pages, 13 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[470]  arXiv:2512.02344 [pdf, ps, other]
Title: A multi-weight self-matching visual explanation for cnns on sar images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[471]  arXiv:2512.02341 [pdf, ps, other]
Title: TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[472]  arXiv:2512.02339 [pdf, ps, other]
Title: Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision
Comments: Accepted at NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[473]  arXiv:2512.02290 [pdf, ps, other]
Title: Enhancing Cross Domain SAR Oil Spill Segmentation via Morphological Region Perturbation and Synthetic Label-to-SAR Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[474]  arXiv:2512.02273 [pdf, ps, other]
Title: Progressive Image Restoration via Text-Conditioned Video Generation
Comments: First two authors contributed equally to this work. IEEE ICNC Accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[475]  arXiv:2512.02268 [pdf, ps, other]
Title: Spatiotemporal Pyramid Flow Matching for Climate Emulation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[476]  arXiv:2512.02258 [pdf, ps, other]
Title: Exploring the Potentials of Spiking Neural Networks for Image Deraining
Comments: Accepted By AAAI2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[477]  arXiv:2512.02231 [pdf, ps, other]
Title: See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
Comments: preprint
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[478]  arXiv:2512.02224 [pdf, ps, other]
Title: Towards Unified Video Quality Assessment
Comments: 8 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[479]  arXiv:2512.02198 [pdf, ps, other]
Title: Multifractal Recalibration of Neural Networks for Medical Imaging Segmentation
Comments: 30 pages, 9 figures, journal paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[480]  arXiv:2512.02188 [pdf, ps, other]
Title: RobustSurg: Tackling domain generalisation for out-of-distribution surgical scene segmentation
Comments: Submitted to Medical Image Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[481]  arXiv:2512.02172 [pdf, ps, other]
Title: SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian Splatting
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[482]  arXiv:2512.02162 [pdf, ps, other]
Title: Mapping of Lesion Images to Somatic Mutations
Authors: Rahul Mehta
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
[483]  arXiv:2512.02161 [pdf, ps, other]
Title: FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges
Comments: Accepted to NeurIPS 2025 Datasets and Benchmarks Track
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[484]  arXiv:2512.02152 [pdf, ps, other]
Title: Context-Enriched Contrastive Loss: Enhancing Presentation of Inherent Sample Connections in Contrastive Learning Framework
Comments: 13 pages, 7 figures. Published in IEEE Transactions on Multimedia. Code available at: this https URL
Journal-ref: IEEE Transactions on Multimedia, Vol. 27, pp. 429-441, December 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[485]  arXiv:2512.02055 [pdf, ps, other]
Title: Leveraging AI multimodal geospatial foundation models for improved near-real-time flood mapping at a global scale
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[486]  arXiv:2512.03028 (cross-list from cs.GR) [pdf, ps, other]
Title: SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control
Comments: 14 pages, 9 figures
Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[487]  arXiv:2512.02920 (cross-list from cs.LG) [pdf, ps, other]
Title: Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation
Comments: 17 pages. To appear in KDD'26 Datasets
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)
[488]  arXiv:2512.02787 (cross-list from cs.RO) [pdf, ps, other]
Title: Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[489]  arXiv:2512.02719 (cross-list from cs.CL) [pdf, ps, other]
Title: Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
[490]  arXiv:2512.02651 (cross-list from cs.HC) [pdf, ps, other]
Title: Real-Time Multimodal Data Collection Using Smartwatches and Its Visualization in Education
Comments: Accepted in Technological Ecosystems for Enhancing Multiculturality (TEEM) 2025
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
[491]  arXiv:2512.02636 (cross-list from cs.LG) [pdf, ps, other]
Title: Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[492]  arXiv:2512.02609 (cross-list from cs.RO) [pdf, ps, other]
Title: SAM2Grasp: Resolve Multi-modal Grasping via Prompt-conditioned Temporal Action Prediction
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[493]  arXiv:2512.02340 (cross-list from cs.AI) [pdf, ps, other]
Title: Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective
Comments: 23 pages, 37 figures
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[494]  arXiv:2512.02306 (cross-list from cs.AI) [pdf, ps, other]
Title: OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[495]  arXiv:2512.02293 (cross-list from cs.RO) [pdf, ps, other]
Title: VIGS-SLAM: Visual Inertial Gaussian Splatting SLAM
Comments: Project page: this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[496]  arXiv:2512.02280 (cross-list from cs.AI) [pdf, ps, other]
Title: Bridging the Gap: Toward Cognitive Autonomy in Artificial Intelligence
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[497]  arXiv:2512.02243 (cross-list from cs.CR) [pdf, ps, other]
Title: PhishSnap: Image-Based Phishing Detection Using Perceptual Hashing
Comments: IEE Standard Formatting, 3 pages, 3 figures
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[498]  arXiv:2512.02143 (cross-list from cs.GR) [pdf, ps, other]
Title: CoatFusion: Controllable Material Coating in Images
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[499]  arXiv:2512.02088 (cross-list from eess.IV) [pdf, ps, other]
Title: Comparing Baseline and Day-1 Diffusion MRI Using Multimodal Deep Embeddings for Stroke Outcome Prediction
Comments: 5 pages, 5 figures, 2 tables
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[500]  arXiv:2512.02062 (cross-list from cs.CR) [pdf, ps, other]
Title: Superpixel Attack: Enhancing Black-box Adversarial Attack with Image-driven Division Areas
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Tue, 2 Dec 2025

[501]  arXiv:2512.02018 [pdf, ps, other]
Title: Data-Centric Visual Development for Self-Driving Labs
Comments: 11 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[502]  arXiv:2512.02017 [pdf, ps, other]
Title: Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion
Comments: Accepted to NeurIPS 2025. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[503]  arXiv:2512.02016 [pdf, ps, other]
Title: Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[504]  arXiv:2512.02015 [pdf, ps, other]
Title: Generative Video Motion Editing with 3D Point Tracks
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[505]  arXiv:2512.02014 [pdf, ps, other]
Title: TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[506]  arXiv:2512.02012 [pdf, ps, other]
Title: Improved Mean Flows: On the Challenges of Fastforward Generative Models
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[507]  arXiv:2512.02009 [pdf, ps, other]
Title: AirSim360: A Panoramic Simulation Platform within Drone View
Comments: Project Website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[508]  arXiv:2512.02006 [pdf, ps, other]
Title: MV-TAP: Tracking Any Point in Multi-View Videos
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[509]  arXiv:2512.02005 [pdf, ps, other]
Title: Learning Visual Affordance from Audio
Comments: 15 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[510]  arXiv:2512.01989 [pdf, ps, other]
Title: PAI-Bench: A Comprehensive Benchmark For Physical AI
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[511]  arXiv:2512.01988 [pdf, ps, other]
Title: Artemis: Structured Visual Reasoning for Perception Policy Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[512]  arXiv:2512.01975 [pdf, ps, other]
Title: SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning
Comments: Accept by AAAI-2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[513]  arXiv:2512.01960 [pdf, ps, other]
Title: SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[514]  arXiv:2512.01952 [pdf, ps, other]
Title: GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[515]  arXiv:2512.01949 [pdf, ps, other]
Title: Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models
Comments: Published in Transactions on Machine Learning Research, Project in this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[516]  arXiv:2512.01934 [pdf, ps, other]
Title: Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial Trajectory
Comments: Accepted to Annual Computer Security Applications Conference (ACSAC) 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[517]  arXiv:2512.01922 [pdf, ps, other]
Title: Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding
Journal-ref: Computers in Biology and Medicine (2026)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[518]  arXiv:2512.01908 [pdf, ps, other]
Title: SARL: Spatially-Aware Self-Supervised Representation Learning for Visuo-Tactile Perception
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[519]  arXiv:2512.01895 [pdf, ps, other]
Title: StyleYourSmile: Cross-Domain Face Retargeting Without Paired Multi-Style Data
Comments: 15 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[520]  arXiv:2512.01889 [pdf, ps, other]
Title: KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[521]  arXiv:2512.01885 [pdf, ps, other]
Title: TransientTrack: Advanced Multi-Object Tracking and Classification of Cancer Cells with Transient Fluorescent Signals
Comments: 13 pages, 7 figures, 2 tables. This work has been submitted to IEEE Transactions on Medical Imaging
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cell Behavior (q-bio.CB); Quantitative Methods (q-bio.QM)
[522]  arXiv:2512.01853 [pdf, ps, other]
Title: COACH: Collaborative Agents for Contextual Highlighting -- A Multi-Agent Framework for Sports Video Analysis
Comments: Accepted by AAAI 2026 Workshop LaMAS
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[523]  arXiv:2512.01850 [pdf, ps, other]
Title: Register Any Point: Scaling 3D Point Cloud Registration by Flow Matching
Comments: 22 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[524]  arXiv:2512.01843 [pdf, ps, other]
Title: PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models
Comments: 17 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[525]  arXiv:2512.01830 [pdf, ps, other]
Title: OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[526]  arXiv:2512.01827 [pdf, ps, other]
Title: CauSight: Learning to Supersense for Visual Causal Discovery
Comments: project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[527]  arXiv:2512.01821 [pdf, ps, other]
Title: Seeing through Imagination: Learning Scene Geometry via Implicit Spatial World Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[528]  arXiv:2512.01816 [pdf, ps, other]
Title: Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
Comments: 35 pages, 12 figures, 10 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[529]  arXiv:2512.01803 [pdf, ps, other]
Title: Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[530]  arXiv:2512.01789 [pdf, ps, other]
Title: SAM3-UNet: Simplified Adaptation of Segment Anything Model 3
Comments: Technical Report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[531]  arXiv:2512.01788 [pdf, ps, other]
Title: Learned Image Compression for Earth Observation: Implications for Downstream Segmentation Tasks
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[532]  arXiv:2512.01774 [pdf, ps, other]
Title: Evaluating SAM2 for Video Semantic Segmentation
Comments: 17 pages, 3 figures and 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[533]  arXiv:2512.01771 [pdf, ps, other]
Title: Robust Rigid and Non-Rigid Medical Image Registration Using Learnable Edge Kernels
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[534]  arXiv:2512.01769 [pdf, ps, other]
Title: VideoScoop: A Non-Traditional Domain-Independent Framework For Video Analysis
Authors: Hafsa Billah
Comments: This is a report submitted as part of PhD proposal defense of Hafsa Billah
Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)
[535]  arXiv:2512.01763 [pdf, ps, other]
Title: HiconAgent: History Context-aware Policy Optimization for GUI Agents
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[536]  arXiv:2512.01755 [pdf, ps, other]
Title: FreqEdit: Preserving High-Frequency Features for Robust Multi-Turn Image Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[537]  arXiv:2512.01707 [pdf, ps, other]
Title: StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[538]  arXiv:2512.01701 [pdf, ps, other]
Title: SSR: Semantic and Spatial Rectification for CLIP-based Weakly Supervised Segmentation
Comments: Accepted in AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[539]  arXiv:2512.01686 [pdf, ps, other]
Title: DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[540]  arXiv:2512.01681 [pdf, ps, other]
Title: Cross-Domain Validation of a Resection-Trained Self-Supervised Model on Multicentre Mesothelioma Biopsies
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[541]  arXiv:2512.01677 [pdf, ps, other]
Title: Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[542]  arXiv:2512.01675 [pdf, ps, other]
Title: GRASP: Guided Residual Adapters with Sample-wise Partitioning
Comments: 10 pages, 4 figures, 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[543]  arXiv:2512.01665 [pdf, ps, other]
Title: Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[544]  arXiv:2512.01657 [pdf, ps, other]
Title: DB-KAUNet: An Adaptive Dual Branch Kolmogorov-Arnold UNet for Retinal Vessel Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[545]  arXiv:2512.01643 [pdf, ps, other]
Title: ViT$^3$: Unlocking Test-Time Training in Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[546]  arXiv:2512.01636 [pdf, ps, other]
Title: Generative Editing in the Joint Vision-Language Space for Zero-Shot Composed Image Retrieval
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[547]  arXiv:2512.01629 [pdf, ps, other]
Title: SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge
Comments: Project page: this https URL 17 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[548]  arXiv:2512.01611 [pdf, ps, other]
Title: Depth Matching Method Based on ShapeDTW for Oil-Based Mud Imager
Subjects: Computer Vision and Pattern Recognition (cs.CV); Geophysics (physics.geo-ph)
[549]  arXiv:2512.01589 [pdf, ps, other]
Title: Toward Content-based Indexing and Retrieval of Head and Neck CT with Abscess Segmentation
Comments: The 2025 IEEE International Conference on Content-Based Multimedia Indexing (IEEE CBMI)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[550]  arXiv:2512.01582 [pdf, ps, other]
Title: RoleMotion: A Large-Scale Dataset towards Robust Scene-Specific Role-Playing Motion Synthesis with Fine-grained Descriptions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[551]  arXiv:2512.01563 [pdf, ps, other]
Title: MasHeNe: A Benchmark for Head and Neck CT Mass Segmentation using Window-Enhanced Mamba with Frequency-Domain Integration
Comments: The 14th International Symposium on Information and Communication Technology Conference SoICT 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[552]  arXiv:2512.01540 [pdf, ps, other]
Title: FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention
Authors: Zipeng Wang, Dan Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[553]  arXiv:2512.01534 [pdf, ps, other]
Title: Deep Unsupervised Anomaly Detection in Brain Imaging: Large-Scale Benchmarking and Bias Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[554]  arXiv:2512.01533 [pdf, ps, other]
Title: Diffusion Fuzzy System: Fuzzy Rule Guided Latent Multi-Path Diffusion Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[555]  arXiv:2512.01519 [pdf, ps, other]
Title: QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic Interactions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Materials Science (cond-mat.mtrl-sci); Quantum Physics (quant-ph)
[556]  arXiv:2512.01510 [pdf, ps, other]
Title: Semantic-aware Random Convolution and Source Matching for Domain Generalization in Medical Image Segmentation
Comments: Preprint submitted to Computer Methods and Programs in Biomedicine (currently under revision)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[557]  arXiv:2512.01495 [pdf, ps, other]
Title: ELVIS: Enhance Low-Light for Video Instance Segmentation in the Dark
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[558]  arXiv:2512.01494 [pdf, other]
Title: A variational method for curve extraction with curvature-dependent energies
Authors: Majid Arthaud (ENPC, MOKAPLAN, UMich), Antonin Chambolle (CEREMADE, MOKAPLAN), Vincent Duval (MOKAPLAN)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[559]  arXiv:2512.01481 [pdf, ps, other]
Title: ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[560]  arXiv:2512.01478 [pdf, ps, other]
Title: CourtMotion: Learning Event-Driven Motion Representations from Skeletal Data for Basketball
Authors: Omer Sela (1 and 2), Michael Chertok (1), Lior Wolf (2) ((1) Amazon, (2) Tel Aviv University)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
[561]  arXiv:2512.01444 [pdf, ps, other]
Title: FastAnimate: Towards Learnable Template Construction and Pose Deformation for Fast 3D Human Avatar Animation
Comments: 9 pages,4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[562]  arXiv:2512.01427 [pdf, ps, other]
Title: Language-Guided Open-World Anomaly Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[563]  arXiv:2512.01426 [pdf, ps, other]
Title: ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[564]  arXiv:2512.01424 [pdf, ps, other]
Title: ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models
Comments: 22 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[565]  arXiv:2512.01422 [pdf, ps, other]
Title: MDiff4STR: Mask Diffusion Model for Scene Text Recognition
Comments: Accepted by AAAI 2026 (Oral)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[566]  arXiv:2512.01419 [pdf, ps, other]
Title: Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries
Comments: 14 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[567]  arXiv:2512.01390 [pdf, ps, other]
Title: FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution
Comments: Comments: Please visit our project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[568]  arXiv:2512.01383 [pdf, ps, other]
Title: PointNet4D: A Lightweight 4D Point Cloud Video Backbone for Online and Offline Perception in Robotic Applications
Comments: Accepted by WACV2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[569]  arXiv:2512.01382 [pdf, ps, other]
Title: Reversible Inversion for Training-Free Exemplar-guided Image Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[570]  arXiv:2512.01380 [pdf, ps, other]
Title: Textured Geometry Evaluation: Perceptual 3D Textured Shape Metric via 3D Latent-Geometry Network
Comments: Accepted by AAAI26
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[571]  arXiv:2512.01373 [pdf, ps, other]
Title: SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape Evaluation
Comments: Accepted by AAAI2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[572]  arXiv:2512.01366 [pdf, ps, other]
Title: BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single Earbud
Comments: This is the author-accepted version of the paper published in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Vol. 9, No. 4, Article 191, 2025. Final published version: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[573]  arXiv:2512.01352 [pdf, ps, other]
Title: OpenBox: Annotate Any Bounding Boxes in 3D
Comments: Accepted by NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[574]  arXiv:2512.01348 [pdf, ps, other]
Title: Handwritten Text Recognition for Low Resource Languages
Comments: 21 Pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[575]  arXiv:2512.01342 [pdf, ps, other]
Title: InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[576]  arXiv:2512.01340 [pdf, ps, other]
Title: EvalTalker: Learning to Evaluate Real-Portrait-Driven Multi-Subject Talking Humans
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[577]  arXiv:2512.01334 [pdf, ps, other]
Title: AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[578]  arXiv:2512.01333 [pdf, ps, other]
Title: Optimizing Stroke Risk Prediction: A Machine Learning Pipeline Combining ROS-Balanced Ensembles and XAI
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[579]  arXiv:2512.01319 [pdf, ps, other]
Title: Rethinking Intracranial Aneurysm Vessel Segmentation: A Perspective from Computational Fluid Dynamics Applications
Comments: 18 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[580]  arXiv:2512.01315 [pdf, ps, other]
Title: FOD-S2R: A FOD Dataset for Sim2Real Transfer Learning based Object Detection
Comments: 8 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[581]  arXiv:2512.01314 [pdf, ps, other]
Title: TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[582]  arXiv:2512.01312 [pdf, ps, other]
Title: IVCR-200K: A Large-Scale Multi-turn Dialogue Benchmark for Interactive Video Corpus Retrieval
Comments: Accepted by SIGIR2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[583]  arXiv:2512.01310 [pdf, ps, other]
Title: Lost in Distortion: Uncovering the Domain Gap Between Computer Vision and Brain Imaging - A Study on Pretraining for Age Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[584]  arXiv:2512.01306 [pdf, ps, other]
Title: Gaussian Swaying: Surface-Based Framework for Aerodynamic Simulation with 3D Gaussians
Comments: Accepted to WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[585]  arXiv:2512.01302 [pdf, ps, other]
Title: DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy
Comments: Accepted to WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[586]  arXiv:2512.01298 [pdf, ps, other]
Title: TBT-Former: Learning Temporal Boundary Distributions for Action Localization
Comments: 8 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[587]  arXiv:2512.01296 [pdf, ps, other]
Title: EGG-Fusion: Efficient 3D Reconstruction with Geometry-aware Gaussian Surfel on the Fly
Comments: SIGGRAPH ASIA 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[588]  arXiv:2512.01292 [pdf, ps, other]
Title: Diffusion Model in Latent Space for Medical Image Segmentation Task
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[589]  arXiv:2512.01291 [pdf, ps, other]
Title: Supervised Contrastive Machine Unlearning of Background Bias in Sonar Image Classification with Fine-Grained Explainable AI
Comments: Accepted to CVIP 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[590]  arXiv:2512.01273 [pdf, ps, other]
Title: nnMobileNet++: Towards Efficient Hybrid Networks for Retinal Image Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[591]  arXiv:2512.01268 [pdf, ps, other]
Title: ViscNet: Vision-Based In-line Viscometry for Fluid Mixing Process
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[592]  arXiv:2512.01248 [pdf, ps, other]
Title: TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[593]  arXiv:2512.01242 [pdf, ps, other]
Title: Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[594]  arXiv:2512.01236 [pdf, ps, other]
Title: PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[595]  arXiv:2512.01223 [pdf, ps, other]
Title: S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance
Comments: 18 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[596]  arXiv:2512.01214 [pdf, ps, other]
Title: M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis
Comments: 12 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[597]  arXiv:2512.01213 [pdf, ps, other]
Title: Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[598]  arXiv:2512.01204 [pdf, ps, other]
Title: TabletopGen: Instance-Level Interactive 3D Tabletop Scene Generation from Text or Single Image
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[599]  arXiv:2512.01178 [pdf, ps, other]
Title: VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering
Comments: arXiv admin note: text overlap with arXiv:2404.00149
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[600]  arXiv:2512.01165 [pdf, ps, other]
Title: Real-Time On-the-Go Annotation Framework Using YOLO for Automated Dataset Generation
Authors: Mohamed Abdallah Salem (1), Ahmed Harb Rabia (1) ((1) North Dakota State University)
Comments: Copyright 2025 IEEE. This is the author's version of the work that has been accepted for publication in Proceedings of the 5. Interdisciplinary Conference on Electrics and Computer (INTCEC 2025) 15-16 September 2025, Chicago-USA. The final version of record is available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[601]  arXiv:2512.01153 [pdf, ps, other]
Title: DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[602]  arXiv:2512.01148 [pdf, ps, other]
Title: SocialFusion: Addressing Social Degradation in Pre-trained Vision-Language Models
Comments: 22 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[603]  arXiv:2512.01145 [pdf, ps, other]
Title: Weakly Supervised Continuous Micro-Expression Intensity Estimation Using Temporal Deep Neural Network
Authors: Riyadh Mohammed Almushrafy (Majmaah University, Saudi Arabia)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[604]  arXiv:2512.01128 [pdf, ps, other]
Title: OmniFD: A Unified Model for Versatile Face Forgery Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[605]  arXiv:2512.01116 [pdf, ps, other]
Title: Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis
Comments: 37 pages, 14 Figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[606]  arXiv:2512.01103 [pdf, ps, other]
Title: Learning Eigenstructures of Unstructured Data Manifolds
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[607]  arXiv:2512.01095 [pdf, ps, other]
Title: CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[608]  arXiv:2512.01094 [pdf, ps, other]
Title: Accelerating Inference of Masked Image Generators via Reinforcement Learning
Comments: 15 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[609]  arXiv:2512.01085 [pdf, ps, other]
Title: Generalized Medical Phrase Grounding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[610]  arXiv:2512.01059 [pdf, ps, other]
Title: Parameter Reduction Improves Vision Transformers: A Comparative Study of Sharing and Width Reduction
Authors: Anantha Padmanaban Krishna Kumar (Boston University)
Comments: 7 pages total (6 pages main text, 1 page references), 1 figures, 2 tables. Code available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[611]  arXiv:2512.01048 [pdf, ps, other]
Title: TRoVe: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models
Comments: NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[612]  arXiv:2512.01030 [pdf, ps, other]
Title: Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
Comments: Work done at the Hong Kong University of Science and Technology (Guangzhou). Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[613]  arXiv:2512.01008 [pdf, ps, other]
Title: LISA-3D: Lifting Language-Image Segmentation to 3D via Multi-View Consistency
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[614]  arXiv:2512.00999 [pdf, ps, other]
Title: Provenance-Driven Reliable Semantic Medical Image Vector Reconstruction via Lightweight Blockchain-Verified Latent Fingerprints
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[615]  arXiv:2512.00995 [pdf, ps, other]
Title: S2AM3D: Scale-controllable Part Segmentation of 3D Point Cloud
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[616]  arXiv:2512.00993 [pdf, ps, other]
Title: PhotoFramer: Multi-modal Image Composition Instruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[617]  arXiv:2512.00975 [pdf, ps, other]
Title: MM-ACT: Learn from Multimodal Parallel Generation to Act
Comments: 17 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[618]  arXiv:2512.00960 [pdf, ps, other]
Title: Efficient and Scalable Monocular Human-Object Interaction Motion Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[619]  arXiv:2512.00953 [pdf, ps, other]
Title: Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval
Comments: Accepted by AAAI 2026, 10 pages, 9 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[620]  arXiv:2512.00944 [pdf, ps, other]
Title: Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation
Journal-ref: AAAI2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[621]  arXiv:2512.00936 [pdf, ps, other]
Title: SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding
Comments: Accepted to WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[622]  arXiv:2512.00927 [pdf, ps, other]
Title: LAHNet: Local Attentive Hashing Network for Point Cloud Registration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[623]  arXiv:2512.00912 [pdf, ps, other]
Title: ForamDeepSlice: A High-Accuracy Deep Learning Framework for Foraminifera Species Classification from 2D Micro-CT Slices
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[624]  arXiv:2512.00911 [pdf, ps, other]
Title: Dual-Projection Fusion for Accurate Upright Panorama Generation in Robotic Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[625]  arXiv:2512.00909 [pdf, ps, other]
Title: TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model
Comments: WACV 2026, Project page available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[626]  arXiv:2512.00904 [pdf, ps, other]
Title: Hierarchical Semantic Alignment for Image Clustering
Comments: AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[627]  arXiv:2512.00903 [pdf, ps, other]
Title: SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[628]  arXiv:2512.00891 [pdf, ps, other]
Title: Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
Comments: Code is avaliable at \url{this https URL}
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[629]  arXiv:2512.00887 [pdf, ps, other]
Title: Multilingual Training-Free Remote Sensing Image Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[630]  arXiv:2512.00885 [pdf, ps, other]
Title: HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[631]  arXiv:2512.00882 [pdf, ps, other]
Title: Look, Recite, Then Answer: Enhancing VLM Performance via Self-Generated Knowledge Hints
Authors: Xisheng Feng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[632]  arXiv:2512.00880 [pdf, ps, other]
Title: Quantum-Inspired Spectral Geometry for Neural Operator Equivalence and Structured Pruning
Comments: 6 pages, 1 figure, preliminary version; concepts and simulation experiments only
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[633]  arXiv:2512.00877 [pdf, ps, other]
Title: Feed-Forward 3D Gaussian Splatting Compression with Long-Context Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[634]  arXiv:2512.00873 [pdf, ps, other]
Title: Neural Discrete Representation Learning for Sparse-View CBCT Reconstruction: From Algorithm Design to Prospective Multicenter Clinical Evaluation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[635]  arXiv:2512.00872 [pdf, ps, other]
Title: TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models
Comments: 22 pages, 4 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[636]  arXiv:2512.00850 [pdf, ps, other]
Title: Smol-GS: Compact Representations for Abstract 3D Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[637]  arXiv:2512.00846 [pdf, ps, other]
Title: AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agent
Comments: Accepted at WACV 2026 Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[638]  arXiv:2512.00832 [pdf, ps, other]
Title: PanFlow: Decoupled Motion Control for Panoramic Video Generation
Comments: Accepted by AAAI. Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[639]  arXiv:2512.00814 [pdf, ps, other]
Title: IRPO: Boosting Image Restoration via Post-training GRPO
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[640]  arXiv:2512.00805 [pdf, ps, other]
Title: Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[641]  arXiv:2512.00796 [pdf, ps, other]
Title: CircleFlow: Flow-Guided Camera Blur Estimation using a Circle Grid Target
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[642]  arXiv:2512.00794 [pdf, ps, other]
Title: PolarGS: Polarimetric Cues for Ambiguity-Free Gaussian Splatting with Accurate Geometry Recovery
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[643]  arXiv:2512.00773 [pdf, ps, other]
Title: DEJIMA: A Novel Large-scale Japanese Dataset for Image Captioning and Visual Question Answering
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[644]  arXiv:2512.00771 [pdf, ps, other]
Title: EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting Scenes
Comments: Accepted at NeurIPS 2025 (spotlight)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[645]  arXiv:2512.00765 [pdf, ps, other]
Title: The Outline of Deception: Physical Adversarial Attacks on Traffic Signs Using Edge Patches
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[646]  arXiv:2512.00762 [pdf, ps, other]
Title: Seeing the Wind from a Falling Leaf
Comments: Accepted at NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[647]  arXiv:2512.00752 [pdf, ps, other]
Title: Charts Are Not Images: On the Challenges of Scientific Chart Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[648]  arXiv:2512.00748 [pdf, ps, other]
Title: Probabilistic Modeling of Multi-rater Medical Image Segmentation for Diversity and Personalization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[649]  arXiv:2512.00744 [pdf, ps, other]
Title: Joint Multi-scale Gated Transformer and Prior-guided Convolutional Network for Learned Image Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[650]  arXiv:2512.00743 [pdf, ps, other]
Title: Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards
Comments: 20 pages, 15 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[651]  arXiv:2512.00723 [pdf, ps, other]
Title: TrajDiff: End-to-end Autonomous Driving without Perception Annotation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[652]  arXiv:2512.00718 [pdf, ps, other]
Title: RS-ISRefiner: Towards Better Adapting Vision Foundation Models for Interactive Segmentation of Remote Sensing Images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[653]  arXiv:2512.00714 [pdf, ps, other]
Title: Deep Learning-Based Computer Vision Models for Early Cancer Detection Using Multimodal Medical Imaging and Radiogenomic Integration Frameworks
Journal-ref: International Journal of Computer Applications Technology and Research, vol. 14, no. 11, pp. 1-14, 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[654]  arXiv:2512.00706 [pdf, ps, other]
Title: Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[655]  arXiv:2512.00700 [pdf, ps, other]
Title: CAR-Net: A Cascade Refinement Network for Rotational Motion Deblurring under Angle Information Uncertainty
Comments: Accepted to AAIML 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[656]  arXiv:2512.00694 [pdf, ps, other]
Title: Affordance-First Decomposition for Continual Learning in Video-Language Understanding
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[657]  arXiv:2512.00691 [pdf, ps, other]
Title: Silhouette-based Gait Foundation Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[658]  arXiv:2512.00677 [pdf, ps, other]
Title: Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer
Comments: 4D Scene Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[659]  arXiv:2512.00676 [pdf, ps, other]
Title: Realistic Handwritten Multi-Digit Writer (MDW) Number Recognition Challenges
Authors: Kiri L. Wagstaff
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[660]  arXiv:2512.00647 [pdf, ps, other]
Title: MambaScope: Coarse-to-Fine Scoping for Efficient Vision Mamba
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[661]  arXiv:2512.00641 [pdf, ps, other]
Title: Graph-Attention Network with Adversarial Domain Alignment for Robust Cross-Domain Facial Expression Recognition
Comments: 17 pages, 5 figures. Accepted at the 17th Asian Conference on Machine Learning (ACML 2025), Taipei, Taiwan, December 9-12, 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[662]  arXiv:2512.00639 [pdf, ps, other]
Title: Doppler-Enhanced Deep Learning: Improving Thyroid Nodule Segmentation with YOLOv5 Instance Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Performance (cs.PF)
[663]  arXiv:2512.00626 [pdf, ps, other]
Title: XAI-Driven Skin Disease Classification: Leveraging GANs to Augment ResNet-50 Performance
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[664]  arXiv:2512.00625 [pdf, ps, other]
Title: Automatic Pith Detection in Tree Cross-Section Images Using Deep Learning
Comments: 8 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[665]  arXiv:2512.00597 [pdf, ps, other]
Title: Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[666]  arXiv:2512.00582 [pdf, ps, other]
Title: SatireDecoder: Visual Cascaded Decoupling for Enhancing Satirical Image Comprehension
Comments: Accepted by AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[667]  arXiv:2512.00572 [pdf, ps, other]
Title: Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[668]  arXiv:2512.00565 [pdf, ps, other]
Title: Describe Anything Anywhere At Any Moment
Comments: 14 pages, 5 figures, 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[669]  arXiv:2512.00557 [pdf, ps, other]
Title: NeuroVolve: Evolving Visual Stimuli toward Programmable Neural Objectives
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[670]  arXiv:2512.00547 [pdf, ps, other]
Title: Asset-Driven Sematic Reconstruction of Dynamic Scene with Multi-Human-Object Interactions
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[671]  arXiv:2512.00539 [pdf, ps, other]
Title: SAIDO: Generalizable Detection of AI-Generated Images via Scene-Aware and Importance-Guided Dynamic Optimization in Continual Learning
Comments: 17 pages, 19 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[672]  arXiv:2512.00534 [pdf, ps, other]
Title: Cross-Temporal 3D Gaussian Splatting for Sparse-View Guided Scene Update
Comments: AAAI2026 accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[673]  arXiv:2512.00532 [pdf, ps, other]
Title: Image Generation as a Visual Planner for Robotic Manipulation
Authors: Ye Pang
Comments: 11 pages 9 figures Under review at CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[674]  arXiv:2512.00514 [pdf, ps, other]
Title: Terrain Sensing with Smartphone Structured Light: 2D Dynamic Time Warping for Grid Pattern Matching
Authors: Tanaka Nobuaki
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[675]  arXiv:2512.00493 [pdf, ps, other]
Title: CC-FMO: Camera-Conditioned Zero-Shot Single Image to 3D Scene Generation with Foundation Model Orchestration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[676]  arXiv:2512.00489 [pdf, ps, other]
Title: Learning What Helps: Task-Aligned Context Selection for Vision Tasks
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[677]  arXiv:2512.00475 [pdf, ps, other]
Title: Structured Context Learning for Generic Event Boundary Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[678]  arXiv:2512.00473 [pdf, ps, other]
Title: RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[679]  arXiv:2512.00456 [pdf, ps, other]
Title: CausalAffect: Causal Discovery for Facial Affective Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[680]  arXiv:2512.00450 [pdf, ps, other]
Title: RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources Applications
Comments: 20 pages, 10 figures, 10 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[681]  arXiv:2512.00438 [pdf, ps, other]
Title: FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[682]  arXiv:2512.00428 [pdf, ps, other]
Title: Recognizing Pneumonia in Real-World Chest X-rays with a Classifier Trained with Images Synthetically Generated by Nano Banana
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[683]  arXiv:2512.00425 [pdf, ps, other]
Title: What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[684]  arXiv:2512.00424 [pdf, ps, other]
Title: Recovering Origin Destination Flows from Bus CCTV: Early Results from Nairobi and Kigali
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[685]  arXiv:2512.00422 [pdf, ps, other]
Title: PhysGen: Physically Grounded 3D Shape Generation for Industrial Design
Comments: 14 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[686]  arXiv:2512.00413 [pdf, ps, other]
Title: SplatFont3D: Structure-Aware Text-to-3D Artistic Font Generation with Part-Level Style Control
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[687]  arXiv:2512.00408 [pdf, ps, other]
Title: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[688]  arXiv:2512.00395 [pdf, ps, other]
Title: Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[689]  arXiv:2512.00387 [pdf, ps, other]
Title: WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing
Comments: 32 pages, 20 figures. Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[690]  arXiv:2512.00385 [pdf, ps, other]
Title: EZ-SP: Fast and Lightweight Superpoint-Based 3D Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[691]  arXiv:2512.00381 [pdf, ps, other]
Title: Pore-scale Image Patch Dataset and A Comparative Evaluation of Pore-scale Facial Features
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[692]  arXiv:2512.00369 [pdf, ps, other]
Title: POLARIS: Projection-Orthogonal Least Squares for Robust and Adaptive Inversion in Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[693]  arXiv:2512.00368 [pdf, ps, other]
Title: THCRL: Trusted Hierarchical Contrastive Representation Learning for Multi-View Clustering
Authors: Jian Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[694]  arXiv:2512.00365 [pdf, ps, other]
Title: Towards aligned body representations in vision models
Comments: Andrea Procopio and Andrey Gizdov have equal contributions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[695]  arXiv:2512.00363 [pdf, ps, other]
Title: MM-DETR: An Efficient Multimodal Detection Transformer with Mamba-Driven Dual-Granularity Fusion and Frequency-Aware Modality Adapters
Comments: Manuscript submitted to IEEE Transactions on Geoscience and Remote Sensing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[696]  arXiv:2512.00355 [pdf, ps, other]
Title: SMamDiff: Spatial Mamba for Stochastic Human Motion Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[697]  arXiv:2512.00345 [pdf, ps, other]
Title: mmPred: Radar-based Human Motion Prediction in the Dark
Comments: This paper is accepted by AAAI-2026
Journal-ref: AAAI-2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[698]  arXiv:2512.00343 [pdf, ps, other]
Title: Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[699]  arXiv:2512.00336 [pdf, ps, other]
Title: MVAD : A Comprehensive Multimodal Video-Audio Dataset for AIGC Detection
Comments: 7 pages,2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[700]  arXiv:2512.00327 [pdf, ps, other]
Title: Odometry Without Correspondence from Inertially Constrained Ruled Surfaces
Comments: 14 pages, 13 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[701]  arXiv:2512.00310 [pdf, ps, other]
Title: ART-ASyn: Anatomy-aware Realistic Texture-based Anomaly Synthesis Framework for Chest X-Rays
Comments: Accepted in WACV2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[702]  arXiv:2512.00308 [pdf, ps, other]
Title: Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation
Comments: NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[703]  arXiv:2512.00300 [pdf, ps, other]
Title: TGSFormer: Scalable Temporal Gaussian Splatting for Embodied Semantic Scene Completion
Comments: 14 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[704]  arXiv:2512.00294 [pdf, ps, other]
Title: Words into World: A Task-Adaptive Agent for Language-Guided Spatial Retrieval in AR
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[705]  arXiv:2512.00281 [pdf, ps, other]
Title: Rethinking Lung Cancer Screening: AI Nodule Detection and Diagnosis Outperforms Radiologists, Leading Models, and Standards Beyond Size and Growth
Comments: 25 pages, 8 figures, with supplementary information containing 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
[706]  arXiv:2512.00275 [pdf, ps, other]
Title: HIMOSA: Efficient Remote Sensing Image Super-Resolution with Hierarchical Mixture of Sparse Attention
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[707]  arXiv:2512.00269 [pdf, ps, other]
Title: USB: Unified Synthetic Brain Framework for Bidirectional Pathology-Healthy Generation and Editing
Authors: Jun Wang, Peirong Liu
Comments: 16 pages, 17 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[708]  arXiv:2512.00264 [pdf, ps, other]
Title: HeartFormer: Semantic-Aware Dual-Structure Transformers for 3D Four-Chamber Cardiac Point Cloud Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[709]  arXiv:2512.00261 [pdf, ps, other]
Title: UniDiff: Parameter-Efficient Adaptation of Diffusion Models for Land Cover Classification with Multi-Modal Remotely Sensed Imagery and Sparse Annotations
Comments: Camera-ready for WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[710]  arXiv:2512.00255 [pdf, ps, other]
Title: Relightable Holoported Characters: Capturing and Relighting Dynamic Human Performance from Sparse Views
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[711]  arXiv:2512.00226 [pdf, ps, other]
Title: DenseScan: Advancing 3D Scene Understanding with 2D Dense Annotation
Authors: Zirui Wang, Tao Zhang
Comments: Workshop on Space in Vision, Language, and Embodied AI at NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[712]  arXiv:2512.00208 [pdf, ps, other]
Title: ReactionMamba: Generating Short &Long Human Reaction Sequences
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[713]  arXiv:2512.00198 [pdf, ps, other]
Title: Mammo-FM: Breast-specific foundational model for Integrated Mammographic Diagnosis, Prognosis, and Reporting
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[714]  arXiv:2512.00194 [pdf, ps, other]
Title: AutocleanEEG ICVision: Automated ICA Artifact Classification Using Vision-Language AI
Comments: 6 pages, 8 figures
Journal-ref: Conference ICMI2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
[715]  arXiv:2512.00179 [pdf, ps, other]
Title: Efficient Edge-Compatible CNN for Speckle-Based Material Recognition in Laser Cutting Systems
Authors: Mohamed Abdallah Salem (North Dakota State University), Nourhan Zein Diab (New Mansoura University)
Comments: Copyright 2025 IEEE. This is the author's version of the work that has been Accepted for publication in the Proceedings of the 2025 IEEE The 35th International Conference on Computer Theory and Applications (ICCTA 2025). Final published version will be available on IEEE Xplore
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[716]  arXiv:2512.00130 [pdf, ps, other]
Title: Local and Global Context-and-Object-part-Aware Superpixel-based Data Augmentation for Deep Visual Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[717]  arXiv:2512.00129 [pdf, ps, other]
Title: Analysis of Incursive Breast Cancer in Mammograms Using YOLO, Explainability, and Domain Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[718]  arXiv:2512.00125 [pdf, ps, other]
Title: Hybrid Synthetic Data Generation with Domain Randomization Enables Zero-Shot Vision-Based Part Inspection Under Extreme Class Imbalance
Comments: Submitted to the NAMRC 54
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[719]  arXiv:2512.00117 [pdf, ps, other]
Title: TinyViT: Field Deployable Transformer Pipeline for Solar Panel Surface Fault and Severity Screening
Comments: 3pages, 2figures,ICGVIP 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[720]  arXiv:2512.00103 [pdf, ps, other]
Title: Comparative Analysis of Vision Transformer, Convolutional, and Hybrid Architectures for Mental Health Classification Using Actigraphy-Derived Images
Authors: Ifeanyi Okala
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[721]  arXiv:2512.00091 [pdf, ps, other]
Title: Deep Filament Extraction for 3D Concrete Printing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[722]  arXiv:2512.00089 [pdf, ps, other]
Title: TeleViT1.0: Teleconnection-aware Vision Transformers for Subseasonal to Seasonal Wildfire Pattern Forecasts
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[723]  arXiv:2512.00088 [pdf, ps, other]
Title: SemImage: Semantic Image Representation for Text, a Novel Framework for Embedding Disentangled Linguistic Features
Authors: Mohammad Zare
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[724]  arXiv:2512.00087 [pdf, ps, other]
Title: Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom Data
Comments: This article has been accepted for publication in the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[725]  arXiv:2512.00086 [pdf, ps, other]
Title: Multi-modal On-Device Learning for Monocular Depth Estimation on Ultra-low-power MCUs
Comments: 14 pages, 9 figures, 3 tables. Associated open-source release available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[726]  arXiv:2512.00084 [pdf, ps, other]
Title: A Fast and Efficient Modern BERT based Text-Conditioned Diffusion Model for Medical Image Segmentation
Comments: 15 pages, 3 figures, Accepted in Slide 3 10th International Conference on Computer Vision & Image Processing (CVIP 2026)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[727]  arXiv:2512.00082 [pdf, ps, other]
Title: Exploring Diagnostic Prompting Approach for Multimodal LLM-based Visual Complexity Assessment: A Case Study of Amazon Search Result Pages
Comments: 9 pages, 4 figures, 9 tables. Study on diagnostic prompting for multimodal LLM-based visual complexity assessment of Amazon search result pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[728]  arXiv:2512.00080 [pdf, ps, other]
Title: Conceptual Evaluation of Deep Visual Stereo Odometry for the MARWIN Radiation Monitoring Robot in Accelerator Tunnels
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[729]  arXiv:2512.00078 [pdf, ps, other]
Title: Diffusion-Based Synthetic Brightfield Microscopy Images for Enhanced Single Cell Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[730]  arXiv:2512.00075 [pdf, ps, other]
Title: Adapter Shield: A Unified Framework with Built-in Authentication for Preventing Unauthorized Zero-Shot Image-to-Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[731]  arXiv:2512.00073 [pdf, ps, other]
Title: ProvRain: Rain-Adaptive Denoising and Vehicle Detection via MobileNet-UNet and Faster R-CNN
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[732]  arXiv:2512.00065 [pdf, ps, other]
Title: Satellite to Street : Disaster Impact Estimator
Comments: 11 pages,9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[733]  arXiv:2512.00061 [pdf, ps, other]
Title: DL-CapsNet: A Deep and Light Capsule Network
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[734]  arXiv:2512.00060 [pdf, ps, other]
Title: PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[735]  arXiv:2512.00042 [pdf, ps, other]
Title: Closing the Gap: Data-Centric Fine-Tuning of Vision Language Models for the Standardized Exam Questions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
[736]  arXiv:2512.00008 [pdf, ps, other]
Title: MOTION: ML-Assisted On-Device Low-Latency Motion Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[737]  arXiv:2512.02020 (cross-list from cs.RO) [pdf, ps, other]
Title: EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI
Comments: Accepted by AAAI 2026. Project Page: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[738]  arXiv:2512.01993 (cross-list from cs.RO) [pdf, ps, other]
Title: RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies
Comments: Preprint
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[739]  arXiv:2512.01979 (cross-list from cs.AI) [pdf, ps, other]
Title: Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference Feedback
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[740]  arXiv:2512.01946 (cross-list from cs.RO) [pdf, ps, other]
Title: Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models
Comments: Code, Data, and Models available at this https URL The paper contains 8 pages, 9 figures, 6 tables
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[741]  arXiv:2512.01913 (cross-list from eess.IV) [pdf, ps, other]
Title: Disentangling Progress in Medical Image Registration: Beyond Trend-Driven Architectures towards Domain-Specific Strategies
Comments: Submitted to Medical Image Analysis. Journal Extension of arXiv:2407.19274
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[742]  arXiv:2512.01822 (cross-list from cs.CL) [pdf, ps, other]
Title: InnoGym: Benchmarking the Innovation Potential of AI Agents
Comments: Work in progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[743]  arXiv:2512.01818 (cross-list from cs.LG) [pdf, ps, other]
Title: Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[744]  arXiv:2512.01687 (cross-list from cs.NE) [pdf, ps, other]
Title: Revisiting Direct Encoding: Learnable Temporal Dynamics for Static Image Spiking Neural Networks
Authors: Huaxu He
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
[745]  arXiv:2512.01550 (cross-list from cs.RO) [pdf, ps, other]
Title: NavForesee: A Unified Vision-Language World Model for Hierarchical Planning and Dual-Horizon Navigation Prediction
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[746]  arXiv:2512.01461 (cross-list from cs.LG) [pdf, ps, other]
Title: Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[747]  arXiv:2512.01329 (cross-list from cs.GR) [pdf, ps, other]
Title: TagSplat: Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[748]  arXiv:2512.01324 (cross-list from hep-ex) [pdf, ps, other]
Title: Panda: Self-distillation of Reusable Sensor-level Representations for High Energy Physics
Comments: 23 pages, 15 figures, preprint. Project page at this https URL
Subjects: High Energy Physics - Experiment (hep-ex); Computer Vision and Pattern Recognition (cs.CV)
[749]  arXiv:2512.01252 (cross-list from cs.LG) [pdf, ps, other]
Title: Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe
Comments: 9 pages, 7 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[750]  arXiv:2512.01181 (cross-list from cs.LG) [pdf, ps, other]
Title: First On-Orbit Demonstration of a Geospatial Foundation Model
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[751]  arXiv:2512.01152 (cross-list from cs.LG) [pdf, ps, other]
Title: Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A Provably Efficient Solution
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[752]  arXiv:2512.01104 (cross-list from cs.RO) [pdf, ps, other]
Title: Estimation of Kinematic Motion from Dashcam Footage
Comments: 8 pages, 10 figures
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[753]  arXiv:2512.01061 (cross-list from cs.RO) [pdf, ps, other]
Title: Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer
Comments: this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[754]  arXiv:2512.01009 (cross-list from cs.RO) [pdf, ps, other]
Title: FOM-Nav: Frontier-Object Maps for Object Goal Navigation
Comments: Project page: this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[755]  arXiv:2512.00883 (cross-list from cs.MM) [pdf, ps, other]
Title: Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[756]  arXiv:2512.00818 (cross-list from cs.AI) [pdf, ps, other]
Title: Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[757]  arXiv:2512.00777 (cross-list from cs.RO) [pdf, ps, other]
Title: Sign Language Recognition using Bidirectional Reservoir Computing
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[758]  arXiv:2512.00736 (cross-list from cs.LG) [pdf, ps, other]
Title: REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories
Journal-ref: Proceedings of the Conference on Language Modeling (COLM 2025)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[759]  arXiv:2512.00659 (cross-list from cs.RO) [pdf, ps, other]
Title: Fast, Robust, Permutation-and-Sign Invariant SO(3) Pattern Alignment
Subjects: Robotics (cs.RO); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV)
[760]  arXiv:2512.00403 (cross-list from cs.LG) [pdf, ps, other]
Title: SelfAI: Building a Self-Training AI System with LLM Agents
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[761]  arXiv:2512.00396 (cross-list from cs.LG) [pdf, ps, other]
Title: Time-Series at the Edge: Tiny Separable CNNs for Wearable Gait Detection and Optimal Sensor Placement
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[762]  arXiv:2512.00350 (cross-list from eess.IV) [pdf, ps, other]
Title: MedCondDiff: Lightweight, Robust, Semantically Guided Diffusion for Medical Image Segmentation
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[763]  arXiv:2512.00324 (cross-list from cs.RO) [pdf, ps, other]
Title: MILE: A Mechanically Isomorphic Exoskeleton Data Collection System with Fingertip Visuotactile Sensing for Dexterous Manipulation
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[764]  arXiv:2512.00287 (cross-list from cs.RO) [pdf, ps, other]
Title: RealAppliance: Let High-fidelity Appliance Assets Controllable and Workable as Aligned Real Manuals
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[765]  arXiv:2512.00229 (cross-list from cs.LG) [pdf, ps, other]
Title: TIE: A Training-Inversion-Exclusion Framework for Visually Interpretable and Uncertainty-Guided Out-of-Distribution Detection
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[766]  arXiv:2512.00138 (cross-list from cs.AR) [pdf, ps, other]
Title: Ternary-Input Binary-Weight CNN Accelerator Design for Miniature Object Classification System with Query-Driven Spatial DVS
Comments: 6 pages.12 figures & 2 table
Subjects: Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[767]  arXiv:2512.00120 (cross-list from cs.SD) [pdf, ps, other]
Title: Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[768]  arXiv:2512.00115 (cross-list from cs.SD) [pdf, ps, other]
Title: MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning
Comments: 10 pages, 5 figures
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[769]  arXiv:2512.00094 (cross-list from cs.CR) [pdf, ps, other]
Title: HMARK: Radioactive Multi-Bit Semantic-Latent Watermarking for Diffusion Models
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[770]  arXiv:2512.00076 (cross-list from cs.RO) [pdf, ps, other]
Title: Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[771]  arXiv:2512.00074 (cross-list from cs.RO) [pdf, ps, other]
Title: Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[772]  arXiv:2512.00052 (cross-list from physics.geo-ph) [pdf, ps, other]
Title: Coarse-to-Fine Non-Rigid Registration for Side-Scan Sonar Mosaicking
Subjects: Geophysics (physics.geo-ph); Computer Vision and Pattern Recognition (cs.CV)
[773]  arXiv:2512.00041 (cross-list from cs.RO) [pdf, ps, other]
Title: VISTAv2: World Imagination for Indoor Vision-and-Language Navigation
Comments: 11 pages, 5 figures
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[774]  arXiv:2512.00037 (cross-list from cs.RO) [pdf, ps, other]
Title: ICD-Net: Inertial Covariance Displacement Network for Drone Visual-Inertial SLAM
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[775]  arXiv:2512.00027 (cross-list from cs.RO) [pdf, ps, other]
Title: A Survey on Improving Human Robot Collaboration through Vision-and-Language Navigation
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[776]  arXiv:2512.00024 (cross-list from cs.RO) [pdf, ps, other]
Title: Learning from Watching: Scalable Extraction of Manipulation Trajectories from Human Videos
Authors: X. Hu, G. Ye
Comments: Accepted to RSS 2025 Workshop
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[777]  arXiv:2512.00021 (cross-list from cs.RO) [pdf, ps, other]
Title: Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges
Comments: Under review
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[778]  arXiv:2512.00019 (cross-list from cs.RO) [pdf, ps, other]
Title: A Comprehensive Survey on Surgical Digital Twin
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[ total of 778 entries: 1-778 ]
[ showing 778 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help  (Access key information)