We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Vision and Pattern Recognition

Authors and titles for recent submissions, skipping first 421

[ total of 778 entries: 1-50 | ... | 272-321 | 322-371 | 372-421 | 422-471 | 472-521 | 522-571 | 572-621 | ... | 772-778 ]
[ showing 50 entries per page: fewer | more | all ]

Wed, 3 Dec 2025 (continued, showing 50 of 141 entries)

[422]  arXiv:2512.02643 [pdf, ps, other]
Title: Leveraging Large-Scale Pretrained Spatial-Spectral Priors for General Zero-Shot Pansharpening
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[423]  arXiv:2512.02624 [pdf, ps, other]
Title: PPTBench: Towards Holistic Evaluation of Large Language Models for PowerPoint Layout and Design Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[424]  arXiv:2512.02622 [pdf, ps, other]
Title: RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[425]  arXiv:2512.02621 [pdf, ps, other]
Title: Content-Aware Texturing for Gaussian Splatting
Comments: Project Page: this https URL
Journal-ref: Eurographics Symposium on Rendering (Symposium Track), 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[426]  arXiv:2512.02576 [pdf, ps, other]
Title: Co-speech Gesture Video Generation via Motion-Based Graph Retrieval
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[427]  arXiv:2512.02566 [pdf, ps, other]
Title: From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[428]  arXiv:2512.02554 [pdf, ps, other]
Title: OmniPerson: Unified Identity-Preserving Pedestrian Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[429]  arXiv:2512.02541 [pdf, ps, other]
Title: AVGGT: Rethinking Global Attention for Accelerating VGGT
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[430]  arXiv:2512.02536 [pdf, ps, other]
Title: WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[431]  arXiv:2512.02520 [pdf, ps, other]
Title: On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection
Authors: Tai Le-Gia
Comments: PhD Dissertation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[432]  arXiv:2512.02517 [pdf, ps, other]
Title: SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[433]  arXiv:2512.02512 [pdf, ps, other]
Title: Two-Stage Vision Transformer for Image Restoration: Colorization Pretraining + Residual Upsampling
Comments: Accepted as a Tiny Paper at the 13th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP 2025), IIT Mandi, India. 3 pages, 1 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[434]  arXiv:2512.02505 [pdf, ps, other]
Title: GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[435]  arXiv:2512.02498 [pdf, ps, other]
Title: dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[436]  arXiv:2512.02497 [pdf, ps, other]
Title: A Large Scale Benchmark for Test Time Adaptation Methods in Medical Image Segmentation
Comments: 45 pages, 18 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[437]  arXiv:2512.02496 [pdf, ps, other]
Title: Attention-guided reference point shifting for Gaussian-mixture-based partial point set registration
Comments: 16 pages, 9 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[438]  arXiv:2512.02492 [pdf, ps, other]
Title: YingVideo-MV: Music-Driven Multi-Stage Video Generation
Comments: 18 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[439]  arXiv:2512.02487 [pdf, ps, other]
Title: Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[440]  arXiv:2512.02485 [pdf, ps, other]
Title: UCAgents: Unidirectional Convergence for Visual Evidence Anchored Multi-Agent Medical Decision-Making
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[441]  arXiv:2512.02482 [pdf, ps, other]
Title: G-SHARP: Gaussian Surgical Hardware Accelerated Real-time Pipeline
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[442]  arXiv:2512.02473 [pdf, ps, other]
Title: WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[443]  arXiv:2512.02469 [pdf, ps, other]
Title: TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution
Comments: Accepted in AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[444]  arXiv:2512.02458 [pdf, ps, other]
Title: Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[445]  arXiv:2512.02457 [pdf, ps, other]
Title: Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation
Comments: Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[446]  arXiv:2512.02456 [pdf, ps, other]
Title: See, Think, Learn: A Self-Taught Multimodal Reasoner
Comments: Winter Conference on Applications of Computer Vision 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[447]  arXiv:2512.02453 [pdf, ps, other]
Title: ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[448]  arXiv:2512.02450 [pdf, ps, other]
Title: HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild
Comments: NeurIPS 2025 (Datasets and Benchmarks Track) Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[449]  arXiv:2512.02448 [pdf, ps, other]
Title: nuScenes Revisited: Progress and Challenges in Autonomous Driving
Comments: 18 pages, 17 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[450]  arXiv:2512.02447 [pdf, ps, other]
Title: Temporal Dynamics Enhancer for Directly Trained Spiking Object Detectors
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[451]  arXiv:2512.02441 [pdf, ps, other]
Title: Basis-Oriented Low-rank Transfer for Few-Shot and Test-Time Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[452]  arXiv:2512.02438 [pdf, ps, other]
Title: Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources
Comments: WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[453]  arXiv:2512.02437 [pdf, ps, other]
Title: LightHCG: a Lightweight yet powerful HSIC Disentanglement based Causal Glaucoma Detection Model framework
Authors: Daeyoung Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[454]  arXiv:2512.02425 [pdf, ps, other]
Title: WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
Comments: Project page : this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[455]  arXiv:2512.02423 [pdf, ps, other]
Title: GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning
Comments: 26 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[456]  arXiv:2512.02421 [pdf, ps, other]
Title: Generalizing Vision-Language Models with Dedicated Prompt Guidance
Comments: Accepted to AAAI26
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[457]  arXiv:2512.02413 [pdf, ps, other]
Title: MitUNet: Enhancing Floor Plan Recognition using a Hybrid Mix-Transformer and U-Net Architecture
Comments: 9 pages, 4 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[458]  arXiv:2512.02405 [pdf, ps, other]
Title: WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[459]  arXiv:2512.02400 [pdf, ps, other]
Title: Nav-$R^2$ Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[460]  arXiv:2512.02395 [pdf, ps, other]
Title: Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Comments: 21 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[461]  arXiv:2512.02394 [pdf, ps, other]
Title: Reproducing and Extending RaDelft 4D Radar with Camera-Assisted Labels
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[462]  arXiv:2512.02392 [pdf, ps, other]
Title: From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[463]  arXiv:2512.02375 [pdf, ps, other]
Title: On-the-fly Feedback SfM: Online Explore-and-Exploit UAV Photogrammetry with Incremental Mesh Quality-Aware Indicator and Predictive Path Planning
Comments: This work was submitted to IEEE GRSM Journal for consideration.COPYRIGHT would be transferred once it get accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[464]  arXiv:2512.02369 [pdf, ps, other]
Title: SAGE: Style-Adaptive Generalization for Privacy-Constrained Semantic Segmentation Across Domains
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[465]  arXiv:2512.02368 [pdf, ps, other]
Title: Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[466]  arXiv:2512.02364 [pdf, ps, other]
Title: Tackling Tuberculosis: A Comparative Dive into Machine Learning for Tuberculosis Detection
Journal-ref: Vol. 6, No. 1 (2024), Minnesota Undergraduate Research & Academic Journal (MURAJ)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[467]  arXiv:2512.02361 [pdf, ps, other]
Title: VACoT: Rethinking Visual Data Augmentation with VLMs
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[468]  arXiv:2512.02359 [pdf, ps, other]
Title: WSCF-MVCC: Weakly-supervised Calibration-free Multi-view Crowd Counting
Comments: PRCV 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[469]  arXiv:2512.02351 [pdf, ps, other]
Title: Understanding and Harnessing Sparsity in Unified Multimodal Models
Comments: 13 pages, 13 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[470]  arXiv:2512.02344 [pdf, ps, other]
Title: A multi-weight self-matching visual explanation for cnns on sar images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[471]  arXiv:2512.02341 [pdf, ps, other]
Title: TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[ total of 778 entries: 1-50 | ... | 272-321 | 322-371 | 372-421 | 422-471 | 472-521 | 522-571 | 572-621 | ... | 772-778 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help  (Access key information)