We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Vision and Pattern Recognition

Authors and titles for recent submissions, skipping first 28

[ total of 747 entries: 1-100 | 29-128 | 129-228 | 229-328 | 329-428 | ... | 729-747 ]
[ showing 100 entries per page: fewer | more | all ]

Mon, 15 Dec 2025 (continued, showing last 76 of 104 entries)

[29]  arXiv:2512.11534 [pdf, ps, other]
Title: HFS: Holistic Query-Aware Frame Selection for Efficient Video Reasoning
Comments: 18 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[30]  arXiv:2512.11524 [pdf, ps, other]
Title: Super-Resolved Canopy Height Mapping from Sentinel-2 Time Series Using LiDAR HD Reference Data across Metropolitan France
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[31]  arXiv:2512.11510 [pdf, ps, other]
Title: Reconstruction as a Bridge for Event-Based Visual Question Answering
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[32]  arXiv:2512.11508 [pdf, ps, other]
Title: On Geometric Understanding and Learned Data Priors in VGGT
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[33]  arXiv:2512.11507 [pdf, ps, other]
Title: SSA3D: Text-Conditioned Assisted Self-Supervised Framework for Automatic Dental Abutment Design
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[34]  arXiv:2512.11503 [pdf, ps, other]
Title: TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[35]  arXiv:2512.11490 [pdf, ps, other]
Title: VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote Sensing
Comments: 21 pages, 7 figures, under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
[36]  arXiv:2512.11480 [pdf, ps, other]
Title: CADMorph: Geometry-Driven Parametric CAD Editing via a Plan-Generate-Verify Loop
Comments: NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[37]  arXiv:2512.11465 [pdf, ps, other]
Title: DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation
Comments: AAAI-26
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[38]  arXiv:2512.11464 [pdf, ps, other]
Title: Exploring MLLM-Diffusion Information Transfer with MetaCanvas
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[39]  arXiv:2512.11458 [pdf, ps, other]
Title: Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[40]  arXiv:2512.11446 [pdf, ps, other]
Title: YawDD+: Frame-level Annotations for Accurate Yawn Prediction
Comments: This paper is submitted at European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[41]  arXiv:2512.11438 [pdf, ps, other]
Title: Flowception: Temporally Expansive Flow Matching for Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[42]  arXiv:2512.11423 [pdf, ps, other]
Title: JoyAvatar: Real-time and Infinite Audio-Driven Avatar Generation with Autoregressive Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[43]  arXiv:2512.11401 [pdf, ps, other]
Title: Collaborative Reconstruction and Repair for Multi-class Industrial Anomaly Detection
Comments: Accepted to Data Intelligence 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[44]  arXiv:2512.11395 [pdf, ps, other]
Title: FlowDC: Flow-Based Decoupling-Decay for Complex Image Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[45]  arXiv:2512.11393 [pdf, ps, other]
Title: The N-Body Problem: Parallel Execution from Single-Person Egocentric Video
Comments: project webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[46]  arXiv:2512.11373 [pdf, ps, other]
Title: Out-of-Distribution Segmentation via Wasserstein-Based Evidential Uncertainty
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[47]  arXiv:2512.11369 [pdf, ps, other]
Title: Assisted Refinement Network Based on Channel Information Interaction for Camouflaged and Salient Object Detection
Comments: 15 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[48]  arXiv:2512.11360 [pdf, ps, other]
Title: Reliable Detection of Minute Targets in High-Resolution Aerial Imagery across Temporal Shifts
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[49]  arXiv:2512.11356 [pdf, ps, other]
Title: Prior-Enhanced Gaussian Splatting for Dynamic Scene Reconstruction from Casual Video
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[50]  arXiv:2512.11354 [pdf, ps, other]
Title: A Multi-Mode Structured Light 3D Imaging System with Multi-Source Information Fusion for Underwater Pipeline Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[51]  arXiv:2512.11350 [pdf, ps, other]
Title: Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[52]  arXiv:2512.11340 [pdf, ps, other]
Title: Task-Specific Distance Correlation Matching for Few-Shot Action Recognition
Comments: 9 pages. 4 figures, conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[53]  arXiv:2512.11336 [pdf, ps, other]
Title: UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Comments: 22 pages, 13 figures, technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[54]  arXiv:2512.11335 [pdf, ps, other]
Title: FreqDINO: Frequency-Guided Adaptation for Generalized Boundary-Aware Ultrasound Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[55]  arXiv:2512.11327 [pdf, ps, other]
Title: Physics-Informed Video Flare Synthesis and Removal Leveraging Motion Independence between Flare and Scene
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[56]  arXiv:2512.11325 [pdf, ps, other]
Title: MLLM Machine Unlearning via Visual Knowledge Distillation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[57]  arXiv:2512.11321 [pdf, ps, other]
Title: KeyframeFace: From Text to Expressive Facial Keyframes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[58]  arXiv:2512.11319 [pdf, ps, other]
Title: SATMapTR: Satellite Image Enhanced Online HD Map Construction
Comments: 9 pages (+ 3 pages of Appendix)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[59]  arXiv:2512.11301 [pdf, ps, other]
Title: MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene Reconstruction
Comments: ACM MM 2025 Dataset Track
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[60]  arXiv:2512.11296 [pdf, ps, other]
Title: Few-Shot VLM-Based G-Code and HMI Verification in CNC Machining
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[61]  arXiv:2512.11293 [pdf, ps, other]
Title: Autoregressive Video Autoencoder with Decoupled Temporal and Spatial Context
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[62]  arXiv:2512.11284 [pdf, ps, other]
Title: RcAE: Recursive Reconstruction Framework for Unsupervised Industrial Anomaly Detection
Comments: 19 pages, 7 figures, to be published in AAAI-26
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[63]  arXiv:2512.11274 [pdf, ps, other]
Title: FilmWeaver: Weaving Consistent Multi-Shot Videos with Cache-Guided Autoregressive Diffusion
Comments: AAAI-2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[64]  arXiv:2512.11267 [pdf, ps, other]
Title: Evaluating the Efficacy of Sentinel-2 versus Aerial Imagery in Serrated Tussock Classification
Comments: Accepted in Earthsense 2025 (IEEE INTERNATIONAL CONFERENCE ON NEXT-GEN TECHNOLOGIES OF ARTIFICIAL INTELLIGENCE AND GEOSCIENCE REMOTE SENSING)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[65]  arXiv:2512.11260 [pdf, ps, other]
Title: Do We Need Reformer for Vision? An Experimental Comparison with Vision Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[66]  arXiv:2512.11253 [pdf, ps, other]
Title: PersonaLive! Expressive Portrait Image Animation for Live Streaming
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[67]  arXiv:2512.11239 [pdf, ps, other]
Title: Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition
Comments: Accepted by AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[68]  arXiv:2512.11237 [pdf, ps, other]
Title: WildCap: Facial Appearance Capture in the Wild via Hybrid Inverse Rendering
Comments: Technical report. project page: this https URL; code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[69]  arXiv:2512.11234 [pdf, ps, other]
Title: RoomPilot: Controllable Synthesis of Interactive Indoor Environments via Multimodal Semantic Parsing
Comments: 20 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[70]  arXiv:2512.11229 [pdf, ps, other]
Title: REST: Diffusion-based Real-time End-to-end Streaming Talking Head Generation via ID-Context Caching and Asynchronous Streaming Distillation
Comments: 10pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[71]  arXiv:2512.11226 [pdf, ps, other]
Title: FutureX: Enhance End-to-End Autonomous Driving via Latent Chain-of-Thought World Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[72]  arXiv:2512.11225 [pdf, ps, other]
Title: VFMF: World Modeling by Forecasting Vision Foundation Model Features
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[73]  arXiv:2512.11215 [pdf, ps, other]
Title: SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
Comments: Accepted to WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[74]  arXiv:2512.11203 [pdf, ps, other]
Title: AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[75]  arXiv:2512.11199 [pdf, ps, other]
Title: CADKnitter: Compositional CAD Generation from Text and Geometry Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[76]  arXiv:2512.11189 [pdf, ps, other]
Title: Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization
Comments: BinEgo360@ICCV25
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[77]  arXiv:2512.11186 [pdf, ps, other]
Title: Lightweight 3D Gaussian Splatting Compression via Video Codec
Comments: Accepted by DCC2026 Oral
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[78]  arXiv:2512.11167 [pdf, ps, other]
Title: Image Tiling for High-Resolution Reasoning: Balancing Local Detail with Global Context
Comments: Accepted in AAAI 2025 Workshop on Reproducible AI
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[79]  arXiv:2512.11141 [pdf, ps, other]
Title: Learning complete and explainable visual representations from itemized text supervision
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[80]  arXiv:2512.11130 [pdf, ps, other]
Title: Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[81]  arXiv:2512.11121 [pdf, ps, other]
Title: Learning from a Generative Oracle: Domain Adaptation for Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[82]  arXiv:2512.11104 [pdf, ps, other]
Title: Information-driven Fusion of Pathology Foundation Models for Enhanced Disease Characterization
Comments: 29 Pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[83]  arXiv:2512.11099 [pdf, ps, other]
Title: VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[84]  arXiv:2512.11098 [pdf, ps, other]
Title: Vision-Language Models for Infrared Industrial Sensing in Additive Manufacturing Scene Description
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[85]  arXiv:2512.11076 [pdf, ps, other]
Title: E-CHUM: Event-based Cameras for Human Detection and Urban Monitoring
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[86]  arXiv:2512.11061 [pdf, ps, other]
Title: VDAWorld: World Modelling via VLM-Directed Abstraction and Simulation
Comments: Website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[87]  arXiv:2512.11060 [pdf, ps, other]
Title: Synthetic Vasculature and Pathology Enhance Vision-Language Model Reasoning
Comments: 23 pages, 8 figures, 6 tables. Full paper under review for MIDL 2026 (Medical Imaging with Deep Learning)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[88]  arXiv:2512.11057 [pdf, ps, other]
Title: Weakly Supervised Tuberculosis Localization in Chest X-rays through Knowledge Distillation
Comments: 18 pages, 9 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[89]  arXiv:2512.11016 [pdf, ps, other]
Title: SoccerMaster: A Vision Foundation Model for Soccer Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[90]  arXiv:2512.11015 [pdf, ps, other]
Title: Leveraging Text Guidance for Enhancing Demographic Fairness in Gender Classification
Authors: Anoop Krishnan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[91]  arXiv:2512.11797 (cross-list from cs.RO) [pdf, ps, other]
Title: AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis
Comments: Project page: this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[92]  arXiv:2512.11745 (cross-list from eess.IV) [pdf, ps, other]
Title: mViSE: A Visual Search Engine for Analyzing Multiplex IHC Brain Tissue Images
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[93]  arXiv:2512.11695 (cross-list from physics.flu-dyn) [pdf, ps, other]
Title: Particle Image Velocimetry Refinement via Consensus ADMM
Comments: Code: this https URL
Subjects: Fluid Dynamics (physics.flu-dyn); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Optimization and Control (math.OC)
[94]  arXiv:2512.11676 (cross-list from math.PR) [pdf, ps, other]
Title: Stochastics of shapes and Kunita flows
Subjects: Probability (math.PR); Computer Vision and Pattern Recognition (cs.CV)
[95]  arXiv:2512.11582 (cross-list from cs.LG) [pdf, ps, other]
Title: Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model
Comments: Code and pretrained models available at this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
[96]  arXiv:2512.11532 (cross-list from cs.DC) [pdf, ps, other]
Title: Parallax: Runtime Parallelization for Operator Fallbacks in Heterogeneous Edge Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[97]  arXiv:2512.11433 (cross-list from cs.AI) [pdf, other]
Title: Back to the Baseline: Examining Baseline Effects on Explainability Metrics
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[98]  arXiv:2512.11399 (cross-list from cs.CL) [pdf, ps, other]
Title: Minimal Clips, Maximum Salience: Long Video Summarization via Key Moment Extraction
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[99]  arXiv:2512.11243 (cross-list from cs.LG) [pdf, ps, other]
Title: Task-Aware Multi-Expert Architecture For Lifelong Deep Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[100]  arXiv:2512.11218 (cross-list from cs.RO) [pdf, ps, other]
Title: Seeing to Act, Prompting to Specify: A Bayesian Factorization of Vision Language Action Policy
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[101]  arXiv:2512.11194 (cross-list from cs.LG) [pdf, ps, other]
Title: Beyond Memorization: Gradient Projection Enables Selective Learning in Diffusion Models
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[102]  arXiv:2512.11145 (cross-list from cs.LG) [pdf, ps, other]
Title: Autoencoder-based Semi-Supervised Dimensionality Reduction and Clustering for Scientific Ensembles
Comments: Research Internship Project
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[103]  arXiv:2512.11047 (cross-list from cs.RO) [pdf, ps, other]
Title: WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[104]  arXiv:2512.10966 (cross-list from cs.LG) [pdf, ps, other]
Title: Multimodal Fusion of Regional Brain Experts for Interpretable Alzheimer's Disease Diagnosis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Fri, 12 Dec 2025 (showing first 24 of 118 entries)

[105]  arXiv:2512.10959 [pdf, ps, other]
Title: StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[106]  arXiv:2512.10958 [pdf, ps, other]
Title: WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World
Comments: Preprint; 80 pages, 37 figures, 29 tables; Project Page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[107]  arXiv:2512.10957 [pdf, ps, other]
Title: SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[108]  arXiv:2512.10956 [pdf, ps, other]
Title: Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[109]  arXiv:2512.10955 [pdf, ps, other]
Title: Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[110]  arXiv:2512.10954 [pdf, ps, other]
Title: Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[111]  arXiv:2512.10950 [pdf, ps, other]
Title: E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training
Comments: Project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[112]  arXiv:2512.10949 [pdf, ps, other]
Title: Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation
Comments: Code is released at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[113]  arXiv:2512.10948 [pdf, ps, other]
Title: ClusIR: Towards Cluster-Guided All-in-One Image Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[114]  arXiv:2512.10947 [pdf, ps, other]
Title: Towards Efficient and Effective Multi-Camera Encoding for End-to-End Driving
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[115]  arXiv:2512.10945 [pdf, ps, other]
Title: MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation
Comments: IEEE TPAMI, Project Page: this https URL
Journal-ref: in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 12, pp. 11400-11416, 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[116]  arXiv:2512.10943 [pdf, ps, other]
Title: AlcheMinT: Fine-grained Temporal Control for Multi-Reference Consistent Video Generation
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[117]  arXiv:2512.10942 [pdf, ps, other]
Title: VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[118]  arXiv:2512.10941 [pdf, ps, other]
Title: Mull-Tokens: Modality-Agnostic Latent Thinking
Comments: Project webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[119]  arXiv:2512.10940 [pdf, ps, other]
Title: OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[120]  arXiv:2512.10939 [pdf, ps, other]
Title: GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting
Comments: IEEE/CVF Winter Conference on Applications of Computer Vision 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[121]  arXiv:2512.10935 [pdf, ps, other]
Title: Any4D: Unified Feed-Forward Metric 4D Reconstruction
Comments: Project Website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[122]  arXiv:2512.10932 [pdf, ps, other]
Title: BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[123]  arXiv:2512.10927 [pdf, ps, other]
Title: FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Comments: Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[124]  arXiv:2512.10894 [pdf, ps, other]
Title: DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[125]  arXiv:2512.10888 [pdf, ps, other]
Title: PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction
Comments: 15 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[126]  arXiv:2512.10881 [pdf, ps, other]
Title: MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[127]  arXiv:2512.10867 [pdf, ps, other]
Title: From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[128]  arXiv:2512.10863 [pdf, ps, other]
Title: MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[ total of 747 entries: 1-100 | 29-128 | 129-228 | 229-328 | 329-428 | ... | 729-747 ]
[ showing 100 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help  (Access key information)