We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Vision and Pattern Recognition

Authors and titles for recent submissions, skipping first 386

[ total of 603 entries: 1-250 | 137-386 | 387-603 ]
[ showing 250 entries per page: fewer | more | all ]

Mon, 29 Dec 2025 (continued, showing last 54 of 96 entries)

[387]  arXiv:2512.21734 [pdf, ps, other]
Title: Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[388]  arXiv:2512.21714 [pdf, ps, other]
Title: AstraNav-World: World Model for Foresight Control and Consistency
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[389]  arXiv:2512.21710 [pdf, ps, other]
Title: RAPTOR: Real-Time High-Resolution UAV Video Prediction with Efficient Video Attention
Comments: Accepted by AAAI2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[390]  arXiv:2512.21707 [pdf, ps, other]
Title: Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction
Comments: 12 pages, 7 figures, Accepted by AAAI 2026 (oral)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[391]  arXiv:2512.21695 [pdf, ps, other]
Title: FUSE: Unifying Spectral and Semantic Cues for Robust AI-Generated Image Detection
Comments: accepted for publication in 2025 28th International Conference on Computer and Information Technology (ICCIT)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[392]  arXiv:2512.21694 [pdf, ps, other]
Title: BeHGAN: Bengali Handwritten Word Generation from Plain Text Using Generative Adversarial Networks
Comments: Accepted for publication in 2025 28th International Conference on Computer and Information Technology (ICCIT)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[393]  arXiv:2512.21693 [pdf, ps, other]
Title: Prior-AttUNet: Retinal OCT Fluid Segmentation Based on Normal Anatomical Priors and Attention Gating
Authors: Li Yang, Yuting Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[394]  arXiv:2512.21692 [pdf, ps, other]
Title: ShinyNeRF: Digitizing Anisotropic Appearance in Neural Radiance Fields
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[395]  arXiv:2512.21691 [pdf, ps, other]
Title: Analyzing the Mechanism of Attention Collapse in VGGT from a Dynamics Perspective
Comments: 8 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[396]  arXiv:2512.21684 [pdf, ps, other]
Title: SlideChain: Semantic Provenance for Lecture Understanding via Blockchain Registration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[397]  arXiv:2512.21683 [pdf, ps, other]
Title: Contrastive Graph Modeling for Cross-Domain Few-Shot Medical Image Segmentation
Comments: Accepted to IEEE Transactions on Medical Imaging (T-MI), 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[398]  arXiv:2512.21675 [pdf, ps, other]
Title: UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Comments: 27 pages, 14 figures, 17 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[399]  arXiv:2512.21673 [pdf, ps, other]
Title: Comparative Analysis of Deep Learning Models for Perception in Autonomous Vehicles
Authors: Jalal Khan
Comments: 6 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[400]  arXiv:2512.21670 [pdf, ps, other]
Title: The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds
Comments: 10 pages, 5 figures, Initial Work
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[401]  arXiv:2512.21643 [pdf, ps, other]
Title: Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[402]  arXiv:2512.21641 [pdf, ps, other]
Title: TrackTeller: Temporal Multimodal 3D Grounding for Behavior-Dependent Object References
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[403]  arXiv:2512.21637 [pdf, ps, other]
Title: Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[404]  arXiv:2512.21618 [pdf, ps, other]
Title: SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[405]  arXiv:2512.21617 [pdf, ps, other]
Title: CausalFSFG: Rethinking Few-Shot Fine-Grained Visual Categorization from Causal Perspective
Comments: 12 pages, 5 figures, accepted by IEEE TMM
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[406]  arXiv:2512.21616 [pdf, ps, other]
Title: TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant
Comments: Accepted by KDD 2026 research track. Code and data are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[407]  arXiv:2512.21599 [pdf, ps, other]
Title: GaussianEM: Model compositional and conformational heterogeneity using 3D Gaussians
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[408]  arXiv:2512.21598 [pdf, ps, other]
Title: From Shallow Humor to Metaphor: Towards Label-Free Harmful Meme Detection via LMM Agent Self-Improvement
Comments: 12 pages. Accepted by KDD 2026 research track. Codes are released at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[409]  arXiv:2512.21584 [pdf, ps, other]
Title: UltraLBM-UNet: Ultralight Bidirectional Mamba-based Model for Skin Lesion Segmentation
Authors: Linxuan Fan (1), Juntao Jiang (2), Weixuan Liu (3), Zhucun Xue (2), Jiajun Lv (2), Jiangning Zhang (2), Yong Liu (2) ((1) Data Science Institute, Vanderbilt University, Nashville, USA (2) College of Control Science and Engineering, Zhejiang University, Hangzhou, China (3) School of Computer Science and Technology, East China Normal University, Shanghai, China)
Comments: 12 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[410]  arXiv:2512.21582 [pdf, ps, other]
Title: LLM-Free Image Captioning Evaluation in Reference-Flexible Settings
Comments: Accepted for presentation at AAAI2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[411]  arXiv:2512.21576 [pdf, ps, other]
Title: Towards Long-window Anchoring in Vision-Language Model Distillation
Comments: Accepted by AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[412]  arXiv:2512.21562 [pdf, ps, other]
Title: Exploration of Reproducible Generated Image Detection
Authors: Yihang Duan
Comments: AAAI workshop RAI accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[413]  arXiv:2512.21560 [pdf, ps, other]
Title: Toward Intelligent Scene Augmentation for Context-Aware Object Placement and Sponsor-Logo Integration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[414]  arXiv:2512.21545 [pdf, ps, other]
Title: EraseLoRA: MLLM-Driven Foreground Exclusion and Background Subtype Aggregation for Dataset-Free Object Removal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[415]  arXiv:2512.21542 [pdf, ps, other]
Title: Vision Transformers are Circulant Attention Learners
Comments: AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[416]  arXiv:2512.21529 [pdf, ps, other]
Title: Hierarchy-Aware Fine-Tuning of Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[417]  arXiv:2512.21514 [pdf, ps, other]
Title: DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[418]  arXiv:2512.21513 [pdf, ps, other]
Title: MuS-Polar3D: A Benchmark Dataset for Computational Polarimetric 3D Imaging under Multi-Scattering Conditions
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[419]  arXiv:2512.21512 [pdf, ps, other]
Title: Fixed-Threshold Evaluation of a Hybrid CNN-ViT for AI-Generated Image Detection Across Photos and Art
Comments: Accepted at the 2025 28th International Conference on Computer and Information Technology (ICCIT). 6 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[420]  arXiv:2512.21508 [pdf, ps, other]
Title: Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification
Comments: Accepted at the 2025 28th International Conference on Computer and Information Technology (ICCIT). 6 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[421]  arXiv:2512.21507 [pdf, ps, other]
Title: SVBench: Evaluation of Video Generation Models on Social Reasoning
Comments: 10pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[422]  arXiv:2512.21495 [pdf, ps, other]
Title: Generative Multi-Focus Image Fusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[423]  arXiv:2512.21476 [pdf, ps, other]
Title: GPF-Net: Gated Progressive Fusion Learning for Polyp Re-Identification
Comments: Work in progress
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[424]  arXiv:2512.21472 [pdf, ps, other]
Title: IMA++: ISIC Archive Multi-Annotator Dermoscopic Skin Lesion Segmentation Dataset
Comments: 11 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[425]  arXiv:2512.21459 [pdf, ps, other]
Title: CCAD: Compressed Global Feature Conditioned Anomaly Detection
Comments: 18 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[426]  arXiv:2512.21452 [pdf, ps, other]
Title: Intelligent recognition of GPR road hidden defect images based on feature fusion and attention mechanism
Comments: Accepted for publication in *IEEE Transactions on Geoscience and Remote Sensing*
Journal-ref: IEEE Transactions on Geoscience and Remote Sensing, 2025, 63, 5213217
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[427]  arXiv:2512.21434 [pdf, ps, other]
Title: Scalable Deep Subspace Clustering Network
Comments: Published at the 2025 IEEE 12th International Conference on Data Science and Advanced Analytics (DSAA)
Journal-ref: Proceedings of the IEEE 12th International Conference on Data Science and Advanced Analytics (DSAA), 2025, pp. 1-10
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[428]  arXiv:2512.21414 [pdf, ps, other]
Title: A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[429]  arXiv:2512.21402 [pdf, ps, other]
Title: Understanding Virality: A Rubric based Vision-Language Model Framework for Short-Form Edutainment Evaluation
Comments: Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[430]  arXiv:2512.22016 (cross-list from cs.HC) [pdf, ps, other]
Title: SketchPlay: Intuitive Creation of Physically Realistic VR Content with Gesture-Driven Sketching
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
[431]  arXiv:2512.21988 (cross-list from eess.IV) [pdf, ps, other]
Title: The Color-Clinical Decoupling: Why Perceptual Calibration Fails Clinical Biomarkers in Smartphone Dermatology
Authors: Sungwoo Kang
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
[432]  arXiv:2512.21975 (cross-list from eess.IV) [pdf, ps, other]
Title: RT-Focuser: A Real-Time Lightweight Model for Edge-side Image Deblurring
Comments: 2 pages, 2 figures, this paper already accepted by IEEE ICTA 2025
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[433]  arXiv:2512.21789 (cross-list from cs.CL) [pdf, ps, other]
Title: Five Years of SciCap: What We Learned and Future Directions for Scientific Figure Captioning
Comments: Accepted to the 5th Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE 2026)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[434]  arXiv:2512.21747 (cross-list from cs.HC) [pdf, ps, other]
Title: Modified TSception for Analyzing Driver Drowsiness and Mental Workload from EEG
Comments: 8 Pages, 3 Figures, 1 Table
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
[435]  arXiv:2512.21743 (cross-list from cs.LG) [pdf, ps, other]
Title: Dynamic Feedback Engines: Layer-Wise Control for Self-Regulating Continual Learning
Comments: 14 pages
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[436]  arXiv:2512.21602 (cross-list from cs.LG) [pdf, ps, other]
Title: Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emergency and Critical Care
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[437]  arXiv:2512.21593 (cross-list from stat.ML) [pdf, ps, other]
Title: Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models
Authors: Takuro Kutsuna
Comments: 40 pages
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[438]  arXiv:2512.21516 (cross-list from cs.LG) [pdf, ps, other]
Title: Global-Graph Guided and Local-Graph Weighted Contrastive Learning for Unified Clustering on Incomplete and Noise Multi-View Data
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[439]  arXiv:2512.21510 (cross-list from cs.LG) [pdf, ps, other]
Title: Missing Pattern Tree based Decision Grouping and Ensemble for Deep Incomplete Multi-View Clustering
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[440]  arXiv:2512.21372 (cross-list from eess.IV) [pdf, ps, other]
Title: A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Thu, 25 Dec 2025

[441]  arXiv:2512.21338 [pdf, ps, other]
Title: HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
Comments: Project Page: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[442]  arXiv:2512.21337 [pdf, ps, other]
Title: Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[443]  arXiv:2512.21334 [pdf, ps, other]
Title: Streaming Video Instruction Tuning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[444]  arXiv:2512.21333 [pdf, ps, other]
Title: Fast SAM2 with Text-Driven Token Pruning
Comments: 28 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[445]  arXiv:2512.21331 [pdf, ps, other]
Title: TICON: A Slide-Level Tile Contextualizer for Histopathology Representation Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[446]  arXiv:2512.21302 [pdf, ps, other]
Title: AndroidLens: Long-latency Evaluation with Nested Sub-targets for Android GUI Agents
Comments: 23 pages, 13 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[447]  arXiv:2512.21287 [pdf, ps, other]
Title: Post-Processing Mask-Based Table Segmentation for Structural Coordinate Extraction
Authors: Suren Bandara
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[448]  arXiv:2512.21284 [pdf, ps, other]
Title: Surgical Scene Segmentation using a Spike-Driven Video Transformer with Real-Time Potential
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[449]  arXiv:2512.21276 [pdf, ps, other]
Title: GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[450]  arXiv:2512.21268 [pdf, ps, other]
Title: ACD: Direct Conditional Control for Video Diffusion Models via Attention Supervision
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[451]  arXiv:2512.21264 [pdf, ps, other]
Title: AnyAD: Unified Any-Modality Anomaly Detection in Incomplete Multi-Sequence MRI
Comments: 15 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[452]  arXiv:2512.21252 [pdf, ps, other]
Title: DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[453]  arXiv:2512.21237 [pdf, ps, other]
Title: SegMo: Segment-aligned Text to 3D Human Motion Generation
Comments: The IEEE/CVF Winter Conference on Applications of Computer Vision 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[454]  arXiv:2512.21221 [pdf, ps, other]
Title: Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval
Comments: System description paper for EVENTA Grand Challenge Track 2 at ACM Multimedia 2025 (MM '25). Ranked 4th place. 6 pages, 1 figure, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[455]  arXiv:2512.21218 [pdf, ps, other]
Title: Latent Implicit Visual Reasoning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[456]  arXiv:2512.21209 [pdf, ps, other]
Title: Human Motion Estimation with Everyday Wearables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[457]  arXiv:2512.21194 [pdf, ps, other]
Title: VisRes Bench: On Evaluating the Visual Reasoning Capabilities of VLMs
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[458]  arXiv:2512.21185 [pdf, ps, other]
Title: UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
Comments: 14 pages, 10 figures, Technical Report,
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[459]  arXiv:2512.21183 [pdf, ps, other]
Title: Towards Arbitrary Motion Completing via Hierarchical Continuous Representation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[460]  arXiv:2512.21174 [pdf, ps, other]
Title: A Turn Toward Better Alignment: Few-Shot Generative Adaptation with Equivariant Feature Rotation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[461]  arXiv:2512.21150 [pdf, ps, other]
Title: ORCA: Object Recognition and Comprehension for Archiving Marine Species
Comments: Accepted by The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[462]  arXiv:2512.21135 [pdf, ps, other]
Title: TGC-Net: A Structure-Aware and Semantically-Aligned Framework for Text-Guided Medical Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[463]  arXiv:2512.21126 [pdf, ps, other]
Title: MarineEval: Assessing the Marine Intelligence of Vision-Language Models
Comments: Accepted by The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)
[464]  arXiv:2512.21104 [pdf, ps, other]
Title: FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting
Comments: Accepted by AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[465]  arXiv:2512.21095 [pdf, ps, other]
Title: UniRec-0.1B: Unified Text and Formula Recognition with 0.1B Parameters
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[466]  arXiv:2512.21094 [pdf, ps, other]
Title: T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[467]  arXiv:2512.21083 [pdf, ps, other]
Title: Hierarchical Modeling Approach to Fast and Accurate Table Recognition
Authors: Takaya Kawakatsu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[468]  arXiv:2512.21078 [pdf, ps, other]
Title: UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[469]  arXiv:2512.21064 [pdf, ps, other]
Title: Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition
Comments: Accepted by Machine Intelligence Research (Journal Impact Factor 8.7, 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[470]  arXiv:2512.21058 [pdf, ps, other]
Title: Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control
Comments: 32 pages, 17 figures, and 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[471]  arXiv:2512.21054 [pdf, ps, other]
Title: DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors
Comments: Accepted in WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[472]  arXiv:2512.21053 [pdf, ps, other]
Title: Optical Flow-Guided 6DoF Object Pose Tracking with an Event Camera
Comments: 9 pages, 5 figures. In Proceedings of the 32nd ACM International Conference on Multimedia (MM '24)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[473]  arXiv:2512.21050 [pdf, ps, other]
Title: Matrix Completion Via Reweighted Logarithmic Norm Minimization
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[474]  arXiv:2512.21040 [pdf, ps, other]
Title: A Large-Depth-Range Layer-Based Hologram Dataset for Machine Learning-Based 3D Computer-Generated Holography
Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
[475]  arXiv:2512.21038 [pdf, ps, other]
Title: Next-Scale Prediction: A Self-Supervised Approach for Real-World Image Denoising
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[476]  arXiv:2512.21032 [pdf, ps, other]
Title: Multi-Attribute guided Thermal Face Image Translation based on Latent Diffusion Model
Comments: Accepted by 2025 IEEE International Joint Conference on Biometrics (IJCB 2025)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[477]  arXiv:2512.21019 [pdf, ps, other]
Title: Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[478]  arXiv:2512.21015 [pdf, ps, other]
Title: FluencyVE: Marrying Temporal-Aware Mamba with Bypass Attention for Video Editing
Comments: Accepted by IEEE Transactions on Multimedia (TMM)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[479]  arXiv:2512.21011 [pdf, ps, other]
Title: Granular-ball Guided Masking: Structure-aware Data Augmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[480]  arXiv:2512.21004 [pdf, ps, other]
Title: Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[481]  arXiv:2512.21003 [pdf, ps, other]
Title: MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds
Comments: 21 pages, 17 figures, 5 tables, project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[482]  arXiv:2512.20988 [pdf, ps, other]
Title: PUFM++: Point Cloud Upsampling via Enhanced Flow Matching
Comments: 21 pages, 15 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[483]  arXiv:2512.20980 [pdf, ps, other]
Title: X-ray Insights Unleashed: Pioneering the Enhancement of Multi-Label Long-Tail Data
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[484]  arXiv:2512.20976 [pdf, ps, other]
Title: XGrid-Mapping: Explicit Implicit Hybrid Grid Submaps for Efficient Incremental Neural LiDAR Mapping
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[485]  arXiv:2512.20975 [pdf, ps, other]
Title: SPOT!: Map-Guided LLM Agent for Unsupervised Multi-CCTV Dynamic Object Tracking
Comments: 33 pages, 27figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[486]  arXiv:2512.20937 [pdf, ps, other]
Title: Beyond Artifacts: Real-Centric Envelope Modeling for Reliable AI-Generated Image Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[487]  arXiv:2512.20936 [pdf, ps, other]
Title: Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[488]  arXiv:2512.20934 [pdf, ps, other]
Title: Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
Comments: Project Website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multiagent Systems (cs.MA)
[489]  arXiv:2512.20927 [pdf, ps, other]
Title: Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting
Comments: Will be updated
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[490]  arXiv:2512.20921 [pdf, ps, other]
Title: Self-supervised Multiplex Consensus Mamba for General Image Fusion
Comments: Accepted by AAAI 2026, 9 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[491]  arXiv:2512.20907 [pdf, ps, other]
Title: PanoGrounder: Bridging 2D and 3D with Panoramic Scene Representations for VLM-based 3D Visual Grounding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[492]  arXiv:2512.20901 [pdf, ps, other]
Title: Benchmarking and Enhancing VLM for Compressed Image Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[493]  arXiv:2512.20898 [pdf, ps, other]
Title: DGSAN: Dual-Graph Spatiotemporal Attention Network for Pulmonary Nodule Malignancy Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[494]  arXiv:2512.20892 [pdf, ps, other]
Title: Beyond Weight Adaptation: Feature-Space Domain Injection for Cross-Modal Ship Re-Identification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[495]  arXiv:2512.20871 [pdf, ps, other]
Title: NeRV360: Neural Representation for 360-Degree Videos with a Viewport Decoder
Comments: 2026 IIEEJ International Conference on Image Electronics and Visual Computing (IEVC)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[496]  arXiv:2512.20866 [pdf, ps, other]
Title: Lightweight framework for underground pipeline recognition and spatial localization based on multi-view 2D GPR images
Journal-ref: IEEE Transactions on Geoscience and Remote Sensing, 2025, 63, 5110115
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[497]  arXiv:2512.20858 [pdf, ps, other]
Title: ALIVE: An Avatar-Lecture Interactive Video Engine with Content-Aware Retrieval for Real-Time Interaction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[498]  arXiv:2512.20839 [pdf, ps, other]
Title: Input-Adaptive Visual Preprocessing for Efficient Fast Vision-Language Model Inference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[499]  arXiv:2512.20833 [pdf, ps, other]
Title: CHAMMI-75: pre-training multi-channel models with heterogeneous microscopy images
Authors: Vidit Agrawal (1,2), John Peters (1,2), Tyler N. Thompson (1,2), Mohammad Vali Sanian (3,4), Chau Pham (5), Nikita Moshkov (6), Arshad Kazi (1,2), Aditya Pillai (1,2), Jack Freeman (1), Byunguk Kang (7,8), Samouil L. Farhi (8), Ernest Fraenkel (7), Ron Stewart (1), Lassi Paavolainen (3,4), Bryan A. Plummer (5), Juan C. Caicedo (1,2) ((1) Morgridge Institute for Research, Madison, WI, USA, (2) University of Wisconsin-Madison, Madison, WI, USA, (3) Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland, (4) University of Helsinki, Helsinki, Finland, (5) Boston University, Boston, MA, USA, (6) Institute of Computational Biology, Helmholtz Munich, Neuherberg, Germany, (7) Massachusetts Institute of Technology, Cambridge, MA, USA, (8) Broad Institute of MIT and Harvard, Cambridge, MA, USA)
Comments: 47 Pages, 23 Figures, 26 Tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[500]  arXiv:2512.20815 [pdf, ps, other]
Title: Learning to Sense for Driving: Joint Optics-Sensor-Model Co-Design for Semantic Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[501]  arXiv:2512.20783 [pdf, ps, other]
Title: NULLBUS: Multimodal Mixed-Supervision for Breast Ultrasound Segmentation via Nullable Global-Local Prompts
Comments: 5 pages, 2 figures, and 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[502]  arXiv:2512.20770 [pdf, ps, other]
Title: OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[503]  arXiv:2512.20746 [pdf, ps, other]
Title: TrashDet: Iterative Neural Architecture Search for Efficient Waste Detection
Authors: Tony Tran, Bin Hu
Comments: 10 pages. The paper has been accepted by the WACV 2026 workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[504]  arXiv:2512.20735 [pdf, ps, other]
Title: VL4Gaze: Unleashing Vision-Language Models for Gaze Following
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[505]  arXiv:2512.21315 (cross-list from cs.LG) [pdf, ps, other]
Title: Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[506]  arXiv:2512.21241 (cross-list from cs.LG) [pdf, ps, other]
Title: Improving the Convergence Rate of Ray Search Optimization for Query-Efficient Hard-Label Attacks
Comments: Published at AAAI 2026 (Oral). This version corresponds to the conference proceedings; v2 will include the appendix
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[507]  arXiv:2512.21220 (cross-list from cs.AI) [pdf, ps, other]
Title: RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
Comments: 11 pages, 6 figures
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[508]  arXiv:2512.21201 (cross-list from cs.RO) [pdf, ps, other]
Title: Schrödinger's Navigator: Imagining an Ensemble of Futures for Zero-Shot Object Navigation
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[509]  arXiv:2512.21180 (cross-list from physics.med-ph) [pdf, ps, other]
Title: Equivariant Multiscale Learned Invertible Reconstruction for Cone Beam CT: From Simulated to Real Data
Comments: 29 pages. arXiv admin note: substantial text overlap with arXiv:2401.11256
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)
[510]  arXiv:2512.21118 (cross-list from cs.LG) [pdf, ps, other]
Title: STLDM: Spatio-Temporal Latent Diffusion Model for Precipitation Nowcasting
Comments: Accepted by TMLR. Camera-ready submission
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[511]  arXiv:2512.21099 (cross-list from cs.GR) [pdf, ps, other]
Title: TexAvatars : Hybrid Texel-3D Representations for Stable Rigging of Photorealistic Gaussian Head Avatars
Comments: 3DV 2026, Project page with videos: this https URL
Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[512]  arXiv:2512.21065 (cross-list from cs.RO) [pdf, ps, other]
Title: Language-Guided Grasp Detection with Coarse-to-Fine Learning for Robotic Manipulation
Comments: Submitted to IEEE Journal
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[513]  arXiv:2512.20963 (cross-list from cs.LG) [pdf, ps, other]
Title: Generalization of Diffusion Models Arises with a Balanced Representation Space
Comments: 40 pages, 19 figures. The first two authors contributed equally
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[514]  arXiv:2512.20674 (cross-list from cs.LG) [pdf, ps, other]
Title: HyDRA: Hierarchical and Dynamic Rank Adaptation for Mobile Vision Language Model
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[515]  arXiv:2512.20655 (cross-list from cs.LG) [pdf, ps, other]
Title: MaskOpt: A Large-Scale Mask Optimization Dataset to Advance AI in Integrated Circuit Manufacturing
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[516]  arXiv:2512.20642 (cross-list from physics.flu-dyn) [pdf, ps, other]
Title: Flow Gym
Comments: Code: this https URL
Subjects: Fluid Dynamics (physics.flu-dyn); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE); Computational Physics (physics.comp-ph)
[517]  arXiv:2512.20626 (cross-list from cs.AI) [pdf, ps, other]
Title: MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Wed, 24 Dec 2025

[518]  arXiv:2512.20619 [pdf, ps, other]
Title: SemanticGen: Video Generation in Semantic Space
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[519]  arXiv:2512.20617 [pdf, ps, other]
Title: SpatialTree: How Spatial Abilities Branch Out in MLLMs
Comments: webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[520]  arXiv:2512.20615 [pdf, ps, other]
Title: Active Intelligence in Video Avatars via Closed-loop World Modeling
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[521]  arXiv:2512.20610 [pdf, ps, other]
Title: FedPOD: the deployable units of training for federated learning
Comments: 12 pages, 12 figures, MICCAI
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[522]  arXiv:2512.20606 [pdf, ps, other]
Title: Repurposing Video Diffusion Transformers for Robust Point Tracking
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[523]  arXiv:2512.20563 [pdf, ps, other]
Title: LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[524]  arXiv:2512.20561 [pdf, ps, other]
Title: FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Comments: Under submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[525]  arXiv:2512.20557 [pdf, ps, other]
Title: Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[526]  arXiv:2512.20556 [pdf, ps, other]
Title: Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios
Comments: Accepted to WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[527]  arXiv:2512.20538 [pdf, ps, other]
Title: AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment
Comments: 18 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[528]  arXiv:2512.20531 [pdf, ps, other]
Title: SirenPose: Dynamic Scene Reconstruction via Geometric Supervision
Comments: Under submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[529]  arXiv:2512.20501 [pdf, ps, other]
Title: Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition
Authors: Gorjan Radevski
Comments: Ph.D. manuscript; Supervisors/Mentors: Marie-Francine Moens and Tinne Tuytelaars
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[530]  arXiv:2512.20487 [pdf, ps, other]
Title: Multi-temporal Adaptive Red-Green-Blue and Long-Wave Infrared Fusion for You Only Look Once-Based Landmine Detection from Unmanned Aerial Systems
Comments: 21 pages with 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[531]  arXiv:2512.20479 [pdf, ps, other]
Title: UTDesign: A Unified Framework for Stylized Text Editing and Generation in Graphic Design Images
Comments: 22 pages, 25 figures, SIGGRAPH Asia 2025, Conference Paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[532]  arXiv:2512.20451 [pdf, ps, other]
Title: Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[533]  arXiv:2512.20432 [pdf, ps, other]
Title: High Dimensional Data Decomposition for Anomaly Detection of Textured Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[534]  arXiv:2512.20431 [pdf, ps, other]
Title: Skin Lesion Classification Using a Soft Voting Ensemble of Convolutional Neural Networks
Comments: Authors' version of the paper published in proceedings of ECCE, DOI: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[535]  arXiv:2512.20417 [pdf, ps, other]
Title: Chain-of-Anomaly Thoughts with Large Vision-Language Models
Comments: 2 pages, 3 figures, 1 table. Accepted for RECPAD 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
[536]  arXiv:2512.20409 [pdf, ps, other]
Title: DETACH : Decomposed Spatio-Temporal Alignment for Exocentric Video and Ambient Sensors with Staged Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[537]  arXiv:2512.20377 [pdf, ps, other]
Title: SmartSplat: Feature-Smart Gaussians for Scalable Compression of Ultra-High-Resolution Images
Comments: Accepted by AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[538]  arXiv:2512.20376 [pdf, ps, other]
Title: Linking Faces and Voices Across Languages: Insights from the FAME 2026 Challenge
Comments: Accepted at ICASSP 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[539]  arXiv:2512.20362 [pdf, ps, other]
Title: CRAFT: Continuous Reasoning and Agentic Feedback Tuning for Multimodal Text-to-Image Generation
Comments: 37 pages, 42 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[540]  arXiv:2512.20340 [pdf, ps, other]
Title: The devil is in the details: Enhancing Video Virtual Try-On via Keyframe-Driven Details Injection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[541]  arXiv:2512.20296 [pdf, ps, other]
Title: TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[542]  arXiv:2512.20288 [pdf, ps, other]
Title: UbiQVision: Quantifying Uncertainty in XAI for Image Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[543]  arXiv:2512.20260 [pdf, ps, other]
Title: ${D}^{3}${ETOR}: ${D}$ebate-Enhanced Pseudo Labeling and Frequency-Aware Progressive ${D}$ebiasing for Weakly-Supervised Camouflaged Object ${D}$etection with Scribble Annotations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[544]  arXiv:2512.20257 [pdf, ps, other]
Title: LADLE-MM: Limited Annotation based Detector with Learned Ensembles for Multimodal Misinformation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[545]  arXiv:2512.20255 [pdf, ps, other]
Title: BiCoR-Seg: Bidirectional Co-Refinement Framework for High-Resolution Remote Sensing Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[546]  arXiv:2512.20251 [pdf, ps, other]
Title: Degradation-Aware Metric Prompting for Hyperspectral Image Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[547]  arXiv:2512.20236 [pdf, ps, other]
Title: IndicDLP: A Foundational Dataset for Multi-Lingual and Multi-Domain Document Layout Parsing
Comments: Accepted in ICDAR 2025 (Oral Presentation) - Best Student Paper Runner-Up Award
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[548]  arXiv:2512.20217 [pdf, ps, other]
Title: LiteFusion: Taming 3D Object Detectors from Vision-Based to Multi-Modal with Minimal Adaptation
Comments: 13 pages, 9 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[549]  arXiv:2512.20213 [pdf, ps, other]
Title: JDPNet: A Network Based on Joint Degradation Processing for Underwater Image Enhancement
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[550]  arXiv:2512.20194 [pdf, ps, other]
Title: Generative Latent Coding for Ultra-Low Bitrate Image Compression
Comments: Accepted at CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[551]  arXiv:2512.20174 [pdf, ps, other]
Title: Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
Comments: CVPR 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Information Retrieval (cs.IR)
[552]  arXiv:2512.20157 [pdf, ps, other]
Title: AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model
Comments: 17 pages, 8 figures, 11 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[553]  arXiv:2512.20153 [pdf, ps, other]
Title: CoDi -- an exemplar-conditioned diffusion model for low-shot counting
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[554]  arXiv:2512.20148 [pdf, ps, other]
Title: Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS)
Comments: 33 pages, excluding appendices. 17 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[555]  arXiv:2512.20128 [pdf, ps, other]
Title: milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion
Comments: Accepted at WACV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[556]  arXiv:2512.20120 [pdf, ps, other]
Title: HEART-VIT: Hessian-Guided Efficient Dynamic Attention and Token Pruning in Vision Transformer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[557]  arXiv:2512.20117 [pdf, ps, other]
Title: DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[558]  arXiv:2512.20113 [src]
Title: Multi Modal Attention Networks with Uncertainty Quantification for Automated Concrete Bridge Deck Delamination Detection
Comments: the authors are going to substantially edit the paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[559]  arXiv:2512.20107 [pdf, ps, other]
Title: UMAMI: Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis
Comments: Accepted to NeurIPS 2025. The first two authors contributed equally
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[560]  arXiv:2512.20105 [pdf, ps, other]
Title: LiDARDraft: Generating LiDAR Point Cloud from Versatile Inputs
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[561]  arXiv:2512.20104 [pdf, ps, other]
Title: Effect of Activation Function and Model Optimizer on the Performance of Human Activity Recognition System Using Various Deep Learning Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[562]  arXiv:2512.20088 [pdf, ps, other]
Title: Item Region-based Style Classification Network (IRSN): A Fashion Style Classifier Based on Domain Knowledge of Fashion Experts
Comments: This is a pre-print of an article published in Applied Intelligence. The final authenticated version is available online at: this https URL
Journal-ref: Applied Intelligence, Vol. 54, pp. 6197-6209 (2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[563]  arXiv:2512.20070 [pdf, ps, other]
Title: Progressive Learned Image Compression for Machine Perception
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[564]  arXiv:2512.20042 [pdf, ps, other]
Title: Beyond Vision: Contextually Enriched Image Captioning with Multi-Modal Retrieva
Comments: 7 pages, 5 figures. System description for the EVENTA Grand Challenge (Track 1) at ACM MM'25
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[565]  arXiv:2512.20033 [pdf, ps, other]
Title: FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[566]  arXiv:2512.20032 [pdf, ps, other]
Title: VALLR-Pin: Uncertainty-Factorized Visual Speech Recognition for Mandarin with Pinyin Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[567]  arXiv:2512.20029 [pdf, ps, other]
Title: $\text{H}^2$em: Learning Hierarchical Hyperbolic Embeddings for Compositional Zero-Shot Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[568]  arXiv:2512.20026 [pdf, ps, other]
Title: MAPI-GNN: Multi-Activation Plane Interaction Graph Neural Network for Multimodal Medical Diagnosis
Comments: Accepted by Proceedings of the AAAI Conference on Artificial Intelligence 40 (AAAI-26)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[569]  arXiv:2512.20025 [pdf, ps, other]
Title: A Contextual Analysis of Driver-Facing and Dual-View Video Inputs for Distraction Detection in Naturalistic Driving Environments
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[570]  arXiv:2512.20013 [pdf, ps, other]
Title: SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[571]  arXiv:2512.20011 [pdf, ps, other]
Title: PaveSync: A Unified and Comprehensive Dataset for Pavement Distress Analysis and Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[572]  arXiv:2512.20000 [pdf, ps, other]
Title: Few-Shot-Based Modular Image-to-Video Adapter for Diffusion Models
Comments: GitHub page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[573]  arXiv:2512.19990 [pdf, ps, other]
Title: A Dual-Branch Local-Global Framework for Cross-Resolution Land Cover Mapping
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[574]  arXiv:2512.19989 [pdf, ps, other]
Title: A Novel CNN Gradient Boosting Ensemble for Guava Disease Detection
Comments: Accepted at IEEE ICCIT 2025. This is the author accepted manuscript
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[575]  arXiv:2512.19982 [pdf, ps, other]
Title: WSD-MIL: Window Scale Decay Multiple Instance Learning for Whole Slide Image Classification
Authors: Le Feng, Li Xiao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[576]  arXiv:2512.19954 [pdf, ps, other]
Title: HistoWAS: A Pathomics Framework for Large-Scale Feature-Wide Association Studies of Tissue Topology and Patient Outcomes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[577]  arXiv:2512.19949 [pdf, ps, other]
Title: How Much 3D Do Video Foundation Models Encode?
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[578]  arXiv:2512.19943 [pdf, ps, other]
Title: SE360: Semantic Edit in 360$^\circ$ Panoramas via Hierarchical Data Construction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[579]  arXiv:2512.19941 [pdf, ps, other]
Title: Block-Recurrent Dynamics in Vision Transformers
Comments: 25 pages, 15 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[580]  arXiv:2512.19934 [pdf, ps, other]
Title: Vehicle-centric Perception via Multimodal Structured Pre-training
Comments: Journal extension of VehicleMAE (AAAI 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[581]  arXiv:2512.19928 [pdf, ps, other]
Title: Unified Brain Surface and Volume Registration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[582]  arXiv:2512.19918 [pdf, ps, other]
Title: Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs
Comments: Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[583]  arXiv:2512.19871 [pdf, ps, other]
Title: HyGE-Occ: Hybrid View-Transformation with 3D Gaussian and Edge Priors for 3D Panoptic Occupancy Prediction
Comments: 11 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[584]  arXiv:2512.19850 [pdf, ps, other]
Title: RANSAC Scoring Functions: Analysis and Reality Check
Authors: A. Shekhovtsov
Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
[585]  arXiv:2512.19823 [pdf, ps, other]
Title: Learning to Refocus with Video Diffusion Models
Comments: Code and data are available at this https URL . SIGGRAPH Asia 2025, Dec. 2025
Journal-ref: Proceedings of the SIGGRAPH Asia 2025, pp. 1-11, 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[586]  arXiv:2512.19817 [pdf, ps, other]
Title: Generating the Past, Present and Future from a Motion-Blurred Image
Comments: Code and data are available at this https URL
Journal-ref: ACM Trans. Graph. (SIGGRAPH Asia 2025), vol. 44, no. 6, pp. 1-15, Dec. 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[587]  arXiv:2512.19711 [pdf, ps, other]
Title: PHANTOM: PHysical ANamorphic Threats Obstructing Connected Vehicle Mobility
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[588]  arXiv:2512.20618 (cross-list from cs.AI) [pdf, ps, other]
Title: LongVideoAgent: Multi-Agent Reasoning with Long Videos
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[589]  arXiv:2512.20595 (cross-list from cs.CL) [pdf, ps, other]
Title: Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs
Comments: 27 pages, 5 figures, 9 tables. Cube available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[590]  arXiv:2512.20464 (cross-list from physics.optics) [pdf, ps, other]
Title: Snapshot 3D image projection using a diffractive decoder
Comments: 22 Pages, 8 Figures
Subjects: Optics (physics.optics); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Applied Physics (physics.app-ph)
[591]  arXiv:2512.20436 (cross-list from eess.IV) [pdf, ps, other]
Title: Dual-Encoder Transformer-Based Multimodal Learning for Ischemic Stroke Lesion Segmentation Using Diffusion MRI
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[592]  arXiv:2512.20420 (cross-list from cs.LG) [pdf, ps, other]
Title: Simplifying Multi-Task Architectures Through Task-Specific Normalization
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[593]  arXiv:2512.20387 (cross-list from cs.AI) [pdf, ps, other]
Title: Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems
Comments: 10 pages, 9 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[594]  arXiv:2512.20374 (cross-list from eess.IV) [pdf, ps, other]
Title: CLIP Based Region-Aware Feature Fusion for Automated BBPS Scoring in Colonoscopy Images
Comments: 12 pages, 9 figures, BMVC 2025 submission
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[595]  arXiv:2512.20350 (cross-list from cs.LG) [pdf, ps, other]
Title: Field-Space Attention for Structure-Preserving Earth System Transformers
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Mathematical Physics (math-ph)
[596]  arXiv:2512.20299 (cross-list from cs.RO) [pdf, ps, other]
Title: KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[597]  arXiv:2512.20249 (cross-list from cs.LG) [pdf, ps, other]
Title: Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion
Authors: Xuanyu Hu
Comments: 15 pages, 2 figures, 4 tables. Submitted to ICPR 2026
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[598]  arXiv:2512.20233 (cross-list from cs.LG) [pdf, ps, other]
Title: How I Met Your Bias: Investigating Bias Amplification in Diffusion Models
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[599]  arXiv:2512.20145 (cross-list from cs.CL) [pdf, ps, other]
Title: Retrieval-augmented Prompt Learning for Pre-trained Foundation Models
Comments: IEEE/ACM Transactions on Audio, Speech and Language Processing
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[600]  arXiv:2512.20129 (cross-list from cs.HC) [pdf, ps, other]
Title: Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and Outputs
Comments: CHI 2025, Project page: this https URL
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
[601]  arXiv:2512.20056 (cross-list from cs.AI) [pdf, ps, other]
Title: Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[602]  arXiv:2512.19731 (cross-list from cs.LG) [pdf, ps, other]
Title: Exploring Deep-to-Shallow Transformable Neural Networks for Intelligent Embedded Systems
Comments: Accepted by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[603]  arXiv:2512.18099 (cross-list from eess.AS) [pdf, ps, other]
Title: SAM Audio: Segment Anything in Audio
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV)
[ total of 603 entries: 1-250 | 137-386 | 387-603 ]
[ showing 250 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2601, contact, help  (Access key information)