We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Vision and Pattern Recognition

Authors and titles for recent submissions, skipping first 536

[ total of 759 entries: 1-100 | ... | 237-336 | 337-436 | 437-536 | 537-636 | 637-736 | 737-759 ]
[ showing 100 entries per page: fewer | more | all ]

Thu, 4 Dec 2025 (continued, showing last 82 of 130 entries)

[537]  arXiv:2512.03667 [pdf, ps, other]
Title: Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[538]  arXiv:2512.03666 [pdf, ps, other]
Title: ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos
Comments: 26 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[539]  arXiv:2512.03663 [pdf, ps, other]
Title: Multi-Scale Visual Prompting for Lightweight Small-Image Classification
Authors: Salim Khazem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[540]  arXiv:2512.03643 [pdf, ps, other]
Title: Optical Context Compression Is Just (Bad) Autoencoding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[541]  arXiv:2512.03640 [pdf, ps, other]
Title: MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention Mechanisms
Journal-ref: MultiMedia Modeling. MMM 2025. Lecture Notes in Computer Science, vol 15521. Springer, Singapore
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[542]  arXiv:2512.03625 [pdf, ps, other]
Title: FeatureLens: A Highly Generalizable and Interpretable Framework for Detecting Adversarial Examples Based on Image Features
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[543]  arXiv:2512.03621 [pdf, ps, other]
Title: ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[544]  arXiv:2512.03619 [pdf, ps, other]
Title: LAMP: Language-Assisted Motion Planning for Controllable Video Generation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[545]  arXiv:2512.03601 [pdf, ps, other]
Title: Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding
Comments: Accepted to NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[546]  arXiv:2512.03598 [pdf, ps, other]
Title: Memory-Guided Point Cloud Completion for Dental Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[547]  arXiv:2512.03597 [pdf, ps, other]
Title: HBFormer: A Hybrid-Bridge Transformer for Microtumor and Miniature Organ Segmentation
Comments: 6 pages, 4 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[548]  arXiv:2512.03593 [pdf, ps, other]
Title: CloseUpAvatar: High-Fidelity Animatable Full-Body Avatars with Mixture of Multi-Scale Textures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[549]  arXiv:2512.03592 [pdf, ps, other]
Title: Harnessing Hypergraphs in Geometric Deep Learning for 3D RNA Inverse Folding
Authors: Guang Yang, Lei Fan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[550]  arXiv:2512.03590 [pdf, ps, other]
Title: Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video Interpolation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[551]  arXiv:2512.03580 [pdf, ps, other]
Title: Dynamic Optical Test for Bot Identification (DOT-BI): A simple check to identify bots in surveys and online processes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
[552]  arXiv:2512.03577 [pdf, ps, other]
Title: Cross-Stain Contrastive Learning for Paired Immunohistochemistry and Histopathology Slide Representation Learning
Comments: 6 pages, 2 figures. Camera-ready version accepted for IEEE BIBM 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[553]  arXiv:2512.03575 [pdf, ps, other]
Title: UniComp: Rethinking Video Compression Through Informational Uniqueness
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[554]  arXiv:2512.03574 [pdf, ps, other]
Title: Global-Local Aware Scene Text Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[555]  arXiv:2512.03566 [pdf, ps, other]
Title: GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models
Comments: Accepted by ACM MM Asia2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[556]  arXiv:2512.03558 [pdf, ps, other]
Title: CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding
Comments: Accepted at SIGSPATIAL 2025 (Best paper candidates), 15 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[557]  arXiv:2512.03553 [pdf, ps, other]
Title: Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching
Comments: Accepted at KDD 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[558]  arXiv:2512.03542 [pdf, ps, other]
Title: V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[559]  arXiv:2512.03540 [pdf, ps, other]
Title: CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation
Comments: Accepted by ACM Multimedia 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[560]  arXiv:2512.03534 [pdf, ps, other]
Title: Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
Comments: Visualizations are available at the website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[561]  arXiv:2512.03532 [pdf, ps, other]
Title: OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[562]  arXiv:2512.03520 [pdf, ps, other]
Title: FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation
Comments: 15 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[563]  arXiv:2512.03510 [pdf, ps, other]
Title: CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[564]  arXiv:2512.03509 [pdf, ps, other]
Title: AfroBeats Dance Movement Analysis Using Computer Vision: A Proof-of-Concept Framework Combining YOLO and Segment Anything Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[565]  arXiv:2512.03508 [pdf, ps, other]
Title: Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation
Comments: ICCV 2025 (poster)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[566]  arXiv:2512.03500 [pdf, ps, other]
Title: EEA: Exploration-Exploitation Agent for Long Video Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[567]  arXiv:2512.03499 [pdf, ps, other]
Title: NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[568]  arXiv:2512.03479 [pdf, ps, other]
Title: Towards Object-centric Understanding for Instructional Videos
Authors: Wenliang Guo, Yu Kong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[569]  arXiv:2512.03477 [pdf, ps, other]
Title: Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma Diagnosis
Comments: 10 pages, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[570]  arXiv:2512.03474 [pdf, ps, other]
Title: Procedural Mistake Detection via Action Effect Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[571]  arXiv:2512.03470 [pdf, ps, other]
Title: Difference Decomposition Networks for Infrared Small Target Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[572]  arXiv:2512.03463 [pdf, ps, other]
Title: Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[573]  arXiv:2512.03454 [pdf, ps, other]
Title: Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[574]  arXiv:2512.03453 [pdf, ps, other]
Title: GeoVideo: Introducing Geometric Regularization into Video Generation Model
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[575]  arXiv:2512.03451 [pdf, ps, other]
Title: GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[576]  arXiv:2512.03450 [pdf, ps, other]
Title: KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[577]  arXiv:2512.03449 [src]
Title: LM-CartSeg: Automated Segmentation of Lateral and Medial Cartilage and Subchondral Bone for Radiomics Analysis
Authors: Tongxu Zhang
Comments: The manuscript represents only a preliminary and substantially incompleted exploration. The author has decided not to stand by these results, and a thoroughly revised and significantly different version will be developed separately. Therefore this version is withdrawn and should not be cited
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[578]  arXiv:2512.03445 [pdf, ps, other]
Title: Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation
Comments: 10 pages. Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[579]  arXiv:2512.03430 [pdf, ps, other]
Title: Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion Features
Comments: Accepted to the ICML 2025 TerraBytes Workshop (June 9, 2025)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[580]  arXiv:2512.03427 [pdf, ps, other]
Title: Generalization Evaluation of Deep Stereo Matching Methods for UAV-Based Forestry Applications
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[581]  arXiv:2512.03424 [pdf, ps, other]
Title: DM3D: Deformable Mamba via Offset-Guided Gaussian Sequencing for Point Cloud Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[582]  arXiv:2512.03418 [pdf, ps, other]
Title: YOLOA: Real-Time Affordance Detection via LLM Adapter
Comments: 13 pages, 9 figures, conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[583]  arXiv:2512.03405 [pdf, ps, other]
Title: ViDiC: Video Difference Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[584]  arXiv:2512.03404 [pdf, ps, other]
Title: MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[585]  arXiv:2512.03370 [pdf, ps, other]
Title: ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[586]  arXiv:2512.03369 [pdf, ps, other]
Title: FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread Forecasting
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[587]  arXiv:2512.03359 [pdf, ps, other]
Title: A Hybrid Deep Learning Framework with Explainable AI for Lung Cancer Classification with DenseNet169 and SVM
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[588]  arXiv:2512.03350 [pdf, ps, other]
Title: SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[589]  arXiv:2512.03346 [pdf, ps, other]
Title: Hierarchical Attention for Sparse Volumetric Anomaly Detection in Subclinical Keratoconus
Comments: 16 pages, 7 figures, 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[590]  arXiv:2512.03345 [pdf, ps, other]
Title: HalluGen: Synthesizing Realistic and Controllable Hallucinations for Evaluating Image Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[591]  arXiv:2512.03339 [pdf, ps, other]
Title: ProtoEFNet: Dynamic Prototype Learning for Inherently Interpretable Ejection Fraction Estimation in Echocardiography
Comments: 11 pages, Accepted in IMIMIC Workshop at MICCAI 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[592]  arXiv:2512.03335 [pdf, ps, other]
Title: Step-by-step Layered Design Generation
Journal-ref: AAAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[593]  arXiv:2512.03317 [pdf, ps, other]
Title: NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map Construction
Comments: Accepted to 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[594]  arXiv:2512.03284 [pdf, ps, other]
Title: SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[595]  arXiv:2512.03257 [pdf, ps, other]
Title: PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing Imagery
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[596]  arXiv:2512.03247 [pdf, ps, other]
Title: PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement
Comments: Published in the Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[597]  arXiv:2512.03245 [pdf, ps, other]
Title: 2-Shots in the Dark: Low-Light Denoising with Minimal Data Acquisition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[598]  arXiv:2512.03237 [pdf, ps, other]
Title: LLM-Guided Material Inference for 3D Point Clouds
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[599]  arXiv:2512.03233 [pdf, ps, other]
Title: Object Counting with GPT-4o and GPT-5: A Comparative Study
Comments: 5 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[600]  arXiv:2512.03210 [pdf, ps, other]
Title: Flux4D: Flow-based Unsupervised 4D Reconstruction
Comments: NeurIPS 2025. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[601]  arXiv:2512.03199 [pdf, ps, other]
Title: Does Head Pose Correction Improve Biometric Facial Recognition?
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[602]  arXiv:2512.03182 [pdf, ps, other]
Title: Drainage: A Unifying Framework for Addressing Class Uncertainty
Comments: 16 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[603]  arXiv:2512.03126 [pdf, ps, other]
Title: Hierarchical Process Reward Models are Symbolic Vision Learners
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[604]  arXiv:2512.04076 (cross-list from cs.GR) [pdf, ps, other]
Title: Radiance Meshes for Volumetric Reconstruction
Comments: Website: half-potato.gitlab.io/rm
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[605]  arXiv:2512.04032 (cross-list from cs.CL) [pdf, ps, other]
Title: Jina-VLM: Small Multilingual Vision Language Model
Comments: 18 pages, 1-7 main content, 13-18 appendix for tables and dataset
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[606]  arXiv:2512.03995 (cross-list from cs.RO) [pdf, ps, other]
Title: Artificial Microsaccade Compensation: Stable Vision for an Ornithopter
Comments: 29 pages, 5 figures, 2 tables, under review
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[607]  arXiv:2512.03962 (cross-list from eess.IV) [pdf, ps, other]
Title: Tada-DIP: Input-adaptive Deep Image Prior for One-shot 3D Image Reconstruction
Comments: 6 pages, 8 figures, 2025 Asilomar Conference on Signals, Systems, and Computers. Code is available at github.com/evanbell02/Tada-DIP/
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[608]  arXiv:2512.03656 (cross-list from cs.LG) [pdf, ps, other]
Title: Cyclical Temporal Encoding and Hybrid Deep Ensembles for Multistep Energy Forecasting
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[609]  arXiv:2512.03556 (cross-list from cs.RO) [pdf, ps, other]
Title: RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[610]  arXiv:2512.03522 (cross-list from cs.RO) [pdf, ps, other]
Title: MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global Localization
Comments: Accepted in IEEE Robotics and Automation Letters (2025)
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[611]  arXiv:2512.03514 (cross-list from cs.IR) [pdf, ps, other]
Title: M3DR: Towards Universal Multilingual Multimodal Document Retrieval
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[612]  arXiv:2512.03422 (cross-list from cs.RO) [pdf, ps, other]
Title: What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[613]  arXiv:2512.03216 (cross-list from physics.ins-det) [pdf, ps, other]
Title: Kaleidoscopic Scintillation Event Imaging
Subjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[614]  arXiv:2512.03173 (cross-list from cs.CY) [pdf, ps, other]
Title: Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping
Journal-ref: AAAI 2026 Social Impact Track
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[615]  arXiv:2512.03166 (cross-list from cs.RO) [pdf, ps, other]
Title: Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual Environments
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[616]  arXiv:2512.03111 (cross-list from q-bio.GN) [pdf, ps, other]
Title: PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer
Comments: Accepted by AAAI 2026
Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[617]  arXiv:2512.03054 (cross-list from cs.LG) [pdf, ps, other]
Title: Energy-Efficient Federated Learning via Adaptive Encoder Freezing for MRI-to-CT Conversion: A Green AI-Guided Research
Comments: 22 pages, 13 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Medical Physics (physics.med-ph)
[618]  arXiv:2512.03052 (cross-list from cs.GR) [pdf, ps, other]
Title: LATTICE: Democratize High-Fidelity 3D Generation at Scale
Comments: Technical Report
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

Wed, 3 Dec 2025 (showing first 18 of 141 entries)

[619]  arXiv:2512.03046 [pdf, ps, other]
Title: MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
Comments: Code and demo available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[620]  arXiv:2512.03045 [pdf, ps, other]
Title: CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[621]  arXiv:2512.03043 [pdf, ps, other]
Title: OneThinker: All-in-one Reasoning Model for Image and Video
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[622]  arXiv:2512.03042 [pdf, ps, other]
Title: PPTArena: A Benchmark for Agentic PowerPoint Editing
Comments: Project webpage: this https URL GitHub: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[623]  arXiv:2512.03041 [pdf, ps, other]
Title: MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[624]  arXiv:2512.03040 [pdf, ps, other]
Title: Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation
Comments: Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[625]  arXiv:2512.03036 [pdf, ps, other]
Title: ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[626]  arXiv:2512.03034 [pdf, ps, other]
Title: MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation
Comments: Our project website is this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[627]  arXiv:2512.03020 [pdf, ps, other]
Title: Unrolled Networks are Conditional Probability Flows in MRI Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[628]  arXiv:2512.03018 [pdf, ps, other]
Title: AutoBrep: Autoregressive B-Rep Generation with Unified Topology and Geometry
Comments: Accepted to Siggraph Asia 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[629]  arXiv:2512.03014 [pdf, ps, other]
Title: Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks
Comments: NeurIPS 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[630]  arXiv:2512.03013 [pdf, ps, other]
Title: In-Context Sync-LoRA for Portrait Video Editing
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[631]  arXiv:2512.03010 [pdf, ps, other]
Title: SurfFill: Completion of LiDAR Point Clouds via Gaussian Surfel Splatting
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
[632]  arXiv:2512.03004 [pdf, ps, other]
Title: DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[633]  arXiv:2512.03000 [pdf, ps, other]
Title: DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[634]  arXiv:2512.02993 [pdf, ps, other]
Title: TEXTRIX: Latent Attribute Grid for Native Texture Generation and Beyond
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[635]  arXiv:2512.02991 [pdf, ps, other]
Title: GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[636]  arXiv:2512.02982 [pdf, ps, other]
Title: U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
Comments: Preprint; 19 pages, 7 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[ total of 759 entries: 1-100 | ... | 237-336 | 337-436 | 437-536 | 537-636 | 637-736 | 737-759 ]
[ showing 100 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2512, contact, help  (Access key information)