Computer Vision and Pattern Recognition
Authors and titles for recent submissions
[ total of 749 entries: 1-749 ][ showing up to 1000 entries per page: fewer | more ]
Wed, 10 Dec 2025
- [1] arXiv:2512.08931 [pdf, ps, other]
-
Title: Astra: General Interactive World Model with Autoregressive DenoisingComments: Code is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [2] arXiv:2512.08930 [pdf, ps, other]
-
Title: Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature AlignmentAuthors: Youming Deng, Songyou Peng, Junyi Zhang, Kathryn Heal, Tiancheng Sun, John Flynn, Steve Marschner, Lucy ChaiComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [3] arXiv:2512.08924 [pdf, ps, other]
-
Title: Efficiently Reconstructing Dynamic Scenes One D4RT at a TimeAuthors: Chuhan Zhang, Guillaume Le Moing, Skanda Koppula, Ignacio Rocco, Liliane Momeni, Junyu Xie, Shuyang Sun, Rahul Sukthankar, Joëlle K Barral, Raia Hadsell, Zoubin Ghahramani, Andrew Zisserman, Junlin Zhang, Mehdi SM SajjadiComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [4] arXiv:2512.08922 [pdf, ps, other]
-
Title: Unified Diffusion Transformer for High-fidelity Text-Aware Image RestorationAuthors: Jin Hyeon Kim, Paul Hyunbin Cho, Claire Kim, Jaewon Min, Jaeeun Lee, Jihye Park, Yeji Choi, Seungryong KimSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [5] arXiv:2512.08912 [pdf, ps, other]
-
Title: LiDAS: Lighting-driven Dynamic Active Sensing for Nighttime PerceptionComments: Preprint. 12 pages, 9 figures. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [6] arXiv:2512.08905 [pdf, ps, other]
-
Title: Self-Evolving 3D Scene Generation from a Single ImageSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [7] arXiv:2512.08897 [pdf, ps, other]
-
Title: UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [8] arXiv:2512.08889 [pdf, ps, other]
-
Title: No Labels, No Problem: Training Visual Reasoners with Multimodal VerifiersComments: Project webpage: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [9] arXiv:2512.08888 [pdf, ps, other]
-
Title: Accelerated Rotation-Invariant Convolution for UAV Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [10] arXiv:2512.08881 [pdf, ps, other]
-
Title: SATGround: A Spatially-Aware Approach for Visual Grounding in Remote SensingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [11] arXiv:2512.08873 [pdf, ps, other]
-
Title: Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image CaptioningComments: 6 pagesJournal-ref: 2024 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
- [12] arXiv:2512.08860 [pdf, ps, other]
-
Title: Tri-Bench: Stress-Testing VLM Reliability on Spatial Reasoning under Camera Tilt and Object InterferenceAuthors: Amit BendkhaleComments: 6 pages, 3 figures. Code and data: this https URL Accepted to the AAAI 2026 Workshop on Trust and Control in Agentic AI (TrustAgent)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [13] arXiv:2512.08854 [pdf, ps, other]
-
Title: Generation is Required for Data-Efficient PerceptionComments: PreprintSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [14] arXiv:2512.08829 [pdf, ps, other]
-
Title: InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language ModelsComments: 16 pages, 8 figures, conference or other essential infoSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [15] arXiv:2512.08820 [pdf, ps, other]
-
Title: Training-Free Dual Hyperbolic Adapters for Better Cross-Modal ReasoningAuthors: Yi Zhang, Chun-Wun Cheng, Junyi He, Ke Yu, Yushun Tang, Carola-Bibiane Schönlieb, Zhihai He, Angelica I. Aviles-RiveroComments: Accepted in IEEE Transactions on Multimedia (TMM)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [16] arXiv:2512.08789 [pdf, ps, other]
-
Title: MatteViT: High-Frequency-Aware Document Shadow Removal with Shadow Matte GuidanceComments: 10 pages, 7 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [17] arXiv:2512.08785 [pdf, ps, other]
-
Title: LoFA: Learning to Predict Personalized Priors for Fast Adaptation of Visual Generative ModelsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [18] arXiv:2512.08774 [pdf, ps, other]
-
Title: Refining Visual Artifacts in Diffusion Models via Explainable AI-based Flaw Activation MapsComments: 10 pages, 9 figures, 7 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [19] arXiv:2512.08765 [pdf, ps, other]
-
Title: Wan-Move: Motion-controllable Video Generation via Latent Trajectory GuidanceAuthors: Ruihang Chu, Yefei He, Zhekai Chen, Shiwei Zhang, Xiaogang Xu, Bin Xia, Dingdong Wang, Hongwei Yi, Xihui Liu, Hengshuang Zhao, Yu Liu, Yingya Zhang, Yujiu YangComments: NeurlPS 2025. Code and data available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [20] arXiv:2512.08751 [pdf, ps, other]
-
Title: Skewness-Guided Pruning of Multimodal Swin Transformers for Federated Skin Lesion Classification on Edge DevicesSubjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
- [21] arXiv:2512.08747 [pdf, ps, other]
-
Title: A Scalable Pipeline Combining Procedural 3D Graphics and Guided Diffusion for Photorealistic Synthetic Training Data Generation in White Button Mushroom SegmentationComments: 20 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [22] arXiv:2512.08738 [pdf, ps, other]
-
Title: Pose-Based Sign Language Spotting via an End-to-End Encoder ArchitectureComments: To appear at AACL-IJCNLP 2025 Workshop WSLPSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [23] arXiv:2512.08733 [pdf, ps, other]
-
Title: Mitigating Individual Skin Tone Bias in Skin Lesion Classification through Distribution-Aware ReweightingAuthors: Kuniko Paxton, Zeinab Dehghani, Koorosh Aslansefat, Dhavalkumar Thakker, Yiannis PapadopoulosSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [24] arXiv:2512.08730 [pdf, ps, other]
-
Title: SegEarth-OV3: Exploring SAM 3 for Open-Vocabulary Semantic Segmentation in Remote Sensing ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [25] arXiv:2512.08700 [pdf, ps, other]
-
Title: Scale-invariant and View-relational Representation Learning for Full Surround Monocular DepthAuthors: Kyumin Hwang, Wonhyeok Choi, Kiljoon Han, Wonjoon Choi, Minwoo Choi, Yongcheon Na, Minwoo Park, Sunghoon ImComments: Accepted at IEEE Robotics and Automation Letters (RA-L) 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [26] arXiv:2512.08697 [pdf, ps, other]
-
Title: What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute ImportanceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [27] arXiv:2512.08673 [pdf, ps, other]
-
Title: Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point CloudsComments: 16 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [28] arXiv:2512.08648 [pdf, ps, other]
-
Title: Repulsor: Accelerating Generative Modeling with a Contrastive Memory BankAuthors: Shaofeng Zhang, Xuanqi Chen, Ning Liao, Haoxiang Zhao, Xiaoxing Wang, Haoru Tan, Sitong Wu, Xiaosong Jia, Qi Fan, Junchi YanComments: 19 pages, 19 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [29] arXiv:2512.08647 [pdf, ps, other]
-
Title: C-DIRA: Computationally Efficient Dynamic ROI Routing and Domain-Invariant Adversarial Learning for Lightweight Driver Behavior RecognitionAuthors: Keito InoshitaSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [30] arXiv:2512.08645 [pdf, ps, other]
-
Title: Chain-of-Image Generation: Toward Monitorable and Controllable Image GenerationComments: 19 pages, 13 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [31] arXiv:2512.08639 [pdf, ps, other]
-
Title: Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied ReasoningComments: Under Review, 12 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [32] arXiv:2512.08627 [pdf, ps, other]
-
Title: Trajectory Densification and Depth from Perspective-based BlurSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [33] arXiv:2512.08625 [pdf, ps, other]
-
Title: OpenMonoGS-SLAM: Monocular Gaussian Splatting SLAM with Open-set SemanticsComments: 8 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [34] arXiv:2512.08606 [pdf, ps, other]
-
Title: Decoupling Template Bias in CLIP: Harnessing Empty Prompts for Enhanced Few-Shot LearningComments: 14 pages, 8 figures, Association for the Advancement of Artificial Intelligence (AAAI2026, poster)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [35] arXiv:2512.08589 [pdf, ps, other]
-
Title: Automated Pollen Recognition in Optical and Holographic Microscopy ImagesAuthors: Swarn Singh Warshaneyan, Maksims Ivanovs, Blaž Cugmas, Inese Bērziņa, Laura Goldberga, Mindaugas Tamosiunas, Roberts KadiķisComments: 08 pages, 10 figures, 04 tables, 20 references. Date of Conference: 13-14 June 2025 Date Added to IEEE Xplore: 10 July 2025 Electronic ISBN: 979-8-3315-0969-9 Print on Demand(PoD) ISBN: 979-8-3315-0970-5 DOI: 10.1109/AICCONF64766.2025.11064260 Conference Location: Prague, Czech Republic Online Access: this https URLJournal-ref: 2025 3rd Cognitive Models and Artificial Intelligence Conference (AICCONF), vol. 1, no. 1, pp. 1-8, Prague, Czech Republic, IEEE, 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [36] arXiv:2512.08577 [pdf, ps, other]
-
Title: Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open SurgerySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [37] arXiv:2512.08572 [pdf, ps, other]
-
Title: From Cells to Survival: Hierarchical Analysis of Cell Inter-Relations in Multiplex Microscopy for Lung Cancer PrognosisAuthors: Olle Edgren Schüllerqvist, Jens Baumann, Joakim Lindblad, Love Nordling, Artur Mezheyeuski, Patrick Micke, Nataša SladojeComments: 5 pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [38] arXiv:2512.08569 [pdf, ps, other]
-
Title: Instance-Aware Test-Time Segmentation for Continual Domain ShiftsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [39] arXiv:2512.08564 [pdf, ps, other]
-
Title: Modular Neural Image Signal ProcessingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [40] arXiv:2512.08560 [pdf, ps, other]
-
Title: BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human BrainAuthors: Navve Wasserman, Matias Cosarinsky, Yuval Golbari, Aude Oliva, Antonio Torralba, Tamar Rott Shaham, Michal IraniSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [41] arXiv:2512.08557 [pdf, ps, other]
-
Title: SSCATeR: Sparse Scatter-Based Convolution Algorithm with Temporal Data Recycling for Real-Time 3D Object Detection in LiDAR Point CloudsComments: 22 Pages, 26 Figures, This work has been submitted to the IEEE Sensors Journal for possible publicationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [42] arXiv:2512.08547 [pdf, ps, other]
-
Title: An Iteration-Free Fixed-Point Estimator for Diffusion InversionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [43] arXiv:2512.08542 [pdf, ps, other]
-
Title: A Novel Wasserstein Quaternion Generative Adversarial Network for Color Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
- [44] arXiv:2512.08537 [pdf, ps, other]
-
Title: Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [45] arXiv:2512.08535 [pdf, ps, other]
-
Title: Photo3D: Advancing Photorealistic 3D Generation through Structure-Aligned Detail EnhancementSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [46] arXiv:2512.08534 [pdf, ps, other]
-
Title: PaintFlow: A Unified Framework for Interactive Oil Paintings Editing and GenerationComments: 14 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [47] arXiv:2512.08529 [pdf, ps, other]
-
Title: MVP: Multiple View Prediction Improves GUI GroundingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [48] arXiv:2512.08524 [pdf, ps, other]
-
Title: Beyond Real Weights: Hypercomplex Representations for Stable QuantizationAuthors: Jawad Ibn Ahad, Maisha Rahman, Amrijit Biswas, Muhammad Rafsan Kabir, Robin Krambroeckers, Sifat Momen, Nabeel Mohammed, Shafin RahmanComments: Accepted in Winter Conference on Applications of Computer Vision (WACV) 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [49] arXiv:2512.08511 [pdf, ps, other]
-
Title: Thinking with Images via Self-Calling AgentComments: Code would be released at this https URL soonSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [50] arXiv:2512.08506 [pdf, ps, other]
-
Title: OCCDiff: Occupancy Diffusion Model for High-Fidelity 3D Building Reconstruction from Noisy Point CloudsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [51] arXiv:2512.08505 [pdf, ps, other]
-
Title: Beyond the Noise: Aligning Prompts with Latent Representations in Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [52] arXiv:2512.08503 [pdf, ps, other]
-
Title: Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [53] arXiv:2512.08498 [pdf, ps, other]
-
Title: On-the-fly Large-scale 3D Reconstruction from Multi-Camera RigsAuthors: Yijia Guo, Tong Hu, Zhiwei Li, Liwen Hu, Keming Qian, Xitong Lin, Shengbo Chen, Tiejun Huang, Lei MaSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [54] arXiv:2512.08486 [pdf, ps, other]
-
Title: Temporal Concept Dynamics in Diffusion Models via Prompt-Conditioned InterventionsComments: Code is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [55] arXiv:2512.08478 [pdf, ps, other]
-
Title: Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting PlatformAuthors: Yuning Gong, Yifei Liu, Yifan Zhan, Muyao Niu, Xueying Li, Yuanjun Liao, Jiaming Chen, Yuanyuan Gao, Jiaqi Chen, Minming Chen, Li Zhou, Yuning Zhang, Wei Wang, Xiaoqing Hou, Huaxi Huang, Shixiang Tang, Le Ma, Dingwen Zhang, Xue Yang, Junchi Yan, Yanchi Zhang, Yinqiang Zheng, Xiao Sun, Zhihang ZhongComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
- [56] arXiv:2512.08477 [pdf, ps, other]
-
Title: ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Consistent AttentionAuthors: Huiguo He, Pengyu Yan, Ziqi Yi, Weizhi Zhong, Zheng Liu, Yejun Tang, Huan Yang, Kun Gai, Guanbin Li, Lianwen JinSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [57] arXiv:2512.08467 [pdf, ps, other]
-
Title: Team-Aware Football Player Tracking with SAM: An Appearance-Based Approach to Occlusion RecoveryComments: 8 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [58] arXiv:2512.08445 [pdf, ps, other]
-
Title: Uncertainty-Aware Subset Selection for Robust Visual Explainability under Distribution ShiftsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [59] arXiv:2512.08441 [pdf, ps, other]
-
Title: Leveraging Multispectral Sensors for Color Correction in Mobile CamerasSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [60] arXiv:2512.08439 [pdf, ps, other]
-
Title: LapFM: A Laparoscopic Segmentation Foundation Model via Hierarchical Concept Evolving Pre-trainingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [61] arXiv:2512.08430 [pdf, ps, other]
-
Title: SDT-6D: Fully Sparse Depth-Transformer for Staged End-to-End 6D Pose Estimation in Industrial Multi-View Bin PickingComments: Accepted to WACV 2026. Preprint versionSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [62] arXiv:2512.08410 [pdf, ps, other]
-
Title: Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip RetrievalAuthors: Tao Chen, Shaobo Ju, Qiong Wu, Chenxin Fang, Kun Zhang, Jun Peng, Hui Li, Yiyi Zhou, Rongrong JiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [63] arXiv:2512.08406 [pdf, ps, other]
-
Title: SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from VideosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [64] arXiv:2512.08400 [pdf, ps, other]
-
Title: Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in FisheriesComments: The paper has been accepted for publication at Northern Lights Deep Learning (NLDL) Conference 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [65] arXiv:2512.08397 [pdf, ps, other]
-
Title: Detection of Digital Facial Retouching utilizing Face Beauty InformationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [66] arXiv:2512.08378 [pdf, ps, other]
-
Title: Simultaneous Enhancement and Noise Suppression under Complex Illumination ConditionsComments: The paper has been accepted and officially published by IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENTSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [67] arXiv:2512.08374 [pdf, ps, other]
-
Title: The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information LossAuthors: Bozhou Li, Xinda Xue, Sihan Yang, Yang Shi, Xinlong Chen, Yushuo Guan, Yuanxing Zhang, Wentao ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [68] arXiv:2512.08362 [pdf, ps, other]
-
Title: SCU-CGAN: Enhancing Fire Detection through Synthetic Fire Image Generation and Dataset AugmentationComments: Accepted for main track at MobieSec 2024 (not published in the proceedings)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [69] arXiv:2512.08358 [pdf, ps, other]
-
Title: TrackingWorld: World-centric Monocular 3D Tracking of Almost All PixelsAuthors: Jiahao Lu, Weitao Xiong, Jiacheng Deng, Peng Li, Tianyu Huang, Zhiyang Dou, Cheng Lin, Sai-Kit Yeung, Yuan LiuComments: Accepted by NeurIPS 2025. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [70] arXiv:2512.08337 [pdf, ps, other]
-
Title: DINO-BOLDNet: A DINOv3-Guided Multi-Slice Attention Network for T1-to-BOLD GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [71] arXiv:2512.08334 [pdf, ps, other]
-
Title: HybridSplat: Fast Reflection-baked Gaussian Tracing using Hybrid SplattingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [72] arXiv:2512.08331 [pdf, ps, other]
-
Title: Bi^2MAC: Bimodal Bi-Adaptive Mask-Aware Convolution for Remote Sensing PansharpeningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [73] arXiv:2512.08330 [pdf, ps, other]
-
Title: PointDico: Contrastive 3D Representation Learning Guided by Diffusion ModelsComments: Accepted by IJCNN 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [74] arXiv:2512.08329 [pdf, ps, other]
-
Title: Interpreting Structured Perturbations in Image Protection Methods for Diffusion ModelsComments: 32 pages, 17 figures, 1 table, 5 algorithms, preprintSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [75] arXiv:2512.08327 [pdf, ps, other]
-
Title: Low Rank Support Quaternion Matrix MachineSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
- [76] arXiv:2512.08325 [pdf, ps, other]
-
Title: GeoDiffMM: Geometry-Guided Conditional Diffusion for Motion MagnificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [77] arXiv:2512.08323 [pdf, ps, other]
-
Title: Detecting Dental Landmarks from Intraoral 3D Scans: the 3DTeethLand challengeAuthors: Achraf Ben-Hamadou, Nour Neifar, Ahmed Rekik, Oussama Smaoui, Firas Bouzguenda, Sergi Pujades, Niels van Nistelrooij, Shankeeth Vinayahalingam, Kaibo Shi, Hairong Jin, Youyi Zheng, Tibor Kubík, Oldřich Kodym, Petr Šilling, Kateřina Trávníčková, Tomáš Mojžiš, Jan Matula, Jeffry Hartanto, Xiaoying Zhu, Kim-Ngan Nguyen, Tudor Dascalu, Huikai Wu, and Weijie Liu, Shaojie Zhuang, Guangshun Wei, Yuanfeng ZhouComments: MICCAI 2024, 3DTeethLand, Challenge report, under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [78] arXiv:2512.08317 [pdf, ps, other]
-
Title: GeoDM: Geometry-aware Distribution Matching for Dataset DistillationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [79] arXiv:2512.08309 [pdf, ps, other]
-
Title: Terrain Diffusion: A Diffusion-Based Successor to Perlin Noise in Infinite, Real-Time Terrain GenerationAuthors: Alexander GoslinSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
- [80] arXiv:2512.08294 [pdf, ps, other]
-
Title: OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and ManipulationAuthors: Yexin Liu, Manyuan Zhang, Yueze Wang, Hongyu Li, Dian Zheng, Weiming Zhang, Changsheng Lu, Xunliang Cai, Yan Feng, Peng Pei, Harry YangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [81] arXiv:2512.08282 [pdf, ps, other]
-
Title: PAVAS: Physics-Aware Video-to-Audio SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
- [82] arXiv:2512.08269 [pdf, ps, other]
-
Title: EgoX: Egocentric Video Generation from a Single Exocentric VideoComments: 21 pages, project page : this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [83] arXiv:2512.08262 [pdf, ps, other]
-
Title: RLCNet: An end-to-end deep learning framework for simultaneous online calibration of LiDAR, RADAR, and CameraSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [84] arXiv:2512.08254 [pdf, ps, other]
-
Title: SFP: Real-World Scene Recovery Using Spatial and Frequency PriorsComments: 10 pages, 13 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [85] arXiv:2512.08253 [pdf, ps, other]
-
Title: Query-aware Hub Prototype Learning for Few-Shot 3D Point Cloud Semantic SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [86] arXiv:2512.08247 [pdf, ps, other]
-
Title: Distilling Future Temporal Knowledge with Masked Feature Reconstruction for 3D Object DetectionComments: AAAI-26Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [87] arXiv:2512.08243 [pdf, ps, other]
-
Title: Residual-SwinCA-Net: A Channel-Aware Integrated Residual CNN-Swin Transformer for Malignant Lesion Segmentation in BUSIAuthors: Saeeda Naz, Saddam Hussain Khan (Artificial Intelligence Lab, Department of Computer Systems Engineering, University of Engineering and Applied Sciences (UEAS), Swat, Pakistan)Comments: 26 Pages, 10 Figures, 4 TablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [88] arXiv:2512.08240 [pdf, ps, other]
-
Title: HybridToken-VLM: Hybrid Token Compression for Vision-Language ModelsAuthors: Jusheng Zhang, Xiaoyang Guo, Kaitong Cai, Qinhan Lv, Yijia Fan, Wenhao Chai, Jian Wang, Keze WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [89] arXiv:2512.08237 [pdf, ps, other]
-
Title: FastBEV++: Fast by Algorithm, Deployable by DesignSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [90] arXiv:2512.08229 [pdf, ps, other]
-
Title: Geometry-Aware Sparse Depth Sampling for High-Fidelity RGB-D Depth Completion in Robotic SystemsSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [91] arXiv:2512.08228 [pdf, ps, other]
-
Title: MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal ModelsAuthors: Jusheng Zhang, Kaitong Cai, Xiaoyang Guo, Sidi Liu, Qinhan Lv, Ruiqi Chen, Jing Yang, Yijia Fan, Xiaofei Sun, Jian Wang, Ziliang Chen, Liang Lin, Keze WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [92] arXiv:2512.08227 [pdf, ps, other]
-
Title: New VVC profiles targeting Feature Coding for MachinesComments: Accepted for presentation at ICIP 2025 workshop on Coding for MachinesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [93] arXiv:2512.08223 [pdf, ps, other]
-
Title: SOP^2: Transfer Learning with Scene-Oriented Prompt Pool on 3D Object DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [94] arXiv:2512.08221 [pdf, ps, other]
-
Title: VisKnow: Constructing Visual Knowledge Base for Object UnderstandingComments: 16 pages, 12 figures, 7 tables. Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [95] arXiv:2512.08215 [pdf, ps, other]
-
Title: Blur2Sharp: Human Novel Pose and View Synthesis with Generative Prior RefinementSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [96] arXiv:2512.08198 [pdf, ps, other]
-
Title: Animal Re-Identification on MicrocontrollersSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [97] arXiv:2512.08180 [pdf, ps, other]
-
Title: GeoLoom: High-quality Geometric Diagram Generation from Textual InputSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [98] arXiv:2512.08163 [pdf, ps, other]
-
Title: Accuracy Does Not Guarantee Human-Likeness in Monocular Depth EstimatorsComments: 22 pages, 12 figures, 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [99] arXiv:2512.08161 [pdf, ps, other]
-
Title: Fourier-RWKV: A Multi-State Perception Network for Efficient Image DehazingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [100] arXiv:2512.08135 [pdf, ps, other]
-
Title: CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial ReasoningComments: Accepted to WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [101] arXiv:2512.08075 [pdf, ps, other]
-
Title: Identification of Deforestation Areas in the Amazon Rainforest Using Change Detection ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [102] arXiv:2512.08048 [pdf, ps, other]
-
Title: Mask to Adapt: Simple Random Masking Enables Robust Continual Test-Time LearningAuthors: Chandler Timm C. DolorielComments: ongoing workSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [103] arXiv:2512.08042 [pdf, ps, other]
-
Title: Towards Sustainable Universal Deepfake Detection with Frequency-Domain MaskingAuthors: Chandler Timm C. Doloriel, Habib Ullah, Kristian Hovde Liland, Fadi Al Machot, Ngai-Man CheungSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [104] arXiv:2512.08040 [pdf, ps, other]
-
Title: Lost in Translation, Found in Embeddings: Sign Language Translation and AlignmentSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [105] arXiv:2512.08038 [pdf, ps, other]
-
Title: SSplain: Sparse and Smooth Explainer for Retinopathy of Prematurity ClassificationAuthors: Elifnur Sunger, Tales Imbiriba, Peter Campbell, Deniz Erdogmus, Stratis Ioannidis, Jennifer DyComments: 20 pages, 16 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [106] arXiv:2512.08016 [pdf, ps, other]
-
Title: FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language ModelsAuthors: Jiyoon Pyo, Yuankun Jiao, Dongwon Jung, Zekun Li, Leeje Jang, Sofia Kirsanova, Jina Kim, Yijun Lin, Qin Liu, Junyi Xie, Hadi Askari, Nan Xu, Muhao Chen, Yao-Yi ChiangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [107] arXiv:2512.07984 [pdf, ps, other]
-
Title: Restrictive Hierarchical Semantic Segmentation for Stratified Tooth Layer DetectionComments: 13 pages, 7 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [108] arXiv:2512.07951 [pdf, ps, other]
-
Title: Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic QualityAuthors: Zekai Luo, Zongze Du, Zhouhang Zhu, Hao Zhong, Muzhi Zhu, Wen Wang, Yuling Xi, Chenchen Jing, Hao Chen, Chunhua ShenComments: Project webpage: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [109] arXiv:2512.07925 [pdf, ps, other]
-
Title: Near-real time fires detection using satellite imagery in Sudan conflictSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [110] arXiv:2512.07838 [pdf, ps, other]
-
Title: Detection of Cyberbullying in GIF using AISubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
- [111] arXiv:2512.08715 (cross-list from cs.PF) [pdf, ps, other]
-
Title: Multi-domain performance analysis with scores tailored to user preferencesSubjects: Performance (cs.PF); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [112] arXiv:2512.08629 (cross-list from cs.AI) [pdf, ps, other]
-
Title: See-Control: A Multimodal Agent Framework for Smartphone Interaction with a Robotic ArmSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
- [113] arXiv:2512.08545 (cross-list from cs.CL) [pdf, ps, other]
-
Title: Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon TasksComments: 22 pages, 2 tables, 9 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
- [114] arXiv:2512.08500 (cross-list from cs.GR) [pdf, ps, other]
-
Title: Learning to Control Physically-simulated 3D Characters via Generating and Mimicking 2D MotionsSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
- [115] arXiv:2512.08360 (cross-list from cs.NE) [pdf, ps, other]
-
Title: Conditional Morphogenesis: Emergent Generation of Structural Digits via Neural Cellular AutomataAuthors: Ali SakourComments: 13 pages, 5 figures. Code available at: this https URLSubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [116] arXiv:2512.08284 (cross-list from physics.geo-ph) [pdf, ps, other]
-
Title: Self-Reinforced Deep Priors for Reparameterized Full Waveform InversionComments: Submitted to GEOPHYSICSSubjects: Geophysics (physics.geo-ph); Computer Vision and Pattern Recognition (cs.CV)
- [117] arXiv:2512.08271 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic TeleoperationComments: Published and Presented at 3rd Workshop on Human-Centric Multilateral Teleoperation in ICRA 2025Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [118] arXiv:2512.08216 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [119] arXiv:2512.08188 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World ModelAuthors: Wenjiang Xu, Cindy Wang, Rui Fang, Mingkang Zhang, Lusong Li, Jing Xu, Jiayuan Gu, Zecui Zeng, Rui ChenComments: Website at this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [120] arXiv:2512.08170 (cross-list from cs.RO) [pdf, ps, other]
-
Title: RAVES-Calib: Robust, Accurate and Versatile Extrinsic Self Calibration Using Optimal Geometric FeaturesSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [121] arXiv:2512.08153 (cross-list from cs.LG) [pdf, ps, other]
-
Title: TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [122] arXiv:2512.08125 (cross-list from eess.IV) [pdf, ps, other]
-
Title: FlowSteer: Conditioning Flow Field for Consistent Image RestorationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [123] arXiv:2512.08099 (cross-list from math.NA) [pdf, ps, other]
-
Title: Generalizations of the Normalized Radon Cumulative Distribution Transform for Limited Data RecognitionSubjects: Numerical Analysis (math.NA); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
- [124] arXiv:2512.08029 (cross-list from cs.LG) [pdf, ps, other]
-
Title: CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent SpaceSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [125] arXiv:2512.07998 (cross-list from cs.RO) [pdf, ps, other]
-
Title: DIJIT: A Robotic Head for an Active ObserverAuthors: Mostafa Kamali Tabrizi, Mingshi Chi, Bir Bikram Dey, Yu Qing Yuan, Markus D. Solbach, Yiqian Liu, Michael Jenkin, John K. TsotsosSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [126] arXiv:2512.07981 (cross-list from cs.LG) [pdf, ps, other]
-
Title: CIP-Net: Continual Interpretable Prototype-based NetworkSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [127] arXiv:2512.07976 (cross-list from cs.RO) [pdf, ps, other]
-
Title: VLD: Visual Language Goal Distance for Reinforcement Learning NavigationSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [128] arXiv:2512.07969 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Sparse Variable Projection in Robotic Perception: Exploiting Separable Structure for Efficient Nonlinear OptimizationComments: 8 pages, submitted for reviewSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [129] arXiv:2512.07884 (cross-list from cs.LG) [pdf, ps, other]
-
Title: GSPN-2: Efficient Parallel Sequence ModelingAuthors: Hongjun Wang, Yitong Jiang, Collin McCarthy, David Wehr, Hanrong Ye, Xinhao Li, Ka Chun Cheung, Wonmin Byeon, Jinwei Gu, Ke Chen, Kai Han, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Sifei LiuComments: NeurIPS 2025Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [130] arXiv:2512.07855 (cross-list from cs.LG) [pdf, ps, other]
-
Title: LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer ModelSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [131] arXiv:2512.05791 (cross-list from physics.med-ph) [pdf, ps, other]
-
Title: Fast and Robust Diffusion Posterior Sampling for MR Image Reconstruction Using the Preconditioned Unadjusted Langevin AlgorithmComments: Submitted to Magnetic Resonance in MedicineSubjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Probability (math.PR)
Tue, 9 Dec 2025
- [132] arXiv:2512.07834 [pdf, ps, other]
-
Title: Voxify3D: Pixel Art Meets Volumetric RenderingComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [133] arXiv:2512.07833 [pdf, ps, other]
-
Title: Relational Visual SimilarityAuthors: Thao Nguyen, Sicheng Mo, Krishna Kumar Singh, Yilin Wang, Jing Shi, Nicholas Kolkin, Eli Shechtman, Yong Jae Lee, Yuheng LiComments: Project page, data, and code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [134] arXiv:2512.07831 [pdf, ps, other]
-
Title: UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video GenerationAuthors: Jiehui Huang, Yuechen Zhang, Xu He, Yuan Gao, Zhi Cen, Bin Xia, Yan Zhou, Xin Tao, Pengfei Wan, Jiaya JiaComments: Project Website this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [135] arXiv:2512.07829 [pdf, ps, other]
-
Title: One Layer Is Enough: Adapting Pretrained Visual Encoders for Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [136] arXiv:2512.07826 [pdf, ps, other]
-
Title: OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video EditingAuthors: Haoyang He, Jie Wang, Jiangning Zhang, Zhucun Xue, Xingyuan Bu, Qiangpeng Yang, Shilei Wen, Lei XieComments: 38 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [137] arXiv:2512.07821 [pdf, ps, other]
-
Title: WorldReel: 4D Video Generation with Consistent Geometry and Motion ModelingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [138] arXiv:2512.07807 [pdf, ps, other]
-
Title: Lang3D-XL: Language Embedded 3D Gaussians for Large-scale ScenesComments: Accepted to SIGGRAPH Asia 2025. Project webpage: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [139] arXiv:2512.07806 [pdf, ps, other]
-
Title: Multi-view Pyramid Transformer: Look Coarser to See BroaderComments: Project page: see this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [140] arXiv:2512.07802 [pdf, ps, other]
-
Title: OneStory: Coherent Multi-Shot Video Generation with Adaptive MemoryAuthors: Zhaochong An, Menglin Jia, Haonan Qiu, Zijian Zhou, Xiaoke Huang, Zhiheng Liu, Weiming Ren, Kumara Kahatapitiya, Ding Liu, Sen He, Chenyang Zhang, Tao Xiang, Fanny Yang, Serge Belongie, Tian XieComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [141] arXiv:2512.07778 [pdf, ps, other]
-
Title: Distribution Matching Variational AutoEncoderSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [142] arXiv:2512.07776 [pdf, ps, other]
-
Title: GorillaWatch: An Automated System for In-the-Wild Gorilla Re-Identification and Population MonitoringAuthors: Maximilian Schall, Felix Leonard Knöfel, Noah Elias König, Jan Jonas Kubeler, Maximilian von Klinski, Joan Wilhelm Linnemann, Xiaoshi Liu, Iven Jelle Schlegelmilch, Ole Woyciniuk, Alexandra Schild, Dante Wasmuht, Magdalena Bermejo Espinet, German Illera Basas, Gerard de MeloComments: Accepted at WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [143] arXiv:2512.07760 [pdf, ps, other]
-
Title: Modality-Aware Bias Mitigation and Invariance Learning for Unsupervised Visible-Infrared Person Re-IdentificationComments: Accepted to AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [144] arXiv:2512.07756 [pdf, ps, other]
-
Title: UltrasODM: A Dual Stream Optical Flow Mamba Network for 3D Freehand Ultrasound ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [145] arXiv:2512.07747 [pdf, ps, other]
-
Title: Unison: A Fully Automatic, Task-Universal, and Low-Cost Framework for Unified Understanding and GenerationAuthors: Shihao Zhao, Yitong Chen, Zeyinzi Jiang, Bojia Zi, Shaozhe Hao, Yu Liu, Chaojie Mao, Kwan-Yee K. WongSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [146] arXiv:2512.07745 [pdf, ps, other]
-
Title: DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous DrivingAuthors: Jialv Zou, Shaoyu Chen, Bencheng Liao, Zhiyu Zheng, Yuehao Song, Lefei Zhang, Qian Zhang, Wenyu Liu, Xinggang WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [147] arXiv:2512.07738 [pdf, ps, other]
-
Title: HLTCOE Evaluation Team at TREC 2025: VQA TrackAuthors: Dengjia Zhang, Charles Weng, Katherine Guerrerio, Yi Lu, Kenton Murray, Alexander Martin, Reno Kriz, Benjamin Van DurmeComments: 7 pages, 1 figureSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [148] arXiv:2512.07733 [pdf, ps, other]
-
Title: SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental ImagerySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [149] arXiv:2512.07730 [pdf, ps, other]
-
Title: SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object HallucinationComments: WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [150] arXiv:2512.07729 [pdf, ps, other]
-
Title: Improving action classification with brain-inspired deep networksSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [151] arXiv:2512.07720 [pdf, ps, other]
-
Title: ViSA: 3D-Aware Video Shading for Real-Time Upper-Body Avatar CreationAuthors: Fan Yang, Heyuan Li, Peihao Li, Weihao Yuan, Lingteng Qiu, Chaoyue Song, Cheng Chen, Yisheng He, Shifeng Zhang, Xiaoguang Han, Steven Hoi, Guosheng LinComments: Project page: \url{this https URL}Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [152] arXiv:2512.07712 [pdf, ps, other]
-
Title: UnCageNet: Tracking and Pose Estimation of Caged AnimalComments: 9 pages, 2 figures, 2 tables. Accepted to the Indian Conference on Computer Vision, Graphics, and Image Processing (ICVGIP 2025), Mandi, IndiaSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [153] arXiv:2512.07703 [pdf, ps, other]
-
Title: PVeRA: Probabilistic Vector-Based Random Matrix AdaptationAuthors: Leo Fillioux, Enzo Ferrante, Paul-Henry Cournède, Maria Vakalopoulou, Stergios ChristodoulidisSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [154] arXiv:2512.07702 [pdf, ps, other]
-
Title: Guiding What Not to Generate: Automated Negative Prompting for Text-Image AlignmentComments: WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [155] arXiv:2512.07698 [pdf, ps, other]
-
Title: sim2art: Accurate Articulated Object Modeling from a Single Video using Synthetic Training Data OnlySubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [156] arXiv:2512.07674 [pdf, ps, other]
-
Title: DIST-CLIP: Arbitrary Metadata and Image Guided MRI Harmonization via Disentangled Anatomy-Contrast RepresentationsAuthors: Mehmet Yigit Avci, Pedro Borges, Virginia Fernandez, Paul Wright, Mehmet Yigitsoy, Sebastien Ourselin, Jorge CardosoSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [157] arXiv:2512.07668 [pdf, ps, other]
-
Title: EgoCampus: Egocentric Pedestrian Eye Gaze Model and DatasetSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [158] arXiv:2512.07661 [pdf, ps, other]
-
Title: Optimization-Guided Diffusion for Interactive Scene GenerationAuthors: Shiaho Li, Naisheng Ye, Tianyu Li, Kashyap Chitta, Tuo An, Peng Su, Boyang Wang, Haiou Liu, Chen Lv, Hongyang LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [159] arXiv:2512.07652 [pdf, ps, other]
-
Title: An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific ResearchSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [160] arXiv:2512.07651 [pdf, ps, other]
-
Title: Liver Fibrosis Quantification and Analysis: The LiQA Dataset and Baseline MethodSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [161] arXiv:2512.07628 [pdf, ps, other]
-
Title: MoCA: Mixture-of-Components Attention for Scalable Compositional 3D GenerationAuthors: Zhiqi Li, Wenhuan Li, Tengfei Wang, Zhenwei Wang, Junta Wu, Haoyuan Wang, Yunhan Yang, Zehuan Huang, Yang Li, Peidong Liu, Chunchao GuoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [162] arXiv:2512.07606 [pdf, ps, other]
-
Title: Decomposition Sampling for Efficient Region Annotations in Active LearningAuthors: Jingna Qiu, Frauke Wilm, Mathias Öttl, Jonas Utz, Maja Schlereth, Moritz Schillinger, Marc Aubreville, Katharina BreiningerSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [163] arXiv:2512.07599 [pdf, ps, other]
-
Title: Online Segment Any 3D Thing as Instance TrackingComments: NeurIPS 2025, Code is at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [164] arXiv:2512.07596 [pdf, ps, other]
-
Title: More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic SurgeryAuthors: Wenzhen Dong, Jieming Yu, Yiming Huang, Hongqiu Wang, Lei Zhu, Albert C. S. Chung, Hongliang Ren, Long BaiComments: Technical ReportSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [165] arXiv:2512.07590 [pdf, ps, other]
-
Title: Robust Variational Model Based Tailored UNet: Leveraging Edge Detector and Mean Curvature for Improved Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [166] arXiv:2512.07584 [pdf, ps, other]
-
Title: LongCat-Image Technical ReportAuthors: Meituan LongCat Team: Hanghang Ma, Haoxian Tan, Jiale Huang, Junqiang Wu, Jun-Yan He, Lishuai Gao, Songlin Xiao, Xiaoming Wei, Xiaoqi Ma, Xunliang Cai, Yayong Guan, Jie HuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [167] arXiv:2512.07580 [pdf, ps, other]
-
Title: All You Need Are Random Visual Tokens? Demystifying Token Pruning in VLLMsAuthors: Yahong Wang, Juncheng Wu, Zhangkai Ni, Longzhen Yang, Yihang Liu, Chengmei Yang, Ying Wen, Xianfeng Tang, Hui Liu, Yuyin Zhou, Lianghua HeSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [168] arXiv:2512.07568 [pdf, ps, other]
-
Title: Dual-Stream Cross-Modal Representation Learning via Residual Semantic DecorrelationAuthors: Xuecheng Li, Weikuan Jia, Alisher Kurbonaliev, Qurbonaliev Alisher, Khudzhamkulov Rustam, Ismoilov Shuhratjon, Eshmatov Javhariddin, Yuanjie ZhengSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
- [169] arXiv:2512.07564 [pdf, ps, other]
-
Title: Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language ModelsComments: 24 pages, 3 figures, 2 tables. Training-free self-correction framework for vision-language models. Code and implementation details will be released at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [170] arXiv:2512.07527 [pdf, ps, other]
-
Title: From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite ImagesAuthors: Fei Yu, Yu Liu, Luyang Tang, Mingchao Sun, Zengye Ge, Rui Bu, Yuchao Jin, Haisen Zhao, He Sun, Yangyan Li, Mu Xu, Wenzheng Chen, Baoquan ChenSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [171] arXiv:2512.07514 [pdf, ps, other]
-
Title: MeshRipple: Structured Autoregressive Generation of Artist-MeshesAuthors: Junkai Lin, Hang Long, Huipeng Guo, Jielei Zhang, JiaYi Yang, Tianle Guo, Yang Yang, Jianwen Li, Wenxiao Zhang, Matthias Nießner, Wei YangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [172] arXiv:2512.07504 [pdf, ps, other]
-
Title: ControlVP: Interactive Geometric Refinement of AI-Generated Images with Consistent Vanishing PointsComments: Accepted to WACV 2026, 8 pages, supplementary included. Dataset and code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [173] arXiv:2512.07503 [pdf, ps, other]
-
Title: SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image GenerationAuthors: Yao Teng, Zhihuan Jiang, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [174] arXiv:2512.07500 [pdf, ps, other]
-
Title: MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion TransformerSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [175] arXiv:2512.07498 [pdf, ps, other]
-
Title: Towards Robust DeepFake Detection under Unstable Face Sequences: Adaptive Sparse Graph Embedding with Order-Free Representation and Explicit Laplacian Spectral PriorComments: 16 pages (including appendix)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [176] arXiv:2512.07480 [pdf, ps, other]
-
Title: Single-step Diffusion-based Video Coding with Semantic-Temporal GuidanceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [177] arXiv:2512.07469 [pdf, ps, other]
-
Title: Unified Video Editing with Temporal ReasonerComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [178] arXiv:2512.07426 [pdf, ps, other]
-
Title: When normalization hallucinates: unseen risks in AI-powered whole slide image processingAuthors: Karel Moens, Matthew B. Blaschko, Tinne Tuytelaars, Bart Diricx, Jonas De Vylder, Mustafa YousifComments: 4 pages, accepted for oral presentation at SPIE Medical Imaging, 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [179] arXiv:2512.07415 [pdf, ps, other]
-
Title: Data-driven Exploration of Mobility Interaction PatternsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [180] arXiv:2512.07410 [pdf, ps, other]
-
Title: InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction GraphsAuthors: Bin Li, Ruichi Zhang, Han Liang, Jingyan Zhang, Juze Zhang, Xin Chen, Lan Xu, Jingyi Yu, Jingya WangComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [181] arXiv:2512.07394 [pdf, ps, other]
-
Title: Reconstructing Objects along Hand Interaction Timelines in Egocentric VideoComments: webpage: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [182] arXiv:2512.07391 [pdf, ps, other]
-
Title: GlimmerNet: A Lightweight Grouped Dilated Depthwise Convolutions for UAV-Based Emergency MonitoringAuthors: Đorđe NedeljkovićSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [183] arXiv:2512.07385 [pdf, ps, other]
-
Title: How Far are Modern Trackers from UAV-Anti-UAV? A Million-Scale Benchmark and New BaselineAuthors: Chunhui Zhang, Li Liu, Zhipeng Zhang, Yong Wang, Hao Wen, Xi Zhou, Shiming Ge, Yanfeng WangComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [184] arXiv:2512.07383 [pdf, ps, other]
-
Title: LogicCBMs: Logic-Enhanced Concept-Based LearningComments: 18 pages, 19 figures, WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [185] arXiv:2512.07381 [pdf, ps, other]
-
Title: Tessellation GS: Neural Mesh Gaussians for Robust Monocular Reconstruction of Dynamic ObjectsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [186] arXiv:2512.07379 [pdf, ps, other]
-
Title: Enhancing Small Object Detection with YOLO: A Novel Framework for Improved Accuracy and EfficiencyComments: 22 pages, 16 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [187] arXiv:2512.07360 [pdf, ps, other]
-
Title: Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-Free Open-Vocabulary Semantic SegmentationComments: Accepted to WACV2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [188] arXiv:2512.07351 [pdf, ps, other]
-
Title: DeepAgent: A Dual Stream Multi Agent Fusion for Robust Multimodal Deepfake DetectionAuthors: Sayeem Been Zaman, Wasimul Karim, Arefin Ittesafun Abian, Reem E. Mohamed, Md Rafiqul Islam, Asif Karim, Sami AzamSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD)
- [189] arXiv:2512.07348 [pdf, ps, other]
-
Title: MICo-150K: A Comprehensive Dataset Advancing Multi-Image CompositionAuthors: Xinyu Wei, Kangrui Cen, Hongyang Wei, Zhen Guo, Bairui Li, Zeqing Wang, Jinrui Zhang, Lei ZhangComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [190] arXiv:2512.07345 [pdf, ps, other]
-
Title: Debiasing Diffusion Priors via 3D Attention for Consistent Gaussian SplattingComments: 15 pages, 8 figures, 5 tables, 2 algorithms, Accepted by AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [191] arXiv:2512.07338 [pdf, ps, other]
-
Title: Generalized Referring Expression Segmentation on Aerial PhotosComments: Submitted to IEEE J-STARSSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [192] arXiv:2512.07331 [pdf, ps, other]
-
Title: The Inductive Bottleneck: Data-Driven Emergence of Representational Sparsity in Vision TransformersAuthors: Kanishk AwadhiyaSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [193] arXiv:2512.07328 [pdf, ps, other]
-
Title: ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [194] arXiv:2512.07305 [pdf, ps, other]
-
Title: Reevaluating Automated Wildlife Species Detection: A Reproducibility Study on a Custom Image DatasetAuthors: Tobias Abraham HaiderSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [195] arXiv:2512.07302 [pdf, ps, other]
-
Title: Towards Accurate UAV Image Perception: Guiding Vision-Language Models with Stronger Task PromptsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [196] arXiv:2512.07276 [pdf, ps, other]
-
Title: Geo3DVQA: Evaluating Vision-Language Models for 3D Geospatial Reasoning from Aerial ImageryComments: Accepted to WACV 2026. Camera-ready-based version with minor edits for readability (no change in the contents)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [197] arXiv:2512.07275 [pdf, ps, other]
- [198] arXiv:2512.07273 [pdf, ps, other]
-
Title: RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language TranslationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [199] arXiv:2512.07269 [pdf, ps, other]
-
Title: A graph generation pipeline for critical infrastructures based on heuristics, images and depth dataSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [200] arXiv:2512.07253 [pdf, ps, other]
-
Title: DGGAN: Degradation Guided Generative Adversarial Network for Real-time Endoscopic Video EnhancementComments: 18 pages, 8 figures, and 7 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [201] arXiv:2512.07251 [pdf, ps, other]
-
Title: See More, Change Less: Anatomy-Aware Diffusion for Contrast EnhancementAuthors: Junqi Liu, Zejun Wu, Pedro R. A. S. Bassi, Xinze Zhou, Wenxuan Li, Ibrahim E. Hamamci, Sezgin Er, Tianyu Lin, Yi Luo, Szymon Płotka, Bjoern Menze, Daguang Xu, Kai Ding, Kang Wang, Yang Yang, Yucheng Tang, Alan L. Yuille, Zongwei ZhouSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [202] arXiv:2512.07247 [pdf, ps, other]
-
Title: AdLift: Lifting Adversarial Perturbations to Safeguard 3D Gaussian Splatting Assets Against Instruction-Driven EditingComments: 40 pages, 34 figures, 18 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [203] arXiv:2512.07245 [pdf, ps, other]
-
Title: Zero-Shot Textual Explanations via Translating Decision-Critical FeaturesComments: 11+6 pages, 8 figures, 4 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [204] arXiv:2512.07241 [pdf, ps, other]
-
Title: Squeezed-Eff-Net: Edge-Computed Boost of Tomography Based Brain Tumor Classification leveraging Hybrid Neural Network ArchitectureAuthors: Md. Srabon Chowdhury, Syeda Fahmida Tanzim, Sheekar Banerjee, Ishtiak Al Mamoon, AKM Muzahidul IslamSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [205] arXiv:2512.07237 [pdf, ps, other]
-
Title: Unified Camera Positional Encoding for Controlled Video GenerationAuthors: Cheng Zhang, Boying Li, Meng Wei, Yan-Pei Cao, Camilo Cruz Gambardella, Dinh Phung, Jianfei CaiComments: Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [206] arXiv:2512.07234 [pdf, ps, other]
-
Title: Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [207] arXiv:2512.07230 [pdf, ps, other]
-
Title: STRinGS: Selective Text Refinement in Gaussian SplattingAuthors: Abhinav Raundhal, Gaurav Behera, P J Narayanan, Ravi Kiran Sarvadevabhatla, Makarand TapaswiComments: Accepted to WACV 2026. Project Page, see this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [208] arXiv:2512.07229 [pdf, ps, other]
-
Title: ReLKD: Inter-Class Relation Learning with Knowledge Distillation for Generalized Category DiscoveryComments: Accepted to the Main Track of the 28th European Conference on Artificial Intelligence (ECAI 2025). To appear in the proceedings published by IOS Press (DOI: 10.3233/FAIA413)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [209] arXiv:2512.07228 [pdf, ps, other]
-
Title: Towards Robust Protective Perturbation against DeepFake Face SwappingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [210] arXiv:2512.07215 [pdf, ps, other]
-
Title: VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose EstimationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [211] arXiv:2512.07211 [pdf, ps, other]
-
Title: Object Pose Distribution Estimation for Determining Revolution and Reflection Uncertainty in Point CloudsComments: 8 pages, 8 figures, 5 tables, ICCR 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [212] arXiv:2512.07206 [pdf, ps, other]
-
Title: AutoLugano: A Deep Learning Framework for Fully Automated Lymphoma Segmentation and Lugano Staging on FDG-PET/CTAuthors: Boyang Pan, Zeyu Zhang, Hongyu Meng, Bin Cui, Yingying Zhang, Wenli Hou, Junhao Li, Langdi Zhong, Xiaoxiao Chen, Xiaoyu Xu, Changjin Zuo, Chao Cheng, Nan-Jie GongSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [213] arXiv:2512.07203 [pdf, ps, other]
-
Title: MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent ReasoningComments: 7 pages, 1 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [214] arXiv:2512.07201 [pdf, ps, other]
-
Title: Understanding Diffusion Models via Code ExecutionAuthors: Cheng YuSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [215] arXiv:2512.07198 [pdf, ps, other]
-
Title: Generating Storytelling Images with Rich Chains-of-ReasoningSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [216] arXiv:2512.07197 [pdf, ps, other]
-
Title: SUCCESS-GS: Survey of Compactness and Compression for Efficient Static and Dynamic Gaussian SplattingComments: The first three authors contributed equally to this work. The last two authors are co-corresponding authors. Please visit our project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [217] arXiv:2512.07192 [pdf, ps, other]
-
Title: HVQ-CGIC: Enabling Hyperprior Entropy Modeling for VQ-Based Controllable Generative Image CompressionComments: 12 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [218] arXiv:2512.07191 [pdf, ps, other]
-
Title: RefLSM: Linearized Structural-Prior Reflectance Model for Medical Image Segmentation and Bias-Field CorrectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [219] arXiv:2512.07190 [pdf, ps, other]
-
Title: Integrating Multi-scale and Multi-filtration Topological Features for Medical Image ClassificationAuthors: Pengfei Gu, Huimin Li, Haoteng Tang, Dongkuan (DK)Xu, Erik Enriquez, DongChul Kim, Bin Fu, Danny Z. ChenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [220] arXiv:2512.07186 [pdf, ps, other]
-
Title: START: Spatial and Textual Learning for Chart UnderstandingComments: WACV2026 Camera ReadySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [221] arXiv:2512.07171 [pdf, ps, other]
-
Title: TIDE: Two-Stage Inverse Degradation Estimation with Guided Prior Disentanglement for Underwater Image RestorationComments: 21 pages, 11 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [222] arXiv:2512.07170 [pdf, ps, other]
-
Title: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer ApproachSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [223] arXiv:2512.07166 [pdf, ps, other]
-
Title: When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM EditingComments: 9 pages,7figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [224] arXiv:2512.07165 [pdf, ps, other]
-
Title: MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale AdaptationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [225] arXiv:2512.07155 [pdf, ps, other]
-
Title: CHIMERA: Adaptive Cache Injection and Semantic Anchor Prompting for Zero-shot Image Morphing with Morphing-oriented MetricsComments: Please visit our project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [226] arXiv:2512.07141 [pdf, ps, other]
-
Title: Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [227] arXiv:2512.07136 [pdf, ps, other]
-
Title: A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and ReasoningAuthors: Siyang Jiang, Mu Yuan, Xiang Ji, Bufang Yang, Zeyu Liu, Lilin Xu, Yang Li, Yuting He, Liran Dong, Wenrui Lu, Zhenyu Yan, Xiaofan Jiang, Wei Gao, Hongkai Chen, Guoliang XingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [228] arXiv:2512.07135 [pdf, ps, other]
-
Title: TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement LearningAuthors: Zebin Xing, Pengxuan Yang, Linbo Wang, Yichen Zhang, Yiming Hu, Yupeng Zheng, Junli Wang, Yinfeng Gao, Guang Li, Kun Ma, Long Chen, Zhongpu Xia, Qichao Zhang, Hangjun Ye, Dongbin ZhaoSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [229] arXiv:2512.07128 [pdf, ps, other]
-
Title: MulCLIP: A Multi-level Alignment Framework for Enhancing Fine-grained Long-context CLIPSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [230] arXiv:2512.07126 [pdf, ps, other]
-
Title: Training-free Clothing Region of Interest Self-correction for Virtual Try-OnComments: 16 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [231] arXiv:2512.07110 [pdf, ps, other]
-
Title: MSN: Multi-directional Similarity Network for Hand-crafted and Deep-synthesized Copy-Move Forgery DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [232] arXiv:2512.07107 [pdf, ps, other]
-
Title: COREA: Coarse-to-Fine 3D Representation Alignment Between Relightable 3D Gaussians and SDF via Bidirectional 3D-to-3D SupervisionComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [233] arXiv:2512.07078 [pdf, ps, other]
-
Title: DFIR-DETR: Frequency Domain Enhancement and Dynamic Feature Aggregation for Cross-Scene Small Object DetectionComments: 16 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [234] arXiv:2512.07076 [pdf, ps, other]
-
Title: Context-measure: Contextualizing Metric for CamouflageComments: Technical ReportSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [235] arXiv:2512.07065 [pdf, ps, other]
-
Title: Persistent Homology-Guided Frequency Filtering for Image CompressionComments: 17 pages, 8 figures, code available at github.com/RMATH3/persistent-homology-compressionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [236] arXiv:2512.07062 [pdf, ps, other]
-
Title: $\mathrm{D}^{\mathrm{3}}$-Predictor: Noise-Free Deterministic Diffusion for Dense PredictionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [237] arXiv:2512.07052 [pdf, ps, other]
-
Title: RAVE: Rate-Adaptive Visual Encoding for 3D Gaussian SplattingAuthors: Hoang-Nhat Tran, Francesco Di Sario, Gabriele Spadaro, Giuseppe Valenzise, Enzo TartaglioneSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [238] arXiv:2512.07051 [pdf, ps, other]
-
Title: DAUNet: A Lightweight UNet Variant with Deformable Convolutions and Parameter-Free Attention for Medical Image SegmentationComments: 11 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [239] arXiv:2512.07037 [pdf, ps, other]
-
Title: Evaluating and Preserving High-level Fidelity in Super-ResolutionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [240] arXiv:2512.07034 [pdf, ps, other]
-
Title: Power of Boundary and Reflection: Semantic Transparent Object Segmentation using Pyramid Vision Transformer with Transparent CuesAuthors: Tuan-Anh Vu, Hai Nguyen-Truong, Ziqiang Zheng, Binh-Son Hua, Qing Guo, Ivor Tsang, Sai-Kit YeungComments: Accepted to WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [241] arXiv:2512.06981 [pdf, ps, other]
-
Title: Selective Masking based Self-Supervised Learning for Image Semantic SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [242] arXiv:2512.06949 [pdf, ps, other]
-
Title: Can We Go Beyond Visual Features? Neural Tissue Relation Modeling for Relational Graph Analysis in Non-Melanoma Skin HistologyComments: 19 pages, 5 figures, 2 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [243] arXiv:2512.06921 [pdf, ps, other]
-
Title: NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy IdentificationAuthors: Ziyang Song, Zelin Zang, Xiaofan Ye, Boqiang Xu, Long Bai, Jinlin Wu, Hongliang Ren, Hongbin Liu, Jiebo Luo, Zhen LeiComments: Accepted by IEEE ICIA 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [244] arXiv:2512.06905 [pdf, ps, other]
-
Title: Scaling Zero-Shot Reference-to-Video GenerationAuthors: Zijian Zhou, Shikun Liu, Haozhe Liu, Haonan Qiu, Zhaochong An, Weiming Ren, Zhiheng Liu, Xiaoke Huang, Kam Woh Ng, Tian Xie, Xiao Han, Yuren Cong, Hang Li, Chuyan Zhu, Aditya Patel, Tao Xiang, Sen HeComments: Website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [245] arXiv:2512.06888 [pdf, ps, other]
-
Title: Overcoming Small Data Limitations in Video-Based Infant Respiration EstimationAuthors: Liyang Song, Hardik Bishnoi, Sai Kumar Reddy Manne, Sarah Ostadabbas, Briana J. Taylor, Michael WanSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [246] arXiv:2512.06886 [pdf, ps, other]
-
Title: Balanced Learning for Domain Adaptive Semantic SegmentationComments: Accepted by International Conference on Machine Learning (ICML 2025)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [247] arXiv:2512.06885 [pdf, ps, other]
-
Title: JoPano: Unified Panorama Generation via Joint ModelingComments: Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [248] arXiv:2512.06882 [pdf, ps, other]
-
Title: Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian FusionComments: Accepted to BMVC 2025 (Sheffield, UK, Nov 24-27, 2025). Supplementary video and poster available upon requestSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [249] arXiv:2512.06877 [pdf, ps, other]
-
Title: SceneMixer: Exploring Convolutional Mixing Networks for Remote Sensing Scene ClassificationComments: Accepted and presented in ICSPISSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [250] arXiv:2512.06870 [pdf, ps, other]
-
Title: Towards Robust Pseudo-Label Learning in Semantic Segmentation: An Encoding PerspectiveComments: Accepted by Conference on Neural Information Processing Systems (NeurIPS 2025)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [251] arXiv:2512.06866 [pdf, ps, other]
-
Title: Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe PriorComments: Accepted by NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [252] arXiv:2512.06865 [pdf, ps, other]
-
Title: Spatial Retrieval Augmented Autonomous DrivingAuthors: Xiaosong Jia, Chenhe Zhang, Yule Jiang, Songbur Wong, Zhiyuan Zhang, Chen Chen, Shaofeng Zhang, Xuanhe Zhou, Xue Yang, Junchi Yan, Yu-Gang JiangComments: Demo Page: this https URL with open sourced code, dataset, and checkpointsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [253] arXiv:2512.06864 [pdf, ps, other]
-
Title: Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-TrainingComments: Accepted to WACV 2026. arXiv admin note: substantial text overlap with arXiv:2508.19808Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [254] arXiv:2512.06862 [pdf, ps, other]
-
Title: Omni-Referring Image SegmentationAuthors: Qiancheng Zheng, Yunhang Shen, Gen Luo, Baiyang Song, Xing Sun, Xiaoshuai Sun, Yiyi Zhou, Rongrong JiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [255] arXiv:2512.06849 [pdf, ps, other]
-
Title: Hide-and-Seek Attribution: Weakly Supervised Segmentation of Vertebral Metastases in CTAuthors: Matan Atad, Alexander W. Marka, Lisa Steinhelfer, Anna Curto-Vilalta, Yannik Leonhardt, Sarah C. Foreman, Anna-Sophia Walburga Dietrich, Robert Graf, Alexandra S. Gersing, Bjoern Menze, Daniel Rueckert, Jan S. Kirschke, Hendrik MöllerComments: In submissionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [256] arXiv:2512.06845 [pdf, ps, other]
-
Title: Pseudo Anomalies Are All You Need: Diffusion-Based Generation for Weakly-Supervised Video Anomaly DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [257] arXiv:2512.06840 [pdf, ps, other]
-
Title: CADE: Continual Weakly-supervised Video Anomaly Detection with EnsemblesComments: Accepted to WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [258] arXiv:2512.06838 [pdf, ps, other]
-
Title: SparseCoop: Cooperative Perception with Kinematic-Grounded QueriesAuthors: Jiahao Wang, Zhongwei Jiang, Wenchao Sun, Jiaru Zhong, Haibao Yu, Yuner Zhang, Chenyang Lu, Chuang Zhang, Lei He, Shaobing Xu, Jianqiang WangComments: Accepted by AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [259] arXiv:2512.06818 [pdf, ps, other]
-
Title: MeshSplatting: Differentiable Rendering with Opaque MeshesAuthors: Jan Held, Sanghyun Son, Renaud Vandeghen, Daniel Rebain, Matheus Gadelha, Yi Zhou, Anthony Cioppa, Ming C. Lin, Marc Van Droogenbroeck, Andrea TagliasacchiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [260] arXiv:2512.06811 [pdf, ps, other]
-
Title: RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language ModelsComments: Accepted by AAAI 2026(Oral)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
- [261] arXiv:2512.06810 [pdf, ps, other]
-
Title: MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement LearningAuthors: Yueqian Wang, Songxiang Liu, Disong Wang, Nuo Xu, Guanglu Wan, Huishuai Zhang, Dongyan ZhaoSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [262] arXiv:2512.06802 [pdf, ps, other]
-
Title: VDOT: Efficient Unified Video Creation via Optimal Transport DistillationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [263] arXiv:2512.06793 [pdf, ps, other]
-
Title: Generalized Geometry Encoding Volume for Real-time Stereo MatchingComments: Accepted by AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [264] arXiv:2512.06783 [pdf, ps, other]
-
Title: Physics Informed Human Posture Estimation Based on 3D Landmarks from Monocular RGB-VideosComments: 16 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [265] arXiv:2512.06774 [pdf, ps, other]
-
Title: RDSplat: Robust Watermarking Against Diffusion Editing for 3D Gaussian SplattingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [266] arXiv:2512.06769 [pdf, ps, other]
-
Title: Stitch and Tell: A Structured Multimodal Data Augmentation Method for Spatial UnderstandingAuthors: Hang Yin, Xiaomin He, PeiWen Yuan, Yiwei Li, Jiayi Shi, Wenxiao Fan, Shaoxiong Feng, Kan LiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [267] arXiv:2512.06763 [pdf, ps, other]
-
Title: JOCA: Task-Driven Joint Optimisation of Camera Hardware and Adaptive Camera Control AlgorithmsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [268] arXiv:2512.06759 [pdf, ps, other]
-
Title: VisChainBench: A Benchmark for Multi-Turn, Multi-Image Visual Reasoning Beyond Language PriorsComments: 12 pages,13figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [269] arXiv:2512.06750 [pdf, ps, other]
-
Title: UARE: A Unified Vision-Language Model for Image Quality Assessment, Restoration, and EnhancementAuthors: Weiqi Li, Xuanyu Zhang, Bin Chen, Jingfen Xie, Yan Wang, Kexin Zhang, Junlin Li, Li Zhang, Jian Zhang, Shijie ZhaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [270] arXiv:2512.06746 [pdf, ps, other]
-
Title: Task-Model Alignment: A Simple Path to Generalizable AI-Generated Image DetectionAuthors: Ruoxin Chen, Jiahui Gao, Kaiqing Lin, Keyue Zhang, Yandan Zhao, Isabel Guan, Taiping Yao, Shouhong DingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [271] arXiv:2512.06738 [pdf, ps, other]
-
Title: FedSCAl: Leveraging Server and Client Alignment for Unsupervised Federated Source-Free Domain AdaptationComments: Accepted to Winter Conference on Applications of Computer Vision (WACV) 2026, Round 1Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [272] arXiv:2512.06736 [pdf, ps, other]
-
Title: Graph Convolutional Long Short-Term Memory Attention Network for Post-Stroke Compensatory Movement Detection Based on Skeleton DataSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [273] arXiv:2512.06726 [pdf, ps, other]
-
Title: The Role of Entropy in Visual Grounding: Analysis and OptimizationAuthors: Shuo Li, Jiajun Sun, Zhihao Zhang, Xiaoran Fan, Senjie Jin, Hui Li, Yuming Yang, Junjie Ye, Lixing Shen, Tao Ji, Tao Gui, Qi Zhang, Xuanjing HuangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [274] arXiv:2512.06689 [pdf, ps, other]
-
Title: Lightweight Wasserstein Audio-Visual Model for Unified Speech Enhancement and SeparationComments: Accepted to ASRU 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
- [275] arXiv:2512.06684 [pdf, ps, other]
-
Title: EMGauss: Continuous Slice-to-3D Reconstruction via Dynamic Gaussian Modeling in Volume Electron MicroscopySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [276] arXiv:2512.06674 [pdf, ps, other]
-
Title: RunawayEvil: Jailbreaking the Image-to-Video Generative ModelsAuthors: Songping Wang, Rufan Qian, Yueming Lyu, Qinglong Liu, Linzhuang Zou, Jie Qin, Songhua Liu, Caifeng ShanSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [277] arXiv:2512.06673 [pdf, ps, other]
-
Title: 1 + 1 > 2: Detector-Empowered Video Large Language Model for Spatio-Temporal Grounding and ReasoningAuthors: Shida Gao, Feng Xue, Xiangfeng Wang, Anlong Ming, Teng Long, Yihua Shao, Haozhe Wang, Zhaowen Lin, Wei Wang, Nicu SebeSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [278] arXiv:2512.06663 [pdf, ps, other]
-
Title: CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language TasksSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [279] arXiv:2512.06662 [pdf, ps, other]
-
Title: Personalized Image Descriptions from Attention SequencesAuthors: Ruoyu Xue, Hieu Le, Jingyi Xu, Sounak Mondal, Abe Leite, Gregory Zelinsky, Minh Hoai, Dimitris SamarasComments: 10 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [280] arXiv:2512.06657 [pdf, ps, other]
-
Title: TextMamba: Scene Text Detector with MambaSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [281] arXiv:2512.06642 [pdf, ps, other]
-
Title: Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-ResolutionAuthors: Achmad Ardani Prasha, Clavino Ourizqi Rachmadi, Muhamad Fauzan Ibnu Syahlan, Naufal Rahfi Anugerah, Nanda Garin Raditya, Putri Amelia, Sabrina Laila Mutiara, Hilman Syachr RamadhanComments: 21 pages, 7 figures, 3 tableSubjects: Computer Vision and Pattern Recognition (cs.CV); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [282] arXiv:2512.06613 [pdf, ps, other]
-
Title: Hierarchical Deep Learning for Diatom Image Classification: A Multi-Level Taxonomic ApproachAuthors: Yueying KeComments: 10 pages, 6 figures, 2 tables, IEEE conference format. Submitted as course projectSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [283] arXiv:2512.06612 [pdf, ps, other]
-
Title: Learning Relative Gene Expression Trends from Pathology Images in Spatial TranscriptomicsComments: Neurips 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [284] arXiv:2512.06598 [pdf, ps, other]
-
Title: From Remote Sensing to Multiple Time Horizons Forecasts: Transformers Model for CyanoHAB Intensity in Lake ChamplainAuthors: Muhammad Adil, Patrick J. Clemins, Andrew W. Schroth, Panagiotis D. Oikonomou, Donna M. Rizzo, Peter D. F. Isles, Xiaohan Zhang, Kareem I. Hannoun, Scott Turnbull, Noah B. Beckage, Asim Zia, Safwan WshahComments: 23 pages, 15 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [285] arXiv:2512.06581 [pdf, ps, other]
-
Title: MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video UnderstandingAuthors: Yuhao Su, Anwesa Choudhuri, Zhongpai Gao, Benjamin Planche, Van Nguyen Nguyen, Meng Zheng, Yuhan Shen, Arun Innanje, Terrence Chen, Ehsan Elhamifar, Ziyan WuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [286] arXiv:2512.06575 [pdf, ps, other]
-
Title: Proof of Concept for Mammography Classification with Enhanced Compactness and Separability ModulesAuthors: Fariza DahesComments: 26 pages, 16 figures, 2 tables; proof of concept on mammography classification with compactness/separability modules and interactive dashboard; preprint submitted to arXiv cs.LGSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [287] arXiv:2512.06565 [pdf, ps, other]
-
Title: GNC-Pose: Geometry-Aware GNC-PnP for Accurate 6D Pose EstimationAuthors: Xiujin LiuComments: 1 figures, 2 tables, 14pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [288] arXiv:2512.06562 [pdf, ps, other]
-
Title: SUGAR: A Sweeter Spot for Generative Unlearning of Many IdentitiesAuthors: Dung Thuy Nguyen, Quang Nguyen, Preston K. Robinette, Eli Jiang, Taylor T. Johnson, Kevin LeachSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [289] arXiv:2512.06560 [pdf, ps, other]
-
Title: Bridging spatial awareness and global context in medical image segmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [290] arXiv:2512.06531 [pdf, ps, other]
-
Title: Novel Deep Learning Architectures for Classification and Segmentation of Brain Tumors from MRI ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [291] arXiv:2512.06530 [pdf, ps, other]
-
Title: On The Role of K-Space Acquisition in MRI Reconstruction Domain-GeneralizationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [292] arXiv:2512.06521 [pdf, ps, other]
-
Title: ShadowWolf -- Automatic Labelling, Evaluation and Model Training Optimised for Camera Trap Wildlife ImagesAuthors: Jens Dede (1), Anna Förster (1) ((1) Department of Sustainable Communication Networks, University of Bremen, Bibliothekstr. 1, 28359, Bremen, Bremen, Germany)Comments: 31 pages + appendixSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [293] arXiv:2512.06504 [pdf, ps, other]
-
Title: Method of UAV Inspection of Photovoltaic Modules Using Thermal and RGB Data FusionAuthors: Andrii Lysyi, Anatoliy Sachenko, Pavlo Radiuk, Mykola Lysyi, Oleksandr Melnychenko, Diana ZahorodniaSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [294] arXiv:2512.06485 [pdf, ps, other]
-
Title: Sanvaad: A Multimodal Accessibility Framework for ISL Recognition and Voice-Based InteractionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [295] arXiv:2512.06447 [pdf, ps, other]
-
Title: Towards Stable Cross-Domain Depression Recognition under Missing ModalitiesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [296] arXiv:2512.06438 [pdf, ps, other]
-
Title: AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head AvatarsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [297] arXiv:2512.06434 [pdf, ps, other]
-
Title: Automated Deep Learning Estimation of Anthropometric Measurements for Preparticipation Cardiovascular ScreeningComments: 8 pages, 2 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [298] arXiv:2512.06426 [pdf, ps, other]
-
Title: When Gender is Hard to See: Multi-Attribute Support for Long-Range RecognitionComments: 12 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [299] arXiv:2512.06424 [pdf, ps, other]
-
Title: DragMesh: Interactive 3D Generation Made EasySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [300] arXiv:2512.06422 [pdf, ps, other]
-
Title: A Perception CNN for Facial Expression RecognitionComments: in IEEE Transactions on Image Processing (2025)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [301] arXiv:2512.06421 [pdf, ps, other]
-
Title: Rethinking Training Dynamics in Scale-wise Autoregressive GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [302] arXiv:2512.06400 [pdf, ps, other]
-
Title: Perceptual Region-Driven Infrared-Visible Co-Fusion for Extreme Scene EnhancementComments: The paper has been accepted and officially published by IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENTSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [303] arXiv:2512.06379 [pdf, ps, other]
- [304] arXiv:2512.06377 [pdf, ps, other]
- [305] arXiv:2512.06376 [pdf, ps, other]
-
Title: Are AI-Generated Driving Videos Ready for Autonomous Driving? A Diagnostic Evaluation FrameworkSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [306] arXiv:2512.06373 [pdf, ps, other]
-
Title: VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement LearningComments: The project page is [this url](this https URL)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [307] arXiv:2512.06368 [pdf, ps, other]
-
Title: HuPrior3R: Incorporating Human Priors for Better 3D Dynamic Reconstruction from Monocular VideosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [308] arXiv:2512.06363 [pdf, ps, other]
-
Title: Spoofing-aware Prompt Learning for Unified Physical-Digital Facial Attack DetectionAuthors: Jiabao Guo, Yadian Wang, Hui Ma, Yuhao Fu, Ju Jia, Hui Liu, Shengeng Tang, Lechao Cheng, Yunfeng Diao, Ajian LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [309] arXiv:2512.06358 [pdf, ps, other]
-
Title: Rectifying Latent Space for Generative Single-Image Reflection RemovalSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [310] arXiv:2512.06353 [pdf, ps, other]
-
Title: TreeQ: Pushing the Quantization Boundary of Diffusion Transformer via Tree-Structured Mixed-Precision SearchAuthors: Kaicheng Yang, Kaisen Yang, Baiting Wu, Xun Zhang, Qianrui Yang, Haotong Qin, He Zhang, Yulun ZhangComments: Code and Supplementary Material could be found at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [311] arXiv:2512.06345 [pdf, ps, other]
-
Title: CLUENet: Cluster Attention Makes Neural Networks Have EyesComments: 10 pages, 6 figures, 2026 Association for the Advancement of Artificial IntelligenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [312] arXiv:2512.06344 [pdf, ps, other]
-
Title: Beyond Hallucinations: A Multimodal-Guided Task-Aware Generative Image Compression for Ultra-Low BitrateSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [313] arXiv:2512.06332 [pdf, ps, other]
-
Title: CryoHype: Reconstructing a thousand cryo-EM structures with transformer-based hypernetworksSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [314] arXiv:2512.06330 [pdf, ps, other]
-
Title: S2WMamba: A Spectral-Spatial Wavelet Mamba for PansharpeningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [315] arXiv:2512.06328 [pdf, ps, other]
-
Title: ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language ModelsComments: Accepted as an Oral presentation at AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [316] arXiv:2512.06306 [pdf, ps, other]
-
Title: Exploiting Spatiotemporal Properties for Efficient Event-Driven Human Pose EstimationAuthors: Haoxian Zhou, Chuanzhi Xu, Langyi Chen, Haodong Chen, Yuk Ying Chung, Qiang Qu, Xaoming Chen, Weidong CaiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [317] arXiv:2512.06290 [pdf, ps, other]
-
Title: StrokeNet: Unveiling How to Learn Fine-Grained Interactions in Online Handwritten Stroke ClassificationComments: 17 pages, 5 figuresJournal-ref: ICDAR 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [318] arXiv:2512.06282 [pdf, ps, other]
-
Title: A Sleep Monitoring System Based on Audio, Video and Depth InformationComments: Accepted in the Computer Vision, Graphics and Image Processing (CVGIP 2013)Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [319] arXiv:2512.06281 [pdf, ps, other]
-
Title: Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language ModelsAuthors: Hengzhuang Li, Xinsong Zhang, Qiming Peng, Bin Luo, Han Hu, Dengyang Jiang, Han-Jia Ye, Teng Zhang, Hai JinSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [320] arXiv:2512.06276 [pdf, ps, other]
-
Title: RefBench-PRO: Perceptual and Reasoning Oriented Benchmark for Referring Expression ComprehensionAuthors: Tianyi Gao, Hao Li, Han Fang, Xin Wei, Xiaodong Dong, Hongbo Sun, Ye Yuan, Zhongjiang He, Jinglin Xu, Jingmin Xin, Hao SunSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [321] arXiv:2512.06275 [pdf, ps, other]
-
Title: FacePhys: State of the Heart LearningAuthors: Kegang Wang, Jiankai Tang, Yuntao Wang, Xin Liu, Yuxuan Fan, Jiatong Ji, Yuanchun Shi, Daniel McDuffSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [322] arXiv:2512.06269 [pdf, ps, other]
- [323] arXiv:2512.06258 [pdf, ps, other]
-
Title: Knowing the Answer Isn't Enough: Fixing Reasoning Path Failures in LVLMsAuthors: Chaoyang Wang, Yangfan He, Yiyang Zhou, Yixuan Wang, Jiaqi Liu, Peng Xia, Zhengzhong Tu, Mohit Bansal, Huaxiu YaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [324] arXiv:2512.06255 [pdf, ps, other]
-
Title: Language-driven Fine-grained RetrievalSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [325] arXiv:2512.06251 [pdf, ps, other]
-
Title: NexusFlow: Unifying Disparate Tasks under Partial Supervision via Invertible Flow NetworksAuthors: Fangzhou Lin, Yuping Wang, Yuliang Guo, Zixun Huang, Xinyu Huang, Haichong Zhang, Kazunori Yamada, Zhengzhong Tu, Liu Ren, Ziming ZhangComments: 12 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [326] arXiv:2512.06232 [pdf, ps, other]
-
Title: Opinion: Learning Intuitive Physics May Require More than Visual DataSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [327] arXiv:2512.06230 [pdf, ps, other]
-
Title: GPU-GLMB: Assessing the Scalability of GPU-Accelerated Multi-Hypothesis TrackingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [328] arXiv:2512.06221 [pdf, ps, other]
-
Title: Revisiting SVD and Wavelet Difference Reduction for Lossy Image Compression: A Reproducibility StudyAuthors: Alena MakarovaComments: 15 pages, 13 figures. Reproducibility studySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [329] arXiv:2512.06206 [pdf, ps, other]
-
Title: The MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024: Efficient and Robust Aggregation Methods for Federated LearningAuthors: Akis Linardos, Sarthak Pati, Ujjwal Baid, Brandon Edwards, Patrick Foley, Kevin Ta, Verena Chung, Micah Sheller, Muhammad Irfan Khan, Mojtaba Jafaritadi, Elina Kontio, Suleiman Khan, Leon Mächler, Ivan Ezhov, Suprosanna Shit, Johannes C. Paetzold, Gustav Grimberg, Manuel A. Nickel, David Naccache, Vasilis Siomos, Jonathan Passerat-Palmbach, Giacomo Tarroni, Daewoon Kim, Leonard L. Klausmann, Prashant Shah, Bjoern Menze, Dimitrios Makris, Spyridon BakasComments: Published at the Journal of Machine Learning for Biomedical Imaging (MELBA) this https URLJournal-ref: Machine.Learning.for.Biomedical.Imaging. 3 (2025)Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [330] arXiv:2512.06190 [pdf, ps, other]
-
Title: Multi-Modal Zero-Shot Prediction of Color Trajectories in Food DryingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [331] arXiv:2512.06185 [pdf, ps, other]
-
Title: SPOOF: Simple Pixel Operations for Out-of-Distribution FoolingComments: 10 pages with 8 figures, plus 13 pages and 16 figures of supplementary materialSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [332] arXiv:2512.06179 [pdf, ps, other]
-
Title: Physics-Grounded Attached Shadow Detection Using Approximate 3D Geometry and Light DirectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [333] arXiv:2512.06174 [pdf, ps, other]
-
Title: Physics-Grounded Shadow Generation from Monocular 3D Geometry Priors and Approximate Light DirectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [334] arXiv:2512.06171 [pdf, ps, other]
-
Title: Automated Annotation of Shearographic Measurements Enabling Weakly Supervised Defect DetectionComments: 11 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [335] arXiv:2512.06158 [pdf, ps, other]
-
Title: Tracking-Guided 4D Generation: Foundation-Tracker Motion Priors for 3D Model AnimationAuthors: Su Sun, Cheng Zhao, Himangi Mittal, Gaurav Mittal, Rohith Kukkala, Yingjie Victor Chen, Mei ChenComments: 15 pages, 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [336] arXiv:2512.06105 [pdf, ps, other]
-
Title: Explainable Melanoma Diagnosis with Contrastive Learning and LLM-based Report GenerationAuthors: Junwen Zheng, Xinran Xu, Li Rong Wang, Chang Cai, Lucinda Siyun Tan, Dingyuan Wang, Hong Liang Tey, Xiuyi FanComments: AAAI-26-AIASubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [337] arXiv:2512.06103 [pdf, ps, other]
-
Title: SpectraIrisPAD: Leveraging Vision Foundation Models for Spectrally Conditioned Multispectral Iris Presentation Attack DetectionComments: Accepted in IEEE T-BIOMSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [338] arXiv:2512.06096 [pdf, ps, other]
-
Title: BeLLA: End-to-End Birds Eye View Large Language Assistant for Autonomous DrivingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [339] arXiv:2512.06080 [pdf, ps, other]
-
Title: Shoot-Bounce-3D: Single-Shot Occlusion-Aware 3D from Lidar by Decomposing Two-Bounce LightAuthors: Tzofi Klinghoffer, Siddharth Somasundaram, Xiaoyu Xiang, Yuchen Fan, Christian Richardt, Akshat Dave, Ramesh Raskar, Rakesh RanjanComments: SIGGRAPH Asia 2025. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [340] arXiv:2512.06065 [pdf, ps, other]
-
Title: EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video EditingAuthors: Runjia Li, Moayed Haji-Ali, Ashkan Mirzaei, Chaoyang Wang, Arpit Sahni, Ivan Skorokhodov, Aliaksandr Siarohin, Tomas Jakab, Junlin Han, Sergey Tulyakov, Philip Torr, Willi MenapaceComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [341] arXiv:2512.06058 [pdf, ps, other]
-
Title: Representation Learning for Point Cloud UnderstandingAuthors: Siming YanComments: 181 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [342] arXiv:2512.06032 [pdf, ps, other]
-
Title: The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [343] arXiv:2512.06024 [pdf, ps, other]
-
Title: Neural reconstruction of 3D ocean wave hydrodynamics from camera sensingSubjects: Computer Vision and Pattern Recognition (cs.CV); Fluid Dynamics (physics.flu-dyn)
- [344] arXiv:2512.06020 [pdf, ps, other]
-
Title: PrefGen: Multimodal Preference Learning for Preference-Conditioned Image GenerationComments: Project Page: \href{https://prefgen.github.io/}{\texttt{this https URL}}Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [345] arXiv:2512.06014 [pdf, ps, other]
-
Title: Benchmarking CXR Foundation Models With Publicly Available MIMIC-CXR and NIH-CXR14 DatasetsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [346] arXiv:2512.06013 [pdf, ps, other]
-
Title: VAT: Vision Action Transformer by Unlocking Full Representation of ViTSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [347] arXiv:2512.06012 [pdf, ps, other]
-
Title: High-Throughput Unsupervised Profiling of the Morphology of 316L Powder Particles for Use in Additive ManufacturingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [348] arXiv:2512.06010 [pdf, other]
-
Title: Fast and Flexible Robustness Certificates for Semantic SegmentationAuthors: Thomas Massena (IRIT-MISFIT, DTIPG - SNCF, UT3), Corentin Friedrich, Franck Mamalet, Mathieu Serrurier (IRIT-MISFIT)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [349] arXiv:2512.06006 [pdf, ps, other]
-
Title: Simple Agents Outperform Experts in Biomedical Imaging Workflow OptimizationAuthors: Xuefei (Julie) Wang, Kai A. Horstmann, Ethan Lin, Jonathan Chen, Alexander R. Farhang, Sophia Stiles, Atharva Sehgal, Jonathan Light, David Van Valen, Yisong Yue, Jennifer J. SunSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [350] arXiv:2512.06003 [pdf, ps, other]
-
Title: PrunedCaps: A Case For Primary Capsules DiscriminationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [351] arXiv:2512.05996 [pdf, ps, other]
-
Title: FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and CountingComments: 18 pages, under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Robotics (cs.RO); Image and Video Processing (eess.IV)
- [352] arXiv:2512.05993 [pdf, ps, other]
-
Title: Domain-Specific Foundation Model Improves AI-Based Analysis of NeuropathologyAuthors: Ruchika Verma, Shrishtee Kandoi, Robina Afzal, Shengjia Chen, Jannes Jegminat, Michael W. Karlovich, Melissa Umphlett, Timothy E. Richardson, Kevin Clare, Quazi Hossain, Jorge Samanamud, Phyllis L. Faust, Elan D. Louis, Ann C. McKee, Thor D. Stein, Jonathan D. Cherry, Jesse Mez, Anya C. McGoldrick, Dalilah D. Quintana Mora, Melissa J. Nirenberg, Ruth H. Walker, Yolfrankcis Mendez, Susan Morgello, Dennis W. Dickson, Melissa E. Murray, Carlos Cordon-Cardo, Nadejda M. Tsankova, Jamie M. Walker, Diana K. Dangoor, Stephanie McQuillan, Emma L. Thorn, Claudia De Sanctis, Shuying Li, Thomas J. Fuchs, Kurt Farrell, John F. Crary, Gabriele CampanellaSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [353] arXiv:2512.05991 [pdf, ps, other]
-
Title: EmoDiffTalk:Emotion-aware Diffusion for Editable 3D Gaussian Talking HeadAuthors: Chang Liu, Tianjiao Jing, Chengcheng Ma, Xuanqi Zhou, Zhengxuan Lian, Qin Jin, Hongliang Yuan, Shi-Sheng HuangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [354] arXiv:2512.05988 [pdf, ps, other]
-
Title: VG3T: Visual Geometry Grounded Gaussian TransformerSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [355] arXiv:2512.05987 [pdf, ps, other]
-
Title: Adaptive Dataset Quantization: A New Direction for Dataset PruningComments: Accepted by ICCPR 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [356] arXiv:2512.05969 [pdf, ps, other]
-
Title: Video Models Start to Solve Chess, Maze, Sudoku, Mental Rotation, and Raven' MatricesAuthors: Hokin DengSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [357] arXiv:2512.07687 (cross-list from cs.CL) [pdf, ps, other]
-
Title: HalluShift++: Bridging Language and Vision through Internal Representation Shifts for Hierarchical Hallucinations in MLLMsSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [358] arXiv:2512.07576 (cross-list from eess.IV) [pdf, ps, other]
-
Title: R2MF-Net: A Recurrent Residual Multi-Path Fusion Network for Robust Multi-directional Spine X-ray SegmentationAuthors: Xuecheng Li, Weikuan Jia, Komildzhon Sharipov, Sharipov Hotam Beknazarovich, Farzona S. Ataeva, Qurbonaliev Alisher, Yuanjie ZhengSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [359] arXiv:2512.07574 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Precise Liver Tumor Segmentation in CT Using a Hybrid Deep Learning-Radiomics FrameworkAuthors: Xuecheng Li, Weikuan Jia, Komildzhon Sharipov, Alimov Ruslan, Lutfuloev Mazbutdzhon, Ismoilov Shuhratjon, Yuanjie ZhengSubjects: Image and Video Processing (eess.IV); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
- [360] arXiv:2512.07558 (cross-list from cs.LG) [pdf, ps, other]
-
Title: ReLaX: Reasoning with Latent Exploration for Large Reasoning ModelsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [361] arXiv:2512.07509 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Exploring possible vector systems for faster training of neural networks with preconfigured latent spacesAuthors: Nikita GabdullinComments: 9 pages, 5 figures, 1 table, 4 equationsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [362] arXiv:2512.07459 (cross-list from cs.GR) [pdf, ps, other]
-
Title: Human Geometry Distribution for 3D Animation GenerationSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
- [363] arXiv:2512.07437 (cross-list from cs.LG) [pdf, ps, other]
-
Title: KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World ModelsComments: 23 pages, 8 figures, 3 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
- [364] arXiv:2512.07419 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language ModelsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [365] arXiv:2512.07390 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Towards Reliable Test-Time Adaptation: Style Invariance as a Correctness LikelihoodComments: Accepted to WACV 2026Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [366] arXiv:2512.07355 (cross-list from cs.AI) [pdf, ps, other]
-
Title: A Geometric Unification of Concept Learning with Concept ConesComments: 22 pagesSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [367] arXiv:2512.07259 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Affine Subspace Models and Clustering for Patch-Based Image DenoisingComments: Asilomar Conference on Signals, Systems, and Computers 2025Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [368] arXiv:2512.07224 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Clinical Interpretability of Deep Learning Segmentation Through Shapley-Derived Agreement and Uncertainty MetricsAuthors: Tianyi Ren, Daniel Low, Pittra Jaengprajak, Juampablo Heras Rivera, Jacob Ruzevick, Mehmet KurtSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [369] arXiv:2512.07150 (cross-list from cs.LG) [pdf, ps, other]
-
Title: FlowLPS: Langevin-Proximal Sampling for Flow-based Inverse Problem SolversSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [370] arXiv:2512.07142 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Winning the Lottery by Preserving Network Training Dynamics with Concrete Ticket SearchComments: This work plans to be submitted to the IEEE for possible publicationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
- [371] arXiv:2512.07132 (cross-list from cs.CL) [pdf, ps, other]
-
Title: DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal ReasoningAuthors: Nithin Sivakumaran, Justin Chih-Yao Chen, David Wan, Yue Zhang, Jaehong Yoon, Elias Stengel-Eskin, Mohit BansalComments: Code: this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [372] arXiv:2512.07130 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Mimir: Hierarchical Goal-Driven Diffusion with Uncertainty Propagation for End-to-End Autonomous DrivingAuthors: Zebin Xing, Yupeng Zheng, Qichao Zhang, Zhixing Ding, Pengxuan Yang, Songen Gu, Zhongpu Xia, Dongbin ZhaoSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [373] arXiv:2512.07040 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Transformation of Biological Networks into Images via Semantic Cartography for Visual Interpretation and Scalable Deep AnalysisSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [374] arXiv:2512.06990 (cross-list from cs.AI) [pdf, ps, other]
-
Title: Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme PatientsSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [375] arXiv:2512.06963 (cross-list from cs.RO) [pdf, ps, other]
-
Title: VideoVLA: Video Generators Can Be Generalizable Robot ManipulatorsAuthors: Yichao Shen, Fangyun Wei, Zhiying Du, Yaobo Liang, Yan Lu, Jiaolong Yang, Nanning Zheng, Baining GuoComments: Project page: this https URLJournal-ref: The Thirty-ninth Annual Conference on Neural Information Processing Systems(NeurIPS2025)Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [376] arXiv:2512.06951 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR ChallengeComments: 2025 NeurIPS Behavior Challenge 1st place solutionSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [377] arXiv:2512.06868 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Dynamic Visual SLAM using a General 3D PriorComments: 8 pagesSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [378] arXiv:2512.06848 (cross-list from cs.CL) [pdf, ps, other]
-
Title: AquaFusionNet: Lightweight VisionSensor Fusion Framework for Real-Time Pathogen Detection and Water Quality Anomaly Prediction on Edge DevicesComments: 9Pages, 3 figure, Politeknik Negeri BanyuwangiSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [379] arXiv:2512.06757 (cross-list from cs.SD) [pdf, ps, other]
-
Title: XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice AssociationComments: FAME 2026 Technical ReportSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
- [380] arXiv:2512.06737 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Arc Gradient Descent: A Mathematically Derived Reformulation of Gradient Descent with Phase-Aware, User-Controlled Step DynamicsComments: 80 pages, 6 tables, 2 figures, 5 appendices, proof-of-conceptSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
- [381] arXiv:2512.06730 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Enhancing Interpretability of AR-SSVEP-Based Motor Intention Recognition via CNN-BiLSTM and SHAP Analysis on EEG DataSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [382] arXiv:2512.06665 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Rethinking Robustness: A New Approach to Evaluating Feature Attribution MethodsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [383] arXiv:2512.06649 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Estimating Black Carbon Concentration from Urban Traffic Using Vision-Based Machine LearningAuthors: Camellia Zakaria, Aryan Sadeghi, Weaam Jaafar, Junshi Xu, Alex Mariakakis, Marianne HatzopoulouComments: 12 pages, 16 figures, 4 tables, 4 pages Appendix, in submission and under review for ACM MobiSys 2026 as of December 6th, 2025Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Emerging Technologies (cs.ET)
- [384] arXiv:2512.06648 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Financial Fraud Identification and Interpretability Study for Listed Companies Based on Convolutional Neural NetworkAuthors: Xiao LiComments: in Chinese languageSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [385] arXiv:2512.06628 (cross-list from cs.RO) [pdf, ps, other]
-
Title: MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical AlignmentAuthors: Ruicheng Zhang, Mingyang Zhang, Jun Zhou, Zhangrui Guo, Xiaofan Liu, Zunnan Xu, Zhizhou Zhong, Puxin Yan, Haocheng Luo, Xiu LiSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [386] arXiv:2512.06609 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Vector Quantization using Gaussian Variational AutoencoderAuthors: Tongda Xu, Wendi Zheng, Jiajun He, Jose Miguel Hernandez-Lobato, Yan Wang, Ya-Qin Zhang, Jie TangSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [387] arXiv:2512.06589 (cross-list from cs.CR) [pdf, ps, other]
-
Title: OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense EvaluationAuthors: Xiaojun Jia, Jie Liao, Qi Guo, Teng Ma, Simeng Qin, Ranjie Duan, Tianlin Li, Yihao Huang, Zhitao Zeng, Dongxian Wu, Yiming Li, Wenqi Ren, Xiaochun Cao, Yang LiuSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
- [388] arXiv:2512.06147 (cross-list from cs.RO) [pdf, ps, other]
-
Title: GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind TravelersAuthors: Hochul Hwang, Soowan Yang, Jahir Sadik Monon, Nicholas A Giudice, Sunghoon Ivan Lee, Joydeep Biswas, Donghyun KimSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
- [389] arXiv:2512.06008 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Semantic Temporal Single-photon LiDARAuthors: Fang Li, Tonglin Mu, Shuling Li, Junran Guo, Keyuan Li, Jianing Li, Ziyang Luo, Xiaodong Fan, Ye Chen, Yunfeng Liu, Hong Cai, Lip Ket Chin, Jinbei Zhang, Shihai SunComments: 14 pages, 5 figures. And any comment is welcomeSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantum Physics (quant-ph)
- [390] arXiv:2512.05992 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Stronger is not better: Better Augmentations in Contrastive Learning for Medical Image SegmentationComments: NeurIPS Black in AI workshop - 2022Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Mon, 8 Dec 2025
- [391] arXiv:2512.05965 [pdf, ps, other]
-
Title: EditThinker: Unlocking Iterative Reasoning for Any Image EditorAuthors: Hongyu Li, Manyuan Zhang, Dian Zheng, Ziyu Guo, Yimeng Jia, Kaituo Feng, Hao Yu, Yexin Liu, Yan Feng, Peng Pei, Xunliang Cai, Linjiang Huang, Hongsheng Li, Si LiuComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [392] arXiv:2512.05960 [pdf, ps, other]
-
Title: AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image EnhancementSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [393] arXiv:2512.05941 [pdf, ps, other]
-
Title: Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI GroundingAuthors: Zhiyuan Jiang, Shenghao Xie, Wenyi Li, Wenqiang Zu, Peihang Li, Jiahao Qiu, Siqi Pei, Lei Ma, Tiejun Huang, Mengdi Wang, Shilong LiuComments: Code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [394] arXiv:2512.05937 [pdf, ps, other]
-
Title: Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV PerceptionComments: 8 pages, 2 figures, 7 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [395] arXiv:2512.05936 [pdf, ps, other]
-
Title: Synset Signset Germany: a Synthetic Dataset for German Traffic Sign RecognitionAuthors: Anne Sielemann, Lena Loercher, Max-Lion Schumacher, Stefan Wolf, Masoud Roschani, Jens ZiehnComments: 8 pages, 8 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [396] arXiv:2512.05928 [pdf, ps, other]
-
Title: A Comparative Study on Synthetic Facial Data Generation Techniques for Face RecognitionComments: 18 pages, 17 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [397] arXiv:2512.05927 [pdf, ps, other]
-
Title: World Models That Know When They Don't Know: Controllable Video Generation with Calibrated UncertaintySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [398] arXiv:2512.05922 [pdf, ps, other]
-
Title: LPD: Learnable Prototypes with Diversity Regularization for Weakly Supervised Histopathology SegmentationAuthors: Khang Le, Anh Mai Vu, Thi Kim Trang Vo, Ha Thach, Ngoc Bui Lam Quang, Thanh-Huy Nguyen, Minh H. N. Le, Zhu Han, Chandra Mohan, Hien Van NguyenComments: Note: Khang Le and Anh Mai Vu contributed equallySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [399] arXiv:2512.05920 [pdf, ps, other]
-
Title: NICE: Neural Implicit Craniofacial Model for Orthognathic Surgery PredictionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [400] arXiv:2512.05905 [pdf, ps, other]
-
Title: SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose RepresentationsAuthors: Wenhao Yan, Sheng Ye, Zhuoyi Yang, Jiayan Teng, ZhenHui Dong, Kairui Wen, Xiaotao Gu, Yong-Jin Liu, Jie TangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [401] arXiv:2512.05866 [pdf, ps, other]
-
Title: Underwater Image Reconstruction Using a Swin Transformer-Based Generator and PatchGAN DiscriminatorComments: This paper has been accepted for presentation at the IEEE 28th International Conference on Computer and Information Technology (ICCIT), December 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [402] arXiv:2512.05859 [pdf, ps, other]
-
Title: Edit-aware RAW ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [403] arXiv:2512.05853 [pdf, ps, other]
-
Title: VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential AttackAuthors: Shiji Zhao, Shukun Xiong, Yao Huang, Yan Jin, Zhenyu Wu, Jiyang Guan, Ranjie Duan, Jialing Tao, Hui Xue, Xingxing WeiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [404] arXiv:2512.05830 [pdf, ps, other]
-
Title: Phase-OTDR Event Detection Using Image-Based Data Transformation and Deep LearningComments: 22 pages, 11 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [405] arXiv:2512.05814 [pdf, ps, other]
-
Title: UG-FedDA: Uncertainty-Guided Federated Domain Adaptation for Multi-Center Alzheimer's Disease DetectionAuthors: Fubao Zhu, Zhanyuan Jia, Zhiguo Wang, Huan Huang, Danyang Sun, Chuang Han, Yanting Li, Jiaofen Nan, Chen Zhao, Weihua ZhouComments: The code is already available on GitHub: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [406] arXiv:2512.05809 [pdf, ps, other]
-
Title: Probing the effectiveness of World Models for Spatial Reasoning through Test-time ScalingComments: Extended abstract at World Modeling Workshop 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [407] arXiv:2512.05802 [pdf, ps, other]
-
Title: Bring Your Dreams to Life: Continual Text-to-Video CustomizationAuthors: Jiahua Dong, Xudong Wang, Wenqi Liang, Zongyan Han, Meng Cao, Duzhen Zhang, Hanbin Zhao, Zhi Han, Salman Khan, Fahad Shahbaz KhanComments: Accepted to AAAI2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [408] arXiv:2512.05783 [pdf, ps, other]
-
Title: Curvature-Regularized Variational Autoencoder for 3D Scene Reconstruction from Sparse DepthSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [409] arXiv:2512.05774 [pdf, ps, other]
-
Title: Active Video Perception: Iterative Evidence Seeking for Agentic Long Video UnderstandingAuthors: Ziyang Wang, Honglu Zhou, Shijie Wang, Junnan Li, Caiming Xiong, Silvio Savarese, Mohit Bansal, Michael S. Ryoo, Juan Carlos NieblesComments: Website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [410] arXiv:2512.05762 [pdf, ps, other]
-
Title: FNOPT: Resolution-Agnostic, Self-Supervised Cloth Simulation using Meta-Optimization with Fourier Neural OperatorsComments: Accepted for WACVSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [411] arXiv:2512.05759 [pdf, ps, other]
-
Title: Label-Efficient Point Cloud Segmentation with Active LearningAuthors: Johannes Meyer, Jasper Hoffmann, Felix Schulz, Dominik Merkle, Daniel Buescher, Alexander Reiterer, Joschka Boedecker, Wolfram BurgardSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [412] arXiv:2512.05754 [pdf, ps, other]
-
Title: USV: Unified Sparsification for Accelerating Video Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [413] arXiv:2512.05746 [pdf, ps, other]
-
Title: HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [414] arXiv:2512.05740 [pdf, ps, other]
-
Title: Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic ExcisionAuthors: Lennart Maack, Julia-Kristin Graß, Lisa-Marie Toscha, Nathaniel Melling, Alexander SchlaeferSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [415] arXiv:2512.05710 [pdf, ps, other]
-
Title: Manifold-Aware Point Cloud Completion via Geodesic-Attentive Hierarchical Feature LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [416] arXiv:2512.05698 [pdf, ps, other]
-
Title: OWL: Unsupervised 3D Object Detection by Occupancy Guided Warm-up and Large Model Priors ReasoningAuthors: Xusheng Guo, Wanfa Zhang, Shijia Zhao, Qiming Xia, Xiaolong Xie, Mingming Wang, Hai Wu, Chenglu WenComments: The 40th Annual AAAI Conference on Artificial IntelligenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [417] arXiv:2512.05683 [pdf, ps, other]
-
Title: Physics-Informed Graph Neural Network with Frequency-Aware Learning for Optical Aberration CorrectionAuthors: Yong En Kok, Bowen Deng, Alexander Bentley, Andrew J. Parkes, Michael G. Somekh, Amanda J. Wright, Michael P. PoundSubjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
- [418] arXiv:2512.05674 [pdf, ps, other]
-
Title: Hyperspectral Unmixing with 3D Convolutional Sparse Coding and Projected Simplex Volume MaximizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [419] arXiv:2512.05672 [pdf, ps, other]
-
Title: InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse ProblemSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [420] arXiv:2512.05669 [pdf, ps, other]
-
Title: Deep Learning-Based Real-Time Sequential Facial Expression Analysis Using Geometric FeaturesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [421] arXiv:2512.05663 [pdf, ps, other]
-
Title: LeAD-M3D: Leveraging Asymmetric Distillation for Real-time Monocular 3D DetectionAuthors: Johannes Meier, Jonathan Michel, Oussema Dhaouadi, Yung-Hsu Yang, Christoph Reich, Zuria Bauer, Stefan Roth, Marc Pollefeys, Jacques Kaiser, Daniel CremersSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [422] arXiv:2512.05651 [pdf, ps, other]
-
Title: Self-Supervised AI-Generated Image Detection: A Camera Metadata PerspectiveSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [423] arXiv:2512.05635 [pdf, ps, other]
-
Title: Experts-Guided Unbalanced Optimal Transport for ISP Learning from Unpaired and/or Paired DataSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [424] arXiv:2512.05613 [pdf, ps, other]
-
Title: DistillFSS: Synthesizing Few-Shot Knowledge into a Lightweight Segmentation ModelAuthors: Pasquale De Marinis, Pieter M. Blok, Uzay Kaymak, Rogier Brussee, Gennaro Vessio, Giovanna CastellanoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [425] arXiv:2512.05610 [pdf, ps, other]
-
Title: NormalView: sensor-agnostic tree species classification from backpack and aerial lidar data using geometric projectionsAuthors: Juho Korkeala, Jesse Muhojoki, Josef Taher, Klaara Salolahti, Matti Hyyppä, Antero Kukko, Juha HyyppäComments: 19 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [426] arXiv:2512.05597 [pdf, ps, other]
-
Title: Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token PredictionComments: 10 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [427] arXiv:2512.05593 [pdf, ps, other]
-
Title: Learning High-Fidelity Cloth Animation via Skinning-Free Image TransferComments: Accepted to 3DV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [428] arXiv:2512.05571 [pdf, ps, other]
-
Title: MedDIFT: Multi-Scale Diffusion-Based Correspondence in 3D Medical ImagingAuthors: Xingyu Zhang, Anna Reithmeir, Fryderyk Kögl, Rickmer Braren, Julia A. Schnabel, Daniel M. LangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [429] arXiv:2512.05564 [pdf, ps, other]
-
Title: ProPhy: Progressive Physical Alignment for Dynamic World SimulationAuthors: Zijun Wang, Panwen Hu, Jing Wang, Terry Jingchen Zhang, Yuhao Cheng, Long Chen, Yiqiang Yan, Zutao Jiang, Hanhui Li, Xiaodan LiangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [430] arXiv:2512.05557 [pdf, ps, other]
-
Title: 2K-Characters-10K-Stories: A Quality-Gated Stylized Narrative Dataset with Disentangled Control and Sequence ConsistencySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [431] arXiv:2512.05546 [pdf, ps, other]
-
Title: Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language ModelsComments: 6 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [432] arXiv:2512.05539 [pdf, ps, other]
-
Title: Ideal Observer for Segmentation of Dead Leaves ImagesComments: 41 pages, 16 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Methodology (stat.ME)
- [433] arXiv:2512.05529 [pdf, ps, other]
-
Title: See in Depth: Training-Free Surgical Scene Segmentation with Monocular Depth PriorsComments: The first two authors contributed equallySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [434] arXiv:2512.05524 [pdf, ps, other]
-
Title: VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [435] arXiv:2512.05515 [pdf, ps, other]
-
Title: DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment AnalysisComments: Accepted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [436] arXiv:2512.05513 [pdf, ps, other]
-
Title: Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded ReasoningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [437] arXiv:2512.05511 [pdf, ps, other]
-
Title: Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient ParadigmAuthors: Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Yaokun Li, Xiujun Shu, Yuanhao Feng, Bo Wang, Yimian Dai, Xiangyu YueSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [438] arXiv:2512.05494 [pdf, ps, other]
- [439] arXiv:2512.05492 [pdf, ps, other]
-
Title: WaterWave: Bridging Underwater Image Enhancement into Video Streams via Wavelet-based Temporal Consistency FieldSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [440] arXiv:2512.05482 [pdf, ps, other]
-
Title: Concept-based Explainable Data Mining with VLM for 3D DetectionAuthors: Mai TsujimotoComments: 28 pages including appendix. Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [441] arXiv:2512.05481 [pdf, ps, other]
-
Title: UniFS: Unified Multi-Contrast MRI Reconstruction via Frequency-Spatial FusionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [442] arXiv:2512.05478 [pdf, ps, other]
-
Title: EmoStyle: Emotion-Driven Image StylizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [443] arXiv:2512.05468 [pdf, ps, other]
-
Title: University Building Recognition Dataset in Thailand for the mission-oriented IoT sensor systemAuthors: Takara Taniguchi, Yudai Ueda, Atsuya Muramatsu, Kohki Hashimoto, Ryo Yagi, Hideya Ochiai, Chaodit AswakulSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [444] arXiv:2512.05446 [pdf, ps, other]
-
Title: TED-4DGS: Temporally Activated and Embedding-based Deformation for 4DGS CompressionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [445] arXiv:2512.05422 [pdf, ps, other]
-
Title: ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information InteractionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [446] arXiv:2512.05418 [pdf, ps, other]
-
Title: Performance Evaluation of Deep Learning for Tree Branch Segmentation in Autonomous Forestry SystemsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [447] arXiv:2512.05415 [pdf, ps, other]
-
Title: Moving object detection from multi-depth images with an attention-enhanced CNNAuthors: Masato Shibukawa, Fumi Yoshida, Toshifumi Yanagisawa, Takashi Ito, Hirohisa Kurosaki, Makoto Yoshikawa, Kohki Kamiya, Ji-an Jiang, Wesley Fraser, JJ Kavelaars, Susan Benecchi, Anne Verbiscer, Akira Hatakeyama, Hosei O, Naoya OzakiComments: 14 pages, 22 figures, submitted to PASJSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [448] arXiv:2512.05412 [pdf, ps, other]
-
Title: YOLO and SGBM Integration for Autonomous Tree Branch Detection and Depth Estimation in Radiata Pine Pruning ApplicationsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [449] arXiv:2512.05410 [pdf, ps, other]
-
Title: Genetic Algorithms For Parameter Optimization for Disparity Map Generation of Radiata Pine Branch ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [450] arXiv:2512.05398 [pdf, ps, other]
-
Title: The Dynamic Prior: Understanding 3D Structures for Casual Dynamic VideosComments: Code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [451] arXiv:2512.05394 [pdf, ps, other]
-
Title: Delving into Latent Spectral Biasing of Video VAEs for Superior DiffusabilitySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [452] arXiv:2512.05391 [pdf, ps, other]
-
Title: LoC-Path: Learning to Compress for Pathology Multimodal Large Language ModelsAuthors: Qingqiao Hu, Weimin Lyu, Meilong Xu, Kehan Qi, Xiaoling Hu, Saumya Gupta, Jiawei Zhou, Chao ChenComments: 20 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [453] arXiv:2512.05385 [pdf, ps, other]
-
Title: ShaRP: SHAllow-LayeR Pruning for Video Large Language Models AccelerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [454] arXiv:2512.05362 [pdf, ps, other]
-
Title: PoolNet: Deep Learning for 2D to 3D Video Process ValidationComments: All code related to this paper can be found at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [455] arXiv:2512.05359 [pdf, ps, other]
-
Title: Group Orthogonal Low-Rank Adaptation for RGB-T TrackingComments: 13 pages, 8 figures. Accepted by AAAI 2026. Extended versionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [456] arXiv:2512.05354 [pdf, ps, other]
-
Title: SplatPainter: Interactive Authoring of 3D Gaussians from 2D Edits via Test-Time TrainingComments: project page this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [457] arXiv:2512.05343 [pdf, ps, other]
-
Title: SpaceControl: Introducing Test-Time Spatial Control to 3D Generative ModelingAuthors: Elisabetta Fedele, Francis Engelmann, Ian Huang, Or Litany, Marc Pollefeys, Leonidas GuibasComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [458] arXiv:2512.05277 [pdf, ps, other]
-
Title: From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language ModelAuthors: Kevin Cannons, Saeed Ranjbar Alvar, Mohammad Asiful Hossain, Ahmad Rezaei, Mohsen Gholami, Alireza Heidarikhazaei, Zhou Weimin, Yong Zhang, Mohammad AkbariSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [459] arXiv:2512.05272 [pdf, ps, other]
-
Title: Inferring Compositional 4D Scenes without Ever Seeing OneComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [460] arXiv:2512.05268 [pdf, ps, other]
-
Title: CARD: Correlation Aware Restoration with DiffusionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [461] arXiv:2512.05259 [pdf, ps, other]
-
Title: Age-Inclusive 3D Human Mesh Recovery for Action-Preserving Data AnonymizationAuthors: Georgios Chatzichristodoulou, Niki Efthymiou, Panagiotis Filntisis, Georgios Pavlakos, Petros MaragosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [462] arXiv:2512.05240 [pdf, ps, other]
-
Title: IE2Video: Adapting Pretrained Diffusion Models for Event-Based Video ReconstructionAuthors: Dmitrii Torbunov, Onur Okuducu, Yi Huang, Odera Dim, Rebecca Coles, Yonggang Cui, Yihui RenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [463] arXiv:2512.05209 [pdf, ps, other]
-
Title: DEAR: Dataset for Evaluating the Aesthetics of RenderingDEAR: Dataset for Evaluating the Aesthetics of RenderingAuthors: Vsevolod Plohotnuk, Artyom Panshin, Nikola Banić, Simone Bianco, Michael Freeman, Egor ErshovSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [464] arXiv:2512.05198 [pdf, ps, other]
-
Title: Your Latent Mask is Wrong: Pixel-Equivalent Latent Compositing for Diffusion ModelsComments: 16 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
- [465] arXiv:2512.05172 [pdf, ps, other]
-
Title: Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [466] arXiv:2512.05152 [pdf, ps, other]
-
Title: EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer ModelsComments: 6pages, 5figures, published to 2025 IEEE International Conference on Multimedia and Expo (ICME), Nantes, France, 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [467] arXiv:2512.05150 [pdf, ps, other]
-
Title: TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial FlowsComments: arxiv v0Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [468] arXiv:2512.05145 [pdf, ps, other]
-
Title: Self-Improving VLM Judges Without Human AnnotationsAuthors: Inna Wanyin Lin, Yushi Hu, Shuyue Stella Li, Scott Geng, Pang Wei Koh, Luke Zettlemoyer, Tim Althoff, Marjan GhazvininejadSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [469] arXiv:2512.05140 [pdf, other]
-
Title: FlowEO: Generative Unsupervised Domain Adaptation for Earth ObservationAuthors: Georges Le Bellier (CEDRIC - VERTIGO, Cnam), Nicolas Audebert (LaSTIG, IGN, CEDRIC - VERTIGO)Comments: 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Mar 2026, Tucson (AZ), United StatesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [470] arXiv:2512.05139 [pdf, ps, other]
-
Title: Spatiotemporal Satellite Image Downscaling with Transfer Encoders and Autoregressive Generative ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
- [471] arXiv:2512.05137 [pdf, ps, other]
-
Title: ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged ImagesAuthors: Yunfei Zhang, Yizhuo He, Yuanxun Shao, Zhengtao Yao, Haoyan Xu, Junhao Dong, Zhen Yao, Zhikang DongSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [472] arXiv:2512.05136 [pdf, ps, other]
-
Title: Fine-tuning an ECG Foundation Model to Predict Coronary CT Angiography OutcomesAuthors: Yujie Xiao, Gongzhen Tang, Deyun Zhang, Jun Li, Guangkun Nie, Haoyu Wang, Shun Huang, Tong Liu, Qinghao Zhao, Kangyin Chen, Shenda HongSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [473] arXiv:2512.05134 [pdf, ps, other]
-
Title: InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion ModelsAuthors: Zihao WuComments: 8 pages main, 8 pages appendix, 16 figures, 5 tables. Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
- [474] arXiv:2512.05132 [pdf, ps, other]
-
Title: Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution TrainingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [475] arXiv:2512.05131 [pdf, ps, other]
-
Title: AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language GuidanceComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [476] arXiv:2512.05959 (cross-list from cs.CL) [pdf, ps, other]
-
Title: M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAGAuthors: David Anugraha, Patrick Amadeus Irawan, Anshul Singh, En-Shiun Annie Lee, Genta Indra WinataComments: PreprintSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [477] arXiv:2512.05955 (cross-list from cs.RO) [pdf, ps, other]
-
Title: SIMPACT: Simulation-Enabled Action Planning using Vision-Language ModelsSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [478] arXiv:2512.05932 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Physically-Based Simulation of Automotive LiDARSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [479] arXiv:2512.05824 (cross-list from cs.AI) [pdf, ps, other]
-
Title: Multimodal Oncology Agent for IDH1 Mutation Prediction in Low-Grade GliomaAuthors: Hafsa Akebli (1), Adam Shephard (2), Vincenzo Della Mea (1), Nasir Rajpoot (2 and 3) ((1) University of Udine, Udine, Italy, (2) University of Warwick, Coventry, UK, (3) Histofy Ltd, Coventry, UK)Comments: 4 pages, 2 figuresSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [480] arXiv:2512.05812 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Toward Efficient and Robust Behavior Models for Multi-Agent Driving SimulationComments: This work has been submitted to the IEEE for possible publicationSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [481] arXiv:2512.05665 (cross-list from cs.CL) [pdf, ps, other]
-
Title: Interleaved Latent Visual Reasoning with Selective Perceptual ModelingComments: 11 pages, 6 figures. Code available at this https URLSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [482] arXiv:2512.05438 (cross-list from cs.HC) [pdf, ps, other]
-
Title: EXR: An Interactive Immersive EHR Visualization in Extended RealityAuthors: Benoit Marteau, Shaun Q. Y. Tan, Jieru Li, Andrew Hornback, Yishan Zhong, Shaunna Wang, Christian Lowson, Jason Woloff, Joshua M. Pahys, Steven W. Hwang, Coleman Hilton, May D. WangComments: 11 pages, 6 figures. Preprint version. This paper has been accepted to IEEE ICIR 2025. This is the author-prepared version and not the final published version. The final version will appear in IEEE XploSubjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
- [483] arXiv:2512.05299 (cross-list from eess.SY) [pdf, ps, other]
-
Title: ARCAS: An Augmented Reality Collision Avoidance System with SLAM-Based Tracking for Enhancing VRU SafetyAuthors: Ahmad Yehia, Jiseop Byeon, Tianyi Wang, Huihai Wang, Yiming Xu, Junfeng Jiao, Christian ClaudelComments: 8 pages, 3 figures, 1 tableSubjects: Systems and Control (eess.SY); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET); Robotics (cs.RO); Image and Video Processing (eess.IV)
- [484] arXiv:2512.05126 (cross-list from eess.AS) [pdf, ps, other]
-
Title: SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS ModelAuthors: Kaidi Wang, Yi He, Wenhao Guan, Weijie Wu, Hongwu Ding, Xiong Zhang, Di Wu, Meng Meng, Jian Luan, Lin Li, Qingyang HongSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
Fri, 5 Dec 2025
- [485] arXiv:2512.05115 [pdf, ps, other]
-
Title: Light-X: Generative 4D Video Rendering with Camera and Illumination ControlAuthors: Tianqi Liu, Zhaoxi Chen, Zihao Huang, Shaocong Xu, Saining Zhang, Chongjie Ye, Bohan Li, Zhiguo Cao, Wei Li, Hao Zhao, Ziwei LiuComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [486] arXiv:2512.05113 [pdf, ps, other]
-
Title: Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection SplattingComments: WACV 2026. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [487] arXiv:2512.05112 [pdf, ps, other]
-
Title: DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept GenerationAuthors: Dongzhi Jiang, Renrui Zhang, Haodong Li, Zhuofan Zong, Ziyu Guo, Jun He, Claire Guo, Junyan Ye, Rongyao Fang, Weijia Li, Rui Liu, Hongsheng LiComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [488] arXiv:2512.05111 [pdf, ps, other]
-
Title: ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual ReasoningAuthors: Shengyuan Ding, Xinyu Fang, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiangyu Zhao, Haodong Duan, Xiaoyi Dong, Jianze Liang, Bin Wang, Conghui He, Dahua Lin, Jiaqi WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [489] arXiv:2512.05110 [pdf, ps, other]
-
Title: ShadowDraw: From Any Object to Shadow-Drawing Compositional ArtComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
- [490] arXiv:2512.05106 [pdf, ps, other]
-
Title: NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Robotics (cs.RO)
- [491] arXiv:2512.05104 [pdf, ps, other]
-
Title: EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency ModulationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [492] arXiv:2512.05098 [pdf, ps, other]
- [493] arXiv:2512.05091 [pdf, ps, other]
-
Title: Visual Reasoning Tracer: Object-Level Grounded Reasoning BenchmarkAuthors: Haobo Yuan, Yueyi Sun, Yanwei Li, Tao Zhang, Xueqing Deng, Henghui Ding, Lu Qi, Anran Wang, Xiangtai Li, Ming-Hsuan YangComments: Technical Report; Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [494] arXiv:2512.05081 [pdf, ps, other]
-
Title: Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative CompressionComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [495] arXiv:2512.05079 [pdf, ps, other]
-
Title: Object Reconstruction under Occlusion with Generative Priors and Contact-induced ConstraintsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [496] arXiv:2512.05076 [pdf, ps, other]
-
Title: BulletTime: Decoupled Control of Time and Camera Pose for Video GenerationAuthors: Yiming Wang, Qihang Zhang, Shengqu Cai, Tong Wu, Jan Ackermann, Zhengfei Kuang, Yang Zheng, Frano Rajič, Siyu Tang, Gordon WetzsteinComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [497] arXiv:2512.05060 [pdf, ps, other]
-
Title: 4DLangVGGT: 4D Language-Visual Geometry Grounded TransformerAuthors: Xianfeng Wu, Yajing Bai, Minghan Li, Xianzu Wu, Xueqi Zhao, Zhongyuan Lai, Wenyu Liu, Xinggang WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [498] arXiv:2512.05044 [pdf, ps, other]
-
Title: Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single ImageComments: 18 PagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [499] arXiv:2512.05039 [pdf, ps, other]
-
Title: Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual EncodingComments: Submitted for review CVPR-2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [500] arXiv:2512.05025 [pdf, ps, other]
-
Title: RAMEN: Resolution-Adjustable Multimodal Encoder for Earth ObservationAuthors: Nicolas Houdré, Diego Marcos, Hugo Riffaud de Turckheim, Dino Ienco, Laurent Wendling, Camille Kurtz, Sylvain LobrySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [501] arXiv:2512.05021 [pdf, ps, other]
-
Title: HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [502] arXiv:2512.05016 [pdf, ps, other]
-
Title: Generative Neural Video Compression via Video Diffusion PriorSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [503] arXiv:2512.05006 [pdf, ps, other]
-
Title: Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent ObjectsComments: conferenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [504] arXiv:2512.05000 [pdf, ps, other]
-
Title: Reflection Removal through Efficient Adaptation of Diffusion TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [505] arXiv:2512.04996 [pdf, ps, other]
-
Title: A dynamic memory assignment strategy for dilation-based ICP algorithm on embedded GPUsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [506] arXiv:2512.04981 [pdf, ps, other]
-
Title: Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image ModelsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [507] arXiv:2512.04970 [pdf, ps, other]
-
Title: Stable Single-Pixel Contrastive Learning for Semantic and Geometric TasksComments: UniReps Workshop 2025, 12 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [508] arXiv:2512.04969 [pdf, ps, other]
-
Title: Rethinking the Use of Vision Transformers for AI-Generated Image DetectionComments: Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [509] arXiv:2512.04967 [pdf, ps, other]
-
Title: Balanced Few-Shot Episodic Learning for Accurate Retinal Disease DiagnosisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [510] arXiv:2512.04963 [pdf, ps, other]
-
Title: GeoPE:A Unified Geometric Positional Embedding for Structured TensorsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [511] arXiv:2512.04952 [pdf, ps, other]
-
Title: FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via Neural Action TokenizationAuthors: Yicheng Liu, Shiduo Zhang, Zibin Dong, Baijun Ye, Tianyuan Yuan, Xiaopeng Yu, Linqi Yin, Chenhao Lu, Junhao Shi, Luca Jiang-Tao Yu, Liangtao Zheng, Tao Jiang, Jingjing Gong, Xipeng Qiu, Hang ZhaoSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [512] arXiv:2512.04943 [pdf, ps, other]
-
Title: Towards Adaptive Fusion of Multimodal Deep Networks for Human Action RecognitionAuthors: Novanto YudistiraSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [513] arXiv:2512.04939 [pdf, ps, other]
-
Title: LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token MergingAuthors: Zhijian Shu, Cheng Lin, Tao Xie, Wei Yin, Ben Li, Zhiyuan Pu, Weize Li, Yao Yao, Xun Cao, Xiaoyang Guo, Xiao-Xiao LongSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [514] arXiv:2512.04927 [pdf, ps, other]
-
Title: Virtually Unrolling the Herculaneum Papyri by Diffeomorphic Spiral FittingAuthors: Paul HendersonComments: Accepted at WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [515] arXiv:2512.04926 [pdf, ps, other]
-
Title: Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent DiffusionAuthors: Yueming Pan, Ruoyu Feng, Qi Dai, Yuqi Wang, Wenfeng Lin, Mingyu Guo, Chong Luo, Nanning ZhengSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [516] arXiv:2512.04904 [pdf, ps, other]
-
Title: ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow MatchingAuthors: Guanbo Huang, Jingjia Mao, Fanding Huang, Fengkai Liu, Xiangyang Luo, Yaoyuan Liang, Jiasheng Lu, Xiaoe Wang, Pei Liu, Ruiliu Fu, Shao-Lun HuangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [517] arXiv:2512.04890 [pdf, ps, other]
-
Title: Equivariant Symmetry-Aware Head Pose Estimation for Fetal MRIAuthors: Ramya Muthukrishnan, Borjan Gagoski, Aryn Lee, P. Ellen Grant, Elfar Adalsteinsson, Polina Golland, Benjamin BillotSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [518] arXiv:2512.04888 [pdf, ps, other]
-
Title: You Only Train Once (YOTO): A Retraining-Free Object Detection FrameworkAuthors: Priyanto Hidayatullah, Nurjannah Syakrani, Yudi Widhiyasana, Muhammad Rizqi Sholahuddin, Refdinal Tubagus, Zahri Al Adzani Hidayat, Hanri Fajar Ramadhan, Dafa Alfarizki Pratama, Farhan Muhammad YasinComments: This manuscript was first submitted to the Engineering (Elsevier Journal). The preprint version was posted to arXiv afterwards to facilitate open access and community feedbackSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [519] arXiv:2512.04883 [pdf, ps, other]
-
Title: SDG-Track: A Heterogeneous Observer-Follower Framework for High-Resolution UAV Tracking on Embedded PlatformsComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [520] arXiv:2512.04875 [pdf, ps, other]
-
Title: SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion DetectionAuthors: Qing Xu, Yanqian Wang, Xiangjian Hea, Yue Li, Yixuan Zhang, Rong Qu, Wenting Duan, Zhen ChenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [521] arXiv:2512.04862 [pdf, ps, other]
-
Title: Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance SensingAuthors: Maria-Paola Forte, Nikos Athanasiou, Giulia Ballardini, Jan Ulrich Bartels, Katherine J. Kuchenbecker, Michael J. BlackComments: * Equal contribution. Minor figure corrections compared to the ICCV 2025 versionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [522] arXiv:2512.04857 [pdf, ps, other]
-
Title: Autoregressive Image Generation Needs Only a Few Lines of Cached TokensSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [523] arXiv:2512.04837 [pdf, ps, other]
-
Title: A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real WorldAuthors: Jikang Cheng, Renye Yan, Zhiyuan Yan, Yaozhong Gan, Xueyi Zhang, Zhongyuan Wang, Wei Peng, Ling LiangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [524] arXiv:2512.04832 [pdf, ps, other]
-
Title: Tokenizing Buildings: A Transformer for Layout SynthesisComments: 8 pages, 1 page References, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
- [525] arXiv:2512.04830 [pdf, ps, other]
-
Title: FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene SynthesisComments: Novel View Synthesis, Driving Scene, Free Trajectory, Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [526] arXiv:2512.04821 [pdf, ps, other]
-
Title: LatentFM: A Latent Flow Matching Approach for Generative Medical Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [527] arXiv:2512.04815 [pdf, ps, other]
-
Title: RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGSAuthors: Chuanyu Fu, Guanying Chen, Yuqi Zhang, Kunbin Yao, Yuan Xiong, Chuan Huang, Shuguang Cui, Yasuyuki Matsushita, Xiaochun CaoComments: arXiv admin note: substantial text overlap with arXiv:2506.02751Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [528] arXiv:2512.04810 [pdf, ps, other]
-
Title: EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified ArchitectureComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [529] arXiv:2512.04786 [pdf, ps, other]
-
Title: LaFiTe: A Generative Latent Field for 3D Native TexturingAuthors: Chia-Hao Chen, Zi-Xin Zou, Yan-Pei Cao, Ze Yuan, Guan Luo, Xiaojuan Qi, Ding Liang, Song-Hai Zhang, Yuan-Chen GuoComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [530] arXiv:2512.04784 [pdf, ps, other]
-
Title: PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward ModelingAuthors: Bowen Ping, Chengyou Jia, Minnan Luo, Changliang Xia, Xin Shen, Zhuohang Dang, Hangwei QianSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [531] arXiv:2512.04761 [pdf, ps, other]
-
Title: Order Matters: 3D Shape Generation from Sequential VR SketchesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [532] arXiv:2512.04734 [pdf, ps, other]
-
Title: MT-Depth: Multi-task Instance feature analysis for the Depth CompletionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [533] arXiv:2512.04733 [pdf, ps, other]
-
Title: E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous DrivingAuthors: Yihong Tang, Haicheng Liao, Tong Nie, Junlin He, Ao Qu, Kehua Chen, Wei Ma, Zhenning Li, Lijun Sun, Chengzhong XuSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [534] arXiv:2512.04728 [pdf, ps, other]
-
Title: Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the WildAuthors: Yigui Feng, Qinglin Wang, Haotian Mo, Yang Liu, Ke Liu, Gencheng Liu, Xinhai Chen, Siqi Shen, Songzhu Mei, Jie LiuSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [535] arXiv:2512.04699 [pdf, ps, other]
-
Title: OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-ResolutionAuthors: Xinning Chai, Zhengxue Cheng, Yuhong Zhang, Hengsheng Zhang, Yingsheng Qin, Yucai Yang, Rong Xie, Li SongComments: Accepted as TCSVT, 15 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [536] arXiv:2512.04686 [pdf, ps, other]
-
Title: Towards Cross-View Point Correspondence in Vision-Language ModelsAuthors: Yipu Wang, Yuheng Ji, Yuyang Liu, Enshen Zhou, Ziqiang Yang, Yuxuan Tian, Ziheng Qin, Yue Liu, Huajie Tan, Cheng Chi, Zhiyuan Ma, Daniel Dajun Zeng, Xiaolong ZhengSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [537] arXiv:2512.04678 [pdf, ps, other]
-
Title: Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching DistillationAuthors: Yunhong Lu, Yanhong Zeng, Haobo Li, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jiapeng Zhu, Hengyuan Cao, Zhipeng Zhang, Xing Zhu, Yujun Shen, Min ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [538] arXiv:2512.04677 [pdf, ps, other]
-
Title: Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite LengthAuthors: Yubo Huang, Hailong Guo, Fangtai Wu, Shifeng Zhang, Shijie Huang, Qijun Gan, Lin Liu, Sirui Zhao, Enhong Chen, Jiaming Liu, Steven HoiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [539] arXiv:2512.04660 [pdf, ps, other]
-
Title: I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [540] arXiv:2512.04643 [pdf, ps, other]
-
Title: SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive DecodingAuthors: Chang-Hsun Wu, Kai-Po Chang, Yu-Yang Sheng, Hung-Kai Chung, Kuei-Chun Wang, Yu-Chiang Frank WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [541] arXiv:2512.04619 [pdf, ps, other]
-
Title: Denoise to Track: Harnessing Video Diffusion Priors for Robust CorrespondenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [542] arXiv:2512.04599 [pdf, ps, other]
-
Title: Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shotAuthors: Sheng Hang, Chaoxiang He, Hongsheng Hu, Hanqing Hu, Bin Benjamin Zhu, Shi-Feng Sun, Dawu Gu, Shuo WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [543] arXiv:2512.04597 [pdf, ps, other]
-
Title: When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question AnsweringSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [544] arXiv:2512.04585 [pdf, ps, other]
-
Title: SAM3-I: Segment Anything with InstructionsAuthors: Jingjing Li, Yue Feng, Yuchen Guo, Jincai Huang, Yongri Piao, Qi Bi, Miao Zhang, Xiaoqi Zhao, Qiang Chen, Shihao Zou, Wei Ji, Huchuan Lu, Li ChengComments: Preliminary results; work in progressSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [545] arXiv:2512.04581 [pdf, ps, other]
-
Title: Infrared UAV Target Tracking with Dynamic Feature Refinement and Global Contextual Attention Knowledge DistillationAuthors: Houzhang Fang, Chenxing Wu, Kun Bai, Tianqi Chen, Xiaolin Wang, Xiyang Liu, Yi Chang, Luxin YanComments: Accepted by IEEE TMMSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [546] arXiv:2512.04576 [pdf, ps, other]
-
Title: TARDis: Time Attenuated Representation Disentanglement for Incomplete Multi-Modal Tumor Segmentation and ClassificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [547] arXiv:2512.04568 [pdf, ps, other]
-
Title: Prompt2Craft: Generating Functional Craft Assemblies with LLMsAuthors: Vitor Hideyo Isume, Takuya Kiyokawa, Natsuki Yamanobe, Yukiyasu Domae, Weiwei Wan, Kensuke HaradaSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [548] arXiv:2512.04564 [pdf, ps, other]
-
Title: Dataset creation for supervised deep learning-based analysis of microscopic images -- review of important considerations and recommendationsAuthors: Christof A. Bertram, Viktoria Weiss, Jonas Ammeling, F. Maria Schabel, Taryn A. Donovan, Frauke Wilm, Christian Marzahl, Katharina Breininger, Marc AubrevilleSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [549] arXiv:2512.04563 [pdf, ps, other]
-
Title: COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial IntelligenceAuthors: Zefeng Zhang, Xiangzhao Hao, Hengzhu Tang, Zhenyu Zhang, Jiawei Sheng, Xiaodong Li, Zhenyang Li, Li Gao, Daiting Shi, Dawei Yin, Tingwen LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [550] arXiv:2512.04554 [pdf, ps, other]
-
Title: Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question AnsweringSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [551] arXiv:2512.04542 [pdf, ps, other]
-
Title: Gaussian Entropy Fields: Driving Adaptive Sparsity in 3D Gaussian OptimizationComments: 28 pages,11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [552] arXiv:2512.04540 [pdf, ps, other]
-
Title: VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory ManagementSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [553] arXiv:2512.04537 [pdf, ps, other]
-
Title: X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at ScaleSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [554] arXiv:2512.04536 [pdf, ps, other]
-
Title: Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [555] arXiv:2512.04534 [pdf, ps, other]
-
Title: Refaçade: Editing Object with Given Reference TextureAuthors: Youze Huang (1), Penghui Ruan (2), Bojia Zi (3), Xianbiao Qi (4), Jianan Wang (5), Rong Xiao (4) ((1) University of Electronic Science and Technology of China, (2) The Hong Kong Polytechnic University, (3) The Chinese University of Hong Kong, (4) IntelliFusion Inc., (5) Astribot Inc.)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [556] arXiv:2512.04532 [pdf, ps, other]
-
Title: PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance DisentanglementAuthors: Yu-Wei Zhan, Xin Wang, Hong Chen, Tongtong Feng, Wei Feng, Ren Wang, Guangyao Li, Qing Li, Wenwu ZhuSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [557] arXiv:2512.04528 [pdf, ps, other]
-
Title: Auto3R: Automated 3D Reconstruction and Scanning via Data-driven Uncertainty QuantificationAuthors: Chentao Shen, Sizhe Zheng, Bingqian Wu, Yaohua Feng, Yuanchen Fei, Mingyu Mei, Hanwen Jiang, Xiangru HuangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [558] arXiv:2512.04522 [pdf, ps, other]
-
Title: Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-IdentificationComments: 14 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [559] arXiv:2512.04521 [pdf, ps, other]
-
Title: WiFi-based Cross-Domain Gesture Recognition Using Attention MechanismSubjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
- [560] arXiv:2512.04520 [pdf, ps, other]
-
Title: Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image SegmentationAuthors: Chenlin Xu, Lei Zhang, Lituan Wang, Xinyu Pu, Pengfei Ma, Guangwu Qian, Zizhou Wang, Yan WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [561] arXiv:2512.04519 [pdf, ps, other]
-
Title: VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space MemoryAuthors: Yifei Yu, Xiaoshan Wu, Xinting Hu, Tao Hu, Yangtian Sun, Xiaoyang Lyu, Bo Wang, Lin Ma, Yuewen Ma, Zhongrui Wang, Xiaojuan QiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [562] arXiv:2512.04515 [pdf, ps, other]
-
Title: EgoLCD: Egocentric Video Generation with Long Context DiffusionAuthors: Liuzhou Zhang, Jiarui Ye, Yuanlei Wang, Ming Zhong, Mingju Cao, Wanke Xia, Bowen Zeng, Zeyu Zhang, Hao TangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [563] arXiv:2512.04511 [pdf, ps, other]
-
Title: DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain GuidanceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [564] arXiv:2512.04504 [pdf, ps, other]
-
Title: UltraImage: Rethinking Resolution Extrapolation in Image Diffusion TransformersAuthors: Min Zhao, Bokai Yan, Xue Yang, Hongzhou Zhu, Jintao Zhang, Shilong Liu, Chongxuan Li, Jun ZhuComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [565] arXiv:2512.04499 [pdf, ps, other]
-
Title: Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [566] arXiv:2512.04496 [pdf, ps, other]
-
Title: Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight RemovalAuthors: Tianci Huo, Lingfeng Qi, Yuhan Chen, Qihong Xue, Jinyuan Shao, Hai Yu, Jie Li, Zhanhua Zhang, Guofa LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [567] arXiv:2512.04487 [pdf, ps, other]
-
Title: Controllable Long-term Motion Generation with Extended Joint TargetsComments: WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [568] arXiv:2512.04485 [pdf, ps, other]
-
Title: Not All Birds Look The Same: Identity-Preserving Generation For BirdsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [569] arXiv:2512.04483 [pdf, ps, other]
-
Title: DeRA: Decoupled Representation Alignment for Video TokenizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [570] arXiv:2512.04461 [pdf, ps, other]
-
Title: UniTS: Unified Time Series Generative Model for Remote SensingAuthors: Yuxiang Zhang, Shunlin Liang, Wenyuan Li, Han Ma, Jianglei Xu, Yichuan Ma, Jiangwei Xie, Wei Li, Mengmeng Zhang, Ran Tao, Xiang-Gen XiaSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [571] arXiv:2512.04459 [pdf, ps, other]
-
Title: dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable ReasoningAuthors: Yingzi Ma, Yulong Cao, Wenhao Ding, Shuibai Zhang, Yan Wang, Boris Ivanovic, Ming Jiang, Marco Pavone, Chaowei XiaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [572] arXiv:2512.04456 [pdf, ps, other]
-
Title: GuidNoise: Single-Pair Guided Diffusion for Generalized Noise SynthesisComments: AAAI2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [573] arXiv:2512.04451 [pdf, ps, other]
-
Title: StreamEQA: Towards Streaming Video Understanding for Embodied ScenariosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [574] arXiv:2512.04441 [pdf, ps, other]
-
Title: MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous DrivingAuthors: Bin Sun, Yaoguang Cao, Yan Wang, Rui Wang, Jiachen Shang, Xiejie Feng, Jiayi Lu, Jia Shi, Shichun Yang, Xiaoyu Yan, Ziying SongSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [575] arXiv:2512.04426 [pdf, ps, other]
-
Title: Self-Paced and Self-Corrective Masked Prediction for Movie Trailer GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [576] arXiv:2512.04425 [pdf, ps, other]
-
Title: Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [577] arXiv:2512.04421 [pdf, ps, other]
-
Title: UTrice: Unifying Primitives in Differentiable Ray Tracing and Rasterization via Triangles for Particle-Based 3D ScenesComments: 13 pages, 10 figures, submitted to CVPR2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [578] arXiv:2512.04413 [pdf, ps, other]
-
Title: Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object DetectionComments: 12 pages, 8 figures, 11 tablesJournal-ref: IEEE Transactions on Geoscience and Remote Sensing 63 (2025) 1-11Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [579] arXiv:2512.04397 [pdf, ps, other]
-
Title: Performance Evaluation of Transfer Learning Based Medical Image Classification Techniques for Disease DetectionJournal-ref: 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Copenhagen, Denmark, 2025, pp. 1-5Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [580] arXiv:2512.04395 [pdf, ps, other]
-
Title: Fourier-Attentive Representation Learning: A Fourier-Guided Framework for Few-Shot Generalization in Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [581] arXiv:2512.04390 [pdf, ps, other]
-
Title: FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and DeblurringComments: 20 pages, 15 figures. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [582] arXiv:2512.04358 [pdf, ps, other]
-
Title: MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo MatchingAuthors: Ao Xu, Rujin Zhao, Xiong Xu, Boceng Huang, Yujia Jia, Hongfeng Long, Fuxuan Chen, Zilong Cao, Fangyuan ChenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [583] arXiv:2512.04356 [pdf, ps, other]
-
Title: Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive AlignmentComments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [584] arXiv:2512.04331 [pdf, ps, other]
-
Title: Open Set Face Forgery Detection via Dual-Level Evidence CollectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [585] arXiv:2512.04329 [pdf, ps, other]
-
Title: A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural NetworksSubjects: Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
- [586] arXiv:2512.04323 [pdf, ps, other]
-
Title: Bayes-DIC Net: Estimating Digital Image Correlation Uncertainty with Bayesian Neural NetworksComments: 17 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
- [587] arXiv:2512.04315 [pdf, ps, other]
-
Title: SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian SplattingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [588] arXiv:2512.04314 [pdf, ps, other]
-
Title: DisentangleFormer: Spatial-Channel Decoupling for Multi-Channel VisionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [589] arXiv:2512.04313 [pdf, ps, other]
-
Title: Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG DecodingAuthors: Haolin Xiong, Tianwen Fu, Pratusha Bhuvana Prasad, Yunxuan Cai, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie ZhaoComments: 16 pages, 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [590] arXiv:2512.04311 [pdf, ps, other]
-
Title: Real-time Cricket Sorting By SexComments: 13 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
- [591] arXiv:2512.04309 [pdf, ps, other]
-
Title: Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap CorrectionComments: Submitted to CVPR 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [592] arXiv:2512.04305 [pdf, ps, other]
-
Title: How (Mis)calibrated is Your Federated CLIP and What To Do About It?Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [593] arXiv:2512.04303 [pdf, ps, other]
-
Title: Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular ApplicationsComments: Accepted in 3DV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [594] arXiv:2512.04284 [pdf, ps, other]
-
Title: Learning Single-Image Super-Resolution in the JPEG Compressed DomainComments: 7 pages, 4 figures, 2 tables, SEEDS Workshop, ICIP 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [595] arXiv:2512.04283 [pdf, ps, other]
-
Title: Plug-and-Play Image Restoration with Flow Matching: A Continuous ViewpointAuthors: Fan Jia, Yuhao Huang, Shih-Hsin Wang, Cristina Garcia-Cardona, Andrea L. Bertozzi, Bao WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [596] arXiv:2512.04282 [pdf, ps, other]
-
Title: Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion TransferSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [597] arXiv:2512.04267 [pdf, ps, other]
-
Title: UniLight: A Unified Representation for LightingAuthors: Zitian Zhang, Iliyan Georgiev, Michael Fischer, Yannick Hold-Geoffroy, Jean-François Lalonde, Valentin DeschaintreComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [598] arXiv:2512.04248 [pdf, ps, other]
-
Title: MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [599] arXiv:2512.04238 [pdf, ps, other]
-
Title: 6 Fingers, 1 Kidney: Natural Adversarial Medical Images Reveal Critical Weaknesses of Vision-Language ModelsAuthors: Leon Mayer, Piotr Kalinowski, Caroline Ebersbach, Marcel Knopp, Tim Rädsch, Evangelia Christodoulou, Annika Reinke, Fiona R. Kolbinger, Lena Maier-HeinSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [600] arXiv:2512.04222 [pdf, ps, other]
-
Title: ReasonX: MLLM-Guided Intrinsic Image DecompositionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [601] arXiv:2512.04221 [pdf, ps, other]
-
Title: MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video SynthesisAuthors: Xiangyu Bai, He Liang, Bishoy Galoaa, Utsav Nandi, Shayda Moezzi, Yuhang He, Sarah OstadabbasSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [602] arXiv:2512.04219 [pdf, ps, other]
-
Title: Generalized Event Partonomy Inference with Structured Hierarchical Predictive LearningComments: 16 pages, 7 figures, 3 tables. Under ReviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [603] arXiv:2512.04187 [pdf, ps, other]
-
Title: OnSight Pathology: A real-time platform-agnostic computational pathology companion for histopathologyAuthors: Jinzhen Hu, Kevin Faust, Parsa Babaei Zadeh, Adrienn Bourkas, Shane Eaton, Andrew Young, Anzar Alvi, Dimitrios George Oreopoulos, Ameesha Paliwal, Assem Saleh Alrumeh, Evelyn Rose Kamski-Hennekam, Phedias DiamandisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [604] arXiv:2512.04175 [pdf, ps, other]
-
Title: Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [605] arXiv:2512.05117 (cross-list from cs.LG) [pdf, ps, other]
-
Title: The Universal Weight Subspace HypothesisComments: 37 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [606] arXiv:2512.05116 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Value Gradient Guidance for Flow Matching AlignmentComments: Accepted at NeurIPS 2025; 26 pages, 20 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [607] arXiv:2512.05114 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Deep infant brain segmentation from multi-contrast MRIComments: 8 pages, 8 figures, 1 table, website at this https URL, presented at the 2025 IEEE Asilomar Conference on Signals, Systems, and ComputersSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [608] arXiv:2512.05103 (cross-list from cs.LG) [pdf, ps, other]
-
Title: TV2TV: A Unified Framework for Interleaved Language and Video GenerationAuthors: Xiaochuang Han, Youssef Emad, Melissa Hall, John Nguyen, Karthik Padthe, Liam Robbins, Amir Bar, Delong Chen, Michal Drozdzal, Maha Elbayad, Yushi Hu, Shang-Wen Li, Sreya Dutta Roy, Jakob Verbeek, XuDong Wang, Marjan Ghazvininejad, Luke Zettlemoyer, Emily DinanSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [609] arXiv:2512.05094 (cross-list from cs.RO) [pdf, ps, other]
-
Title: From Generated Human Videos to Physically Plausible Robot TrajectoriesAuthors: James Ni, Zekai Wang, Wei Lin, Amir Bar, Yann LeCun, Trevor Darrell, Jitendra Malik, Roei HerzigComments: For project website, see this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [610] arXiv:2512.04814 (cross-list from cs.SD) [pdf, ps, other]
-
Title: Shared Multi-modal Embedding Space for Face-Voice AssociationComments: Ranked 1st in Fame 2026 Challenge, ICASSPSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
- [611] arXiv:2512.04763 (cross-list from cs.LG) [pdf, ps, other]
-
Title: MemLoRA: Distilling Expert Adapters for On-Device Memory SystemsSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [612] arXiv:2512.04705 (cross-list from cs.CC) [pdf, ps, other]
-
Title: Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge AcceleratorsComments: Submitted to IEEE Transactions on Emerging Topics in ComputingSubjects: Computational Complexity (cs.CC); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)
- [613] arXiv:2512.04625 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Rethinking Decoupled Knowledge Distillation: A Predictive Distribution PerspectiveComments: Accepted to IEEE TNNLSSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [614] arXiv:2512.04556 (cross-list from cs.GR) [pdf, ps, other]
-
Title: Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel ComplexComments: 10 pages, 7 figuresSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
- [615] arXiv:2512.04464 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Feature Engineering vs. Deep Learning for Automated Coin Grading: A Comparative Study on Saint-Gaudens Double EaglesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [616] arXiv:2512.04385 (cross-list from cs.LG) [pdf, ps, other]
-
Title: STeP-Diff: Spatio-Temporal Physics-Informed Diffusion Models for Mobile Fine-Grained Pollution ForecastingAuthors: Nan Zhou, Weijie Hong, Huandong Wang, Jianfeng Zheng, Qiuhua Wang, Yali Song, Xiao-Ping Zhang, Yong Li, Xinlei ChenSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [617] arXiv:2512.04264 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Studying Various Activation Functions and Non-IID Data for Machine Learning Model RobustnessSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [618] arXiv:2512.04092 (cross-list from physics.soc-ph) [pdf, ps, other]
-
Title: The changing surface of the world's roadsAuthors: Sukanya Randhawa, Guntaj Randhawa, Clemens Langer, Francis Andorful, Benjamin Herfort, Daniel Kwakye, Omer Olchik, Sven Lautenbach, Alexander ZipfSubjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
- [619] arXiv:2512.04087 (cross-list from q-bio.NC) [pdf, ps, other]
-
Title: Human-Centred Evaluation of Text-to-Image Generation Models for Self-expression of Mental Distress: A Dataset Based on GPT-4oSubjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Thu, 4 Dec 2025
- [620] arXiv:2512.04085 [pdf, ps, other]
-
Title: Unique Lives, Shared World: Learning from Single-Life VideosAuthors: Tengda Han, Sayna Ebrahimi, Dilara Gokay, Li Yang Ku, Maks Ovsjanikov, Iva Babukova, Daniel Zoran, Viorica Patraucean, Joao Carreira, Andrew Zisserman, Dima DamenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [621] arXiv:2512.04084 [pdf, ps, other]
-
Title: SimFlow: Simplified and End-to-End Training of Latent Normalizing FlowsComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [622] arXiv:2512.04082 [pdf, ps, other]
-
Title: PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic DesignComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [623] arXiv:2512.04069 [pdf, ps, other]
-
Title: SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RLAuthors: Siyi Chen, Mikaela Angelina Uy, Chan Hee Song, Faisal Ladhak, Adithyavairavan Murali, Qing Qu, Stan Birchfield, Valts Blukis, Jonathan TremblaySubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [624] arXiv:2512.04048 [pdf, ps, other]
-
Title: Stable Signer: Hierarchical Sign Language Generative ModelComments: 12 pages, 7 figures. More Demo at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Computers and Society (cs.CY)
- [625] arXiv:2512.04040 [pdf, ps, other]
-
Title: RELIC: Interactive Video World Model with Long-Horizon MemoryAuthors: Yicong Hong, Yiqun Mei, Chongjian Ge, Yiran Xu, Yang Zhou, Sai Bi, Yannick Hold-Geoffroy, Mike Roberts, Matthew Fisher, Eli Shechtman, Kalyan Sunkavalli, Feng Liu, Zhengqi Li, Hao TanComments: 22 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [626] arXiv:2512.04039 [pdf, ps, other]
-
Title: Fast & Efficient Normalizing Flows and Applications of Image Generative ModelsAuthors: Sandeep NagarComments: PhD ThesisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [627] arXiv:2512.04025 [pdf, ps, other]
-
Title: PSA: Pyramid Sparse Attention for Efficient Video Understanding and GenerationComments: Tech reportSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [628] arXiv:2512.04021 [pdf, ps, other]
-
Title: C3G: Learning Compact 3D Representations with 2K GaussiansAuthors: Honggyu An, Jaewoo Jung, Mungyeom Kim, Sunghwan Hong, Chaehyun Kim, Kazumi Fukuda, Minkyeong Jeon, Jisang Han, Takuya Narihira, Hyuna Ko, Junsu Kim, Yuki Mitsufuji, Seungryong KimComments: Project Page : this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [629] arXiv:2512.04019 [pdf, ps, other]
-
Title: Ultra-lightweight Neural Video Representation CompressionSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [630] arXiv:2512.04015 [pdf, ps, other]
-
Title: Learning Group Actions In Disentangled Latent Image RepresentationsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [631] arXiv:2512.04012 [pdf, ps, other]
-
Title: Emergent Outlier View Rejection in Visual Geometry Grounded TransformersAuthors: Jisang Han, Sunghwan Hong, Jaewoo Jung, Wooseok Jang, Honggyu An, Qianqian Wang, Seungryong Kim, Chen FengComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [632] arXiv:2512.04007 [pdf, ps, other]
-
Title: On the Temporality for Sketch Representation LearningComments: Preprint submitted to Pattern Recognition LettersSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [633] arXiv:2512.04000 [pdf, ps, other]
- [634] arXiv:2512.03996 [pdf, ps, other]
-
Title: Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding PerturbationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [635] arXiv:2512.03992 [pdf, ps, other]
-
Title: DIQ-H: Evaluating Hallucination Persistence in VLMs Under Temporal Visual DegradationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [636] arXiv:2512.03981 [pdf, ps, other]
-
Title: DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature AlignmentSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [637] arXiv:2512.03979 [pdf, ps, other]
-
Title: BlurDM: A Blur Diffusion Model for Image DeblurringComments: NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [638] arXiv:2512.03964 [pdf, ps, other]
-
Title: Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face PersonalizationComments: 17 pages, 13 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [639] arXiv:2512.03963 [pdf, ps, other]
-
Title: TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement LearningAuthors: Tao Wu, Li Yang, Gen Zhan, Yabin Zhang, Yiting Liao, Junlin Li, Deliang Fu, Li Zhang, Limin WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [640] arXiv:2512.03939 [pdf, ps, other]
-
Title: MUT3R: Motion-aware Updating Transformer for Dynamic 3D ReconstructionAuthors: Guole Shen, Tianchen Deng, Xingrui Qin, Nailin Wang, Jianyu Wang, Yanbo Wang, Yongtao Chen, Hesheng Wang, Jingchuan WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [641] arXiv:2512.03932 [pdf, ps, other]
-
Title: Beyond the Ground Truth: Enhanced Supervision for Image RestorationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [642] arXiv:2512.03918 [pdf, ps, other]
-
Title: UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive FrameworkAuthors: Youxin Pang, Yong Zhang, Ruizhi Shao, Xiang Deng, Feng Gao, Xu Xiaoming, Xiaoming Wei, Yebin LiuComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [643] arXiv:2512.03905 [pdf, ps, other]
-
Title: Zero-Shot Video Translation and Editing with Frame Spatial-Temporal CorrespondenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [644] arXiv:2512.03883 [pdf, ps, other]
-
Title: Dual Cross-Attention Siamese Transformer for Rectal Tumor Regrowth Assessment in Watch-and-Wait EndoscopyAuthors: Jorge Tapias Gomez, Despoina Kanata, Aneesh Rangnekar, Christina Lee, Julio Garcia-Aguilar, Joshua Jesse Smith, Harini VeeraraghavanComments: 6 pages, 5 figures, 1 table, submitted to ISBI conferenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [645] arXiv:2512.03869 [pdf, ps, other]
-
Title: An Automated Framework for Large-Scale Graph-Based Cerebrovascular AnalysisAuthors: Daniele Falcetta, Liane S. Canas, Lorenzo Suppa, Matteo Pentassuglia, Jon Cleary, Marc Modat, Sébastien Ourselin, Maria A. ZuluagaComments: Submitted to ISBI 2026. 6 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
- [646] arXiv:2512.03862 [pdf, ps, other]
-
Title: Diminishing Returns in Self-Supervised LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [647] arXiv:2512.03854 [pdf, ps, other]
-
Title: Prostate biopsy whole slide image dataset from an underrepresented Middle Eastern populationAuthors: Peshawa J. Muhammad Ali, Navin Vincent, Saman S. Abdulla, Han N. Mohammed Fadhl, Anders Blilie, Kelvin Szolnoky, Julia Anna Mielcarz, Xiaoyi Ji, Kimmo Kartasalo, Abdulbasit K. Al-Talabani, Nita MulliqiComments: 13 pages, 2 figures and 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [648] arXiv:2512.03852 [pdf, ps, other]
-
Title: Traffic Image Restoration under Adverse Weather via Frequency-Aware MambaComments: 12pages, 13 figures, 5tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [649] arXiv:2512.03848 [pdf, ps, other]
-
Title: PULSE: A Unified Multi-Task Architecture for Cardiac Segmentation, Diagnosis, and Few-Shot Cross-Modality Clinical AdaptationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [650] arXiv:2512.03844 [pdf, ps, other]
-
Title: CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset DistillationComments: 34 pages, 24 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [651] arXiv:2512.03837 [pdf, ps, other]
-
Title: Heatmap Pooling Network for Action Recognition from RGB VideosComments: Final Version of IEEE Transactions on Pattern Analysis and Machine IntelligenceJournal-ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [652] arXiv:2512.03834 [pdf, ps, other]
-
Title: Lean Unet: A Compact Model for Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [653] arXiv:2512.03827 [pdf, ps, other]
-
Title: A Robust Camera-based Method for Breath Rate MeasurementAuthors: Alexey ProtopopovComments: 9 pages, 4 figures, 2 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [654] arXiv:2512.03817 [pdf, ps, other]
-
Title: HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to EnglishAuthors: Ahmed Nasser, Marwan Mohamed, Alaa Sherif, Basmala Mahmoud, Shereen Yehia, Asmaa Saad, Mariam S. El-Rahmany, Ensaf H. MohamedSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [655] arXiv:2512.03796 [pdf, ps, other]
-
Title: LSRS: Latent Scale Rejection Sampling for Visual Autoregressive ModelingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [656] arXiv:2512.03794 [pdf, ps, other]
-
Title: AdaptVision: Efficient Vision-Language Models via Adaptive Visual AcquisitionComments: 15 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [657] arXiv:2512.03751 [pdf, ps, other]
-
Title: Research on Brain Tumor Classification Method Based on Improved ResNet34 NetworkSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [658] arXiv:2512.03749 [pdf, ps, other]
-
Title: Fully Unsupervised Self-debiasing of Text-to-Image Diffusion ModelsComments: Accepted at WACV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [659] arXiv:2512.03746 [pdf, ps, other]
-
Title: Thinking with Programming Vision: Towards a Unified View for Thinking with ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [660] arXiv:2512.03745 [pdf, ps, other]
-
Title: Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-IdentificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [661] arXiv:2512.03730 [pdf, ps, other]
-
Title: Out-of-the-box: Black-box Causal Attacks on Object DetectorsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [662] arXiv:2512.03724 [pdf, ps, other]
-
Title: PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor AttentionAuthors: Ziwen Li, Xin Wang, Hanlue Zhang, Runnan Chen, Runqi Lin, Xiao He, Han Huang, Yandong Guo, Fakhri Karray, Tongliang Liu, Mingming GongSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [663] arXiv:2512.03715 [pdf, ps, other]
-
Title: DINO-RotateMatch: A Rotation-Aware Deep Framework for Robust Image Matching in Large-Scale 3D ReconstructionComments: 9 pages, 5 figures, 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [664] arXiv:2512.03701 [pdf, ps, other]
-
Title: Structured Uncertainty Similarity Score (SUSS): Learning a Probabilistic, Interpretable, Perceptual Metric Between ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [665] arXiv:2512.03687 [pdf, ps, other]
-
Title: Active Visual Perception: Opportunities and ChallengesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [666] arXiv:2512.03683 [pdf, ps, other]
-
Title: GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent SpacesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [667] arXiv:2512.03673 [pdf, ps, other]
-
Title: ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [668] arXiv:2512.03667 [pdf, ps, other]
-
Title: Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical ReasoningComments: Technical reportSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [669] arXiv:2512.03666 [pdf, ps, other]
-
Title: ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric VideosAuthors: Qi'ao Xu, Tianwen Qian, Yuqian Fu, Kailing Li, Yang Jiao, Jiacheng Zhang, Xiaoling Wang, Liang HeComments: 26 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [670] arXiv:2512.03663 [pdf, ps, other]
-
Title: Multi-Scale Visual Prompting for Lightweight Small-Image ClassificationAuthors: Salim KhazemSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [671] arXiv:2512.03643 [pdf, ps, other]
-
Title: Optical Context Compression Is Just (Bad) AutoencodingSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [672] arXiv:2512.03640 [pdf, ps, other]
-
Title: MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention MechanismsJournal-ref: MultiMedia Modeling. MMM 2025. Lecture Notes in Computer Science, vol 15521. Springer, SingaporeSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [673] arXiv:2512.03625 [pdf, ps, other]
-
Title: FeatureLens: A Highly Generalizable and Interpretable Framework for Detecting Adversarial Examples Based on Image FeaturesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [674] arXiv:2512.03621 [pdf, ps, other]
-
Title: ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video GenerationAuthors: Yaokun Li, Shuaixian Wang, Mantang Guo, Jiehui Huang, Taojun Ding, Mu Hu, Kaixuan Wang, Shaojie Shen, Guang TanComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [675] arXiv:2512.03619 [pdf, ps, other]
-
Title: LAMP: Language-Assisted Motion Planning for Controllable Video GenerationComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [676] arXiv:2512.03601 [pdf, ps, other]
-
Title: Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene UnderstandingComments: Accepted to NeurIPS 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [677] arXiv:2512.03598 [pdf, ps, other]
-
Title: Memory-Guided Point Cloud Completion for Dental ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [678] arXiv:2512.03597 [pdf, ps, other]
-
Title: HBFormer: A Hybrid-Bridge Transformer for Microtumor and Miniature Organ SegmentationAuthors: Fuchen Zheng, Xinyi Chen, Weixuan Li, Quanjun Li, Junhua Zhou, Xiaojiao Guo, Xuhang Chen, Chi-Man Pun, Shoujun ZhouComments: 6 pages, 4 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [679] arXiv:2512.03593 [pdf, ps, other]
-
Title: CloseUpAvatar: High-Fidelity Animatable Full-Body Avatars with Mixture of Multi-Scale TexturesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [680] arXiv:2512.03592 [pdf, ps, other]
-
Title: Harnessing Hypergraphs in Geometric Deep Learning for 3D RNA Inverse FoldingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [681] arXiv:2512.03590 [pdf, ps, other]
-
Title: Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video InterpolationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [682] arXiv:2512.03580 [pdf, ps, other]
-
Title: Dynamic Optical Test for Bot Identification (DOT-BI): A simple check to identify bots in surveys and online processesSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
- [683] arXiv:2512.03577 [pdf, ps, other]
-
Title: Cross-Stain Contrastive Learning for Paired Immunohistochemistry and Histopathology Slide Representation LearningComments: 6 pages, 2 figures. Camera-ready version accepted for IEEE BIBM 2025Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [684] arXiv:2512.03575 [pdf, ps, other]
-
Title: UniComp: Rethinking Video Compression Through Informational UniquenessSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [685] arXiv:2512.03574 [pdf, ps, other]
-
Title: Global-Local Aware Scene Text EditingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [686] arXiv:2512.03566 [pdf, ps, other]
-
Title: GAOT: Generating Articulated Objects Through Text-Guided Diffusion ModelsComments: Accepted by ACM MM Asia2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [687] arXiv:2512.03558 [pdf, ps, other]
-
Title: CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map UnderstandingAuthors: Huy Quang Ung, Guillaume Habault, Yasutaka Nishimura, Hao Niu, Roberto Legaspi, Tomoki Oya, Ryoichi Kojima, Masato Taya, Chihiro Ono, Atsunori Minamikawa, Yan LiuComments: Accepted at SIGSPATIAL 2025 (Best paper candidates), 15 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [688] arXiv:2512.03553 [pdf, ps, other]
-
Title: Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity MatchingAuthors: Wei Chee Yew, Hailun Xu, Sanjay Saha, Xiaotian Fan, Hiok Hian Ong, David Yuchen Wang, Kanchan Sarkar, Zhenheng Yang, Danhui GuanComments: Accepted at KDD 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [689] arXiv:2512.03542 [pdf, ps, other]
-
Title: V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time InterventionAuthors: Nan Sun, Zhenyu Zhang, Xixun Lin, Kun Wang, Yanmin Shang, Naibin Gu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang, Yanan CaoSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [690] arXiv:2512.03540 [pdf, ps, other]
-
Title: CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image GenerationAuthors: Ruoxuan Zhang, Bin Wen, Hongxia Xie, Yi Yao, Songhan Zuo, Jian-Yu Jiang-Lin, Hong-Han Shuai, Wen-Huang ChengComments: Accepted by ACM Multimedia 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [691] arXiv:2512.03534 [pdf, ps, other]
-
Title: Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual GenerationComments: Visualizations are available at the website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [692] arXiv:2512.03532 [pdf, ps, other]
-
Title: OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [693] arXiv:2512.03520 [pdf, ps, other]
-
Title: FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion GenerationComments: 15 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [694] arXiv:2512.03510 [pdf, ps, other]
-
Title: CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous DrivingSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [695] arXiv:2512.03509 [pdf, ps, other]
-
Title: AfroBeats Dance Movement Analysis Using Computer Vision: A Proof-of-Concept Framework Combining YOLO and Segment Anything ModelSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [696] arXiv:2512.03508 [pdf, ps, other]
-
Title: Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic SegmentationComments: ICCV 2025 (poster)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [697] arXiv:2512.03500 [pdf, ps, other]
-
Title: EEA: Exploration-Exploitation Agent for Long Video UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [698] arXiv:2512.03499 [pdf, ps, other]
-
Title: NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable AdaptationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [699] arXiv:2512.03479 [pdf, ps, other]
-
Title: Towards Object-centric Understanding for Instructional VideosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [700] arXiv:2512.03477 [pdf, ps, other]
-
Title: Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma DiagnosisComments: 10 pages, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [701] arXiv:2512.03474 [pdf, ps, other]
-
Title: Procedural Mistake Detection via Action Effect ModelingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [702] arXiv:2512.03470 [pdf, ps, other]
-
Title: Difference Decomposition Networks for Infrared Small Target DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [703] arXiv:2512.03463 [pdf, ps, other]
-
Title: Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [704] arXiv:2512.03454 [pdf, ps, other]
-
Title: Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous VehiclesAuthors: Haicheng Liao, Huanming Shen, Bonan Wang, Yongkang Li, Yihong Tang, Chengyue Wang, Dingyi Zhuang, Kehua Chen, Hai Yang, Chengzhong Xu, Zhenning LiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [705] arXiv:2512.03453 [pdf, ps, other]
-
Title: GeoVideo: Introducing Geometric Regularization into Video Generation ModelComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [706] arXiv:2512.03451 [pdf, ps, other]
-
Title: GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [707] arXiv:2512.03450 [pdf, ps, other]
-
Title: KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [708] arXiv:2512.03449 [src]
-
Title: LM-CartSeg: Automated Segmentation of Lateral and Medial Cartilage and Subchondral Bone for Radiomics AnalysisAuthors: Tongxu ZhangComments: The manuscript represents only a preliminary and substantially incompleted exploration. The author has decided not to stand by these results, and a thoroughly revised and significantly different version will be developed separately. Therefore this version is withdrawn and should not be citedSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [709] arXiv:2512.03445 [pdf, ps, other]
-
Title: Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data GenerationAuthors: Xieji Li, Siyuan Yan, Yingsheng Liu, H. Peter Soyer, Monika Janda, Victoria Mar, Zongyuan GeComments: 10 pages. Under ReviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [710] arXiv:2512.03430 [pdf, ps, other]
-
Title: Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion FeaturesComments: Accepted to the ICML 2025 TerraBytes Workshop (June 9, 2025)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [711] arXiv:2512.03427 [pdf, ps, other]
-
Title: Generalization Evaluation of Deep Stereo Matching Methods for UAV-Based Forestry ApplicationsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [712] arXiv:2512.03424 [pdf, ps, other]
-
Title: DM3D: Deformable Mamba via Offset-Guided Gaussian Sequencing for Point Cloud UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [713] arXiv:2512.03418 [pdf, ps, other]
-
Title: YOLOA: Real-Time Affordance Detection via LLM AdapterComments: 13 pages, 9 figures, conferenceSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
- [714] arXiv:2512.03405 [pdf, ps, other]
-
Title: ViDiC: Video Difference CaptioningAuthors: Jiangtao Wu, Shihao Li, Zhaozhou Bian, Jialu Chen, Runzhe Wen, An Ping, Yiwen He, Jiakai Wang, Yuanxing Zhang, Jiaheng LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [715] arXiv:2512.03404 [pdf, ps, other]
-
Title: MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-IdentificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [716] arXiv:2512.03370 [pdf, ps, other]
-
Title: ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [717] arXiv:2512.03369 [pdf, ps, other]
-
Title: FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread ForecastingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [718] arXiv:2512.03359 [pdf, ps, other]
-
Title: A Hybrid Deep Learning Framework with Explainable AI for Lung Cancer Classification with DenseNet169 and SVMSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [719] arXiv:2512.03350 [pdf, ps, other]
-
Title: SeeU: Seeing the Unseen World via 4D Dynamics-aware GenerationComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [720] arXiv:2512.03346 [pdf, ps, other]
-
Title: Hierarchical Attention for Sparse Volumetric Anomaly Detection in Subclinical KeratoconusComments: 16 pages, 7 figures, 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [721] arXiv:2512.03345 [pdf, ps, other]
-
Title: HalluGen: Synthesizing Realistic and Controllable Hallucinations for Evaluating Image RestorationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [722] arXiv:2512.03339 [pdf, ps, other]
-
Title: ProtoEFNet: Dynamic Prototype Learning for Inherently Interpretable Ejection Fraction Estimation in EchocardiographyAuthors: Yeganeh Ghamary, Victoria Wu, Hooman Vaseli, Christina Luong, Teresa Tsang, Siavash Bigdeli, Purang AbolmaesumiComments: 11 pages, Accepted in IMIMIC Workshop at MICCAI 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [723] arXiv:2512.03335 [pdf, ps, other]
-
Title: Step-by-step Layered Design GenerationAuthors: Faizan Farooq Khan, K J Joseph, Koustava Goswami, Mohamed Elhoseiny, Balaji Vasan SrinivasanJournal-ref: AAAI 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [724] arXiv:2512.03317 [pdf, ps, other]
-
Title: NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map ConstructionComments: Accepted to 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [725] arXiv:2512.03284 [pdf, ps, other]
-
Title: SpatialReasoner: Active Perception for Large-Scale 3D Scene UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [726] arXiv:2512.03257 [pdf, ps, other]
-
Title: PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing ImagerySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [727] arXiv:2512.03247 [pdf, ps, other]
-
Title: PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space RefinementComments: Published in the Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [728] arXiv:2512.03245 [pdf, ps, other]
-
Title: 2-Shots in the Dark: Low-Light Denoising with Minimal Data AcquisitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [729] arXiv:2512.03237 [pdf, ps, other]
-
Title: LLM-Guided Material Inference for 3D Point CloudsSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [730] arXiv:2512.03233 [pdf, ps, other]
-
Title: Object Counting with GPT-4o and GPT-5: A Comparative StudyComments: 5 pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [731] arXiv:2512.03210 [pdf, ps, other]
-
Title: Flux4D: Flow-based Unsupervised 4D ReconstructionAuthors: Jingkang Wang, Henry Che, Yun Chen, Ze Yang, Lily Goli, Sivabalan Manivasagam, Raquel UrtasunComments: NeurIPS 2025. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
- [732] arXiv:2512.03199 [pdf, ps, other]
-
Title: Does Head Pose Correction Improve Biometric Facial Recognition?Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [733] arXiv:2512.03182 [pdf, ps, other]
-
Title: Drainage: A Unifying Framework for Addressing Class UncertaintyComments: 16 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [734] arXiv:2512.03126 [pdf, ps, other]
-
Title: Hierarchical Process Reward Models are Symbolic Vision LearnersSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [735] arXiv:2512.04076 (cross-list from cs.GR) [pdf, ps, other]
-
Title: Radiance Meshes for Volumetric ReconstructionAuthors: Alexander Mai, Trevor Hedstrom, George Kopanas, Janne Kontkanen, Falko Kuester, Jonathan T. BarronComments: Website: half-potato.gitlab.io/rmSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
- [736] arXiv:2512.04032 (cross-list from cs.CL) [pdf, ps, other]
-
Title: Jina-VLM: Small Multilingual Vision Language ModelAuthors: Andreas Koukounas, Georgios Mastrapas, Florian Hönicke, Sedigheh Eslami, Guillaume Roncari, Scott Martens, Han XiaoComments: 18 pages, 1-7 main content, 13-18 appendix for tables and datasetSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [737] arXiv:2512.03995 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Artificial Microsaccade Compensation: Stable Vision for an OrnithopterComments: 29 pages, 5 figures, 2 tables, under reviewSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [738] arXiv:2512.03962 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Tada-DIP: Input-adaptive Deep Image Prior for One-shot 3D Image ReconstructionComments: 6 pages, 8 figures, 2025 Asilomar Conference on Signals, Systems, and Computers. Code is available at github.com/evanbell02/Tada-DIP/Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [739] arXiv:2512.03656 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Cyclical Temporal Encoding and Hybrid Deep Ensembles for Multistep Energy ForecastingSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [740] arXiv:2512.03556 (cross-list from cs.RO) [pdf, ps, other]
-
Title: RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RLAuthors: Yinzhou Tang, Yu Shang, Yinuo Chen, Bingwen Wei, Xin Zhang, Shu'ang Yu, Liangzhi Shi, Chao Yu, Chen Gao, Wei Wu, Yong LiSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [741] arXiv:2512.03522 (cross-list from cs.RO) [pdf, ps, other]
-
Title: MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global LocalizationComments: Accepted in IEEE Robotics and Automation Letters (2025)Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [742] arXiv:2512.03514 (cross-list from cs.IR) [pdf, ps, other]
-
Title: M3DR: Towards Universal Multilingual Multimodal Document RetrievalSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [743] arXiv:2512.03422 (cross-list from cs.RO) [pdf, ps, other]
-
Title: What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation ModelsAuthors: Tianchen Deng, Yue Pan, Shenghai Yuan, Dong Li, Chen Wang, Mingrui Li, Long Chen, Lihua Xie, Danwei Wang, Jingchuan Wang, Javier Civera, Hesheng Wang, Weidong ChenSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [744] arXiv:2512.03216 (cross-list from physics.ins-det) [pdf, ps, other]
-
Title: Kaleidoscopic Scintillation Event ImagingSubjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [745] arXiv:2512.03173 (cross-list from cs.CY) [pdf, ps, other]
-
Title: Culture Affordance Atlas: Reconciling Object Diversity Through Functional MappingJournal-ref: AAAI 2026 Social Impact TrackSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [746] arXiv:2512.03166 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual EnvironmentsSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [747] arXiv:2512.03111 (cross-list from q-bio.GN) [pdf, ps, other]
-
Title: PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-CancerAuthors: Xiaoshui Huang, Tianlin Zhu, Yifan Zuo, Xue Xia, Zonghan Wu, Jiebin Yan, Dingli Hua, Zongyi Xu, Yuming Fang, Jian ZhangComments: Accepted by AAAI 2026Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [748] arXiv:2512.03054 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Energy-Efficient Federated Learning via Adaptive Encoder Freezing for MRI-to-CT Conversion: A Green AI-Guided ResearchAuthors: Ciro Benito Raggio, Lucia Migliorelli, Nils Skupien, Mathias Krohmer Zabaleta, Oliver Blanck, Francesco Cicone, Giuseppe Lucio Cascini, Paolo Zaffino, Maria Francesca SpadeaComments: 22 pages, 13 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Medical Physics (physics.med-ph)
- [749] arXiv:2512.03052 (cross-list from cs.GR) [pdf, ps, other]
-
Title: LATTICE: Democratize High-Fidelity 3D Generation at ScaleAuthors: Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Qingxiang Lin, Jingwei Huang, Chunchao Guo, Xiangyu YueComments: Technical ReportSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[ showing up to 1000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, new, 2512, contact, help (Access key information)