论文雷达日报｜2026-04-22

一句话结论：今日论文聚焦参数高效微调新范式（ShadowPEFT）、多模态幻觉动态纠正（PSRD）、Agent 评估基准（AJ-Bench）以及机器人世界模型语义抽象（MWM），Agent 与推理优化仍是最活跃方向。

摘要

ShadowPEFT 提出共享影子模块替代 LoRA 的分布式低秩扰动，在可比参数预算下匹配或超越 LoRA/DoRA，且支持边缘部署。
PSRD 揭示多模态幻觉的阶段性动态规律，通过轻量奖励模型在解码时动态纠正，将 LLaVA-1.5-7B 幻觉率降低 50%。
AJ-Bench 首次系统评估 Agent-as-a-Judge 范式，覆盖搜索、数据系统、GUI 三域 155 任务 516 标注轨迹。
Mask World Model 将视频扩散世界模型的预测目标从像素替换为语义掩码，大幅提升机器人策略泛化与鲁棒性。
SmoothCruiser 在正则化 MDP 规划中实现问题无关的 O(1/ε⁴) 样本复杂度，NeurIPS 发表、引用速度突出。

📌 Top picks (交叉命中)

ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning（HF↑18 / trending #9 / watchlist: inference） → 共享影子模块替代LoRA实现层级微调
Mitigating Multimodal Hallucination via Phase-wise Self-reward（HF↑1 / trending #11 / watchlist: inference） → 阶段自奖励信号动态纠正多模态幻觉
AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation（HF↑11 / trending #7 / watchlist: agent） → Agent充当裁判评估复杂环境行为
Planning in entropy-regularized Markov decision processes and games（NeurIPS / citation_velocity:13） → 正则化MDP规划达多项式样本复杂度
A-MAR: Agent-based Multimodal Art Retrieval（watchlist: reasoning + agent） → 结构化推理计划引导多模态艺术检索
Mask World Model: Predicting What Matters for Robust Robot Policy Learning（watchlist: robot policy + world model） → 语义掩码世界模型提升机器人泛化
Time Series Augmented Generation for Financial Applications（watchlist: reasoning + agent） → LLM Agent金融时序工具调用评测框架
GRAFT: Geometric Refinement and Fitting Transformer for Human Scene Reconstruction（watchlist: reasoning + inference / citation_velocity:1） → 几何迭代精修实现快速人场景重建

🏷 Watchlist 分类命中

Agent / Reasoning

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments — 331 个可被 reward hack 的环境数据集，评估 agent 鲁棒性（score 5.0）
What Makes an LLM a Good Optimizer? — 分析 LLM 引导优化的轨迹特征（HF trending #4, score 4.6）
A Self-Evolving Framework for Efficient Terminal Agents — 观察压缩实现终端 agent 自进化（score 4.5）
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment — 辩证对齐缓解 agent 行为者-观察者不对称（score 4.5）

Reasoning / Benchmark

Chain-of-Thought Degrades Visual Spatial Reasoning — CoT 反而降低多模态视觉空间推理能力（HF trending #13, score 4.7）
Mind’s Eye: Visual Abstraction, Transformation and Composition Benchmark — 视觉抽象推理基准（HF trending #15, score 4.5）
Pause or Fabricate? Training LMs for Grounded Reasoning — 训练模型在不确定时暂停而非编造（score 4.0）

Inference / PEFT

Micro Language Models Enable Instant Responses — 微型语言模型实现即时推理响应（HF trending #2, score 4.8）
RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation — 几何驱动的 LoRA 参数选择（HF trending #1, score 3.4）

Robot / VLA / World Model

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Transfer — 统一物理语言实现人到仿人机器人策略迁移（score 4.5）

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。所有候选论文的 similar_papers 字段均为空。

🧑‍🔬 新出现的作者 / 团队

Michal Valko（Google DeepMind / Inria）— 本次候选中出现 3 篇论文（SmoothCruiser 等），NeurIPS 发表，citation_velocity 突出，研究方向覆盖 RL 规划与采样。未在 tracked_authors 列表中。
- 代表作：Planning in entropy-regularized MDP
Gerard Pons-Moll 团队（University of Tübingen）— GRAFT + InHabit 两篇论文命中（Pradyumna YM、Nikita Kister、István Sárándi 共现），聚焦人体-场景交互 3D 重建。
- 代表作：GRAFT

📉 覆盖缺口与不确定性

s2_similar_unavailable：所有候选论文的 Semantic Scholar 相似论文字段为空，延伸阅读段无法填充。
候选论文的 affiliations 字段普遍为空（HF JSON 不附机构信息、S2 索引延迟），机构命中依赖推断而非结构化数据。
今日候选 130 篇，Top picks 最高分仅 5.6（无 tracked_author 或 tracked_affiliation 命中加成），整体信号密度中等。

来源与交叉验证说明

本期依赖 arXiv + HuggingFace Daily Papers + Semantic Scholar 三源交叉。HF trending 提供社区热度信号，S2 提供引用速度与 TLDR，arXiv 为论文主源。三源均正常返回，无单源降级。结论锚定 arXiv 预印本原文，HF upvotes 仅作辅助排序依据。

Hanzhi's BLOG

[论文·2026-04-22]