论文雷达日报｜2026-05-05

一句话结论：今日 124 条候选三源齐备（arXiv 100 / HF Daily 27 / S2 全量补全），主线被 VLA / Embodied Reasoning 吃满——MolmoAct2（AI2/UW，9.5 分）以单篇 105 upvotes 拉开榜首，Meta 的 Code World Model Preparedness Report 提供今日唯一可识别 frontier-lab 署名信号。

摘要

VLA × Embodied Reasoning 主线：MolmoAct2（榜首 9.5）横扫 7 个仿真+真实 benchmark；同主线还有 Latent Bridge VLA（4.5）和 Sim-to-Real VLA（2.5）形成横向呼应。
Agentic RL 持续高热：Odysseus（VLM 100+ 回合游戏决策，7.0）、T²PO（不确定性引导的多回合 RL，5.9）两篇直接进入 Top picks；watchlist agent 命中 11 篇，reasoning 命中 14 篇。
唯一 frontier-lab 信号：Meta 发布 Code World Model Preparedness Report（6.6）——非新模型，是 CWM 的 Frontier AI Framework 风险评估文件，定位类似 Anthropic responsible scaling 报告。
MoE / Inference 偏冷：MoE 全榜仅 3 篇命中（含 MASCing 入选 Top picks），DPO 2 篇，speculative decoding 1 篇，scheduler 1 篇——对应主线本周轮空。
S2 相似论文链路缺失：所有 124 条候选 similar_papers 字段未返回，extended_reading 段今日为空。

📌 Top picks (交叉命中)

MolmoAct2: Action Reasoning Models for Real-world Deployment（HF #5 · 105 upvotes · cs.RO · AI2/UW 团队 Ali Farhadi / Dieter Fox / Ranjay Krishna）→ 开源 VLA 基模在 7 仿真+真实场景刷过 Pi-05，13 个具身推理 benchmark 上 MolmoER 反超 GPT-5 与 Gemini Robotics ER-1.5。
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning（HF #15 · 11 upvotes · watchlist:reasoning,agent）→ 预训练 VLM 作为强动作先验，长回合 RL 样本效率显著优于经典深度 RL。
Generative Modeling with Orbit-Space Particle Flow Matching（HF #8 · watchlist:inference,dpo）→ 粒子原生流匹配框架 OGPP 利用置换不变性显著降低 per-index 任务负担。
Code World Model Preparedness Report（HF #4 · Meta · watchlist:reasoning,world model）→ Meta 对 CWM 的 frontier 风险评估：未发现额外灾难性风险。
PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments（HF #12 · 3 upvotes · watchlist:reasoning,agent）→ 首个基于真实 EHR 的医生 agent benchmark，比静态 QA 更接近自治诊疗流程。
T²PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning（HF #11 · watchlist:reasoning,agent）→ 用 token 与 turn 双层不确定性自适应触发 thinking intervention，稳定多回合 agentic RL。
MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks（HF #17 · watchlist:moe,inference）→ 不重训即可重配 MoE 安全行为的激活掩码框架。
From Context to Skills: Can Language Models Learn from Context Skillfully?（HF #13 · 98 upvotes · watchlist:agent,inference）→ Ctx2Skill 自演化框架在无人类监督下发现/精炼/选择上下文相关技能。

🏷 Watchlist 分类命中

候选 categories[] 字段 arXiv 端多为空，按 watchlist 关键词分组，每类取分数前 4。

reasoning（候选 14 篇，Top picks 外取 4）

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs（4.9 · HF #1 · 11 upvotes）— 轻量 PVM 模块抗 LVLM 视觉信号衰减、加速内部预测收敛。
Visual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMs（4.5）— 释放 MLLM 视觉 latent 潜在推理能力的提示性观察。
ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review（4.5）— 支持同行评审可复现性评估的 agent 流水线。
Perceptual Flow Network for Visually Grounded Reasoning（4.1）— 基于感知流的视觉接地推理网络。

agent（候选 11 篇，Top picks 外取 4）

Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation（5.2）— Psi-RAG 跨文档多跳 QA 上对比 RAPTOR +25.9% / HippoRAG2 +7.4% F1。
AcademiClaw: When Students Set Challenges for AI Agents（4.5 · HF #10）— 学生主导命题的 AI agent 评测集。
DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks（4.0）— 生成式 GNN 驱动的动态环境 SLAM。
Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration（2.5）— 编排导向的 LLM 多 agent RL。

inference（候选 11 篇，Top picks 外取 4）

Latent Bridge: Feature Delta Prediction for Efficient Dual-System Vision-Language-Action Models（4.5 · watchlist:inference,vla）— 双系统 VLA 通过 feature delta 预测降低推理开销。
SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection（4.0）— 压缩感知的自适应投机解码 γ 选择。
Linearizing Vision Transformer with Test-Time Training（2.5）— TTT 线性化 ViT 推理路径。
The Bayesian Reflex: Online Learning as the Autonomic Nervous System of Modern AI（2.5）— 把在线学习类比为现代 AI 的自主神经系统。

moe / vla / world model（命中稀疏，合并列出）

moe：Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE（4.0）— 统一多模态模型嫁接 DiT-MoE。
vla（Top picks 外）：Seeing Realism from Simulation: Efficient Video Transfer for VLA（2.5）— 仿真到真实的视频迁移 VLA。
world model：本类除 Top pick #4（Meta CWM Preparedness）外今日无独立增量信号。

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。所有 124 条候选的 similar_papers 字段为空——可能因 5/4 抓取窗口内 S2 的引用图未对当日 arXiv 新条目完成索引。已在 coverage_gaps 标记 s2_similar_unavailable，建议读者直接顺着 Top pick 的 s2_url 在 S2 网页上查相似列表。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人。

候选 JSON affiliations[] 在 124 条上全部为空（HF Daily 与 arXiv 均未在元数据中携带机构字段），无法在自动流程里作 frontier/oss-lab 落桶判定；
全候选与 watchlist tracked_authors（Jason Wei / Yann LeCun / Ilya Sutskever / Tri Dao / Sergey Levine 等）做 full-name 严格匹配——零命中；
软信号：MolmoAct2 作者列表含 Ali Farhadi / Dieter Fox / Ranjay Krishna / Joyce Chai，与 AI2 / UW 强相关，但这些名字尚未出现在 tracked_authors 种子里；Code World Model Preparedness Report 来自 Meta，需要人工 review 是否扩 seed。

📉 覆盖缺口与不确定性

s2_similar_unavailable：所有 candidate 的 similar_papers 为空，今日延伸阅读段缺数据。
affiliations_missing（软）：候选 JSON affiliations[] 124/124 为空，无法自动识别 frontier lab / 高校 / 创业团队归属，影响新作者 / tracked-affiliation 命中。建议在下次 paper_fetch.py 改造时尝试从 PDF first page 或 S2 paper detail 拉取。
venue_empty（软）：120/124 论文 venue 字段为空（多为 arXiv preprint，正常现象，不计入 hard gap）。

confidence_flags：s2_similar_link_degraded、affiliation_metadata_unavailable。

来源与交叉验证说明

三源召回：arXiv 100 篇（含 HF 重叠）/ HF Daily 27 篇 / Semantic Scholar 全量元数据补全（120/124 带 s2_tldr）。三源间交叉占比：arXiv∩HF∩S2 = 3 篇、HF∩S2 = 24 篇、arXiv∩S2 = 97 篇。
排序权重锚定 paper_watchlist.yaml::ranking_weights（hf_trending_rank + watchlist_keyword + nice_to_have），结论遵循 primary > metadata > curated > other 优先级——MolmoAct2 等 Top picks 的方法/基准声明锚在 arXiv abstract 与 S2 tldr，HF upvotes 仅用于热度信号，不作为论文结果证据。
0/124 标记 seen_before=true——seen-pool 14 天滚动窗口未误命中，本日全为新条目。

Hanzhi's BLOG

[论文·2026-05-05]