论文雷达日报｜2026-05-07

一句话结论：当日 reasoning + agent 双主线最厚（top30 各 15 次），三源交集的 OpenSearch-VL / First-Token / RLDX-1 是最强信号；S2 相似图与 affiliations 双双缺失，延伸阅读与新作者段做空。

摘要

候选池规模：134 篇（arXiv + HF Daily + S2 三源去重后），按 ranking_score 已排序。
主线信号：top30 中 reasoning 15 次、agent 15 次、inference 9 次、world model 5 次、VLA 2 次、MoE 1 次。
三源交集：2605.05185 OpenSearch-VL、2605.05166 First-Token、2605.03269 RLDX-1 同时在 arXiv + HF Daily + S2 出现。
HF Daily 头部：trending #1 SWE-WebDevBench、#2 First-Token、#3 MiniCPM-o 4.5、#5 When-to-Think、#7 ResRL、#10 OpenSearch-VL（77↑）、#13 JoyAI-Image、#15 RLDX-1（80↑）。
降级标记：s2_similar_unavailable + 全候选 affiliations 为空 → 延伸阅读 / 新作者两段显式做空。

📌 Top picks (交叉命中)

按 ranking_score 降序，cap 8。

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning（HF 1↑；trending #5 · score=7.5） → 并行推理决定何时输出，平衡延迟与正确率。
- 理由：HF trending #5；命中 watchlist 关键词 reasoning/MoE，含 benchmark+SFT。
- tldr_en：This work introduces Side-by-Side (SxS) Interleaved Reasoning, which makes disclosure timing a controllable decision within standard autoregressive generation and improves accuracy–content-latency Pareto trade-offs under token-level proxies such as inter-update waiting.
- 链接：HF · S2
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents（HF 77↑；trending #10 · score=7.0） → 开源多模态搜索智能体，引入容错 GRPO 训练算法。
- 理由：arXiv+HF+S2 三源交集；HF trending #10、77 票；命中 reasoning/agent。
- tldr_en：This work introduces OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning and proposes a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping.
- 链接：HF · S2
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning（HF 3↑；trending #7 · score=6.8） → 残差 RL 同时提升推理与生成多样性。
- 理由：HF trending #7；命中 reasoning/agent，跨 12 个数学/代码/Agent benchmark。
- tldr_en：Positive sample projection Residual Reinforcement Learning (ResRL) is proposed that decouples similar semantic distributions among positive and negative responses, improving reasoning while preserving diversity and outperforming strong baselines on average across twelve benchmarks spanning Mathematics, Code, Agent Tasks, and Function Calling.
- 链接：HF · S2
ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation（arXiv 新提交 · score=6.5） → 面向机器人操作的 4D 时空一致 VLA 框架。
- 理由：命中 reasoning/inference/VLA 三关键词；新 cs.RO 提交，引入 CS-Thinker。
- tldr_en：This work proposes ConsisVLA-4D, a unified and efficient framework that enhances spatiotemporal consistency in 3D perception and 4D reasoning and introduces CS-Thinker to achieve cross-scene spatiotemporal consistency as actions unfold.
- 链接：HF · S2
RLDX-1 Technical Report（HF 80↑；trending #15 · score=6.5） → 多流动作 Transformer 统一灵巧操作策略。
- 理由：HF trending #15、80 票；命中 inference/VLA，主打 MSAT 架构。
- tldr_en：This work introduces RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention.
- 链接：HF · S2
Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation（HF 7↑；trending #13 · score=6.2） → 统一理解、生成、编辑的空间智能模型。
- 理由：HF trending #13；命中 reasoning + world model，提出 JoyAI-Image。
- tldr_en：The bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence.
- 链接：HF · S2
SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies（HF 2↑；trending #1 · score=5.9） → 面向 vibe coding 平台的端到端 agent 基准。
- 理由：HF trending #1；命中 agent，定位「虚拟软件公司」评测。
- tldr_en：SWE-WebDev Bench is released as a community benchmark to enable larger-scale replication to establish generality and help platform builders identify and address four recurring shortcomings in the current generation of AI app builders.
- 链接：HF · S2
The First Token Knows: Single-Decode Confidence for Hallucination Detection（trending #2 · score=5.3） → 首 token 熵即可媲美自一致性的幻觉检测。
- 理由：三源交集；HF trending #2；命中 inference，省 N×解码成本。
- tldr_en：First-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first content-bearing answer token of a single greedy decode, matches or modestly exceeds semantic self-consistency on closed-book short-answer factual question answering.
- 链接：HF · S2

🏷 Watchlist 分类命中

按 arXiv 主分类聚合，剔除已进 Top picks 的论文，每类上限 4。

cond-mat.mtrl-sci

cs.AI

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents — 长视野搜索智能体的弹性上下文编排。（命中：watchlist_keyword:reasoning,agent+nice_to_have:benchmark）
Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation — 基于选择性委派的轻量 agent 路由。（命中：watchlist_keyword:agent,inference+nice_to_have:benchmark）
Executable World Models for ARC-AGI-3 in the Era of Coding Agents — ARC-AGI-3 的可执行世界模型与代码 agent。（命中：watchlist_keyword:agent,world model）

cs.AR

Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours — Agent 80 小时构建 TurboQuant 推理加速器。（命中：watchlist_keyword:agent,inference）

cs.CL

Misaligned by Reward: Socially Undesirable Preferences in LLMs — 奖励驱动 LLM 的社会偏好失配研究。（命中：watchlist_keyword:reasoning+nice_to_have:benchmark,evaluation）

cs.CR

Agentic Vulnerability Reasoning on Windows COM Binaries — 面向 Windows COM 二进制的 agent 漏洞推理。（命中：watchlist_keyword:reasoning,agent+nice_to_have:benchmark）

cs.CV

LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore) — 4D 世界模型物理性挑战赛报告。（命中：watchlist_keyword:world model+nice_to_have:benchmark,evaluation+citation_velocity:2.0）
Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging — VLM 零样本医学异常定位的最优传输框架。（命中：watchlist_keyword:reasoning,inference+nice_to_have:benchmark,evaluation）
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World — 物理一致 3D 资产生成用于虚拟世界。（命中：hf_trending_rank:17+watchlist_keyword:agent+nice_to_have:embodied）
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models — 步蒸馏扩散模型的策略内自蒸馏微调。（命中：hf_trending_rank:18+watchlist_keyword:inference+nice_to_have:fine-tuning）

cs.DC

cs.GR

cs.LG

Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers — Agent 回合级信用分配的自诱发结果潜势。（命中：watchlist_keyword:reasoning,agent+nice_to_have:benchmark）
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior — 神经网络表征流形操控揭示共享几何。（命中：watchlist_keyword:reasoning,world model）
Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime — 二元奖励 RL 的 rollout 通过率调控。（命中：watchlist_keyword:reasoning,agent）

cs.NI

cs.RO

Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout — 驾驶员中心交通条件潜变量世界模型。（命中：watchlist_keyword:world model+nice_to_have:benchmark,evaluation）

cs.SD

cs.SE

eess.IV

eess.SY

math.CA

math.PR

math.ST

q-bio.NC

stat.ML

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。按硬性约束不再外部补抓，下次 S2 健康时回补。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人。本批 134 条候选的 affiliations 字段全部为空（HF Daily JSON 未附机构、arXiv listing 解析未拿到机构串），无法稳健做新人/机构归因；不为凑数硬塞。

📉 覆盖缺口与不确定性

s2_similar_unavailable：候选 JSON 未携带 similar_papers 字段，按硬性约束跳过外部补抓，延伸阅读段做空。
affiliations_missing_for_all_candidates：134/134 candidates affiliations=[]，机构层归因暂缓；新作者/团队段做空。
三源都活：paper_fetch.err 为空，arxiv / hf_daily / semantic_scholar 均成功返回。

来源与交叉验证说明

arXiv（primary）：用于锚定预印本 PDF / 分类 / 提交日期；本期 1 篇为 arXiv-only（2605.05126 ConsisVLA-4D）。
HF Daily（curated）：提供 trending 排名 + upvotes，做新鲜度信号；38/134 候选有 trending rank。
Semantic Scholar（metadata）：提供 s2_tldr 与跨源 paper-id；本期相似论文图为空，已记录在 coverage_gaps。
冲突优先级：primary > metadata > curated > other；所有 Top picks 至少在 arXiv 或 HF Daily 各自可独立检索。
三源交集：2605.05185 / 2605.05166 / 2605.03269（source=arxiv+hf+s2）三条置信度最高。

generated_at UTC：2026-05-07T15:32:59.730995+00:00

Hanzhi's BLOG

[论文·2026-05-07]