[论文·2026-05-07]

论文雷达日报|2026-05-07

一句话结论:当日 reasoning + agent 双主线最厚(top30 各 15 次),三源交集的 OpenSearch-VL / First-Token / RLDX-1 是最强信号;S2 相似图与 affiliations 双双缺失,延伸阅读与新作者段做空。

摘要

  • 候选池规模:134 篇(arXiv + HF Daily + S2 三源去重后),按 ranking_score 已排序。
  • 主线信号:top30 中 reasoning 15 次、agent 15 次、inference 9 次、world model 5 次、VLA 2 次、MoE 1 次。
  • 三源交集:2605.05185 OpenSearch-VL2605.05166 First-Token2605.03269 RLDX-1 同时在 arXiv + HF Daily + S2 出现。
  • HF Daily 头部:trending #1 SWE-WebDevBench、#2 First-Token、#3 MiniCPM-o 4.5、#5 When-to-Think、#7 ResRL、#10 OpenSearch-VL(77↑)、#13 JoyAI-Image、#15 RLDX-1(80↑)。
  • 降级标记:s2_similar_unavailable + 全候选 affiliations 为空 → 延伸阅读 / 新作者两段显式做空。

📌 Top picks (交叉命中)

ranking_score 降序,cap 8。

  • When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning(HF 1↑;trending #5 · score=7.5) → 并行推理决定何时输出,平衡延迟与正确率。
    • 理由:HF trending #5;命中 watchlist 关键词 reasoning/MoE,含 benchmark+SFT。
    • tldr_en:This work introduces Side-by-Side (SxS) Interleaved Reasoning, which makes disclosure timing a controllable decision within standard autoregressive generation and improves accuracy–content-latency Pareto trade-offs under token-level proxies such as inter-update waiting.
    • 链接:HF · S2
  • OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents(HF 77↑;trending #10 · score=7.0) → 开源多模态搜索智能体,引入容错 GRPO 训练算法。
    • 理由:arXiv+HF+S2 三源交集;HF trending #10、77 票;命中 reasoning/agent。
    • tldr_en:This work introduces OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning and proposes a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping.
    • 链接:HF · S2
  • ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning(HF 3↑;trending #7 · score=6.8) → 残差 RL 同时提升推理与生成多样性。
    • 理由:HF trending #7;命中 reasoning/agent,跨 12 个数学/代码/Agent benchmark。
    • tldr_en:Positive sample projection Residual Reinforcement Learning (ResRL) is proposed that decouples similar semantic distributions among positive and negative responses, improving reasoning while preserving diversity and outperforming strong baselines on average across twelve benchmarks spanning Mathematics, Code, Agent Tasks, and Function Calling.
    • 链接:HF · S2
  • ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation(arXiv 新提交 · score=6.5) → 面向机器人操作的 4D 时空一致 VLA 框架。
    • 理由:命中 reasoning/inference/VLA 三关键词;新 cs.RO 提交,引入 CS-Thinker。
    • tldr_en:This work proposes ConsisVLA-4D, a unified and efficient framework that enhances spatiotemporal consistency in 3D perception and 4D reasoning and introduces CS-Thinker to achieve cross-scene spatiotemporal consistency as actions unfold.
    • 链接:HF · S2
  • RLDX-1 Technical Report(HF 80↑;trending #15 · score=6.5) → 多流动作 Transformer 统一灵巧操作策略。
    • 理由:HF trending #15、80 票;命中 inference/VLA,主打 MSAT 架构。
    • tldr_en:This work introduces RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention.
    • 链接:HF · S2
  • Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation(HF 7↑;trending #13 · score=6.2) → 统一理解、生成、编辑的空间智能模型。
    • 理由:HF trending #13;命中 reasoning + world model,提出 JoyAI-Image。
    • tldr_en:The bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence.
    • 链接:HF · S2
  • SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies(HF 2↑;trending #1 · score=5.9) → 面向 vibe coding 平台的端到端 agent 基准。
    • 理由:HF trending #1;命中 agent,定位「虚拟软件公司」评测。
    • tldr_en:SWE-WebDev Bench is released as a community benchmark to enable larger-scale replication to establish generality and help platform builders identify and address four recurring shortcomings in the current generation of AI app builders.
    • 链接:HF · S2
  • The First Token Knows: Single-Decode Confidence for Hallucination Detection(trending #2 · score=5.3) → 首 token 熵即可媲美自一致性的幻觉检测。
    • 理由:三源交集;HF trending #2;命中 inference,省 N×解码成本。
    • tldr_en:First-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first content-bearing answer token of a single greedy decode, matches or modestly exceeds semantic self-consistency on closed-book short-answer factual question answering.
    • 链接:HF · S2

🏷 Watchlist 分类命中

按 arXiv 主分类聚合,剔除已进 Top picks 的论文,每类上限 4。

cond-mat.mtrl-sci

cs.AI

cs.AR

cs.CL

cs.CR

cs.CV

cs.DC

cs.GR

cs.LG

cs.NI

cs.RO

cs.SD

cs.SE

eess.IV

eess.SY

math.CA

math.PR

math.ST

q-bio.NC

stat.ML

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号(S2 相似论文未返回)。按硬性约束不再外部补抓,下次 S2 健康时回补。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人。本批 134 条候选的 affiliations 字段全部为空(HF Daily JSON 未附机构、arXiv listing 解析未拿到机构串),无法稳健做新人/机构归因;不为凑数硬塞。

📉 覆盖缺口与不确定性

  • s2_similar_unavailable:候选 JSON 未携带 similar_papers 字段,按硬性约束跳过外部补抓,延伸阅读段做空。
  • affiliations_missing_for_all_candidates:134/134 candidates affiliations=[],机构层归因暂缓;新作者/团队段做空。
  • 三源都活:paper_fetch.err 为空,arxiv / hf_daily / semantic_scholar 均成功返回。

来源与交叉验证说明

  • arXiv(primary):用于锚定预印本 PDF / 分类 / 提交日期;本期 1 篇为 arXiv-only(2605.05126 ConsisVLA-4D)。
  • HF Daily(curated):提供 trending 排名 + upvotes,做新鲜度信号;38/134 候选有 trending rank。
  • Semantic Scholar(metadata):提供 s2_tldr 与跨源 paper-id;本期相似论文图为空,已记录在 coverage_gaps。
  • 冲突优先级:primary > metadata > curated > other;所有 Top picks 至少在 arXiv 或 HF Daily 各自可独立检索。
  • 三源交集:2605.05185 / 2605.05166 / 2605.03269(source=arxiv+hf+s2)三条置信度最高。

generated_at UTC:2026-05-07T15:32:59.730995+00:00