论文雷达日报｜2026-05-15

一句话结论：今日候选 134 篇，主线为「Agent / Reasoning 长记忆 × 推理路径」与「推理基建（MoE 路由、Speculative Decoding 延迟、长视频世界模型）」交叉发力——HF Daily 把 STALE（智能体记忆陈旧化探针）和 SANA-WM（2.6B 分钟级世界模型）顶上去，arXiv 侧 Speculative Decoding 与 MoE 推理优化形成稳定的系统层底座。

摘要

智能体记忆类工作集中爆发：STALE / MemEye / MemLens 三篇分别从「belief 陈旧化」「视觉中心多模态记忆」「长期多模态记忆 benchmark」三个角度切入，提示 long-horizon agent memory 已成为下一波 benchmark 内卷主线。
推理 / 测试时计算优化形成密集集群：OpenDeepThink（并行 BT 聚合）、Dual-Dimensional Consistency（自适应预算-质量）、Closed-Loop Visual Reasoning（闭环验证 T2I）三篇都把「reasoning + verification + budget」组合成新的设计模式。
推理基建侧出现连续工作：BEAM（MoE 二值激活掩码）、Interpretable SD Latency Model（生产环境投机解码）与 Performance-Driven Speculative Decoding 政策优化呈现「方法层 + 系统层 + 调度层」三层呼应。
世界模型与 VLA 同日推进：SANA-WM 用 hybrid linear attention 把分钟级 720p 视频世界模型压到 2.6B；Pace-and-Path Correction 给 VLA 提供无训练的时序动态补偿，二者在 cs.CV / robotics 接缝处合流。
Self-Evolving Agentic Post-Training（RewardHarness）把奖励建模重构为上下文演化而非权重优化，是少有的直击「reward model 数据效率瓶颈」的方法工作。

📌 Top picks (交叉命中)

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?（HF 37 upvotes / hf_trending_rank:21 / watchlist:reasoning+agent+inference）→ 三维探针测 LLM 记忆陈旧化与隐式更新能力
RewardHarness: Self-Evolving Agentic Post-Training（hf_trending_rank:13 / watchlist:reasoning+agent）→ 上下文演化替代微调的自演化奖励框架
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer（HF 46 upvotes / hf_trending_rank:28 / watchlist:world model+inference）→ 2.6B 混合线性注意力分钟级 720p 世界模型
BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE（hf_trending_rank:7 / watchlist:moe+inference）→ 可训练二值掩码实现 MoE token 自适应路由
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models（hf_trending_rank:14 / watchlist:vla+inference）→ 无训练 Pace-and-Path 补偿 VLA 时序盲点
An Interpretable Latency Model for Speculative Decoding in LLM Serving（cs.LG/cs.PF / watchlist:speculative decoding+inference+moe）→ 面向生产 SD 服务的可解释推理延迟模型
COTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completion（cs.CL/cs.AI / watchlist:reasoning+agent）→ 概率思维链补全实现临床预防性问诊
Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning（hf_trending_rank:17 / watchlist:reasoning+inference）→ 闭环验证推理框架解锁复杂文生图

🏷 Watchlist 分类命中

cs.CL

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Window（watchlist:speculative decoding+inference）→ 自适应窗口的投机解码策略优化
Improving Multi-turn Dialogue Consistency with Self-Recall Thinking（watchlist:reasoning+inference）→ 自回忆思维提升多轮对话一致性
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Use（watchlist:agent）→ 文转语音工具使用的可复现评测框架
ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World（watchlist:inference）→ 面向多语言的包容性高效 embedding

cs.LG

FutureSim: Replaying World Events to Evaluate Adaptive Agents（watchlist:reasoning+agent）→ 回放世界事件评估自适应智能体
Boosting RL with Verifiable Rewards via Randomly Selected Feedback（hf_trending_rank:1 / nice_to_have:benchmark+sft）→ 随机选择反馈强化可验证奖励 RL
Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution（watchlist:quantization）→ 量化路径下的电路归因永久遗忘
Natural Synthesis: Outperforming Reactive Synthesis Tools with Large Reasoning Models（watchlist:reasoning）→ 大推理模型超越传统反应式综合工具

cs.CV

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both（watchlist:reasoning+agent）→ 一词触发的智能体/潜空间视觉推理切换
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory（watchlist:reasoning+agent）→ 视觉中心的多模态智能体记忆评测
RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO（watchlist:inference+dpo）→ 一致性模型 GRPO 的实时视频自回归外推
LATERN: Test-Time Context-Aware Explainable Video Anomaly Detection（watchlist:reasoning+inference）→ 测试时上下文感知的可解释视频异常检测

cs.AI

OpenDeepThink: Parallel Reasoning via Bradley–Terry Aggregation（watchlist:reasoning+test-time compute）→ Bradley–Terry 聚合驱动的并行推理
Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference（watchlist:reasoning+inference）→ 自适应推理预算-质量双维一致性
Orchard: An Open-Source Agentic Modeling Framework（watchlist:reasoning+agent）→ 开源智能体建模框架
APWA: A Distributed Architecture for Parallelizable Agentic Workflows（watchlist:reasoning+agent）→ 可并行化智能体工作流分布式架构

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人。

📉 覆盖缺口与不确定性

s2_similar_unavailable：候选 JSON 内 Semantic Scholar 未返回 similar_papers 字段，延伸阅读段无法跨论文扩散，仅展示同日 watchlist 命中。
部分 HF 上榜论文仍未被 arXiv 元数据系统补足 categories 与 affiliations（Top picks 中 1/2/3/4/5/8 条），机构信号暂缺；后续 24-48h 会随 arXiv 索引完成回填。
新作者发现今日为空：候选 JSON 的 affiliations 缺失叠加 arXiv 元数据延迟，导致 tracked_affiliations 规则今日全部不命中，并非真的没有新人。

来源与交叉验证说明

本期源混合：arXiv（primary, 预印本与方法）+ HuggingFace Daily Papers（curated, 社区 trending 信号）+ Semantic Scholar（metadata, 引用与 tldr 增强）。Top picks 全部由 arXiv 提供 ground truth，HF 仅作 trending 排序加权；S2 在 1/8 Top pick（STALE）回填 s2_tldr，其余 Top picks 的 s2_url / s2_tldr 缺失，故所有方法层结论严格锚定 arXiv abstract。本期未触发 arxiv_unavailable / hf_daily_unavailable / s2_unavailable 任何单源降级 slug，但 S2 的相似论文图未返回，已记入 coverage_gaps。

Hanzhi's BLOG

[论文·2026-05-15]