论文雷达日报｜2026-05-14

一句话结论：agent / reasoning 与 inference / serving 两条主线在 HF Daily 当日榜单同时放电——PersonalAI 2.0 / RealICU / FrameSkip 占据 trending 前列；inference 侧 Attention Once Is All You Need 用持久 KV 对 vLLM 系列喊出 5.9× 加速，并出现一篇 position paper 主张把 joules/token 写进 inference benchmark。

摘要

HF Daily trending 前 10 名里有 5 篇直接进 Top picks（PersonalAI 2.0 #6 / RealICU #5 / FrameSkip #8 / PNAPO #9 / MemReread #15），agent-reasoning 主题密度异常高。
inference / serving 侧出现三条互补叙事：Attention Once Is All You Need 推 streaming-only 引擎、MinT 推百万级 LoRA 服务化、FlowCompile 推 workflow compile-time 优化。
位置 paper《LLM Inference Should Be Evaluated as Energy-to-Token Production》把 KV 压缩 / 量化 / routing 重新框成「能效杠杆」，对 inference benchmark 评测口径下战书。
VLA / embodied 也有持续高密度产出（FrameSkip + GTA-VLA + Realtime-VLA FLASH + DAWN），主线从架构迁向「数据帧/推理调度/世界模型闭环」三个 lever。
Semantic Scholar 当日无相似论文返回，延伸阅读段降级；HF 候选无 affiliations 字段，新作者扫描跳过。

📌 Top picks (交叉命中)

PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents

arxiv:2605.13481 · HF #6、1 upvotes、score 7.4 · 命中：hf_trending_rank:6, watchlist_keyword:reasoning,agent, nice_to_have:benchmark,evaluation

中文速读：知识图谱遍历驱动的 GraphRAG agent，6 个基准上 SOTA。
入选理由：hf_trending:6 + watchlist:reasoning/agent + benchmark/evaluation 三命中，且自报在 6 个 RAG benchmark 上击败 LightRAG / RAPTOR / HippoRAG2。
链接：https://arxiv.org/abs/2605.13481 · https://huggingface.co/papers/2605.13481

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

arxiv:2605.13542 · HF #5、3 upvotes、score 7.0 · 命中：hf_trending_rank:5, watchlist_keyword:reasoning,agent, nice_to_have:benchmark

中文速读：MIMIC-IV 上的事后标注 ICU 推理基准，曝光 LLM 召回/锚定偏差。
入选理由：hf_trending:5 + watchlist:reasoning/agent，把临床决策从模仿历史动作改成 hindsight 标注，方法层还配套 ICU-Evo 结构化记忆 agent。
链接：https://arxiv.org/abs/2605.13542 · https://huggingface.co/papers/2605.13542

FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

arxiv:2605.13757 · HF #8、19 upvotes、score 6.7 · 命中：hf_trending_rank:8, watchlist_keyword:inference,vla, nice_to_have:benchmark

中文速读：VLA 训练帧筛选层，20% 帧把三基准平均成功率从 66.5 拉到 76.2。
入选理由：hf_trending:8 + watchlist:inference/vla，纯 dataloader 改动不动架构、不改推理流程，跨 RoboCasa-GR1 / SimplerEnv / LIBERO 验证。
链接：http://arxiv.org/abs/2605.13757v1 · https://huggingface.co/papers/2605.13757

Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers

arxiv:2605.13784 · score 6.5 · 命中：watchlist_keyword:inference,kv cache,scheduler, nice_to_have:benchmark

中文速读：持久 KV 流式推理引擎，query 延迟与上下文长度解耦，比 vLLM 等快 5.9×。
入选理由：watchlist:inference/kv cache/scheduler 三命中，提出有状态会话 + Flash Queries 预算抢空 GPU 周期，配 cell-budget 多租户调度器。
链接：http://arxiv.org/abs/2605.13784v1 · https://huggingface.co/papers/2605.13784

FlowCompile: An Optimizing Compiler for Structured LLM Workflows

arxiv:2605.13647 · score 6.5 · 命中：watchlist_keyword:reasoning,agent,inference, nice_to_have:benchmark

中文速读：把 LLM workflow 当编译目标，离线产出多档延迟/准确度配置，最高 6.4× 提速。
入选理由：watchlist:reasoning/agent/inference，从 routing 推到 compile-time DSE 的视角切换，给出可复用配置集而不是单点路由。
链接：http://arxiv.org/abs/2605.13647v1 · https://huggingface.co/papers/2605.13647

Position: LLM Inference Should Be Evaluated as Energy-to-Token Production

arxiv:2605.11733 · HF #38、2 upvotes、score 6.5 · 命中：watchlist_keyword:reasoning,quantization,inference, nice_to_have:benchmark

中文速读：主张以 joules/token 评测推理，把 KV 压缩与量化并入能效杠杆。
入选理由：watchlist:reasoning/quantization/inference，position paper 级别号召，给 inference benchmark 增加 PUE 调整后的功耗维度。
链接：https://arxiv.org/abs/2605.11733 · https://huggingface.co/papers/2605.11733

Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs

arxiv:2605.09433 · HF #9、6 upvotes、score 6.1 · 命中：hf_trending_rank:9, watchlist_keyword:dpo,preference optimization

中文速读：RF 模型的 DPO：保留先验噪声对，训练算力下降仍提升对齐指标。
入选理由：hf_trending:9 + watchlist:dpo/preference optimization，专为 rectified flow 的直线轨迹特性设计 prior-noise-aware 对齐范式。
链接：https://arxiv.org/abs/2605.09433 · https://huggingface.co/papers/2605.09433

MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading

arxiv:2605.10268 · HF #15、2 upvotes、score 5.5 · 命中：hf_trending_rank:15, watchlist_keyword:reasoning,agent

中文速读：流式阅读 + RL 触发重读的长上下文 agent，绕开 retrieval 仍恢复证据。
入选理由：hf_trending:15 + watchlist:reasoning/agent，对 memory-while-reading 范式补刀，主打 question decomposition 触发重读 + 线性时间。
链接：https://arxiv.org/abs/2605.10268 · https://huggingface.co/papers/2605.10268

🏷 Watchlist 分类命中

agent / reasoning（4 篇）

SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation arxiv:2605.10376 — 3-tier 室内外 VLN 基准 SleepWalk，验证 VLM 在 3D 空间下的指令执行短板。（watchlist:reasoning/agent，2,472 个 3D 场景 + 三档难度，曝光遮挡 / 多步指令下的接地失败。）
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution arxiv:2605.09942 — HAGE：把 agent 记忆建为带 RL 训练的加权关系图，长跨度推理更稳。（watchlist:reasoning/agent + RL，把 memory retrieval 升级成 query-conditioned 多关系图遍历。）
Harnessing Agentic Evolution arxiv:2605.13821 — AEvo：让 meta-agent 编辑驱动 evolution 的 procedure，而不是直接出候选。（watchlist:reasoning/agent + benchmark/evaluation，对比 5 个 evolution baseline 平均 +26%。）
MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning arxiv:2605.13037 — MAP：先映射环境再行动，ARC-AGI-3 上 25 个游戏环境里 22 个从近 0 拉起。（hf_trending:29 + watchlist:reasoning/agent，仿 cognitive map 理论的 plug-and-play paradigm。）

inference / serving（4 篇）

MinT: Managed Infrastructure for Training and Serving Millions of LLMs arxiv:2605.13779 — MinT (MindLab)：百万级 LoRA 适配器训/服基础设施，验证到 1T 参数 MoE。（hf_trending:21 + watchlist:moe/distributed training，MLA/DSA attention path，serving 实测加速 8.5×。）
FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation arxiv:2605.04651 — FAAST：闭式 fast-weight 的 forward-only 适配，节省 90% 适配时间。（hf_trending:2 + watchlist:inference，对比 backprop 训练显著省时，常数时间推理。）
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs arxiv:2605.13778 — Realtime-VLA FLASH：扩散 VLA 的 speculative inference，LIBERO 提速 3.04×。（watchlist:inference/vla，draft-verify + phase-aware fallback，把 58ms 全推理大多数替换为 7.8ms 草稿。）
F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking arxiv:2605.12995 — F-GRPO：把候选生成与排序统一进单 autoregressive pass 用 GRPO 联合训。（hf_trending:13 + watchlist:inference，两阶段 group-relative advantage 解决 credit assignment。）

vla / embodied（2 篇）

Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models arxiv:2605.13632 — GTA-VLA：接受 affordance/box/trace 人工先验的可交互 VLA CoT。（watchlist:reasoning/vla + benchmark/embodied，SimplerEnv WidowX 上 81.2% SOTA。）
The DAWN of World-Action Interactive Models arxiv:2605.11550 — DAWN：World Predictor + Action Denoiser 互相喂入的 latent 世界-动作模型。（watchlist:inference/world model，自动驾驶基准上规划成绩对标 WAM baseline。）

long-context / ICL（1 篇）

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn arxiv:2605.13511 — Many-Shot CoT-ICL：reasoning 模型才 scale，相似度检索在推理任务失效。（hf_trending:22 + watchlist:reasoning/long context，给出 CDS demonstration 排序方法 +5.42pp。）

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。Top picks 里仅 FAAST 一条带 s2_url 但 similar_papers 字段为 None，无法构造非空延伸阅读列表；已在 coverage_gaps 写入 s2_similar_unavailable。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人。HuggingFace Daily 抓回的 145 条候选 affiliations 字段全部为空数组，arXiv API 也未附机构信息，因此 paper_groups_seed.yaml 中 frontier-labs / oss-ai-labs / robotics-labs / systems-labs 四组匹配规则全部跳过。tracked_authors 名单同理未触发——本批次没有任何作者名直接命中 watchlist。

📉 覆盖缺口与不确定性

s2_similar_unavailable — Semantic Scholar Graph API 在 fetch 阶段未为 Top picks 返回 similar_papers，本日延伸阅读段空。
affiliations_absent_in_hf_metadata — HF Daily 抓取链路里候选机构字段全空，新作者 / 机构发现脚本无可消费证据，主动跳过。

来源与交叉验证说明

今日 arXiv + HuggingFace Daily 双源正常返回 145 条候选，Semantic Scholar 仅返回 4 条带 s2_url 的元数据且 similar_papers 全空——延伸阅读段落降级为空，已写入 coverage_gaps。

所有 Top picks 结论锚定 arXiv abstract（primary 源），HF trending 排名仅作热度参考、不当作结果证据；tldr_cn 由 abstract 压缩翻译，未引用 S2 tldr（本批次为空）。

Hanzhi's BLOG

[论文·2026-05-14]