论文雷达日报｜2026-05-21

一句话结论：今日候选 142 篇里，KV cache 极致量化（OCTOPUS / OScaR / Mix-Quant）与 DPO/RLHF 偏好对齐两条主线各自出现 3 篇集中信号，KV 量化是当日最强的方法学聚类。

摘要

今日 142 篇候选中，推理 / 智能体 / 推理工程类（reasoning / agent / inference）合计命中 30+ 篇，占明显主导；其中 KV cache 量化同日出现 3 篇（OCTOPUS / OScaR / Mix-Quant），是最强的方法学集中信号。其次 DPO/RLHF 偏好优化层面同日出现 3 篇（含一个明确的 DPO≠RLHF 条件等价证明）。Semantic Scholar 仅对 6 篇候选完成富化，相似论文图未返回，因此延伸阅读今日空缺。
主线一（推理工程/KV cache 量化）：OCTOPUS（八面体三元组联合）、OScaR（per-channel 极致压缩）、Mix-Quant（FP4 预填充 + 精确解码），三者同日给出不同设计点，可作横向对比阅读。
主线二（偏好对齐理论）：2605.20834 给出 DPO≡RLHF 的条件等价证明并明确失败模式；2605.21266 提出 RLVR 离线 DPO 的信息性 rollout 训练，从理论与工程两面同时收紧。
主线三（智能体 / agentic 工程）：IndusAgent（工业异常检测 agent）+ Mix-Quant 把 agent 推理路径的「prefill 重负载」推到主舞台；CutVerse / DeepWeb-Bench / Pilot Audit 等多个 agent benchmark 同日出现，但置信度低于 Top picks。
主线四（驾驶 VLA 稳健性）：Lost in Fog（传感器扰动暴露推理脆弱性）+ DriveMA（重思 VLA 语言接口）同日出现，VLA 工程化层正在收紧。
S2 富化覆盖低：142 篇候选只有 6 篇拿到 S2 元数据，今日 tldr_en 多为空、citation_count 整体为 null。

📌 Top picks (交叉命中)

IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools（HF rank 15 · 33 upvotes）
- 作者：Rongbin Tan, Fangfang Lin, Zhenlong Yuan, Min Qiu 等
- 速读：工具增强智能体提升开放词汇工业异常检测
- 入选理由：HF 当日热榜命中（rank 15, 33 upvotes），同时命中 reasoning / agent / inference 三项 watchlist 关键词。
- 链接：HF
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs（HF rank 10 · 21 upvotes）
- 作者：Haiquan Lu, Zigeng Chen, Gongfan Fang, Xinyin Ma 等
- 速读：面向智能体LLM的相位感知量化加速预填充
- 入选理由：HF rank 10、21 upvotes，agent + quantization + inference 三关键词共振，量化推理是 watchlist 核心主题。
- 链接：HF
OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization（HF rank 12 · 3 upvotes）
- 作者：Mark Boss, Vikram Voleti, Simon Donné, Shimon Vainer
- 速读：三元组联合旋转实现极致KV缓存量化
- 入选理由：HF rank 12，KV cache 量化与今日同方向 OScaR、Mix-Quant 形成三连击。
- 链接：HF
Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment（HF rank 19 · 3 upvotes）
- 作者：Zhiqin Yang, Yonggang Zhang, Wei Xue, Dong Fang 等
- 速读：DPO与RLHF仅在隐含假设下条件等价
- 入选理由：HF rank 19，DPO/RLHF 等价性是偏好优化领域的根问题，理论结果明确给出失败条件。
- 链接：HF
How Much Online RL is Enough? Informative Rollouts for Offline Preference Optimization in RLVR
- 作者：Richa Verma, Balaraman Ravindran
- 速读：用信息性rollout为RLVR做离线偏好优化
- 入选理由：watchlist 命中 reasoning + dpo + preference optimization，是 RLVR 的离线降本路线。
- 链接：HF
OScaR: The Occam’s Razor for Extreme KV Cache Quantization in LLMs and Beyond（HF rank 25 · 35 upvotes）
- 作者：Zunhai Su, Rui Yang, Chao Zhang, Yaxiu Liu 等
- 速读：极致KV缓存量化的奥卡姆剃刀方案
- 入选理由：HF rank 25、35 upvotes，与 OCTOPUS 同主题但走 per-channel 路线，可对比阅读。
- 链接：HF
Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection（HF rank 16 · 2 upvotes · arXiv.org）
- 作者：Guanglong Sun, Siyuan Zhang, Liyuan Wang, Jun Zhu 等
- 速读：正交梯度投影缓解安全对齐税
- 入选理由：命中 HF rank 16 且为唯一 hf+s2 双源覆盖的 Top pick，S2 已给出 tldr，主题贴近 alignment tax。
- S2 tldr: OGPSA is a lightweight update rule that estimates a low-rank reference subspace from gradients on a small set of general-capability data and removes from each safety gradient the component lying in this subspace and is the steepest local safety-descent direction subject to first-order preservation constraints on the reference objectives.
- 链接：HF · S2
Generative Recursive Reasoning（HF rank 34 · 17 upvotes · tracked author: yoshua bengio）
- 作者：Junyeob Baek, Mingyu Jo, Minsu Kim, Mengye Ren 等
- 速读：生成式递归推理把潜状态推为概率轨迹
- 入选理由：watchlist 命中 reasoning + inference，Yoshua Bengio 共同作者触发 tracked_author 加分。
- 链接：HF

🏷 Watchlist 分类命中

量化 / KV cache

2605.21427 PALS: Power-Aware LLM Serving for Mixture-of-Experts Models — watchlist_keyword:moe,inference
2605.21264 FedCoE: Bridging Generalization and Personalization via Federated Coordinated Dual-level MoEs — watchlist_keyword:moe, nice_to_have:fine-tuning

智能体 / Agent benchmark

2605.19484 CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing — hf_trending_rank:9, watchlist_keyword:agent, nice_to_have:benchmark,evaluation
2605.21482 DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation — watchlist_keyword:reasoning,agent, nice_to_have:benchmark,evaluation
2605.21404 What Twelve LLM Agent Benchmark Papers Disclose About Themselves: A Pilot Audit and an Open Scoring Schema — watchlist_keyword:agent,inference, nice_to_have:benchmark,evaluation
2605.14747 Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining — hf_trending_rank:7, watchlist_keyword:agent, nice_to_have:benchmark

驾驶与 VLA

2605.21446 Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs — watchlist_keyword:reasoning,inference,vla
2605.21273 DriveMA: Rethinking Language Interfaces in Driving VLAs with One-Step Meta-Actions — watchlist_keyword:reasoning,inference,vla
2605.21414 PointACT: Vision-Language-Action Models with Multi-Scale Point-Action Interaction — watchlist_keyword:vla, nice_to_have:benchmark

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人（HF candidate JSON 不附机构 / 跟踪作者只命中 Yoshua Bengio 一位，已在 Top pick #8 计入）。

📉 覆盖缺口与不确定性

s2_similar_unavailable：候选 JSON 未预取 similar_papers 字段，本期不写延伸阅读。
s2_enrichment_partial：142 篇候选中仅 6 篇被 S2 富化（hf+s2），多数 Top picks 缺 tldr_en / citation 数据。
hf_affiliation_missing：HF Daily JSON 不附 affiliations，新作者机构判定降级为空。

来源与交叉验证说明

三源混合：arXiv 新预印本 94 篇（primary）、HF Daily 36 篇（curated trending）、HF+S2 交叉 6 篇（metadata 增强）、arXiv+HF 重合 6 篇。结论统一锚定 arXiv 预印本；HF trending 仅作策展信号，不作为论文结果证据。

所有 Top picks 的 arxiv_url 均来自抓取候选 JSON 原字段，未做二次网络访问。tldr_cn 由智能体根据 abstract（或 s2_tldr，仅适用于 2602.07892）一句话浓缩，未自行翻译或臆造结论。

Hanzhi's BLOG

[论文·2026-05-21]