论文雷达日报｜2026-05-23

一句话结论：今日论文层主线是『智能体 + 推理』双主题高度集中——HF 热度榜前列被 agentic planning / agent benchmark 占据，最强信号是 Eric P. Xing 团队的自调节模拟规划（综合分 8.3 居首）。

摘要

本期从 arXiv + HuggingFace Daily + Semantic Scholar 三源抓取 119 篇候选（按 ranking_score 降序），三源均正常返回。今日主线高度集中在『智能体 + 推理』：Top picks 8 篇中 6 篇命中 agent/reasoning 关键词，HF 热度榜前 7 名几乎被 agentic planning、agent benchmark、流式扩散生成占满。最强单点是 Eric P. Xing 团队的《Efficient Agentic Reasoning Through Self-Regulated Simulative Planning》（综合分 8.3 居首，HF 第7）；系统侧亮点为 KVServe（解耦式服务的 KV 缓存通信压缩）。S2 富集稀疏：119 篇候选中仅极少数带 s2_tldr，且无任何相似论文图返回，故延伸阅读今日为空。候选 JSON 未携带 affiliations 字段，机构与作者归属为公开背景推断（cross_checked=false）。
候选规模：119 篇（arXiv + HF Daily + S2，三源均返回），Top picks 取前 8（ranking_score 降序）。
主题分布：agent 与 reasoning 关键词命中最密集；系统侧（KV 缓存 / 沙箱 / 调度）零散但有亮点。
缺口：S2 相似论文图全空 → 延伸阅读为空；候选无 affiliations 字段 → 机构/作者归属为推断。

📌 Top picks (交叉命中)

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning（HF 4↑，trending #7，hf_trending_rank:7/watchlist_keyword:reasoning,agent,world model；score 8.3） → 用自调节模拟规划让智能体学会何时与如何规划
AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment（HF 4↑，trending #2，hf_trending_rank:2/watchlist_keyword:reasoning/nice_to_have:benchmark,evaluation；score 5.8） → 基于规则的奖励模型提升文生图对齐稳健性
Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation（HF 4↑，trending #4，hf_trending_rank:4/watchlist_keyword:agent/nice_to_have:benchmark,evaluation；score 5.6） → 面向DRC脚本合成的LLM智能体大规模基准
- S2 tldr: Rule2DRC is introduced, a large-scale benchmark for DRC script coding agents with 1,000 rule-to-script tasks and 13,921 evaluation chip layouts for execution-based scoring and SplitTester, a tester agent for program selection that uses execution feedback to generate discriminative test cases and separate previously indistinguishable candidate scripts, substantially improving Best-of-N selection performance.
Forecasting Scientific Progress with Artificial Intelligence（HF 33↑，trending #5，hf_trending_rank:5/watchlist_keyword:reasoning/nice_to_have:benchmark,evaluation；score 5.5） → 时序化评测框架检验AI能否预测科研进展
Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators（HF 2↑，trending #3，hf_trending_rank:3/watchlist_keyword:inference/nice_to_have:fine-tuning；score 5.2） → 交互式流式扩散音乐生成的高效微调与后训练
HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools（watchlist_keyword:agent,dpo；score 4.0） → 用技能优先框架统一HTTP接口与MCP工具注册
KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving（HF 10↑，trending #37，watchlist_keyword:inference,kv cache；score 4.0） → 面向解耦式推理服务的自适应KV缓存通信压缩
- S2 tldr: KVServe is the first service-aware and adaptive KV communication compression framework for disaggregated LLM serving and unifies KV compression into a modular strategy space with new components and cross-method recomposition.
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning（HF 32↑，trending #45，watchlist_keyword:agent/nice_to_have:benchmark,fine-tuning,evaluation；score 3.5） → 用强化学习提升LLM智能体真实表格任务能力

🏷 Watchlist 分类命中

仅列本次新鲜抓取、命中 watchlist 关键词但未进 Top picks 的论文（每类≤4）。

cs.AI

Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models（watchlist_keyword:agent/nice_to_have:benchmark,evaluation） → 结合LLM与声学模型做政治演讲多模态情感分析
WorkstreamBench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance（watchlist_keyword:agent/nice_to_have:benchmark,evaluation） → 评测LLM智能体端到端处理金融表格任务
AtelierEval: Agentic Evaluation of Humans & LLMs as Text-to-Image Prompters（watchlist_keyword:agent/nice_to_have:benchmark,evaluation） → 智能体化评测人与LLM作为文生图提示者
Deep Reinforcement Learning for Flexible Job Shop Scheduling with Random Job Arrivals（watchlist_keyword:agent/nice_to_have:benchmark） → 用深度强化学习解带随机到达的柔性作业车间调度

cs.CL

Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety（watchlist_keyword:agent/nice_to_have:benchmark,evaluation） → 多轮智能体安全的渐进式攻击基准
Evaluating Commercial AI Chatbots as News Intermediaries（watchlist_keyword:reasoning/nice_to_have:evaluation） → 评估商用AI聊天机器人作为新闻中介的表现
Tokenization with Split Trees（watchlist_keyword:inference） → 用分裂树重新设计分词以提升推理效率
Self-Policy Distillation via Capability-Selective Subspace Projection（watchlist_keyword:reasoning） → 能力选择性子空间投影的自策略蒸馏

cs.CV

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning（watchlist_keyword:reasoning/nice_to_have:benchmark,evaluation） → 带视觉推理的运动可控视频生成
Cambrian-P: Pose-Grounded Video Understanding（watchlist_keyword:reasoning/nice_to_have:benchmark） → 姿态对齐的视频理解模型
SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation（watchlist_keyword:reasoning/nice_to_have:benchmark） → 用稀疏自编码器做可解释的推理分割对齐
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild（hf_trending_rank:9） → 几何感知、装置无关的野外人体运动建模

cs.LG

MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data（watchlist_keyword:inference/nice_to_have:benchmark,evaluation） → 双向Mamba显式建模缺失数据做眼动认知负荷评估
The Distillation Game: Adaptive Attacks & Efficient Defenses（watchlist_keyword:reasoning/nice_to_have:evaluation） → 蒸馏攻防的自适应攻击与高效防御
Vector Policy Optimization: Training for Diversity Improves Test-Time Search（watchlist_keyword:inference） → 训练多样性以提升测试时搜索能力
Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration（watchlist_keyword:agent） → 情景记忆与持久世界支撑3D探索

cs.RO

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning（watchlist_keyword:agent） → 多智能体强化学习实现超人级安全竞速

cs.OS

DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback（watchlist_keyword:agent/nice_to_have:benchmark,evaluation） → 毫秒级沙箱检查点/回滚扩展有状态AI智能体

cs.SE

Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents（watchlist_keyword:agent/nice_to_have:evaluation） → 面向企业AI智能体的合约式技能治理框架

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。候选 JSON 未预取 similar_papers 字段，按 skill 约束不单独外查，extended_reading=[]。

🧑‍🔬 新出现的作者 / 团队

Eric P. Xing（Carnegie Mellon University / MBZUAI (inferred)，group: systems-labs，cross_checked=false）
- 今日最高分 Top pick《Efficient Agentic Reasoning…》资深作者，候选池出现2次，不在 tracked_authors
- 证据：https://arxiv.org/abs/2605.22138 , https://huggingface.co/papers/2605.22138
James Zou（Stanford University (inferred)，group: systems-labs，cross_checked=false）
- Top pick《Forecasting Scientific Progress with AI》作者，候选池出现2次，Stanford 背景但不在 tracked_authors
- 证据：http://arxiv.org/abs/2605.22681v1 , https://huggingface.co/papers/2605.22681

机构均由公开背景推断（候选 JSON 无 affiliations 字段），未交叉核验；作者发现不自动沉淀，需周末人工 review 后再加入 tracked_authors。

📉 覆盖缺口与不确定性

s2_similar_unavailable
affiliations_unavailable
s2_tldr_sparse: 119 篇候选中仅个别（如 Rule2DRC / KVServe）带 S2 tldr，多数 tldr_en 留空
affiliations_missing: 候选 JSON 无 affiliations 字段，机构/作者归属为推断，未交叉核验

来源与交叉验证说明

锚点为 arXiv primary；HF Daily(curated) 提供 trending 信号与 upvotes，主导今日排序；Semantic Scholar(metadata) 仅零星补充 tldr，无相似论文图。结论锚在 arXiv 预印本本身，HF 热度仅作排序权重。
近 7 日无已落盘日报（recent_daily 为空），无历史去重冲突；Top picks 8 篇 seen_before 均为 false。tldr_cn 由 abstract/s2_tldr 浓缩翻译，tldr_en 仅在 S2 返回时拷贝、未自行翻译。

Hanzhi's BLOG

[论文·2026-05-23]