论文雷达日报|2026-05-23
一句话结论:今日论文层主线是『智能体 + 推理』双主题高度集中——HF 热度榜前列被 agentic planning / agent benchmark 占据,最强信号是 Eric P. Xing 团队的自调节模拟规划(综合分 8.3 居首)。
摘要
- 本期从 arXiv + HuggingFace Daily + Semantic Scholar 三源抓取 119 篇候选(按 ranking_score 降序),三源均正常返回。今日主线高度集中在『智能体 + 推理』:Top picks 8 篇中 6 篇命中 agent/reasoning 关键词,HF 热度榜前 7 名几乎被 agentic planning、agent benchmark、流式扩散生成占满。最强单点是 Eric P. Xing 团队的《Efficient Agentic Reasoning Through Self-Regulated Simulative Planning》(综合分 8.3 居首,HF 第7);系统侧亮点为 KVServe(解耦式服务的 KV 缓存通信压缩)。S2 富集稀疏:119 篇候选中仅极少数带 s2_tldr,且无任何相似论文图返回,故延伸阅读今日为空。候选 JSON 未携带 affiliations 字段,机构与作者归属为公开背景推断(cross_checked=false)。
- 候选规模:119 篇(arXiv + HF Daily + S2,三源均返回),Top picks 取前 8(ranking_score 降序)。
- 主题分布:
agent与reasoning关键词命中最密集;系统侧(KV 缓存 / 沙箱 / 调度)零散但有亮点。 - 缺口:S2 相似论文图全空 → 延伸阅读为空;候选无 affiliations 字段 → 机构/作者归属为推断。
📌 Top picks (交叉命中)
- Efficient Agentic Reasoning Through Self-Regulated Simulative Planning(HF 4↑,trending #7,hf_trending_rank:7/watchlist_keyword:reasoning,agent,world model;score 8.3) → 用自调节模拟规划让智能体学会何时与如何规划
- AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment(HF 4↑,trending #2,hf_trending_rank:2/watchlist_keyword:reasoning/nice_to_have:benchmark,evaluation;score 5.8) → 基于规则的奖励模型提升文生图对齐稳健性
- Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation(HF 4↑,trending #4,hf_trending_rank:4/watchlist_keyword:agent/nice_to_have:benchmark,evaluation;score 5.6) → 面向DRC脚本合成的LLM智能体大规模基准
- S2 tldr: Rule2DRC is introduced, a large-scale benchmark for DRC script coding agents with 1,000 rule-to-script tasks and 13,921 evaluation chip layouts for execution-based scoring and SplitTester, a tester agent for program selection that uses execution feedback to generate discriminative test cases and separate previously indistinguishable candidate scripts, substantially improving Best-of-N selection performance.
- Forecasting Scientific Progress with Artificial Intelligence(HF 33↑,trending #5,hf_trending_rank:5/watchlist_keyword:reasoning/nice_to_have:benchmark,evaluation;score 5.5) → 时序化评测框架检验AI能否预测科研进展
- Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators(HF 2↑,trending #3,hf_trending_rank:3/watchlist_keyword:inference/nice_to_have:fine-tuning;score 5.2) → 交互式流式扩散音乐生成的高效微调与后训练
- HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools(watchlist_keyword:agent,dpo;score 4.0) → 用技能优先框架统一HTTP接口与MCP工具注册
- KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving(HF 10↑,trending #37,watchlist_keyword:inference,kv cache;score 4.0) → 面向解耦式推理服务的自适应KV缓存通信压缩
- S2 tldr: KVServe is the first service-aware and adaptive KV communication compression framework for disaggregated LLM serving and unifies KV compression into a modular strategy space with new components and cross-method recomposition.
- Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning(HF 32↑,trending #45,watchlist_keyword:agent/nice_to_have:benchmark,fine-tuning,evaluation;score 3.5) → 用强化学习提升LLM智能体真实表格任务能力
🏷 Watchlist 分类命中
仅列本次新鲜抓取、命中 watchlist 关键词但未进 Top picks 的论文(每类≤4)。
cs.AI
- Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models(watchlist_keyword:agent/nice_to_have:benchmark,evaluation) → 结合LLM与声学模型做政治演讲多模态情感分析
- WorkstreamBench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance(watchlist_keyword:agent/nice_to_have:benchmark,evaluation) → 评测LLM智能体端到端处理金融表格任务
- AtelierEval: Agentic Evaluation of Humans & LLMs as Text-to-Image Prompters(watchlist_keyword:agent/nice_to_have:benchmark,evaluation) → 智能体化评测人与LLM作为文生图提示者
- Deep Reinforcement Learning for Flexible Job Shop Scheduling with Random Job Arrivals(watchlist_keyword:agent/nice_to_have:benchmark) → 用深度强化学习解带随机到达的柔性作业车间调度
cs.CL
- Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety(watchlist_keyword:agent/nice_to_have:benchmark,evaluation) → 多轮智能体安全的渐进式攻击基准
- Evaluating Commercial AI Chatbots as News Intermediaries(watchlist_keyword:reasoning/nice_to_have:evaluation) → 评估商用AI聊天机器人作为新闻中介的表现
- Tokenization with Split Trees(watchlist_keyword:inference) → 用分裂树重新设计分词以提升推理效率
- Self-Policy Distillation via Capability-Selective Subspace Projection(watchlist_keyword:reasoning) → 能力选择性子空间投影的自策略蒸馏
cs.CV
- MotiMotion: Motion-Controlled Video Generation with Visual Reasoning(watchlist_keyword:reasoning/nice_to_have:benchmark,evaluation) → 带视觉推理的运动可控视频生成
- Cambrian-P: Pose-Grounded Video Understanding(watchlist_keyword:reasoning/nice_to_have:benchmark) → 姿态对齐的视频理解模型
- SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation(watchlist_keyword:reasoning/nice_to_have:benchmark) → 用稀疏自编码器做可解释的推理分割对齐
- AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild(hf_trending_rank:9) → 几何感知、装置无关的野外人体运动建模
cs.LG
- MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data(watchlist_keyword:inference/nice_to_have:benchmark,evaluation) → 双向Mamba显式建模缺失数据做眼动认知负荷评估
- The Distillation Game: Adaptive Attacks & Efficient Defenses(watchlist_keyword:reasoning/nice_to_have:evaluation) → 蒸馏攻防的自适应攻击与高效防御
- Vector Policy Optimization: Training for Diversity Improves Test-Time Search(watchlist_keyword:inference) → 训练多样性以提升测试时搜索能力
- Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration(watchlist_keyword:agent) → 情景记忆与持久世界支撑3D探索
cs.RO
- Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning(watchlist_keyword:agent) → 多智能体强化学习实现超人级安全竞速
cs.OS
- DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback(watchlist_keyword:agent/nice_to_have:benchmark,evaluation) → 毫秒级沙箱检查点/回滚扩展有状态AI智能体
cs.SE
- Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents(watchlist_keyword:agent/nice_to_have:evaluation) → 面向企业AI智能体的合约式技能治理框架
🔗 延伸阅读 (Semantic Scholar 相似论文)
本段今日无高置信度增量信号(S2 相似论文未返回)。候选 JSON 未预取 similar_papers 字段,按 skill 约束不单独外查,extended_reading=[]。
🧑🔬 新出现的作者 / 团队
- Eric P. Xing(Carnegie Mellon University / MBZUAI (inferred),group: systems-labs,cross_checked=false)
- 今日最高分 Top pick《Efficient Agentic Reasoning…》资深作者,候选池出现2次,不在 tracked_authors
- 证据:https://arxiv.org/abs/2605.22138 , https://huggingface.co/papers/2605.22138
- James Zou(Stanford University (inferred),group: systems-labs,cross_checked=false)
- Top pick《Forecasting Scientific Progress with AI》作者,候选池出现2次,Stanford 背景但不在 tracked_authors
- 证据:http://arxiv.org/abs/2605.22681v1 , https://huggingface.co/papers/2605.22681
机构均由公开背景推断(候选 JSON 无 affiliations 字段),未交叉核验;作者发现不自动沉淀,需周末人工 review 后再加入 tracked_authors。
📉 覆盖缺口与不确定性
s2_similar_unavailableaffiliations_unavailable- s2_tldr_sparse: 119 篇候选中仅个别(如 Rule2DRC / KVServe)带 S2 tldr,多数 tldr_en 留空
- affiliations_missing: 候选 JSON 无 affiliations 字段,机构/作者归属为推断,未交叉核验
来源与交叉验证说明
- 锚点为 arXiv primary;HF Daily(curated) 提供 trending 信号与 upvotes,主导今日排序;Semantic Scholar(metadata) 仅零星补充 tldr,无相似论文图。结论锚在 arXiv 预印本本身,HF 热度仅作排序权重。
- 近 7 日无已落盘日报(recent_daily 为空),无历史去重冲突;Top picks 8 篇 seen_before 均为 false。tldr_cn 由 abstract/s2_tldr 浓缩翻译,tldr_en 仅在 S2 返回时拷贝、未自行翻译。