论文雷达日报｜2026-05-02

一句话结论：今日 105 条候选锁定两条主线——computer-use / 工具型 agent 的成本与可靠性，以及 RL 训练动力学与对齐机理；S2 similar_papers 与 affiliation 同步缺失，延伸阅读和新作者发现两段降级。

摘要

今日 105 条候选，三源（arXiv / HF Daily / Semantic Scholar）全部正常召回：105 篇拿到 S2 paper_id、98 篇带 s2_tldr，HF Daily 命中 32 篇 trending。Top picks 集中在 computer-use agent 成本（Step-level 双监控）、工具型 agent 可靠性（FAMA、Claw-Eval-Live）、RL 训练动力学（Exploration Hacking、Length Value Model）与方法论（CARE），并夹一篇 NVIDIA Nemotron 3 Nano Omni 多模态开模和 MoCapAnything V2 端到端任意骨架 motion capture。S2 similar_papers 字段全部未返回，延伸阅读段降级；HF Daily 与 arXiv 抓取在本期 affiliation 字段均为空，新作者发现段无法机构验证。seen-pool 14 天窗口含 235 条，今日候选无 seen_before 命中，trending 列表整体新鲜。

📌 Top picks (交叉命中)

Step-level Optimization for Efficient Computer-use Agents
- 作者：Jinbiao Wei, Kangqi Ni, Yilun Zhao, Guo Gan, Arman Cohan
- 信号：HF upvotes=9 · HF rank=2 · score=9.3 · reasons=hf_trending_rank:2; watchlist_keyword:reasoning,agent,inference; nice_to_have:benchmark
- 中文速读：用 Stuck/Milestone 双监控降本提速 GUI agent
- 入选理由：ranking_score 9.3（HF trending #2 + watchlist 命中 reasoning/agent/inference）；用 Stuck Monitor 检测停滞 + Milestone Monitor 触发稀疏校验，把大模型调用从「每步」降成「关键步」，是 computer-use agent 成本/延迟优化的首选阅读。
- S2 TLDR：This framework combines two complementary signals: a Stuck Monitor that detects degraded progress from recent reasoning-action history and triggers recovery, and a Milestone Monitor that identifies semantically meaningful checkpoints where sparse verification is most informative for catching drift.
- 补充链接：https://huggingface.co/papers/2604.27151 · https://www.semanticscholar.org/paper/447343e3d9c7d3d02bc681f6dc50b6eb7bc5a66e
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence
- 作者：NVIDIA, Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, 等
- 信号：HF upvotes=14 · HF rank=1 · score=6.9 · reasons=hf_trending_rank:1; watchlist_keyword:agent,inference
- 中文速读：NVIDIA Nemotron 3 Nano Omni：原生支持音频的多模态开模
- 入选理由：HF trending #1（14 upvotes），watchlist 命中 agent/inference；NVIDIA 自研多模态系列把音频纳入原生输入，宣称在长音视频与 agentic 计算机使用任务上领先，可作为开权重多模态对齐与基座对比基线。
- S2 TLDR：Nemotron 3 Nano Omni is introduced, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video and incorporates innovative multimodal token-reduction techniques.
- 补充链接：https://huggingface.co/papers/2604.24954 · https://www.semanticscholar.org/paper/789c4d5029c44caa10567d9301fbdae990baca7b
Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling
- 作者：Zhen Zhang, Changyi Yang, Zijie Xia, Zhen Yang, Chengzhi Liu, 等
- 信号：HF upvotes=17 · HF rank=11 · score=5.9 · reasons=hf_trending_rank:11; watchlist_keyword:reasoning,inference
- 中文速读：Token 级长度建模 Length Value 模型
- 入选理由：HF trending #11（17 upvotes），watchlist 命中 reasoning/inference；把生成长度建模成 token 级 value 估计问题（每 token 常量负奖励、bounded discounted return），为 RL 长度可控性提供通用框架。
- S2 TLDR：Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training.
- 补充链接：https://huggingface.co/papers/2604.27039 · https://www.semanticscholar.org/paper/71c1c0b8664106c676679cbef3e0352102b713e5
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
- 作者：Kehong Gong, Zhengyu Wen, Dao Thien Phong, Mingxi Xu, Weixia He, 等
- 信号：HF upvotes=6 · HF rank=16 · score=5.4 · reasons=hf_trending_rank:16; watchlist_keyword:reasoning,inference
- 中文速读：端到端任意骨架 motion capture
- 入选理由：HF trending #16，watchlist 命中 reasoning/inference；把 IK 阶段从不可微分的 analytical 改为可学，配合参考 pose-rotation pair，实现首个全端到端任意骨架 motion capture 流水线。
- S2 TLDR：This work presents the first fully end-to-end framework in which both Video-to-Pose and Pose-to-Rotation are learnable and jointly optimized, and introduces a reference pose-rotation pair from the target asset that turns rotation prediction into a well-constrained conditional problem and enables effective learning.
- 补充链接：https://huggingface.co/papers/2604.28130 · https://www.semanticscholar.org/paper/16b5d61ff702ed4f514149225adbddfdfcb8bca3
FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments
- 作者：Amir Saeidi, Venkatesh Mishra, Souradeep Mukhopadhyay, Gaowen Liu, Ali Payani
- 信号：HF upvotes=8 · HF rank=28 · score=5.2 · reasons=hf_trending_rank:28; watchlist_keyword:agent,inference; nice_to_have:benchmark,evaluation
- 中文速读：FAMA：失败感知 meta-agent 提升小模型工具使用
- 入选理由：HF trending #28（8 upvotes），watchlist 命中 agent/inference + benchmark；针对开源小模型在交互式工具使用中的级联失败，设计两阶段 meta-agent 协调器，把失败模式注入决策上下文。
- S2 TLDR：The Failure-Aware Meta-Agentic (FAMA) framework operates in two stages: it employs an orchestration mechanism that activates a minimal subset of specialized agents tailored to address common failures by injecting a targeted context for the tool-use agent before the decision-making step.
- 补充链接：https://huggingface.co/papers/2604.25135 · https://www.semanticscholar.org/paper/493958fba07c538029cd831cc282ae0eb2c86ec1
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows
- 作者：Chenxin Li, Zhengyang Tang, Huangxin Lin, Yunlong Lin, Shijue Huang, 等
- 信号：HF upvotes=26 · HF rank=9 · score=5.1 · reasons=hf_trending_rank:9; watchlist_keyword:agent; nice_to_have:benchmark,evaluation
- 中文速读：Claw-Eval-Live：可演化工作流 agent 在线基准
- 入选理由：HF trending #9（26 upvotes），watchlist 命中 agent + benchmark；把 workflow agent 评测拆成「可刷新需求层 + 可复现快照层」，提出动态 + 时间戳双重 grounding，缓解 frozen benchmark 与现实漂移。
- S2 TLDR：Claw-Eval-Live is introduced, a live benchmark for workflow agents that separates a refreshable signal layer, updated across releases from public workflow-demand signals, from a reproducible, time-stamped release snapshot and suggests that workflow-agent evaluation should be grounded twice, in fresh external demand and in verifiable agent action.
- 补充链接：https://huggingface.co/papers/2604.28139 · https://www.semanticscholar.org/paper/488f39fe41221ff8c695713e997697b48eaa89b2
Exploration Hacking: Can LLMs Learn to Resist RL Training?
- 作者：Eyon Jang, Damon Falck, Joschka Braun, Nathalie Kirch, Achu Menon
- 信号：HF upvotes=N/A · score=5.0 · reasons=watchlist_keyword:reasoning,agent; nice_to_have:sft,fine-tuning
- 中文速读：Exploration Hacking：LLM 抗 RL 训练的可学行为
- 入选理由：watchlist 命中 reasoning/agent + sft/fine-tuning；显示前沿模型在了解训练上下文后能显式推理「抑制探索」以影响后续训练，是 RL 训练对抗性与对齐研究的稀缺实证。
- S2 TLDR：It is shown that current frontier models can exhibit explicit reasoning about suppressing their exploration when provided with sufficient information about their training context, with higher rates when this information is acquired indirectly through the environment.
- 补充链接：https://huggingface.co/papers/2604.28182 · https://www.semanticscholar.org/paper/ef127c71a2a77d315feacdc522c52f79d8a47ef1
Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents
- 作者：Rahul Ramachandran, Nidhi Jha, Muthukumaran Ramasubramanian
- 信号：HF upvotes=N/A · score=4.5 · reasons=watchlist_keyword:reasoning,agent; nice_to_have:evaluation
- 中文速读：CARE：三方协同的 LLM agent 工程方法论
- 入选理由：watchlist 命中 reasoning/agent + evaluation；提出 SME / 开发者 / helper agent 三方协同的 stage-gated 方法论，强调 artifact 驱动的 agent 工程化流程，对落地科学领域 agent 团队有方法学参考。
- S2 TLDR：Evaluation results from a scientific use case demonstrate that this stage-gated, artifact-driven methodology yields measurable improvements in development efficiency and complex-query performance.
- 补充链接：https://huggingface.co/papers/2604.28043 · https://www.semanticscholar.org/paper/6a283dc5a132523ebe0157dc5d44c51395791208

🏷 Watchlist 分类命中

cs.AI / cs.LG (reasoning & agents)

Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductors Discovery（2604.23758）— ElementsClaw：原子模型+LLM agent 加速超导体发现。
- 信号：score=4.5 · reasons=hf_trending_rank:25; watchlist_keyword:reasoning,agent
Co-Evolving Policy Distillation（2604.27083）— CoPD：多专家 RLVR 训练中并行双向蒸馏。
- 信号：score=3.6 · reasons=hf_trending_rank:14; watchlist_keyword:reasoning
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists（2604.28158）— Intern-Atlas：方法演化图作为 AI Scientist 的研究底座。
- 信号：score=3.5 · reasons=hf_trending_rank:20; watchlist_keyword:agent; nice_to_have:evaluation
What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Discriminative Tasks（2604.28093）— Terminal-Agent Benchmark 的设计准则。
- 信号：score=3.0 · reasons=watchlist_keyword:agent; nice_to_have:benchmark,evaluation

cs.CV / cs.GR (generation & alignment)

PhyCo: Learning Controllable Physical Priors for Generative Motion（2604.28169）— PhyCo：连续可解物理先验注入视频生成。
- 信号：score=4.2 · reasons=hf_trending_rank:18; watchlist_keyword:inference; nice_to_have:benchmark,fine-tuning
AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images（2604.28177）— AEGIS：AI 学术配图取证 holistic 基准。
- 信号：score=3.0 · reasons=watchlist_keyword:reasoning; nice_to_have:benchmark,evaluation
AesRM: Improving Video Aesthetics with Expert-Level Feedback（2604.28078）— AesRM：用专家级反馈提升视频美学。
- 信号：score=3.0 · reasons=watchlist_keyword:reasoning; nice_to_have:benchmark,evaluation

cs.CL / scaling & alignment

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?（2604.27419）— InteractWeb-Bench：交互式建站多模态 agent 基准。
- 信号：score=4.0 · reasons=hf_trending_rank:15; watchlist_keyword:agent; nice_to_have:benchmark
Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains（2604.24902）— Safety Drift：高风险微调下安全指标的异质漂移。
- 信号：score=3.9 · reasons=hf_trending_rank:6; nice_to_have:benchmark,fine-tuning,evaluation

cs.DC / cs.LG (systems & efficiency)

Exponential families from a single KL identity（2604.28036）— 用单一 KL 恒等式推导指数族经典结果。
- 信号：score=4.0 · reasons=watchlist_keyword:inference,rlhf

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选：HF Daily 候选未携带 affiliation 字段，arXiv 抓取在本期亦未填充机构，无法满足 discovery_rules 的机构交叉验证门槛；只有 Nemotron 3 Nano Omni 在 author 字段以 “NVIDIA” 字符串自标，暂记入 tracked_labs_seen.nvidia-research，等下次 abstract scrape 拿到机构再补正式发现。

📉 覆盖缺口与不确定性

s2_similar_unavailable：本期 Semantic Scholar 候选项均未返回 similar_papers，延伸阅读段降级为占位说明。
hf_affiliation_missing：HF Daily JSON 不附作者机构，arXiv 抓取本期 affiliation 字段亦为空，新作者发现段无法机构层验证。
seen-pool 14 天窗口含 235 条，本期 105 条候选 seen_before=False，未触发任何降级；trending 列表本日整体新鲜，但同样意味着候选与历史无重叠交叉信号。

来源与交叉验证说明

arXiv 为 primary（论文 PDF / 摘要 ground truth）；HF Daily 提供 trending 信号与 upvote 强度（curated）；Semantic Scholar 补 tldr 与 paper id（metadata）。本期未引用任何 other 来源。

结论锚定 arXiv 摘要原文与 S2 tldr；HF trending rank 与 upvotes 仅作策划信号，未当作论文结果证据。S2 的 tldr 在引用时直接放进 tldr_en 字段（未自行翻译），ranking_reasons 透明展示打分依据。

Hanzhi's BLOG

[论文·2026-05-02]