论文雷达日报｜2026-04-29

一句话结论：今日 HF 流量主导榜单，主线集中在 Agent 评测/蒸馏（AutoResearchBench、TCOD、Recursive MAS、GoClick） 与 音频/视觉模型对齐（Step-Audio-R1.5、Post-Train 视频框架、Evaluator VLM 盲点），再加一篇合成数据驱动的安全护栏（BARRED）。

摘要

HF Daily 与 arXiv 双源对齐，Top 8 全部来自 HF trending（rank 3-20）+ S2 元数据交叉，今日没有命中 tracked_authors / tracked_affiliations 的明显新机构信号。
Agent 方向今日最活跃：RecursiveMAS（latent-space 递归多智能体）、AutoResearchBench（科研文献发现 benchmark）、TCOD（多轮 OPD 时序课程）、GoClick（230M GUI grounding VLM） 集体上榜，呼应昨日 paper-digest 提到的 “agent 评测过拟合” 主题。
音频/视觉对齐：Step-Audio-R1.5 把音频推理从 RLVR 切到 RLHF；视频生成有一篇系统化 post-training 框架；VLM 评测可靠性被 Seeing Isn’t Believing 系统性踢盘。
安全：BARRED 用「反思 + 对抗 debate」合成边界样本来训练自定义 guardrail，是少见的合成数据走 boundary alignment 路线的工作。
候选 JSON 共 136 条，全部 seen_before=false，无需降级；S2 similar_papers 字段全部缺省，延伸阅读今日无高置信度增量信号。

📌 Top picks (交叉命中)

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate — 中文速读：用反思+不对称辩论合成边界样本训自定义 guardrail。
- 入选理由：hf_trending_rank:11 + watchlist_keyword reasoning,agent,inference，并命中 nice_to_have:fine-tuning。是今日唯一明确的 safety/guardrail synthetic-data 工作。
- 作者：Arnon Mazza, Elad Levi
- 证据链接：https://arxiv.org/abs/2604.25203 ｜ https://huggingface.co/papers/2604.25203 ｜ https://www.semanticscholar.org/paper/06ac37d1f3436c6cd5f6b4ecc58266f0639fa214
Step-Audio-R1.5 Technical Report — 中文速读：阶跃星辰把音频推理从 RLVR 切换到 RLHF。
- 入选理由：hf_trending_rank:3（HF 12 赞、列表前列）+ watchlist reasoning,rlhf + benchmark,evaluation，明确给出 long-turn 对话方向的对齐范式转变。
- 作者：Yuxin Zhang, Xiangyu Tony Zhang, Daijiao Liu, Fei Tian, Yayue Deng, Jun Chen, …, Daxin Jiang（StepFun 大队）
- 证据链接：https://arxiv.org/abs/2604.25719 ｜ https://huggingface.co/papers/2604.25719 ｜ https://www.semanticscholar.org/paper/0369a5befb51700bd14c4ef2a29efcd8a0646537
Recursive Multi-Agent Systems (RecursiveMAS) — 中文速读：把多智能体协作改写为统一 latent 递归计算。
- 入选理由：hf_trending_rank:20（HF 62 赞，本日第二高）+ arXiv cs.AI/CL/LG 三栏命中 + watchlist reasoning,agent,inference。Loop-LM scaling 思路被推到 MAS 维度。
- 作者：Xiyuan Yang, Jiaru Zou, Rui Pan, Ruizhong Qiu, Pan Lu, Shizhe Diao, …, Markus J. Buehler, James Zou
- 证据链接：http://arxiv.org/abs/2604.25917v1 ｜ https://huggingface.co/papers/2604.25917 ｜ https://www.semanticscholar.org/paper/797777f63417da747628cc3891f88ef7360d7b6f
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery — 中文速读：评测 agent 做科研文献发现的开放式基准。
- 入选理由：hf_trending_rank:5（HF 23 赞）+ watchlist reasoning,agent + benchmark,evaluation，定位「research-oriented / literature-focused / open-ended」三性，与 paper-digest 自身定位高度对齐。
- 作者：Lei Xiong, Kun Luo, Ziyi Xia, Wenbo Zhang, Jin-Ge Yao, Zheng Liu, …, Zhicheng Dou
- 证据链接：https://arxiv.org/abs/2604.25256 ｜ https://huggingface.co/papers/2604.25256 ｜ https://www.semanticscholar.org/paper/86f82b41dab50da936b9454d169a633e9051d361
TCOD: Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents — 中文速读：用 trajectory 长度课程稳住多轮 OPD 蒸馏。
- 入选理由：hf_trending_rank:8 + watchlist reasoning,agent + benchmark,evaluation。明确指出 vanilla OPD 在多轮 agent 上的 KL 不稳定问题，是少见的多轮蒸馏方法学论文。
- 作者：Jiaqi Wang, Wenhao Zhang, Weijie Shi, Yaliang Li, James Cheng
- 证据链接：https://arxiv.org/abs/2604.24005 ｜ https://huggingface.co/papers/2604.24005 ｜ https://www.semanticscholar.org/paper/9a7aeb4c4cddd5386f2b6caecc99bef920324475
GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction — 中文速读：230M 参数 VLM 做端侧 GUI 元素定位。
- 入选理由：hf_trending_rank:9 + watchlist agent,inference。论文宣称在 GUI 视觉定位上与更大 VLM 持平，端侧 latency 友好——对 GUI agent 部署链有直接价值。
- 作者：Hongxin Li, Yuntao Chen, Zhaoxiang Zhang
- 证据链接：https://arxiv.org/abs/2604.23941 ｜ https://huggingface.co/papers/2604.23941 ｜ https://www.semanticscholar.org/paper/46846256efa89d0449797cb87f4b62b329d1287b
A Systematic Post-Train Framework for Video Generation — 中文速读：四阶段对齐流水线给视频扩散模型做 post-training。
- 入选理由：hf_trending_rank:16 + watchlist inference,rlhf + sft,fine-tuning。明确 prompt sensitivity / temporal inconsistency / inference cost 三条工业落地痛点，给出系统化解法。
- 作者：Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, …, Nan Duan, Ping Luo
- 证据链接：https://arxiv.org/abs/2604.25427 ｜ https://huggingface.co/papers/2604.25427 ｜ https://www.semanticscholar.org/paper/79860bc258aa4c80c92781ae1351710835a389bd
Seeing Isn’t Believing: Uncovering Blind Spots in Evaluator Vision-Language Models — 中文速读：系统性揭示 VLM 当 evaluator 时的偏差与盲点。
- 入选理由：hf_trending_rank:7 + watchlist reasoning + benchmark,evaluation。在「LLM-as-judge」往 VLM 扩展的当口，给出可复用的扰动化评测框架，用于揭示 object hallucination / spatial / factual / visual fidelity 四类失效。
- 作者：Mohammed Safi Ur Rahman Khan, Sanjay Suryanarayanan, Tushar Anand, Mitesh M. Khapra
- 证据链接：https://arxiv.org/abs/2604.21523 ｜ https://huggingface.co/papers/2604.21523 ｜ https://www.semanticscholar.org/paper/eaf832eea8a78afafab697370a0535237983ade8

🏷 Watchlist 分类命中

Agents / 多智能体（cs.AI / cs.CL / cs.LG）：
- OxyGent — 模块化 / 可观测 / 可演化的 Multi-Agent 框架（Oxy 抽象） · https://arxiv.org/abs/2604.25602
- Think Before You Act — Neurocognitive Governance Model for Autonomous AI Agents · https://arxiv.org/abs/2604.25684
- Toward Scalable Terminal Task Synthesis via Skill Graphs · https://arxiv.org/abs/2604.25727
- DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios（cs.CL）· https://arxiv.org/abs/2604.25914
- AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark · https://arxiv.org/abs/2604.24441
推理/对齐（reasoning, dpo, rlhf）：
- Backtranslation Augmented DPO for Neural Machine Translation（cs.CL）— DPO 在 NMT 的回译增强 · https://arxiv.org/abs/2604.25702
- Walking Through Uncertainty: Empirical Study of Uncertainty Estimation for Audio-Aware LLMs（eess.AS / cs.CL / cs.LG）· https://arxiv.org/abs/2604.25591
- How Fast Should a Model Commit to Supervision?（cs.LG / cs.AI）— SFT 时机 / curriculum 视角 · https://arxiv.org/abs/2604.25907
Inference / 量化 / 投机解码：
- SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compression（eess.SP, cs.DC）· https://arxiv.org/abs/2604.25777
- QB-LIF: Learnable-Scale Quantized Burst Neurons for Efficient SNNs（cs.CV）· https://arxiv.org/abs/2604.25688
- Scalable Inference Architectures for Compound AI Systems: A Production Deployment Study（cs.AI）· https://arxiv.org/abs/2604.25724
Robotics / Embodied：
- KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning（cs.RO）· https://arxiv.org/abs/2604.25788
- IndustryAssetEQA: Neurosymbolic Operational Intelligence System for Embodied QA · https://arxiv.org/abs/2604.23446

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回，本批 136 条候选 similar_papers 字段全部缺省）。详见「📉 覆盖缺口与不确定性」中的 s2_similar_unavailable。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选——候选 JSON 的 affiliations 字段全部为空，无法可靠判定 tracked_affiliations 命中；按 discovery_rules 的「宁缺毋滥」原则不强行罗列。值得人工后续观察的作者锚点：

Yuxin Zhang / Daxin Jiang 等（StepFun 阶跃星辰，Step-Audio 系列继续迭代） — 待人工确认机构归属后再考虑加入 tracked_affiliations。
Hongxin Li, Yuntao Chen, Zhaoxiang Zhang（GUI agent 双投：GoClick + AutoGUI-v2） — 同一作者群一日内贡献两篇 GUI 方向论文，建议下次扫描时加入观察池。

📉 覆盖缺口与不确定性

s2_similar_unavailable — 全部 136 条候选 similar_papers 字段缺失，延伸阅读段落今日无 S2 增量；不去外部 fetch（遵守 step-3 硬性约束）。
affiliations_empty — 候选 JSON affiliations 数组全部为空，导致 tracked_affiliations 命中数为 0、new_authors 段无法做机构级裁判，仅能按作者名给软提示。
confidence_flags：HF Daily trending 信号占主导（Top 8 中 7 条带 hf_trending_rank），arXiv-only 候选普遍 ranking_score ≤ 4.0；今日的 ranking 抽样偏向社区 hype，建议消费方在 AI daily 引用 top_picks[0:3] 时同时核对 abstract。

来源与交叉验证说明

三源全部到位：arXiv（primary）+ HuggingFace Daily（curated）+ Semantic Scholar（metadata）。paper_fetch.py 无 WARN，/tmp/paper_fetch.err 为空。
来源分布（共 136 条）：arxiv+s2 94，hf+s2 36，arxiv+hf+s2 6——HF×arXiv 双命中只有 6 条，Top picks 主要靠 HF trending 提权。
结论锚定遵循 primary > metadata > curated > other：每条 Top pick 均给出 arxiv_url 作为主要证据，hf_url / s2_url 作为补充。
与 market-briefing AI daily 的契约：top_picks[0:3]（BARRED / Step-Audio-R1.5 / RecursiveMAS）可被 AI daily 在「关键人物与社区信号」段引用。

Hanzhi's BLOG

[论文·2026-04-29]