论文雷达日报|2026-05-11
一句话结论:今日 Agent × Multimodal Search × Inference 加速三路同时高密度命中——DTap 红队平台 (Percy Liang 署名) 把代理安全/评测推到榜首,SpecBlock 给出 vs EAGLE-3 +8-13% 的硬数字,ReasonMaxxer 则抛出 RL-free 反命题挑战 RLVR 范式。
摘要
今日 47 条候选三源齐备(arXiv 47 / HF Daily 47 cross-listed / Semantic Scholar 32 命中),主线被 Agent × Multimodal Search × Inference 加速 三路撑住——DTap 红队平台 (Percy Liang 署名) 和 LLMs-Improving-LLMs 双榜首把代理评测和 TTS 自动化推到 Top;HyperEyes / InterLV-Search 同日交付多模态代理搜索的方法面与基准面双信号;SpecBlock 给出本日推理加速主线唯一硬 benchmark 数字(vs EAGLE-3 +8-13%)。ReasonMaxxer 抛出 RL-free 反命题挑战 RLVR 范式,方法面冲击较强。MoE 命中 2 篇(MACE-Dance 是首条具象应用),DPO/长上下文/推测解码命中各 1-2 篇。S2 相似论文链路全候选未返回,延伸阅读段为空;所有候选 affiliations 字段为空,无法做机构/团队层归属。
📌 Top picks (交叉命中)
-
2605.04808 DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents
- 速读:DTap 红队平台覆盖 14 领域 50+ 环境压测 AI 代理。
- S2 TLDR:The DecodingTrust-Agent Platform (DTap) is introduced, the first controllable and interactive red-teaming platform for AI agents, spanning 14 real-world domains and over 50 simulation environments that replicate widely used systems such as Google Workspace, Paypal, and Slack.
- 入选理由:hf_trending_rank:14 + watchlist:agent + tracked_author:Percy Liang,首个可控交互式代理红队平台,安全/评测主线双命中。(score=6.6, hf_upvotes=14, reasons=hf_trending_rank:14; watchlist_keyword:agent; nice_to_have:evaluation; tracked_author:percy liang)
- 作者:Zhaorun Chen, Xun Liu, Haibo Tong, Chengquan Guo, Yuzhou Nie, Jiawei Zhang 等
- 链接:https://arxiv.org/abs/2605.04808 / https://huggingface.co/papers/2605.04808 / https://www.semanticscholar.org/paper/8b2bd7e1a717663be85a78f1486a8f3f415c551c
-
2605.08083 LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
- 速读:代理自动发现 TTS 策略,胜过人工启发式调度。
- 入选理由:watchlist:reasoning,agent,inference 三重命中 + 51 HF upvotes(rank 外但热度第二),首次把 TTS 设计本身交给 LLM 代理迭代。(score=6.5, hf_upvotes=51, reasons=watchlist_keyword:reasoning,agent,inference; nice_to_have:benchmark)
- 作者:Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang, Rui Liu 等
- 链接:https://arxiv.org/abs/2605.08083 / https://huggingface.co/papers/2605.08083
-
2605.06716 From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms
- 速读:LLM 代理记忆综述提出 Storage→Reflection→Experience 三阶段框架。
- S2 TLDR:This survey proposes a novel evolutionary framework for LLM agent memory mechanisms, formalizing the development process into three stages: Storage (trajectory preservation), Reflection (trajectory refinement), and Experience (trajectory abstraction).
- 入选理由:watchlist:agent + citation_velocity:4.0,将散乱记忆机制研究系统化,工程参考价值高。(score=6.0, hf_upvotes=5, reasons=watchlist_keyword:agent; citation_velocity:4.0)
- 作者:Jinghao Luo, Yuchen Tian, Chuxue Cao, Ziyang Luo, Hongzhan Lin, Kaixin Li 等
- 链接:https://arxiv.org/abs/2605.06716 / https://huggingface.co/papers/2605.06716 / https://www.semanticscholar.org/paper/ed20847a506473433843fe31f9024667c0f47325
-
2512.18181 MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
- 速读:级联 MoE:动作专家+外观专家合成音乐驱动舞蹈视频。
- S2 TLDR:MACE-Dance is presented, a music-driven dance video generation framework with cascaded Mixture-of-Experts (MoE), where the Motion Expert performs music-to-3D motion generation while enforcing kinematic plausibility and artistic expressiveness, whereas the Appearance Expert carries out motion- and reference-conditioned video synthesis.
- 入选理由:hf_trending_rank:12 + 80 upvotes + watchlist:moe + benchmark/fine-tuning/evaluation 多重命中,MoE 主线本周内首条具象化应用。(score=5.596, hf_upvotes=80, reasons=hf_trending_rank:12; watchlist_keyword:moe; nice_to_have:benchmark,fine-tuning,evaluation; citation_velocity:0.296)
- 作者:Kaixing Yang, Jiashu Zhu, Xulong Tang, Ziqiao Peng, Xiangyue Zhang, Puwei Wang 等
- 链接:https://arxiv.org/abs/2512.18181 / https://huggingface.co/papers/2512.18181 / https://www.semanticscholar.org/paper/042c61783b406feb5ca8489f34213f837f8474a1
-
2605.06241 Rethinking RL for LLM Reasoning: It’s Sparse Policy Selection, Not Capability Learning
- 速读:ReasonMaxxer:熵门控对比损失替代 RL,单卡分钟级训练。
- S2 TLDR:ReasonMaxxer, a minimal RL-free method that applies contrastive loss only at entropy-gated decision points, matches or exceeds full RL performance while requiring only tens of problems and minutes of single-GPU training, a reduction in training cost of roughly three orders of magnitude.
- 入选理由:hf_trending_rank:4 + watchlist:reasoning,对当前 RLVR 范式提出 RL-free 反命题,训练成本下降约三量级,方法面冲击较强。(score=5.1, hf_upvotes=2, reasons=hf_trending_rank:4; watchlist_keyword:reasoning; nice_to_have:benchmark)
- 作者:Ömer Faruk Akgül, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna
- 链接:https://arxiv.org/abs/2605.06241 / https://huggingface.co/papers/2605.06241 / https://www.semanticscholar.org/paper/3fd6e403b398fa1ecf2618cce026272724ab6a5e
-
2605.07177 HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents
- 速读:HyperEyes 把效率写进 RL 目标,多模态搜索改并行原子动作。
- S2 TLDR:This work presents HyperEyes, a parallel multimodal search agent that fuses visual grounding and retrieval into a single atomic action, enabling concurrent search across multiple entities while treating inference efficiency as a first-class training objective and introduces IMEB, a human-curated benchmark of 300 instances that jointly evaluates search capability and efficiency.
- 入选理由:hf_trending_rank:24 + 54 upvotes + watchlist:agent,inference + benchmark,提出 IMEB 基准,与今日 InterLV-Search 形成多模态代理搜索双信号。(score=5.1, hf_upvotes=54, reasons=hf_trending_rank:24; watchlist_keyword:agent,inference; nice_to_have:benchmark)
- 作者:Guankai Li, Jiabin Chen, Yi Xu, Xichen Zhang, Yuan Lu
- 链接:https://arxiv.org/abs/2605.07177 / https://huggingface.co/papers/2605.07177 / https://www.semanticscholar.org/paper/21877966de98d8e37e7b5e0c7de4834ed2f9c8ad
-
2605.07243 SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting
- 速读:SpecBlock 块迭代推测解码比 EAGLE-3 提速 8-13%。
- S2 TLDR:This paper proposes SpecBlock, a block-iterative drafter that combines path dependence with cheap drafting, and shows that SpecBlock improves mean speedup by 8-13% over EAGLE-3 at 44-52% of its drafting cost, and cost-aware adaptation extends this lead to 11-19%.
- 入选理由:hf_trending_rank:20 + watchlist:inference,speculative decoding,本日推理加速主线唯一硬 benchmark 数字,相对 EAGLE-3 cost 仅 44-52%。(score=5.0, hf_upvotes=2, reasons=hf_trending_rank:20; watchlist_keyword:inference,speculative decoding)
- 作者:Weijie Shi, Qiang Xu, Fan Deng, Yaguang Wu, Jiarun Liu, Yehong Xu 等
- 链接:https://arxiv.org/abs/2605.07243 / https://huggingface.co/papers/2605.07243 / https://www.semanticscholar.org/paper/0a8c0922e16a6c1fd098dc663278c9a2acb13986
-
2605.07510 InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
- 速读:InterLV-Search:首个交错多模态代理搜索基准。
- 入选理由:watchlist:agent,dpo + benchmark/evaluation,把视觉证据纳入搜索轨迹,是今日代理评测面第二条信号。(score=5.0, hf_upvotes=5, reasons=watchlist_keyword:agent,dpo; nice_to_have:benchmark,evaluation)
- 作者:Bohan Hou, Jiuning Gu, Jiayan Guo, Ronghao Dang, Sicong Leng, Xin Li 等
- 链接:https://arxiv.org/abs/2605.07510 / https://huggingface.co/papers/2605.07510
🏷 Watchlist 分类命中
已扣除 Top picks 中已列条目;每桶最多列 4 条 fallback 候选。
agent(4 条 fallback)
- 2605.03353 SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents — score=4.5, hf_upvotes=6; reasons: hf_trending_rank:5; watchlist_keyword:agent
- 2604.25325 R^3-SQL: Ranking Reward and Resampling for Text-to-SQL — score=4.5, hf_upvotes=1; reasons: hf_trending_rank:10; watchlist_keyword:agent; nice_to_have:benchmark
- 2605.06455 PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors — score=4.2, hf_upvotes=2; reasons: hf_trending_rank:8; watchlist_keyword:agent
- 2605.07447 Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs — score=4.1, hf_upvotes=1; reasons: hf_trending_rank:9; watchlist_keyword:agent
reasoning(3 条 fallback)
- 2605.05997 4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding — score=5.0, hf_upvotes=15; reasons: watchlist_keyword:reasoning,inference; nice_to_have:benchmark,fine-tuning
- 2605.08043 SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation — score=3.6, hf_upvotes=7; reasons: hf_trending_rank:19; watchlist_keyword:reasoning; nice_to_have:benchmark
- 2605.06139 Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex — score=2.0, hf_upvotes=57; reasons: watchlist_keyword:reasoning
inference(4 条 fallback)
- 2605.05997 4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding — score=5.0, hf_upvotes=15; reasons: watchlist_keyword:reasoning,inference; nice_to_have:benchmark,fine-tuning
- 2605.08044 Fast Byte Latent Transformer — score=4.0, hf_upvotes=5; reasons: watchlist_keyword:inference,speculative decoding
- 2605.07363 MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference — score=4.0, hf_upvotes=11; reasons: watchlist_keyword:long context,inference
- 2605.06105 Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility — score=3.9, hf_upvotes=1; reasons: hf_trending_rank:16; watchlist_keyword:inference; nice_to_have:benchmark
moe(1 条 fallback)
- 2602.03473 Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts — score=3.1, hf_upvotes=7; reasons: hf_trending_rank:29; watchlist_keyword:moe; nice_to_have:benchmark,evaluation
dpo(1 条 fallback)
- 2605.00933 CGM-JEPA: Learning Consistent Continuous Glucose Monitor Representations via Predictive Self-Supervised Pretraining — score=4.4, hf_upvotes=1; reasons: hf_trending_rank:6; watchlist_keyword:dpo
long context(1 条 fallback)
- 2605.07363 MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference — score=4.0, hf_upvotes=11; reasons: watchlist_keyword:long context,inference
speculative decoding(1 条 fallback)
- 2605.08044 Fast Byte Latent Transformer — score=4.0, hf_upvotes=5; reasons: watchlist_keyword:inference,speculative decoding
🔗 延伸阅读 (Semantic Scholar 相似论文)
本段今日无高置信度增量信号(S2 相似论文未返回)。Coverage gap:s2_similar_unavailable。
🧑🔬 新出现的作者 / 团队
在候选 affiliations / categories 全空的元数据约束下,本日仅靠 ranking_reasons 里的 tracked_author 标签做归属——DTap 红队论文(arxiv:2605.04808)联合署名 Percy Liang,是今日唯一可识别的 watchlist 已知作者活跃信号;其余候选未发现达标新作者 / 新团队。
- Percy Liang — 在 2605.04808 《DecodingTrust-Agent Platform (DTap)》联合署名。watchlist 已知 tracked_author 在今日署名,作为已跟踪人物的活跃信号记录;其余候选 affiliations 字段空,无法做新作者甄别。
📉 覆盖缺口与不确定性
s2_similar_unavailable:S2 similar_papers 字段在所有候选上为 None,延伸阅读段为空。affiliations_unavailable:47 条候选的affiliations[]全空,无法做机构 / 团队级新发现归属。s2_partial_coverage:15/47 候选缺 s2_url(含 LLMs-Improving-LLMs / InterLV-Search 等热门条目),其tldr_en留空,未做替代翻译。confidence_flags: ranking_relies_on_hf_upvotes_and_keyword_only / no_tracked_lab_attribution_today。
来源与交叉验证说明
- arXiv (primary) — 47 条,作为结论锚点,引用
arxiv_url。 - HuggingFace Daily Papers (curated) — 47 条全部 cross-listed,
hf_upvotes/hf_trending_rank仅作注意力指标。 - Semantic Scholar (metadata) — 32/47 命中,提供
tldr_en/citation_velocity;similar_papers全候选未返回。
Top picks 的 tldr_cn 由 s2_tldr 或 abstract 第一句浓缩,未触发外部 fetch / 翻译;ranking_score 由 paper_fetch.py 一次性给出,未二次重排。Source mix:arXiv 47 / HF 47 / S2 32(primary>metadata>curated>other)。