[论文·2026-05-11]

论文雷达日报|2026-05-11

一句话结论:今日 Agent × Multimodal Search × Inference 加速三路同时高密度命中——DTap 红队平台 (Percy Liang 署名) 把代理安全/评测推到榜首,SpecBlock 给出 vs EAGLE-3 +8-13% 的硬数字,ReasonMaxxer 则抛出 RL-free 反命题挑战 RLVR 范式。

摘要

今日 47 条候选三源齐备(arXiv 47 / HF Daily 47 cross-listed / Semantic Scholar 32 命中),主线被 Agent × Multimodal Search × Inference 加速 三路撑住——DTap 红队平台 (Percy Liang 署名) 和 LLMs-Improving-LLMs 双榜首把代理评测和 TTS 自动化推到 Top;HyperEyes / InterLV-Search 同日交付多模态代理搜索的方法面与基准面双信号;SpecBlock 给出本日推理加速主线唯一硬 benchmark 数字(vs EAGLE-3 +8-13%)。ReasonMaxxer 抛出 RL-free 反命题挑战 RLVR 范式,方法面冲击较强。MoE 命中 2 篇(MACE-Dance 是首条具象应用),DPO/长上下文/推测解码命中各 1-2 篇。S2 相似论文链路全候选未返回,延伸阅读段为空;所有候选 affiliations 字段为空,无法做机构/团队层归属。

📌 Top picks (交叉命中)

  • 2605.04808 DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

    • 速读:DTap 红队平台覆盖 14 领域 50+ 环境压测 AI 代理。
    • S2 TLDR:The DecodingTrust-Agent Platform (DTap) is introduced, the first controllable and interactive red-teaming platform for AI agents, spanning 14 real-world domains and over 50 simulation environments that replicate widely used systems such as Google Workspace, Paypal, and Slack.
    • 入选理由:hf_trending_rank:14 + watchlist:agent + tracked_author:Percy Liang,首个可控交互式代理红队平台,安全/评测主线双命中。(score=6.6, hf_upvotes=14, reasons=hf_trending_rank:14; watchlist_keyword:agent; nice_to_have:evaluation; tracked_author:percy liang)
    • 作者:Zhaorun Chen, Xun Liu, Haibo Tong, Chengquan Guo, Yuzhou Nie, Jiawei Zhang 等
    • 链接:https://arxiv.org/abs/2605.04808 / https://huggingface.co/papers/2605.04808 / https://www.semanticscholar.org/paper/8b2bd7e1a717663be85a78f1486a8f3f415c551c
  • 2605.08083 LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

    • 速读:代理自动发现 TTS 策略,胜过人工启发式调度。
    • 入选理由:watchlist:reasoning,agent,inference 三重命中 + 51 HF upvotes(rank 外但热度第二),首次把 TTS 设计本身交给 LLM 代理迭代。(score=6.5, hf_upvotes=51, reasons=watchlist_keyword:reasoning,agent,inference; nice_to_have:benchmark)
    • 作者:Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang, Rui Liu 等
    • 链接:https://arxiv.org/abs/2605.08083 / https://huggingface.co/papers/2605.08083
  • 2605.06716 From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

    • 速读:LLM 代理记忆综述提出 Storage→Reflection→Experience 三阶段框架。
    • S2 TLDR:This survey proposes a novel evolutionary framework for LLM agent memory mechanisms, formalizing the development process into three stages: Storage (trajectory preservation), Reflection (trajectory refinement), and Experience (trajectory abstraction).
    • 入选理由:watchlist:agent + citation_velocity:4.0,将散乱记忆机制研究系统化,工程参考价值高。(score=6.0, hf_upvotes=5, reasons=watchlist_keyword:agent; citation_velocity:4.0)
    • 作者:Jinghao Luo, Yuchen Tian, Chuxue Cao, Ziyang Luo, Hongzhan Lin, Kaixin Li 等
    • 链接:https://arxiv.org/abs/2605.06716 / https://huggingface.co/papers/2605.06716 / https://www.semanticscholar.org/paper/ed20847a506473433843fe31f9024667c0f47325
  • 2512.18181 MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation

    • 速读:级联 MoE:动作专家+外观专家合成音乐驱动舞蹈视频。
    • S2 TLDR:MACE-Dance is presented, a music-driven dance video generation framework with cascaded Mixture-of-Experts (MoE), where the Motion Expert performs music-to-3D motion generation while enforcing kinematic plausibility and artistic expressiveness, whereas the Appearance Expert carries out motion- and reference-conditioned video synthesis.
    • 入选理由:hf_trending_rank:12 + 80 upvotes + watchlist:moe + benchmark/fine-tuning/evaluation 多重命中,MoE 主线本周内首条具象化应用。(score=5.596, hf_upvotes=80, reasons=hf_trending_rank:12; watchlist_keyword:moe; nice_to_have:benchmark,fine-tuning,evaluation; citation_velocity:0.296)
    • 作者:Kaixing Yang, Jiashu Zhu, Xulong Tang, Ziqiao Peng, Xiangyue Zhang, Puwei Wang 等
    • 链接:https://arxiv.org/abs/2512.18181 / https://huggingface.co/papers/2512.18181 / https://www.semanticscholar.org/paper/042c61783b406feb5ca8489f34213f837f8474a1
  • 2605.06241 Rethinking RL for LLM Reasoning: It’s Sparse Policy Selection, Not Capability Learning

    • 速读:ReasonMaxxer:熵门控对比损失替代 RL,单卡分钟级训练。
    • S2 TLDR:ReasonMaxxer, a minimal RL-free method that applies contrastive loss only at entropy-gated decision points, matches or exceeds full RL performance while requiring only tens of problems and minutes of single-GPU training, a reduction in training cost of roughly three orders of magnitude.
    • 入选理由:hf_trending_rank:4 + watchlist:reasoning,对当前 RLVR 范式提出 RL-free 反命题,训练成本下降约三量级,方法面冲击较强。(score=5.1, hf_upvotes=2, reasons=hf_trending_rank:4; watchlist_keyword:reasoning; nice_to_have:benchmark)
    • 作者:Ömer Faruk Akgül, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna
    • 链接:https://arxiv.org/abs/2605.06241 / https://huggingface.co/papers/2605.06241 / https://www.semanticscholar.org/paper/3fd6e403b398fa1ecf2618cce026272724ab6a5e
  • 2605.07177 HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

    • 速读:HyperEyes 把效率写进 RL 目标,多模态搜索改并行原子动作。
    • S2 TLDR:This work presents HyperEyes, a parallel multimodal search agent that fuses visual grounding and retrieval into a single atomic action, enabling concurrent search across multiple entities while treating inference efficiency as a first-class training objective and introduces IMEB, a human-curated benchmark of 300 instances that jointly evaluates search capability and efficiency.
    • 入选理由:hf_trending_rank:24 + 54 upvotes + watchlist:agent,inference + benchmark,提出 IMEB 基准,与今日 InterLV-Search 形成多模态代理搜索双信号。(score=5.1, hf_upvotes=54, reasons=hf_trending_rank:24; watchlist_keyword:agent,inference; nice_to_have:benchmark)
    • 作者:Guankai Li, Jiabin Chen, Yi Xu, Xichen Zhang, Yuan Lu
    • 链接:https://arxiv.org/abs/2605.07177 / https://huggingface.co/papers/2605.07177 / https://www.semanticscholar.org/paper/21877966de98d8e37e7b5e0c7de4834ed2f9c8ad
  • 2605.07243 SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

    • 速读:SpecBlock 块迭代推测解码比 EAGLE-3 提速 8-13%。
    • S2 TLDR:This paper proposes SpecBlock, a block-iterative drafter that combines path dependence with cheap drafting, and shows that SpecBlock improves mean speedup by 8-13% over EAGLE-3 at 44-52% of its drafting cost, and cost-aware adaptation extends this lead to 11-19%.
    • 入选理由:hf_trending_rank:20 + watchlist:inference,speculative decoding,本日推理加速主线唯一硬 benchmark 数字,相对 EAGLE-3 cost 仅 44-52%。(score=5.0, hf_upvotes=2, reasons=hf_trending_rank:20; watchlist_keyword:inference,speculative decoding)
    • 作者:Weijie Shi, Qiang Xu, Fan Deng, Yaguang Wu, Jiarun Liu, Yehong Xu 等
    • 链接:https://arxiv.org/abs/2605.07243 / https://huggingface.co/papers/2605.07243 / https://www.semanticscholar.org/paper/0a8c0922e16a6c1fd098dc663278c9a2acb13986
  • 2605.07510 InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search

    • 速读:InterLV-Search:首个交错多模态代理搜索基准。
    • 入选理由:watchlist:agent,dpo + benchmark/evaluation,把视觉证据纳入搜索轨迹,是今日代理评测面第二条信号。(score=5.0, hf_upvotes=5, reasons=watchlist_keyword:agent,dpo; nice_to_have:benchmark,evaluation)
    • 作者:Bohan Hou, Jiuning Gu, Jiayan Guo, Ronghao Dang, Sicong Leng, Xin Li 等
    • 链接:https://arxiv.org/abs/2605.07510 / https://huggingface.co/papers/2605.07510

🏷 Watchlist 分类命中

已扣除 Top picks 中已列条目;每桶最多列 4 条 fallback 候选。

agent(4 条 fallback)

  • 2605.03353 SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents — score=4.5, hf_upvotes=6; reasons: hf_trending_rank:5; watchlist_keyword:agent
  • 2604.25325 R^3-SQL: Ranking Reward and Resampling for Text-to-SQL — score=4.5, hf_upvotes=1; reasons: hf_trending_rank:10; watchlist_keyword:agent; nice_to_have:benchmark
  • 2605.06455 PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors — score=4.2, hf_upvotes=2; reasons: hf_trending_rank:8; watchlist_keyword:agent
  • 2605.07447 Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs — score=4.1, hf_upvotes=1; reasons: hf_trending_rank:9; watchlist_keyword:agent

reasoning(3 条 fallback)

  • 2605.05997 4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding — score=5.0, hf_upvotes=15; reasons: watchlist_keyword:reasoning,inference; nice_to_have:benchmark,fine-tuning
  • 2605.08043 SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation — score=3.6, hf_upvotes=7; reasons: hf_trending_rank:19; watchlist_keyword:reasoning; nice_to_have:benchmark
  • 2605.06139 Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex — score=2.0, hf_upvotes=57; reasons: watchlist_keyword:reasoning

inference(4 条 fallback)

  • 2605.05997 4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding — score=5.0, hf_upvotes=15; reasons: watchlist_keyword:reasoning,inference; nice_to_have:benchmark,fine-tuning
  • 2605.08044 Fast Byte Latent Transformer — score=4.0, hf_upvotes=5; reasons: watchlist_keyword:inference,speculative decoding
  • 2605.07363 MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference — score=4.0, hf_upvotes=11; reasons: watchlist_keyword:long context,inference
  • 2605.06105 Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility — score=3.9, hf_upvotes=1; reasons: hf_trending_rank:16; watchlist_keyword:inference; nice_to_have:benchmark

moe(1 条 fallback)

  • 2602.03473 Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts — score=3.1, hf_upvotes=7; reasons: hf_trending_rank:29; watchlist_keyword:moe; nice_to_have:benchmark,evaluation

dpo(1 条 fallback)

  • 2605.00933 CGM-JEPA: Learning Consistent Continuous Glucose Monitor Representations via Predictive Self-Supervised Pretraining — score=4.4, hf_upvotes=1; reasons: hf_trending_rank:6; watchlist_keyword:dpo

long context(1 条 fallback)

  • 2605.07363 MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference — score=4.0, hf_upvotes=11; reasons: watchlist_keyword:long context,inference

speculative decoding(1 条 fallback)

  • 2605.08044 Fast Byte Latent Transformer — score=4.0, hf_upvotes=5; reasons: watchlist_keyword:inference,speculative decoding

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号(S2 相似论文未返回)。Coverage gap:s2_similar_unavailable

🧑‍🔬 新出现的作者 / 团队

在候选 affiliations / categories 全空的元数据约束下,本日仅靠 ranking_reasons 里的 tracked_author 标签做归属——DTap 红队论文(arxiv:2605.04808)联合署名 Percy Liang,是今日唯一可识别的 watchlist 已知作者活跃信号;其余候选未发现达标新作者 / 新团队。

  • Percy Liang — 在 2605.04808 《DecodingTrust-Agent Platform (DTap)》联合署名。watchlist 已知 tracked_author 在今日署名,作为已跟踪人物的活跃信号记录;其余候选 affiliations 字段空,无法做新作者甄别。

📉 覆盖缺口与不确定性

  • s2_similar_unavailable:S2 similar_papers 字段在所有候选上为 None,延伸阅读段为空。
  • affiliations_unavailable:47 条候选的 affiliations[] 全空,无法做机构 / 团队级新发现归属。
  • s2_partial_coverage:15/47 候选缺 s2_url(含 LLMs-Improving-LLMs / InterLV-Search 等热门条目),其 tldr_en 留空,未做替代翻译。
  • confidence_flags: ranking_relies_on_hf_upvotes_and_keyword_only / no_tracked_lab_attribution_today。

来源与交叉验证说明

  • arXiv (primary) — 47 条,作为结论锚点,引用 arxiv_url
  • HuggingFace Daily Papers (curated) — 47 条全部 cross-listed,hf_upvotes / hf_trending_rank 仅作注意力指标。
  • Semantic Scholar (metadata) — 32/47 命中,提供 tldr_en / citation_velocitysimilar_papers 全候选未返回。

Top picks 的 tldr_cns2_tldrabstract 第一句浓缩,未触发外部 fetch / 翻译;ranking_scorepaper_fetch.py 一次性给出,未二次重排。Source mix:arXiv 47 / HF 47 / S2 32(primary>metadata>curated>other)。