论文雷达日报｜2026-05-10

一句话结论：今日 HF 趋势榜由「专家组合 / 模块化」主线领跑——MoE 预训练（EMO）、VLM 专家几何堆叠（GeoStack）、agentic RL 轨迹抽象（StraTA）三篇同日并发，且 Top 8 中过半带 watchlist agent / MoE / inference 关键字命中，提示当前社区注意力正显著向「让大模型可被分解、再组合」的方向倾斜。

摘要

MoE/模块化是今日最强信号：EMO（hf#1, score 8.9）和 GeoStack（hf#4）从两条不同路径——预训练原生模块化 vs. 后训练几何堆叠——给「专家可独立训练并组合」的命题贡献了同日的两个证据点。
Agentic RL 持续高密度产出：StraTA、Skill1、Auto Research、AI Co-Mathematician 一日并发四篇 agent / RL 相关工作，其中 AI Co-Mathematician 在 FrontierMath Tier 4 上声称 48% 的新 SOTA，值得后续单独追踪复现性。
Scaling law 重写：Prescriptive Scaling Laws for Data Constrained Training 用一个单参数刻画 data-constrained 场景下的过拟合，挑战了 Chinchilla 在重复数据上的可移植性。
三源全在线但元数据稀薄：arXiv + HF Daily + Semantic Scholar 都返回了候选，但 27 篇候选的 affiliations 与 arXiv categories 全部为空，S2 也未返回任一篇的 similar_papers，因此「Watchlist 分类命中」改用关键词分组，「延伸阅读」段空置。
新作者扫描未达标：候选 JSON 中无 affiliation 信号，无法触发 discovery_rules.md 的机构 / 团队首现规则，今日不上 new_authors。

📌 Top picks (交叉命中)

EMO: Pretraining Mixture of Experts for Emergent Modularity（HF#1 / 5 upvotes / hit: moe + inference） → MoE 预训练实现专家语义级模块化与组合。
- tldr_en: EMO is introduced, an MoE designed for modularity—the independent use and composition of expert subsets—without requiring human-defined priors that enables selective expert use and finds that expert subsets in EMO specialize at semantic levels, in contrast to the low-level syntactic specialization observed in standard MoEs.
- 入选理由：HF 当日趋势第一 + 双 watchlist 关键字（mixture of experts / inference）双重命中。
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction（HF#3 / 16 upvotes / hit: agent） → 策略化轨迹抽象提升 agentic RL 长程决策。
- tldr_en: StraTA is a simple framework that introduces an explicit trajectory-level strategy into agentic reinforcement learning (RL) and trains strategy generation and action execution jointly with a hierarchical GRPO-style rollout design, further enhanced by diverse strategy rollout and critical self-judgment.
- 入选理由：HF 趋势 #3，agent watchlist 命中，且补 agentic RL 在 long-horizon 上 credit assignment 这一明显空白。
GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs（HF#4 / 2 upvotes / hit: inference） → 几何堆叠模块化组合多领域 VLM 专家。
- tldr_en: GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model, is introduced, a modular framework that allows independently trained domain experts to be composed into a unified model.
- 入选理由：HF 趋势 #4，与 EMO 一同强化「模块化 / 可组合专家」当日主线，但走的是后训练几何约束路线。
Generative Quantum-inspired Kolmogorov-Arnold Eigensolver（HF#6 / 2 upvotes / nice-to-have: benchmark+evaluation） → 量子启发 KAN 降低 HPC 量子电路生成开销。
- tldr_en: Results indicate that quantum-inspired Kolmogorov-Arnold networks can reduce classical-side overhead while preserving circuit-generation quality, offering a scalable route for HPC-quantum co-design on near-term quantum platforms.
- 入选理由：HF 趋势 #6，跨领域（HPC × 量子化学）KAN 应用，工程读者可作量子-经典 co-design 案例参考。
Prescriptive Scaling Laws for Data Constrained Training（HF#5 / 4 upvotes / nice-to-have: scaling law） → 数据受限下 weight decay 单参数 scaling 律。
- tldr_en: A scaling-law explanation for recent findings that optimal weight decay in data-constrained regimes is an order of magnitude larger than standard practice is provided, and the one-parameter form isolates overfitting in a single coefficient enables direct comparison across training configurations.
- 入选理由：HF 趋势 #5 + 直接挑战 Chinchilla 在重复数据条件下的假设，对 pretraining 决策具操作性。
PianoCoRe: Combined and Refined Piano MIDI Dataset（HF#2 / 4 upvotes / venue: TISMIR） → 整合多源钢琴 MIDI 数据集与表演渲染。
- tldr_en: PianoCoRe is presented, a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora, and an expressive performance rendering model trained on PianoCoRe demonstrates improved robustness to unseen pieces compared to models trained on raw or smaller datasets.
- 入选理由：HF 趋势 #2 即数据集类工作可见社区关注度高，且为 TISMIR 期刊作品（venue 命中）；交叉验证可参考其 expressive rendering benchmark。
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance（HF#12 / 1 upvote / nice-to-have: benchmark+evaluation） → 解耦前后景生成视频背景替换 14 万对数据集。
- tldr_en: This paper designs a scalable pipeline that generates foreground and background guidance in a decoupled manner with strict quality filtering, and introduces Sparkle, a dataset of ~140K video pairs spanning five common background-change themes, alongside Sparkle-Bench, the largest evaluation benchmark tailored for background replacement to date.
- 入选理由：补 video editing 领域当前公开数据集偏 local-edit 的空白，自带 Sparkle-Bench 评测集，方便工程复用。
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI（HF#34 / 9 upvotes / hit: agent + benchmark） → 数学家 AI 工作台 FrontierMath Tier4 48%。
- tldr_en: The AI co-mathematician is a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research and achieves state of the art results on hard problem-solving benchmarks, including scoring 48% on FrontierMath Tier 4, a new high score among all AI systems evaluated.
- 入选理由：DeepMind 风格作者列表（含 Pushmeet Kohli、Fernanda Viégas、Martin Wattenberg）+ FrontierMath Tier 4 上声称 48% 的新 SOTA，agent + benchmark 双 watchlist 命中。

🏷 Watchlist 分类命中

备注：候选 JSON 的 categories 字段全空（详见 coverage_gaps），本段按 watchlist 关键字主题分组替代 arXiv 类别小节。仅列出未进 Top picks 的命中项。

agent

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning（HF#33 / 60 upvotes） — agent 语料库 + RL 联合演化的 skill 系统，社区点赞数最高，可与 Top pick #2 StraTA 对比阅读。
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes（HF#37 / 12 upvotes） — 闭环驱动的 auto-research agent，自动产出训练 recipe。
MiA-Signature: Approximating Global Activation for Long-Context Understanding（HF#27 / 46 upvotes） — long-context 推理 / agent 能力衍生工作，关注 attention 全局近似。

reasoning

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key（HF#35 / 10 upvotes / +benchmark） — 探究表达能力是否是 RL 教 LLM 长程推理的瓶颈。
A Foundation Model for Zero-Shot Logical Rule Induction（HF#40 / 3 upvotes / +benchmark） — ILP 与基础模型结合的 zero-shot 逻辑规则归纳。

inference

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction（HF#46 / 56 upvotes） — MiniCPM 系列实时全双工多模态推理，工程上有较高复用价值（社区点赞 56）。

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人。候选 JSON 中所有论文的 affiliations 字段为空，无法以机构维度交叉验证；作者维度亦无 watchlist 内 tracked_authors 首现命中（例如 AI Co-Mathematician 的 DeepMind 团队成员属于已知机构常客，不构成「新出现」）。

📉 覆盖缺口与不确定性

s2_similar_unavailable — Semantic Scholar 未返回任何候选论文的 similar_papers，因此「延伸阅读」段为空。
arxiv_categories_unavailable — 27 篇候选的 arXiv categories 全部为空，「Watchlist 分类命中」改用 watchlist 关键字主题分组替代 cs.CL / cs.LG 等 arXiv 子类小节。
affiliations_unavailable — 候选 JSON 中 affiliations 全部为空，使「新作者 / 团队」机构维度的发现规则无法触发，AI Co-Mathematician 等作者的 DeepMind 等隶属信息只能从作者名单上下文推断而非元数据确认。
single_source_dominant_hf — 27 篇候选中绝大多数来自 HF Daily trending（hf_trending_rank 字段普遍存在），arXiv 直接拉取与 S2 主导信号不足，今日排序对 HF 信号有较强依赖。

来源与交叉验证说明

本期候选拉取覆盖三源：

arXiv（primary） — 提供论文 ID、abstract 与 PDF 链接，是结论锚点；本期 categories 元数据缺失。
HF Daily（curated） — 提供 trending rank 与 upvote 信号，是今日 ranking_score 的主导来源；上述「single_source_dominant_hf」缺口即源于此。
Semantic Scholar（metadata） — 提供 s2_tldr（27/27 命中），但 similar_papers 全部缺失，引用图维度无法支撑「延伸阅读」段。

无单源完全离线，但元数据稀薄（categories / affiliations / similar_papers）使本期更接近「单源 + tldr 增强」的降级形态。Top picks 的 ranking_score 排序保留 paper_fetch.py 原序未做二次重排。

Hanzhi's BLOG

[论文·2026-05-10]