论文雷达日报|2026-05-10
一句话结论:今日 HF 趋势榜由「专家组合 / 模块化」主线领跑——MoE 预训练(EMO)、VLM 专家几何堆叠(GeoStack)、agentic RL 轨迹抽象(StraTA)三篇同日并发,且 Top 8 中过半带 watchlist agent / MoE / inference 关键字命中,提示当前社区注意力正显著向「让大模型可被分解、再组合」的方向倾斜。
摘要
- MoE/模块化是今日最强信号:EMO(hf#1, score 8.9)和 GeoStack(hf#4)从两条不同路径——预训练原生模块化 vs. 后训练几何堆叠——给「专家可独立训练并组合」的命题贡献了同日的两个证据点。
- Agentic RL 持续高密度产出:StraTA、Skill1、Auto Research、AI Co-Mathematician 一日并发四篇 agent / RL 相关工作,其中 AI Co-Mathematician 在 FrontierMath Tier 4 上声称 48% 的新 SOTA,值得后续单独追踪复现性。
- Scaling law 重写:Prescriptive Scaling Laws for Data Constrained Training 用一个单参数刻画 data-constrained 场景下的过拟合,挑战了 Chinchilla 在重复数据上的可移植性。
- 三源全在线但元数据稀薄:arXiv + HF Daily + Semantic Scholar 都返回了候选,但 27 篇候选的
affiliations与 arXivcategories全部为空,S2 也未返回任一篇的similar_papers,因此「Watchlist 分类命中」改用关键词分组,「延伸阅读」段空置。 - 新作者扫描未达标:候选 JSON 中无 affiliation 信号,无法触发
discovery_rules.md的机构 / 团队首现规则,今日不上new_authors。
📌 Top picks (交叉命中)
- EMO: Pretraining Mixture of Experts for Emergent Modularity(HF#1 / 5 upvotes / hit: moe + inference) → MoE 预训练实现专家语义级模块化与组合。
- tldr_en: EMO is introduced, an MoE designed for modularity—the independent use and composition of expert subsets—without requiring human-defined priors that enables selective expert use and finds that expert subsets in EMO specialize at semantic levels, in contrast to the low-level syntactic specialization observed in standard MoEs.
- 入选理由:HF 当日趋势第一 + 双 watchlist 关键字(mixture of experts / inference)双重命中。
- StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction(HF#3 / 16 upvotes / hit: agent) → 策略化轨迹抽象提升 agentic RL 长程决策。
- tldr_en: StraTA is a simple framework that introduces an explicit trajectory-level strategy into agentic reinforcement learning (RL) and trains strategy generation and action execution jointly with a hierarchical GRPO-style rollout design, further enhanced by diverse strategy rollout and critical self-judgment.
- 入选理由:HF 趋势 #3,agent watchlist 命中,且补 agentic RL 在 long-horizon 上 credit assignment 这一明显空白。
- GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs(HF#4 / 2 upvotes / hit: inference) → 几何堆叠模块化组合多领域 VLM 专家。
- tldr_en: GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model, is introduced, a modular framework that allows independently trained domain experts to be composed into a unified model.
- 入选理由:HF 趋势 #4,与 EMO 一同强化「模块化 / 可组合专家」当日主线,但走的是后训练几何约束路线。
- Generative Quantum-inspired Kolmogorov-Arnold Eigensolver(HF#6 / 2 upvotes / nice-to-have: benchmark+evaluation) → 量子启发 KAN 降低 HPC 量子电路生成开销。
- tldr_en: Results indicate that quantum-inspired Kolmogorov-Arnold networks can reduce classical-side overhead while preserving circuit-generation quality, offering a scalable route for HPC-quantum co-design on near-term quantum platforms.
- 入选理由:HF 趋势 #6,跨领域(HPC × 量子化学)KAN 应用,工程读者可作量子-经典 co-design 案例参考。
- Prescriptive Scaling Laws for Data Constrained Training(HF#5 / 4 upvotes / nice-to-have: scaling law) → 数据受限下 weight decay 单参数 scaling 律。
- tldr_en: A scaling-law explanation for recent findings that optimal weight decay in data-constrained regimes is an order of magnitude larger than standard practice is provided, and the one-parameter form isolates overfitting in a single coefficient enables direct comparison across training configurations.
- 入选理由:HF 趋势 #5 + 直接挑战 Chinchilla 在重复数据条件下的假设,对 pretraining 决策具操作性。
- PianoCoRe: Combined and Refined Piano MIDI Dataset(HF#2 / 4 upvotes / venue: TISMIR) → 整合多源钢琴 MIDI 数据集与表演渲染。
- tldr_en: PianoCoRe is presented, a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora, and an expressive performance rendering model trained on PianoCoRe demonstrates improved robustness to unseen pieces compared to models trained on raw or smaller datasets.
- 入选理由:HF 趋势 #2 即数据集类工作可见社区关注度高,且为 TISMIR 期刊作品(venue 命中);交叉验证可参考其 expressive rendering benchmark。
- Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance(HF#12 / 1 upvote / nice-to-have: benchmark+evaluation) → 解耦前后景生成视频背景替换 14 万对数据集。
- tldr_en: This paper designs a scalable pipeline that generates foreground and background guidance in a decoupled manner with strict quality filtering, and introduces Sparkle, a dataset of ~140K video pairs spanning five common background-change themes, alongside Sparkle-Bench, the largest evaluation benchmark tailored for background replacement to date.
- 入选理由:补 video editing 领域当前公开数据集偏 local-edit 的空白,自带 Sparkle-Bench 评测集,方便工程复用。
- AI Co-Mathematician: Accelerating Mathematicians with Agentic AI(HF#34 / 9 upvotes / hit: agent + benchmark) → 数学家 AI 工作台 FrontierMath Tier4 48%。
- tldr_en: The AI co-mathematician is a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research and achieves state of the art results on hard problem-solving benchmarks, including scoring 48% on FrontierMath Tier 4, a new high score among all AI systems evaluated.
- 入选理由:DeepMind 风格作者列表(含 Pushmeet Kohli、Fernanda Viégas、Martin Wattenberg)+ FrontierMath Tier 4 上声称 48% 的新 SOTA,agent + benchmark 双 watchlist 命中。
🏷 Watchlist 分类命中
备注:候选 JSON 的
categories字段全空(详见 coverage_gaps),本段按 watchlist 关键字主题分组替代 arXiv 类别小节。仅列出未进 Top picks 的命中项。
agent
- Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning(HF#33 / 60 upvotes) — agent 语料库 + RL 联合演化的 skill 系统,社区点赞数最高,可与 Top pick #2 StraTA 对比阅读。
- Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes(HF#37 / 12 upvotes) — 闭环驱动的 auto-research agent,自动产出训练 recipe。
- MiA-Signature: Approximating Global Activation for Long-Context Understanding(HF#27 / 46 upvotes) — long-context 推理 / agent 能力衍生工作,关注 attention 全局近似。
reasoning
- Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key(HF#35 / 10 upvotes / +benchmark) — 探究表达能力是否是 RL 教 LLM 长程推理的瓶颈。
- A Foundation Model for Zero-Shot Logical Rule Induction(HF#40 / 3 upvotes / +benchmark) — ILP 与基础模型结合的 zero-shot 逻辑规则归纳。
inference
- MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction(HF#46 / 56 upvotes) — MiniCPM 系列实时全双工多模态推理,工程上有较高复用价值(社区点赞 56)。
🔗 延伸阅读 (Semantic Scholar 相似论文)
本段今日无高置信度增量信号(S2 相似论文未返回)。
🧑🔬 新出现的作者 / 团队
本日发现扫描未发现达标候选人。候选 JSON 中所有论文的 affiliations 字段为空,无法以机构维度交叉验证;作者维度亦无 watchlist 内 tracked_authors 首现命中(例如 AI Co-Mathematician 的 DeepMind 团队成员属于已知机构常客,不构成「新出现」)。
📉 覆盖缺口与不确定性
s2_similar_unavailable— Semantic Scholar 未返回任何候选论文的similar_papers,因此「延伸阅读」段为空。arxiv_categories_unavailable— 27 篇候选的 arXivcategories全部为空,「Watchlist 分类命中」改用 watchlist 关键字主题分组替代cs.CL / cs.LG等 arXiv 子类小节。affiliations_unavailable— 候选 JSON 中affiliations全部为空,使「新作者 / 团队」机构维度的发现规则无法触发,AI Co-Mathematician 等作者的 DeepMind 等隶属信息只能从作者名单上下文推断而非元数据确认。single_source_dominant_hf— 27 篇候选中绝大多数来自 HF Daily trending(hf_trending_rank 字段普遍存在),arXiv 直接拉取与 S2 主导信号不足,今日排序对 HF 信号有较强依赖。
来源与交叉验证说明
本期候选拉取覆盖三源:
- arXiv(primary) — 提供论文 ID、abstract 与 PDF 链接,是结论锚点;本期 categories 元数据缺失。
- HF Daily(curated) — 提供 trending rank 与 upvote 信号,是今日 ranking_score 的主导来源;上述「single_source_dominant_hf」缺口即源于此。
- Semantic Scholar(metadata) — 提供 s2_tldr(27/27 命中),但 similar_papers 全部缺失,引用图维度无法支撑「延伸阅读」段。
无单源完全离线,但元数据稀薄(categories / affiliations / similar_papers)使本期更接近「单源 + tldr 增强」的降级形态。Top picks 的 ranking_score 排序保留 paper_fetch.py 原序未做二次重排。