论文雷达日报｜2026-04-25

一句话结论：今日候选以 HF Daily + S2 元数据驱动，Top picks 高度集中在 LLM agent / 具身推理与 MoE 系统层面，arXiv primary 类别覆盖偏稀薄、S2 相似论文未返回。

摘要

8 篇 Top picks 中 7 篇以 LLM agent / reasoning 为锚（embodied、long-horizon、GUI、社交、个人化记忆），1 篇切到 MoE 系统侧（时序扩展 MoE）。
HF trending 与 S2 metadata 均到位，arXiv primary 抓取返回的论文 category 信息缺失较多——多数候选 categories=[]，影响 Watchlist 分类命中段的纵深。
三源中 S2 similar_papers 全部为空，延伸阅读段降级为说明性段落而非具体增量条目。
seen-pool 14 天窗口内已记录 138 篇旧候选，本次候选无 seen_before=True 重复——召回新鲜度好。
候选 JSON 中 affiliations 字段全部为空（HF JSON 默认不附 + S2 这次未返回机构），新作者 / 跟踪机构发现降级。

📌 Top picks (交叉命中)

3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding（HF trending #2 · score 9.8）→ 推理时视觉对比解码缓解 3D 具身 agent 幻觉。
- 入选理由：hf_trending_rank:2 + watchlist_keyword:reasoning,agent,inference + nice_to_have:benchmark,embodied，在「具身 + reasoning + inference-time 干预」三轴交叉命中。
Temporally Extended Mixture-of-Experts Models（HF trending #1 · score 6.9）→ 跨 token 持续选专家组的时序扩展 MoE。
- 入选理由：hf_trending_rank:1 + watchlist_keyword:moe,inference，把 RL options framework 用到 MoE 切换决策上，直击「内存外溢后专家 churn 把 offloading 打废」的系统瓶颈。
PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents（HF trending #7 · IEEE Access · score 6.8）→ 用知识图谱长期记忆做个性化 LLM agent 的系统对比。
- 入选理由：hf_trending_rank:7 + watchlist_keyword:reasoning,agent + nice_to_have:benchmark，混合超边图设计 + 系统级 KG 存储/检索方案对比，工程参考价值高。
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks（HF trending #11 · 16 upvotes · score 6.4）→ 决策 agent 与可学技能库共进化解长程任务。
- 入选理由：hf_trending_rank:11 + watchlist_keyword:reasoning,agent + nice_to_have:benchmark，COSPLAY 框架把 unlabeled rollouts 抽成可复用 skill bank，针对 long-horizon partial observability。
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics（HF trending #5 · 79 upvotes · score 6.0）→ 可视化+表格让 VLM 做分级时序推理。
- 入选理由：hf_trending_rank:5 + watchlist_keyword:reasoning + nice_to_have:benchmark,fine-tuning,evaluation，HF 79 赞口碑明显，且把时序推理形式化成 4 级认知任务族，附 HiTSR 数据集。
VLAA-GUI: Knowing When to Stop, Recover, and Search（HF trending #16 · 12 upvotes · score 5.9）→ 模块化 GUI agent：会停手、恢复与检索。
- 入选理由：hf_trending_rank:16 + watchlist_keyword:agent,vla + nice_to_have:benchmark，正面解决 GUI agent 「过早 declare success」与「死循环」两大失效模式，跨 Linux/Windows 双 benchmark 拿 SOTA。
Trust but Verify: Introducing DAVinCI – A Framework for Dual Attribution and Verification in Claim Inference for Language Models（HF trending #12 · score 5.8）→ 双归因+验证框架审核 LLM 事实性。
- 入选理由：hf_trending_rank:12 + watchlist_keyword:reasoning,inference，作者来自 Adobe Research，提供模块化 DAVinCI 实现可挂在已有 LLM pipeline 上做归因 + 验证。
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution（score 5.0）→ Shapley 值给社交对话 RL 做奖励归因。
- 入选理由：watchlist_keyword:reasoning,agent + nice_to_have:benchmark,evaluation，把 episode-level 社交对话奖励用合作博弈 Shapley 值分配到单 utterance，7B 模型据称对标 GPT-4o / Claude 3.5。

🏷 Watchlist 分类命中

候选 JSON 中很多 HF 论文 categories=[]（因为 HF 元数据未附 arXiv category，且本次 arXiv API 也未交叉到），下面列出有 primary category 的最活跃几条：

cs.CV

Vista4D: Video Reshooting with 4D Point Clouds（HF trending #4 · score 4.6）—— 4D 点云做视频重拍，HF 渠道关注度高，但与 watchlist 仅在 inference 弱命中，未挤进 Top picks。
Grounding Video Reasoning in Physical Signals（score 3.0）—— 用物理信号校准视频推理。
From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change Imagery（score 3.0）—— VLM 在气候图像 discourse 分析上的对比评估。

cs.RO

Task-Driven Co-Design of Heterogeneous Multi-Robot Systems（score 4.0）—— 任务驱动的异构多机器人协同设计。
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis（score 1.0）—— 时空感知视图合成提升机器人 manipulation 视角鲁棒性。

cs.CL

A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from News（score 2.5）—— 文本+图多模态做新闻事件抽取。
TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale（score 2.1）—— 工业级实时客户事故风险事件发现。
Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Choice and Moral Defensibility（score 2.0）—— LLM 在关系性道德困境上的 machine behavior 研究。

cs.AI

Inferring High-Level Events from Timestamped Data: Complexity and Medical Applications（score 2.5）—— 从时间戳数据推断高层事件，带医疗应用案例。
From Research Question to Scientific Workflow: Leveraging Agentic AI for Science（score 2.0）—— Agentic AI 自动从研究问题映射到科研工作流。
Bounding the Black Box: A Statistical Certification Framework for AI Risk Regulation（score 2.0）—— 统计认证框架支撑 AI 风险监管。

cs.LG

Fine-Tuning Regimes Define Distinct Continual Learning Problems（score 1.5）—— 不同微调机制定义出本质不同的持续学习问题族。
Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Inconsistency（score 1.0）—— 揭示流式持续学习里 temporal taskification 的评估不一致性。
Low-Rank Adaptation Redux for Large Models（score 0.5）—— 重新审视 LoRA 在大模型上的设定。

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）：候选 JSON 中所有条目的 similar_papers 字段均为空，按 SKILL.md 「不为此单独 fetch」的硬约束，本段降级为说明，不补外部条目；详见「覆盖缺口」段。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人：候选 JSON 的 affiliations 字段全部为空（HF JSON 不附机构 + 本次 S2 未返回 affiliation 元数据），无法按 discovery_rules.md 验证「机构 / 项目所属」证据；按 SKILL.md 不凑数原则，今日暂记为空，明日重抓后视情况补回。

📉 覆盖缺口与不确定性

s2_similar_unavailable：候选 JSON 内 0 条带 similar_papers，延伸阅读段没有外部数据可锚——SKILL.md 明令「不要为此单独跑 Bash 查 S2」，因此本段保持空数组。
s2_affiliations_missing：87 篇候选 affiliations 全部为空数组，间接导致：(a) tracked_labs_seen 无法判定，(b) new_authors 段降级。
arxiv_category_partial：相当一部分 HF 来源候选 categories=[]（HF API 不附 arXiv category 且交叉到 arXiv 时未匹到）；Watchlist 分类命中段只能列出在交集里有 category 的子集。
paper_fetch.err 文件存在但内容仅记录前一次失败（环境路径问题，已用 system python 重跑成功），三源本身均成功返回数据。

来源与交叉验证说明

本期三源同时启用：arXiv API（primary，新发布预印本）、HuggingFace Daily Papers（curated，社区选篇 + trending 信号）、Semantic Scholar（metadata，TLDR + 引用数）。Top picks 全部由 HF + S2 双源命中（source: hf+s2），8 条均能从 S2 拿到 s2_tldr 作为英文一句速读；arXiv primary 在本批次主要贡献低 score 长尾的 cs.AI / cs.LG / cs.CV / cs.CL 论文，给 Watchlist 分类段做底盘。结论严格锚在 arXiv URL，HF trending rank 与 hf_upvotes 仅作为「社区关注度」附加权重，不作为论文结果证据。

Hanzhi's BLOG

[论文·2026-04-25]