论文雷达日报｜2026-04-18

一句话结论：今日论文以 RL 范式创新（VGF 最优传输、SD-Zero 自蒸馏、PreRL 预训练空间 RL）为主线，连续扩散语言建模（LangFlow）首次追平离散方法，视觉生成奖励模型（RationalRewards，HF 99 赞）和 GUI agent 安全评测（AgentHazard）引发广泛关注。

摘要

28 篇候选论文经三源交叉筛选，Top picks 8 篇以 RL 新范式和语言建模方法论突破为核心，Watchlist 分类命中覆盖 reasoning（2 篇）、agent（4 篇）、inference/VLM（2 篇）。今日 RL 方向尤为密集：VGF 把行为正则化 RL 转为最优传输问题、SD-Zero 用自修订将二元奖励变密集监督、RationalRewards 让奖励模型输出结构化理由而非单一分数。LangFlow 在连续扩散语言建模上首次与离散方法持平，是该方向的里程碑结果。Agent 安全方面，AgentHazard 揭示商用 GUI agent 42% 被第三方内容误导。

📌 Top picks (交叉命中)

1. Reinforcement Learning via Value Gradient Flow (VGF)

作者：Haoran Xu, Kaiwen Hu, Somayeh Sojoudi, Amy Zhang
tldr_cn：将行为正则化 RL 建模为最优传输问题，用值梯度流引导粒子
tldr_en：This paper proposes Value Gradient Flow (VGF), a scalable new paradigm for behavior-regularized RL that eliminates explicit policy parameterization while remaining expressive and flexible, this enables adaptive test-time scaling by adjusting the transport budget.
入选理由：hf_trending_rank:1 + nice_to_have:benchmark，offline RL 和 LLM RL 双 SOTA
链接：arXiv | HF

2. Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision

作者：Yinghui He, Simran Kaur, Adithya Bhaskar 等 (Danqi Chen, Sanjeev Arora)
tldr_cn：单模型自修订将二元奖励转化为密集 token 级监督
tldr_en：Self-Distillation Zero (SD-Zero) is proposed, a method that is substantially more training sample-efficient than RL and does not require an external teacher or high-quality demonstrations and outperforms strong baselines, including Rejection Fine-Tuning (RFT), GRPO, and Self-Distillation Fine-Tuning (SDFT).
入选理由：watchlist_keyword:reasoning + nice_to_have:benchmark,fine-tuning，GRPO 替代方案
链接：arXiv | HF

3. LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

作者：Yuxin Chen, Chumeng Liang, Hangke Sui 等
tldr_cn：连续扩散语言模型首次追平离散扩散方法
tldr_en：LangFlow provides the first clear evidence that continuous diffusion is a promising paradigm for language modeling, by connecting embedding-space DLMs to Flow Matching via Bregman divergence and proposing an information-uniform principle for setting the noise schedule.
入选理由：watchlist_keyword:scheduler + nice_to_have:benchmark,evaluation，方法论里程碑
链接：arXiv | HF

4. Geometric Context Transformer for Streaming 3D Reconstruction (LingBot-Map)

作者：Lin-Zhuo Chen, Jian Gao, Yihang Chen 等
tldr_cn：前馈式 3D 基础模型实现 20 FPS 流式重建
tldr_en：LingBot-Map is introduced, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture, which achieves superior performance compared to both existing streaming and iterative optimization-based approaches.
入选理由：watchlist_keyword:inference + nice_to_have:benchmark,evaluation，实时 3D 重建
链接：arXiv | HF

5. RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

作者：Haozhe Wang, Cong Wei, Weiming Ren 等 (Wenhu Chen)
tldr_cn：奖励模型输出结构化理由再评分，测试时即可提升生成质量
tldr_en：Preference-Anchored Rationalization (PARROT), a principled framework that recovers high-quality rationales from readily available preference data through anchored generation, consistency filtering, and distillation, is introduced, and the resulting model, RationalRewards, achieves state-of-the-art preference prediction among open-source reward models.
入选理由：watchlist_keyword:reasoning + nice_to_have:benchmark,fine-tuning，HF 99 赞高关注
链接：arXiv | HF

6. Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes

作者：Victoria Yue Chen, Emery Pierson, Leopold Maillard, Maks Ovsjanikov
tldr_cn：发现文本引导 3D 生成的 latent sink trap 并用无条件先验绕过
tldr_en：This work identifies a critical failure mode where generation trajectories are drawn into latent ``sink traps’': regions where the model becomes insensitive to prompt modifications, and leads to a more robust framework for text-based 3D shape editing that bypasses latent sinks by decoupling a model’s geometric representation power from its linguistic sensitivity.
入选理由：hf_trending_rank:2，3D 生成可控性新发现
链接：arXiv | HF

7. Three-Phase Transformer (3PT)

作者：Mohammad R. Abu Ayyash
tldr_cn：三相交流电类比的残差流结构先验，几乎零参数开销加速收敛
tldr_en：Self-stabilization of the geometry without explicit enforcement, a novel instance of the conservation-law framework for neural networks; a U-shaped depth profile of rotation-angle drift at 12 layers; and orthogonal composition with RoPE, attention, and FFN.
入选理由：hf_trending_rank:3，Transformer 架构创新方向
链接：arXiv | HF

8. Mobile GUI Agents under Real-world Threats (AgentHazard)

作者：Guohong Liu, Jialei Ye, Jiacheng Liu 等
tldr_cn：商用 GUI agent 平均 42% 被第三方恶意内容误导
tldr_en：It is argued that an important pre-deployment validation is missing to examine whether the agents can maintain their performance under real-world threats, and introduces a scalable app content instrumentation framework to enable flexible and targeted content modifications within existing applications.
入选理由：watchlist_keyword:agent + nice_to_have:benchmark + citation_velocity:0.122，agent 安全核心问题
链接：arXiv | HF

🏷 Watchlist 分类命中

reasoning

What do Language Models Learn and When? The Implicit Curriculum Hypothesis

作者：Emmy Liu, Kaiser Sun 等 (Graham Neubig)
tldr_cn：预训练遵循可预测的组合式隐式课程，技能按组合顺序涌现
入选理由：watchlist_keyword:reasoning + nice_to_have:scaling law，预训练动力学新视角
链接：arXiv

From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space (PreRL)

作者：Yuqiao Tan, Minzheng Wang, Bo Liu 等
tldr_cn：在预训练分布 P(y) 上做 RL，负样本强化大幅激发反思行为
入选理由：watchlist_keyword:reasoning，HF 24 赞，RLVR 根本范式探索
链接：arXiv

agent

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

作者：Kangsan Kim, Minki Kang 等 (Sung Ju Hwang)
tldr_cn：跨域记忆迁移平均提升 3.7%，高层抽象记忆最有效
入选理由：watchlist_keyword:agent + nice_to_have:benchmark，HF 27 赞
链接：arXiv

Narrative-Driven Paper-to-Slide Generation via ArcDeck

作者：Tarik Can Ozden, Sachidanand VS 等 (James Matthew Rehg)
tldr_cn：多 agent 框架将论文逻辑流重建为演示文稿
入选理由：watchlist_keyword:agent + nice_to_have:benchmark
链接：arXiv

Self-Sovereign Agent

作者：Wenjie Qu, Xuandong Zhao, Jiaheng Zhang, Dawn Song
tldr_cn：探讨可自主维持运营的 AI agent 的技术壁垒与治理挑战
入选理由：watchlist_keyword:agent，agent 自主性前沿探讨
链接：arXiv

SkVM: Compiling Skills for Efficient Execution Everywhere

作者：Le Chen, Erhu Feng, Yubin Xia, Haibo Chen
tldr_cn：技能编译运行时，跨 LLM 和 harness 的可移植高效执行
入选理由：watchlist_keyword:agent，agent 基础设施方向
链接：arXiv

inference_vlm

Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

作者：Haoyi Sun, Xiaoxiao Wang 等
tldr_cn：跨模态概率蒸馏框架，0.5B 模型从 3B 教师获 3.6 分提升
入选理由：hf_trending_rank:9 + nice_to_have:benchmark，VLM 蒸馏实用方法
链接：arXiv

ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

作者：Boyuan Wang, Xiaofeng Wang 等
tldr_cn：单目视频前馈式重建几何、外观和物理属性，推理 <1 秒
入选理由：watchlist_keyword:inference，机器人 / 图形学资产快速生成
链接：arXiv

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人。候选论文均无 affiliations 字段，无法匹配机构种子库。

📉 覆盖缺口与不确定性

s2_similar_unavailable：Semantic Scholar 未返回相似论文，延伸阅读为空
affiliations_sparse：候选 JSON 中 affiliations 字段均为空数组，无法进行机构匹配和新作者发现
citation_count_zero_normal：多数候选为近日预印本，citation_count=0 属正常状态，不影响排名

来源与交叉验证说明

arXiv（primary）：预印本原文，结论锚定在此
HuggingFace Daily Papers（curated）：社区趋势信号，hf_upvotes / hf_trending_rank 作为辅助排序依据
Semantic Scholar（metadata）：提供 citation 元数据和 tldr，新预印本 citation_count 普遍为 0，不作为降权依据

冲突优先级：primary > metadata > curated > other。

Hanzhi's BLOG

[论文·2026-04-18]