论文雷达日报|2026-04-18
一句话结论:今日论文以 RL 范式创新(VGF 最优传输、SD-Zero 自蒸馏、PreRL 预训练空间 RL)为主线,连续扩散语言建模(LangFlow)首次追平离散方法,视觉生成奖励模型(RationalRewards,HF 99 赞)和 GUI agent 安全评测(AgentHazard)引发广泛关注。
摘要
28 篇候选论文经三源交叉筛选,Top picks 8 篇以 RL 新范式和语言建模方法论突破为核心,Watchlist 分类命中覆盖 reasoning(2 篇)、agent(4 篇)、inference/VLM(2 篇)。今日 RL 方向尤为密集:VGF 把行为正则化 RL 转为最优传输问题、SD-Zero 用自修订将二元奖励变密集监督、RationalRewards 让奖励模型输出结构化理由而非单一分数。LangFlow 在连续扩散语言建模上首次与离散方法持平,是该方向的里程碑结果。Agent 安全方面,AgentHazard 揭示商用 GUI agent 42% 被第三方内容误导。
📌 Top picks (交叉命中)
1. Reinforcement Learning via Value Gradient Flow (VGF)
- 作者:Haoran Xu, Kaiwen Hu, Somayeh Sojoudi, Amy Zhang
- tldr_cn:将行为正则化 RL 建模为最优传输问题,用值梯度流引导粒子
- tldr_en:This paper proposes Value Gradient Flow (VGF), a scalable new paradigm for behavior-regularized RL that eliminates explicit policy parameterization while remaining expressive and flexible, this enables adaptive test-time scaling by adjusting the transport budget.
- 入选理由:hf_trending_rank:1 + nice_to_have:benchmark,offline RL 和 LLM RL 双 SOTA
- 链接:arXiv | HF
2. Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision
- 作者:Yinghui He, Simran Kaur, Adithya Bhaskar 等 (Danqi Chen, Sanjeev Arora)
- tldr_cn:单模型自修订将二元奖励转化为密集 token 级监督
- tldr_en:Self-Distillation Zero (SD-Zero) is proposed, a method that is substantially more training sample-efficient than RL and does not require an external teacher or high-quality demonstrations and outperforms strong baselines, including Rejection Fine-Tuning (RFT), GRPO, and Self-Distillation Fine-Tuning (SDFT).
- 入选理由:watchlist_keyword:reasoning + nice_to_have:benchmark,fine-tuning,GRPO 替代方案
- 链接:arXiv | HF
3. LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling
- 作者:Yuxin Chen, Chumeng Liang, Hangke Sui 等
- tldr_cn:连续扩散语言模型首次追平离散扩散方法
- tldr_en:LangFlow provides the first clear evidence that continuous diffusion is a promising paradigm for language modeling, by connecting embedding-space DLMs to Flow Matching via Bregman divergence and proposing an information-uniform principle for setting the noise schedule.
- 入选理由:watchlist_keyword:scheduler + nice_to_have:benchmark,evaluation,方法论里程碑
- 链接:arXiv | HF
4. Geometric Context Transformer for Streaming 3D Reconstruction (LingBot-Map)
- 作者:Lin-Zhuo Chen, Jian Gao, Yihang Chen 等
- tldr_cn:前馈式 3D 基础模型实现 20 FPS 流式重建
- tldr_en:LingBot-Map is introduced, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture, which achieves superior performance compared to both existing streaming and iterative optimization-based approaches.
- 入选理由:watchlist_keyword:inference + nice_to_have:benchmark,evaluation,实时 3D 重建
- 链接:arXiv | HF
5. RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
- 作者:Haozhe Wang, Cong Wei, Weiming Ren 等 (Wenhu Chen)
- tldr_cn:奖励模型输出结构化理由再评分,测试时即可提升生成质量
- tldr_en:Preference-Anchored Rationalization (PARROT), a principled framework that recovers high-quality rationales from readily available preference data through anchored generation, consistency filtering, and distillation, is introduced, and the resulting model, RationalRewards, achieves state-of-the-art preference prediction among open-source reward models.
- 入选理由:watchlist_keyword:reasoning + nice_to_have:benchmark,fine-tuning,HF 99 赞高关注
- 链接:arXiv | HF
6. Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes
- 作者:Victoria Yue Chen, Emery Pierson, Leopold Maillard, Maks Ovsjanikov
- tldr_cn:发现文本引导 3D 生成的 latent sink trap 并用无条件先验绕过
- tldr_en:This work identifies a critical failure mode where generation trajectories are drawn into latent ``sink traps’': regions where the model becomes insensitive to prompt modifications, and leads to a more robust framework for text-based 3D shape editing that bypasses latent sinks by decoupling a model’s geometric representation power from its linguistic sensitivity.
- 入选理由:hf_trending_rank:2,3D 生成可控性新发现
- 链接:arXiv | HF
7. Three-Phase Transformer (3PT)
- 作者:Mohammad R. Abu Ayyash
- tldr_cn:三相交流电类比的残差流结构先验,几乎零参数开销加速收敛
- tldr_en:Self-stabilization of the geometry without explicit enforcement, a novel instance of the conservation-law framework for neural networks; a U-shaped depth profile of rotation-angle drift at 12 layers; and orthogonal composition with RoPE, attention, and FFN.
- 入选理由:hf_trending_rank:3,Transformer 架构创新方向
- 链接:arXiv | HF
8. Mobile GUI Agents under Real-world Threats (AgentHazard)
- 作者:Guohong Liu, Jialei Ye, Jiacheng Liu 等
- tldr_cn:商用 GUI agent 平均 42% 被第三方恶意内容误导
- tldr_en:It is argued that an important pre-deployment validation is missing to examine whether the agents can maintain their performance under real-world threats, and introduces a scalable app content instrumentation framework to enable flexible and targeted content modifications within existing applications.
- 入选理由:watchlist_keyword:agent + nice_to_have:benchmark + citation_velocity:0.122,agent 安全核心问题
- 链接:arXiv | HF
🏷 Watchlist 分类命中
reasoning
What do Language Models Learn and When? The Implicit Curriculum Hypothesis
- 作者:Emmy Liu, Kaiser Sun 等 (Graham Neubig)
- tldr_cn:预训练遵循可预测的组合式隐式课程,技能按组合顺序涌现
- 入选理由:watchlist_keyword:reasoning + nice_to_have:scaling law,预训练动力学新视角
- 链接:arXiv
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space (PreRL)
- 作者:Yuqiao Tan, Minzheng Wang, Bo Liu 等
- tldr_cn:在预训练分布 P(y) 上做 RL,负样本强化大幅激发反思行为
- 入选理由:watchlist_keyword:reasoning,HF 24 赞,RLVR 根本范式探索
- 链接:arXiv
agent
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents
- 作者:Kangsan Kim, Minki Kang 等 (Sung Ju Hwang)
- tldr_cn:跨域记忆迁移平均提升 3.7%,高层抽象记忆最有效
- 入选理由:watchlist_keyword:agent + nice_to_have:benchmark,HF 27 赞
- 链接:arXiv
Narrative-Driven Paper-to-Slide Generation via ArcDeck
- 作者:Tarik Can Ozden, Sachidanand VS 等 (James Matthew Rehg)
- tldr_cn:多 agent 框架将论文逻辑流重建为演示文稿
- 入选理由:watchlist_keyword:agent + nice_to_have:benchmark
- 链接:arXiv
Self-Sovereign Agent
- 作者:Wenjie Qu, Xuandong Zhao, Jiaheng Zhang, Dawn Song
- tldr_cn:探讨可自主维持运营的 AI agent 的技术壁垒与治理挑战
- 入选理由:watchlist_keyword:agent,agent 自主性前沿探讨
- 链接:arXiv
SkVM: Compiling Skills for Efficient Execution Everywhere
- 作者:Le Chen, Erhu Feng, Yubin Xia, Haibo Chen
- tldr_cn:技能编译运行时,跨 LLM 和 harness 的可移植高效执行
- 入选理由:watchlist_keyword:agent,agent 基础设施方向
- 链接:arXiv
inference_vlm
Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models
- 作者:Haoyi Sun, Xiaoxiao Wang 等
- tldr_cn:跨模态概率蒸馏框架,0.5B 模型从 3B 教师获 3.6 分提升
- 入选理由:hf_trending_rank:9 + nice_to_have:benchmark,VLM 蒸馏实用方法
- 链接:arXiv
ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video
- 作者:Boyuan Wang, Xiaofeng Wang 等
- tldr_cn:单目视频前馈式重建几何、外观和物理属性,推理 <1 秒
- 入选理由:watchlist_keyword:inference,机器人 / 图形学资产快速生成
- 链接:arXiv
🔗 延伸阅读 (Semantic Scholar 相似论文)
本段今日无高置信度增量信号(S2 相似论文未返回)。
🧑🔬 新出现的作者 / 团队
本日发现扫描未发现达标候选人。候选论文均无 affiliations 字段,无法匹配机构种子库。
📉 覆盖缺口与不确定性
- s2_similar_unavailable:Semantic Scholar 未返回相似论文,延伸阅读为空
- affiliations_sparse:候选 JSON 中 affiliations 字段均为空数组,无法进行机构匹配和新作者发现
- citation_count_zero_normal:多数候选为近日预印本,citation_count=0 属正常状态,不影响排名
来源与交叉验证说明
- arXiv(primary):预印本原文,结论锚定在此
- HuggingFace Daily Papers(curated):社区趋势信号,hf_upvotes / hf_trending_rank 作为辅助排序依据
- Semantic Scholar(metadata):提供 citation 元数据和 tldr,新预印本 citation_count 普遍为 0,不作为降权依据
冲突优先级:primary > metadata > curated > other。