论文雷达日报｜2026-04-21

一句话结论：今日 agent 行为与对齐方法论文集中爆发——LLM agent 环境好奇心缺失被系统性揭示，DPO/RLHF 变体在视觉、谈判、代码等多场景落地，端到端语音 agent 框架 VoxMind 兼顾工具调用与延迟优化。

摘要

Agent 可靠性：实验发现 LLM agent 即便发现完整解法也会忽略，环境好奇心严重不足（2604.17609）
语音 Agent：VoxMind 提出多 agent 动态工具管理架构，为端到端语音对话模型增加 agentic 能力（2604.15710）
偏好优化：S2H-DPO 引入渐进式难度感知 DPO 改善多图推理；StepPO 对齐 agent 多步决策；PRISMA 将情绪感知融入谈判对话（2604.18512 / 2604.18401 / 2604.18354）
自动驾驶 VLA：OneVL 用视觉-语言解释做一步潜在推理，解决 CoT 延迟问题，HF 65 赞（2604.18486）
推理系统：HybridGen 用 CPU-GPU 混合计算优化长上下文 KV cache 推理；River-LLM 基于 KV 共享实现无缝早退（2604.18529 / 2604.18396）

📌 Top picks (交叉命中)

1. Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity（HF trending #9 + agent/reasoning/test-time compute 命中） → LLM agent 发现任务解法却不利用，环境好奇心严重缺失

tldr_cn：LLM agent 无法利用自己发现的信息
reason：HF trending 前 10 + 三关键词命中（agent/reasoning/test-time compute），揭示 agent 基础能力缺陷
作者：Leon Engländer, Sophia Althammer, Ahmet Üstün, Matthias Gallé, Tom Sherborne

2. VoxMind: An End-to-End Agentic Spoken Dialogue System（HF trending #5 + agent/reasoning/inference 命中） → 端到端语音对话框架赋予 agent 工具调用能力

tldr_cn：语音对话模型获得多agent工具管理能力
reason：HF trending 前 5 + 三关键词命中，填补端到端语音 agent 空白
作者：Tianle Liang, Yifu Chen, Shengpeng Ji 等

3. S2H-DPO: Hardness-Aware Preference Optimization for Vision-Language Models（DPO/preference optimization + reasoning 命中） → 渐进难度感知 DPO 提升 VLM 多图推理

tldr_cn：难度感知DPO显著提升VLM跨图推理
reason：DPO + preference optimization 双关键词命中，解决多图推理关键瓶颈
作者：Nitish Shukla, Surgan Jandial, Arun Ross

4. OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation（reasoning/agent/inference + embodied 命中，HF 65 赞） → 一步潜在推理替代 CoT 解决 VLA 延迟问题

tldr_cn：视觉潜在推理替代CoT实现实时自驾
reason：HF 65 赞热门论文 + 命中 reasoning/inference/embodied，提出非自回归 VLA 新范式
作者：Jinghui Lu, Jiayi Guan, Zhijian Huang 等（50 人团队）

5. Training and Agentic Inference Strategies for LLM-based Manim Animation Generation（reasoning/agent/inference + SFT 命中） → 系统研究 LLM 训练与推理策略在动画生成中的交互

tldr_cn：SFT+RLHF+agentic推理协同做动画生成
reason：agent/inference/SFT 多命中，系统性研究 agentic inference 与 SFT/RLHF 交互效果
作者：Ravidu Suien Rammuni Silva, Ahmad Lotfi, Isibor Kennedy Ihianle 等

6. MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation（reasoning/agent/inference 命中） → 多 agent 分角色处理噪声证据的 RAG 方法

tldr_cn：多agent分工合成提升RAG抗噪能力
reason：agent/reasoning/inference 三命中，用角色分工解决 RAG 证据噪声问题
作者：Xingchen Xiao, Heyan Huang, Runheng Liu, Jincheng Xie

7. StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning（reasoning/agent/RLHF 命中） → 步级对齐策略优化提升 agent 多步决策

tldr_cn：步级策略优化增强agent多步推理能力
reason：agent/RLHF 核心命中，针对 agentic RL 提出步级对齐新范式
作者：Daoyu Wang, Qingchuan Li, Mingyue Cheng 等

8. PRISMA: Preference-Reinforced Self-Training Approach for Interpretable Emotionally Intelligent Negotiation Dialogues（reasoning/agent/DPO 命中） → 偏好强化自训练实现可解释情绪感知谈判

tldr_cn：偏好强化训练实现可解释情绪谈判
reason：agent/DPO 命中，将偏好优化用于情绪感知谈判对话
作者：Prajwal Vijay Kajare, Priyanshu Priya, Bikash Santra, Asif Ekbal

🏷 Watchlist 分类命中

cs.DC

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing（inference/kv cache/scheduler） → CPU-GPU 混合计算优化长上下文 KV cache，解决百 GB 级内存瓶颈

cs.CL

River-LLM: Large Language Model Seamless Exit Based on KV Share（reasoning/inference/kv cache） → 基于 KV 共享的无缝早退机制，降低推理延迟
Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints（DPO/preference optimization + SFT） → 规则约束 DPO 生成音乐合理旋律

cs.CV

XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments（reasoning/VLA + embodied） → 增强几何物理线索的大规模具身环境基础模型
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions（reasoning/agent） → 视觉 agent 在视频流中精准把握证据出现时机
MultiWorld: Scalable Multi-Agent Multi-View Video World Models（agent/world model） → 可扩展多 agent 多视角视频世界模型

cs.AI

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering（reasoning/inference） → 通过残差流监控和 KV-cache 转向实现推理时纠错

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。候选论文均未携带 similar_papers 字段。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人。Top picks 作者均为首次出现且无法交叉验证机构归属（候选缺少 affiliations 字段），不满足发现规则中的重复出现或已追踪机构新面孔条件。

📉 覆盖缺口与不确定性

s2_similar_unavailable：Semantic Scholar 相似论文数据未返回，延伸阅读段落为空
大部分候选缺少 affiliations 字段，机构命中与新作者发现受限
Top picks 中多篇论文尚未被 S2 索引（s2_paper_id 为空），citation_count / citation_velocity 不可用
分类信息（categories）部分候选为空（来自 HF Daily 的条目未携带 arXiv 分类）

来源与交叉验证说明

本期依赖 arXiv + HuggingFace Daily Papers + Semantic Scholar 三源。arXiv 提供预印本元数据（primary），HF Daily 提供社区热度排序（curated），S2 提供引用元数据（metadata）。三源均正常返回，无单源降级。排序以 ranking_score 为准（HF trending 权重 3.0 + 关键词命中 2.0 + 追踪作者 2.5 + 追踪机构 1.5 + 顶会 1.5 + 引用速率 1.0）。新预印本 S2 索引延迟属正常现象，citation_count == 0 不作降权依据。

Hanzhi's BLOG

[论文·2026-04-21]