论文雷达日报｜2026-04-20

一句话结论：今日以 post-training 多样性坍缩诊断（HF #2、score 9.3）和 Qwen3.5-Omni 百亿参数全模态技术报告为双主线，RL 探索-利用细粒度权衡（DiPO）、LLM 物理推理基准（PRL-Bench）和 agent 工具基准（GTA-2、QuantCode-Bench）等论文构成密集信号日。

摘要

Post-training 导致输出多样性坍缩已被系统性定位到训练数据组成而非生成格式，对 inference-time scaling 有直接影响
Qwen3.5-Omni 扩展至百亿参数 + 256k 上下文，首次展示音视频指令编码（Audio-Visual Vibe Coding）能力
DiPO 用困惑度空间解耦策略实现 RL 探索/利用细粒度权衡，数学推理和函数调用均有提升
PRL-Bench 从 100 篇 PRL 论文构建端到端物理研究基准，最优模型仍低于 50 分
多篇 agent benchmark（GTA-2、QuantCode-Bench、AccelOpt）继续推动 agent 能力边界量化

📌 Top picks (交叉命中)

1. Where does output diversity collapse in post-training?（HF #2 / 8 upvotes / reasoning+inference+dpo 命中） → 后训练导致输出多样性坍缩，根源在训练数据而非推理格式

tldr_cn: 后训练多样性坍缩由数据组成决定，推理时无法修复
reason: hf_trending_rank:2 + watchlist_keyword:reasoning,inference,dpo，对 inference-time scaling 方法有直接影响
作者: Constantinos Karouzos, Xingwei Tan, Nikolaos Aletras

2. Qwen3.5-Omni Technical Report（HF #22 / 19 upvotes / reasoning+moe+inference 命中） → 百亿参数全模态模型，首创音视频编码能力

tldr_cn: 百亿 MoE 全模态模型，首创音视频指令编码
reason: hf_trending_rank:22 + watchlist_keyword:reasoning,moe,inference，Qwen 系列重大版本
作者: Qwen Team

3. PRL-Bench: A Comprehensive Benchmark Evaluating LLMs’ Capabilities in Frontier Physics Research（HF #21 / reasoning+agent 命中） → 物理研究端到端基准，最优模型 < 50 分

tldr_cn: LLM 物理研究能力基准，最优模型不足 50 分
reason: hf_trending_rank:21 + watchlist_keyword:reasoning,agent，AI for Science 能力边界量化
作者: Tingjia Miao, Wenkai Jin, Muhua Zhang 等

4. Hierarchical Codec Diffusion for Video-to-Speech Generation（HF #3 / quantization 命中 / citation_velocity:1.0） → 分层离散语音 token 建模实现视频到语音对齐

tldr_cn: 分层 codec 扩散实现视频到语音高保真生成
reason: hf_trending_rank:3 + watchlist_keyword:quantization + citation_velocity:1.0，离散语音建模新方向
作者: Jiaxin Ye, Gaoxiang Cong, Chenhui Wang 等

5. PersonaVLM: Long-Term Personalized Multimodal LLMs（HF #24 / 28 upvotes / reasoning+agent 命中） → 多模态 agent 框架实现长期个性化

tldr_cn: 记忆+推理+对齐三阶段多模态个性化框架
reason: hf_trending_rank:24 + watchlist_keyword:reasoning,agent + HF 28 upvotes，长期个性化范式
作者: Chang Nie, Chaoyou Fu, Yifan Zhang 等

6. QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies（HF #5 / agent 命中） → LLM 量化交易策略生成基准

tldr_cn: LLM 交易策略生成基准，瓶颈在金融逻辑
reason: hf_trending_rank:5 + watchlist_keyword:agent，领域特定代码生成新类别
作者: Alexey Khoroshilov, Alexey Chernysh 等

7. AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization（HF #15 / agent 命中 / citation_velocity:0.138） → LLM agent 自主优化 AI 加速器 kernel

tldr_cn: LLM agent 自主优化加速器 kernel，成本降 26 倍
reason: hf_trending_rank:15 + watchlist_keyword:agent + citation_velocity:0.138，开源 agent 做硬件优化
作者: Genghan Zhang, Shaowei Zhu, Anjiang Wei 等

8. DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off（HF #4 / reasoning 命中） → 困惑度空间解耦实现 RL 探索/利用细粒度权衡

tldr_cn: 困惑度解耦实现 RL 细粒度探索利用权衡
reason: hf_trending_rank:4 + watchlist_keyword:reasoning，RLVR 训练稳定性提升
作者: Xiaofan Li, Ming Yang, Zhiyuan Ma 等

🏷 Watchlist 分类命中

reasoning

Can Large Language Models Reinvent Foundational Algorithms?（HF #7 / 5 upvotes）→ LLM 遗忘后重发明算法，Qwen3-4B 50% 成功率
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning（HF #17 / 5 upvotes）→ 自适应视觉推理格式选择，token 用量降 50-90%
Maximal Brain Damage Without Data or Optimization（HF #13 / 31 upvotes）→ 翻转 2 个符号位即可瘫痪 ResNet-50 和 Qwen3-30B
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning（HF #19 / 18 upvotes）→ STOP token 实现并行推理路径早期剪枝，GPT-OSS-20B AIME25 84%→90%

inference

Elucidating the SNR-t Bias of Diffusion Probabilistic Models（HF #6 / 63 upvotes）→ 扩散模型 SNR-时步偏差纠正，8 种模型均获提升
(1D) Ordered Tokens Enable Efficient Test-Time Search（HF #8 / 11 upvotes）→ 粗到精 token 结构提升测试时搜索可扩展性
EdgeDetect: Importance-Aware Gradient Compression with Homomorphic Aggregation for Federated Intrusion Detection（HF #9）→ 联邦 IDS 梯度压缩 32 倍，通信量降 96.9%

agent

GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows（HF #18 / 3 upvotes）→ 通用工具 agent 分层基准，最优模型 workflow 仅 14.39%
Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective RAG Systems（HF #14 / 22 upvotes）→ Web 文档分块成本降一个数量级，消除幻觉风险

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号（S2 相似论文未返回）。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选人。今日候选作者均为单次出现，未满足发现规则中的重复出现或 tracked 机构新面孔条件。

📉 覆盖缺口与不确定性

s2_similar_unavailable — S2 相似论文未返回，延伸阅读段为空
affiliations_sparse — 候选 JSON 中 affiliations 字段均为空，无法做机构匹配（HF 源不附机构信息）
citation_count_zero_normal — 新预印本 S2 尚未索引引用，citation_count=0 为正常状态，不影响排名

来源与交叉验证说明

三源（arXiv + HuggingFace Daily Papers + Semantic Scholar）均成功抓取，无降级。排序主要依赖 HF trending rank（x3.0）和 watchlist keyword 命中（x2.0）。结论锚定在 arXiv 预印本原文（primary source），HF 趋势作为辅助信号，S2 提供 citation 元数据和 tldr。新预印本 citation_count 普遍为 0，不影响排名。tracked_labs_seen: Qwen Team（oss-ai-labs/qwen）出现于 Top #2。

Hanzhi's BLOG

[论文·2026-04-20]