[论文·2026-05-03]

论文雷达日报|2026-05-03

一句话结论:今日 20 条候选 HF Daily 主导,无单篇高分爆款(最高 3.5),主线集中在 Indic TTS(Praxy Voice / PSP)、Agent-native 科研基建(ARA / Last Harness / Synthetic Computers)以及消费级 GPU 训练系统(RoundPipe)三块;S2 similar_papers 未返回,延伸阅读今日空缺。

摘要

三源(arXiv + HF Daily + Semantic Scholar)联合召回 20 篇 raw 候选,全部带 S2 paper_id 与 hf trending rank,seen-pool 命中 0 篇(候选 JSON 内 seen_before=false 一致)。ranking_score 区间 0.0–3.5,Top picks 截断于 ≥2.5,共 8 篇——其中 5 篇驱动来自 HF trending(rank ≤30),3 篇来自 watchlist keyword(agent / inference / reasoning)。今日没有任何 Top pick 命中 frontier-labs / oss-ai-labs / robotics-labs / systems-labs seed 机构(候选 JSON 的 affiliations 字段全部为空),也没有 venue 字段非空的会议论文(除 EACL 2026 的 user simulation survey,但 ranking_score 仅 0.5),属于"中等密度但偏长尾"的一天。Indic 语音方向出现两篇同一作者(Venkata Pushpak Teja Menta)的配套工作(Praxy Voice 系统 + PSP benchmark),构成本日少见的成对工程证据;Agent-native 科研三连击(ARA / Last Harness / Synthetic Computers)值得整体读,三篇都把"用 agent 改造科研工作流本身"作为命题。无 frontier lab 大新闻。

📌 Top picks (交叉命中)

  1. Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost (arxiv:2604.25441)

    • tldr_cn:BUPS+LoRA 让非印度语 base 实现商用级 Telugu/Tamil。
    • tldr_en:This work combines a Brahmic Unified Phoneme Space, a Brahmic Unified Phoneme Space that deterministically romanises seven Indic scripts to ISO-15919 so Chatterbox’s Latin tokeniser can process them, and a voice-prompt recovery recipe that recovers commercial-class acoustic output with no acoustic-decoder training.
    • reason:watchlist_keyword:inference + nice_to_have:benchmark + citation_velocity:1.0;唯一拿到引用速率信号的候选,且与同作者的 PSP 基准互为闭环。
    • evidence_links:https://arxiv.org/abs/2604.25441 | https://huggingface.co/papers/2604.25441
  2. Instruction-Guided Poetry Generation in Arabic and Its Dialects (arxiv:2604.27766)

    • tldr_cn:阿拉伯语 MSA + 方言可控诗歌生成指令数据集。
    • tldr_en:This work presents a large-scale, carefully curated instruction-based dataset in Modern Standard Arabic (MSA) and various Arabic dialects, and addresses the practical aspect of poetry creation in Arabic by introducing controllable generation capabilities to assist users in writing poetry.
    • reason:hf_trending_rank:7 + nice_to_have:fine-tuning,evaluation;今日 HF trending 进前 10 的两篇之一,MBZUAI 系作者群(Preslav Nakov / Fajri Koto)。
    • evidence_links:https://arxiv.org/abs/2604.27766 | https://huggingface.co/papers/2604.27766
  3. The Last Human-Written Paper: Agent-Native Research Artifacts (arxiv:2604.24658)

    • tldr_cn:用机器可执行 ARA 取代论文叙事。
    • tldr_en:This work introduces the Agent-Native Research Artifact (ARA), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs.
    • reason:hf_trending_rank:17 + watchlist_keyword:agent;提出"叙事税 / 工程税"两个清晰概念,PaperBench QA 准确率从 72.4% 拉到 93.7%、RE-Bench 复现率从 57.4% 到 64.4%,作者列表 37 人(Pentland / Mosharaf Chowdhury / Beidi Chen 等)。
    • evidence_links:https://arxiv.org/abs/2604.24658 | https://huggingface.co/papers/2604.24658
  4. Synthetic Computers at Scale for Long-Horizon Productivity Simulation (arxiv:2604.28181)

    • tldr_cn:千台合成电脑驱动长视野生产力 RL。
    • tldr_en:It is argued that scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios.
    • reason:hf_trending_rank:24 + watchlist_keyword:agent + nice_to_have:evaluation;1000 台合成电脑、单 run >8 h、>2000 turn,作者 Tao Ge / Baolin Peng / Jianfeng Gao 来自微软老牌 NLP 流派。
    • evidence_links:https://arxiv.org/abs/2604.28181 | https://huggingface.co/papers/2604.28181
  5. Efficient Training on Multiple Consumer GPUs with RoundPipe (arxiv:2604.27085)

    • tldr_cn:消费级 GPU 上 round-robin 流水线近零气泡。
    • tldr_en:RoundPipe is a novel pipeline schedule that breaks the weight binding constraint on consumer GPU servers and treats GPUs as a pool of stateless execution workers and dynamically dispatches computation stages across devices in a round-robin manner, achieving a near-zero-bubble pipeline.
    • reason:hf_trending_rank:10 + nice_to_have:fine-tuning,evaluation;8×RTX 4090 上 1.48–2.16× 加速,单机即可 LoRA Qwen3-235B(31K seq);今日唯一 systems 方向的 Top pick。
    • evidence_links:https://arxiv.org/abs/2604.27085 | https://huggingface.co/papers/2604.27085
  6. RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments (arxiv:2604.26067)

    • tldr_cn:单目 RGB 在线开词汇语义 SLAM。
    • tldr_en:An online semantic SLAM system that enables geometry-aware open-vocabulary grounding, associating arbitrary natural language queries with localized 3D regions and objects in dynamic environments, enabling robust open-vocabulary semantic grounding for autonomous robotics and unconstrained in-the-wild video streams.
    • reason:watchlist_keyword:agent + nice_to_have:benchmark;hf_upvotes 64(今日候选最高),不需要标定/深度/位姿初始化、TUM-RGBD 动态集 SOTA。
    • evidence_links:https://arxiv.org/abs/2604.26067 | https://huggingface.co/papers/2604.26067
  7. Probing Visual Planning in Image Editing Models (arxiv:2604.22868)

    • tldr_cn:EAR 把视觉规划压缩成单步图像编辑。
    • tldr_en:EAR is presented, an editing-as-reasoning paradigm that reformulates visual planning as a single-step image transformation and introduces AMAZE, a procedurally generated dataset that features the classical Maze and Queen problems, covering distinct, complementary forms of visual planning.
    • reason:watchlist_keyword:reasoning + nice_to_have:evaluation;首次把"编辑即推理"作为 reasoning 范式正式提出,AMAZE 程序化生成,零样本下所有 SOTA 模型都失败。
    • evidence_links:https://arxiv.org/abs/2604.22868 | https://huggingface.co/papers/2604.22868
  8. The Last Harness You’ll Ever Build (arxiv:2604.21003)

    • tldr_cn:双层 meta-evolution 自动构建 agent harness。
    • tldr_en:A two-level framework shifts manual harness engineering into automated harness engineering, and takes one step further --automating the design of the automation itself.
    • reason:watchlist_keyword:agent + nice_to_have:evaluation;Worker / Evaluator / Evolution 三 agent 进化 harness,外层再 meta-evolve evolution 协议,与 ARA 形成"工具化科研 / 工具化 agent"对照。
    • evidence_links:https://arxiv.org/abs/2604.21003 | https://huggingface.co/papers/2604.21003

🏷 Watchlist 分类命中

  • Indic 语音 / 多语 TTS:除 Top pick 1 的 Praxy Voice,配套基准 PSP(arxiv:2604.25476,Phoneme Substitution Profile,6 维 retroflex/aspiration/vowel-length/zha/FAD/PSD)以及 IIT-MBZUAI 联合的 10 语 5K 句对评 leaderboard(arxiv:2604.21481,120K 对偶比较 + Bradley-Terry + SHAP)共同构成 Indic TTS 这一周的"评测三件套"——任一个想做印度本地化语音的项目都该把这三篇作为 Day-0 reading。
  • Agent / agentic infra:在 Top picks 之外,arxiv:2604.21003(Last Harness)与 arxiv:2604.24658(ARA)已经入选,外加 arxiv:2604.28181(Synthetic Computers)和 arxiv:2604.27578(World2Minecraft,VLN 用 Minecraft 做仿真)补"agent 仿真环境"维度。
  • Diffusion / 视觉生成arxiv:2604.24351(Diffusion Templates,KV-Cache + LoRA 统一插件框架)+ arxiv:2604.28190(FD-loss,把 Fréchet Distance 当训练目标,ImageNet 256 一步生成 FID 0.72)+ arxiv:2604.23380(V-GRPO,扩散模型 ELBO 替代 + GRPO,对 MixGRPO 2× / DiffusionNFT 3×)三连,覆盖"插件 / 损失 / RLHF"三个层面。
  • Embodied / 机器人arxiv:2604.27711(ExoActor,第三人称视频生成驱动 humanoid)+ arxiv:2604.27578(World2Minecraft,VLN)+ Top pick 6 RADIO-ViPE 形成今日机器人小三角,但都不是 frontier robotics lab 出品。
  • Survey / 评测arxiv:2604.24977(EACL 2026 长文,LLM-based 对话用户模拟综述,30 位作者,是仅有的 venue 命中);arxiv:2604.25032(推荐系统公平性评测综述)。

🔗 延伸阅读 (Semantic Scholar 相似论文)

本段今日无高置信度增量信号(S2 相似论文未返回)。coverage_gaps 已记录 s2_similar_unavailable,按硬性约束不再外部检索,留待下个迭代当 S2 接口恢复 similar_papers 字段时回填。

🧑‍🔬 新出现的作者 / 团队

本日发现扫描未发现达标候选:候选 JSON 的 affiliations 字段全部为空,且单作者命中只有 Venkata Pushpak Teja Menta(Praxy Voice + PSP,同人配套),不构成独立"新作者"信号;其余 Top picks 多为大型联合作者列表(ARA 37 人、User Simulation 综述 30 人),按 discovery_rules.md 单日新增上限 1–3 人的纪律不强行收录。明日会优先看 frontier-labs / oss-ai-labs 是否有 affiliation 命中再做沉淀。

📉 覆盖缺口与不确定性

  • s2_similar_unavailable:候选 JSON 未包含 similar_papers 字段,本日延伸阅读为空。
  • affiliations_empty:所有 20 条候选的 affiliations 字段都是 [],导致 tracked_labs / discovery 都无法基于 S2 元数据做 affiliation 命中;只能依赖作者姓名兜底(不可靠)。
  • arxiv_categories_empty:候选条目的 categories 字段也是空的,本日 arxiv_categories 留空,无法做 cs.LG / cs.CL 拓扑分布检查。
  • low_score_ceiling:今日最高 ranking_score 仅 3.5,没有命中"watchlist must-read venue"或"frontier lab"高优先级 reasons,整体属于中等密度的"长尾天"。

来源与交叉验证说明

  • primary:arXiv(20 篇候选全部带 arxiv_url + pdf_url,作为方法/结果证据锚)。
  • curated:HF Daily Papers(驱动信号——20 条全部带 hf_trending_rankhf_upvotes,最高 rank=7、最高 upvotes=64)。
  • metadata:Semantic Scholar(20 条全部带 s2_paper_ids2_tldrcitation_count 仅 Praxy Voice / PSP / Sample Selection 三篇 ≥1,符合"新预印本 S2 尚未充分索引"的常态)。
  • other:今日无博客 / 代码库 / 演示页作为补充证据。
  • 冲突优先级:primary > metadata > curated > other。Top picks 的 tldr_cn 全部从 s2_tldr(metadata)浓缩,reason 全部基于 ranking_reasons(合并 curated + watchlist),结论锚回 arXiv abstract(primary)。
  • seen-pool:候选 JSON 内 seen_before=false 全部一致,今日没有"过去 14 天已上榜"的论文需要降级。

Saved: ~/.oh-my-agent/reports/paper-digest/daily/2026-05-03.md