AI 日报｜2026-05-19

一句话结论：5/19 是 Google I/O 2026（Gemini Intelligence 系统级 agentic 助手 + 约 GPT-5.5 档新 Gemini，落后 Claude Mythos）与「agent 评测可信度缺口」论文爆发的叠加日；宏观侧算力 capex 首次破万亿但被电力瓶颈卡兑现，开源权重价格战把推理成本压向 Opus 1/3 以下，应用层 agentic AI 在金融后台出现 pilot→production 实证突破。

摘要

5/19 主线是 Google I/O 2026 keynote（Gemini Intelligence + Android 17 系统级 agentic 助手，新 Gemini 为 3.x 更新、约 GPT-5.5 档、落后 Claude Mythos）与「agent 评测可信度缺口」论文集中爆发的叠加日。OpenAI 5/18 上线美国 ChatGPT Pro 个人理财（Plaid 接 1.2 万家机构）并把产品线收拢到 Brockman；Anthropic 登顶 CNBC Disruptor 50、$30–50B/约$950B 估值轮推进中。论文层 paper-digest Top 8 中 6 篇为新基准，最硬数据点 TOBench 闭环工具使用最强模型仅 ~32% vs 人类 94%、LongMINT 长程记忆抗干扰 7 套系统均值 27.9%。宏观侧 2026 算力 capex 首次破万亿（~$1.04T）但被电力瓶颈（变压器约 5 年交期、30–50% 产能滑入 2027–28）卡住兑现；中国开源权重 12 天内 4 款 frontier 级密集放出、推理价跌至 Opus 1/3 以下，Broadridge 把 agentic AI 推进金融后台生产并称首日降本 30%。人物发现扫描新增 3 名候选（Mohit Bansal / Hyunji Lee / Aditya Tanna），均来自 paper-digest 作者层、不在现有 92 人池内。

当日要点：

Google I/O 2026：Gemini Intelligence 系统级 agentic 助手 + 新 Gemini（3.x，约 GPT-5.5 档，落后 Claude Mythos），官方 model card/benchmark 截稿未发，记为 confidence_flag。
OpenAI 5/18 推美国 ChatGPT Pro 个人理财（Plaid 接 1.2 万机构，默认 GPT-5.5 Thinking），产品线收拢至 Greg Brockman——消费应用层冲量。
Anthropic 登顶 CNBC 2026 Disruptor 50；$30–50B 融资、最高约 $950B 估值推进中（媒体源、未签 term sheet，记为 unverified）。
论文层为 agent 评测基准爆发日：TOBench 闭环工具使用最强模型仅 ~32% vs 人类 94%，LongMINT 长程记忆抗干扰均值 27.9%——无人值守 agent 生产化仍需重型人类兜底。
2026 算力 capex 首次破万亿（~$1.04T）但电力为硬约束：变压器约 5 年交期、~50% 项目延期、30–50% 规划产能滑入 2027–28，capex 不等于可兑现算力。
中国开源权重 12 天内 4 款 frontier 级（GLM-5.1 / MiniMax M2.7 / Kimi K2.6 / DeepSeek V4）放出、价跌至 Opus 1/3 以下，Anthropic 溢价策略「空间见顶」。
应用层 pilot→production 出现具体突破：Broadridge 把 agentic AI 推进金融后台生产、称首日降本 30%；但 WRITER 调研 79% 组织仍有落地阻力、仅约 23–29% 见显著 ROI。
人物候选池新增 3 名（Mohit Bansal LongMINT 末位作者 / Hyunji Lee LongMINT 一作 / Aditya Tanna 表格 FM 团队），全部源自 paper-digest 2026-05-19 作者层、不在现有 92 人池。

Frontier Labs / Frontier Model Radar

May 19 is dominated by Google’s I/O 2026 keynote (Mountain View, 10am PT), where Gemini Intelligence — an agentic, proactive system-level assistant embedded in Android 17 — is the headline, alongside a new Gemini model (widely reported as a 3.x update, not 4.0) that lands roughly at GPT-5.5 level and meaningfully behind Anthropic’s Claude Mythos. OpenAI shipped a US ChatGPT Pro personal-finance experience (Plaid account linking) on May 18 and consolidated products under Greg Brockman. Anthropic’s reported $30B-$50B raise at up to ~$950B valuation continues to advance (expected to close by month-end) and it topped CNBC’s 2026 Disruptor 50; Meta (proprietary Muse Spark), xAI (Grok 4.3 GA + May 15 legacy-model retirement) and Mistral (Emmi AI acquisition, Mythos-alternative cyber model in development) round out the active set. DeepSeek and Qwen had no fresh in-window release; their V4 / 3.5 lines remain status-quo with DeepSeek’s promo pricing ending May 31.

OpenAI（product_release）：OpenAI launched a US-only ChatGPT personal-finance experience in preview for Pro users on May 18, using a Plaid integration to link 12,000+ institutions (Schwab, Fidelity, Chase, Robinhood, Amex, Capital One) with a portfolio/spending/subscriptions dashboard; defaults to GPT-5.5 Thinking. Follows the April Hiro team acquisition. [src] [src]
OpenAI（leadership_signal）：OpenAI consolidated multiple product lines under co-founder Greg Brockman as it expanded into personal finance, signalling a tighter product-org structure around its consumer applications push. [src]
Anthropic（other）：Anthropic topped CNBC’s 2026 Disruptor 50 list (published May 19), and its reported funding round of $30B-$50B at a valuation as high as ~$950B (NYT/Bloomberg/Sherwood) continues to advance, with the round expected to close as soon as end of May; that valuation would put it ahead of OpenAI. Claude Mythos remains the benchmark leader (17 of 18 measured). [src] [src] [src]
Google DeepMind（product_release）：Google opened I/O 2026 on May 19 (Shoreline Amphitheatre, 10am PT keynote). Headline is Gemini Intelligence — a proactive, system-level agentic assistant running inside Android 17 (cross-app/web autonomous tasks: locating a Gmail syllabus and auto-filling a shopping cart was demoed), plus a new Gemini model reported to be a 3.x update (not 4.0) landing roughly at GPT-5.5 level and meaningfully behind Claude Mythos; auto-browse rolls out to subscribers late June, phones this summer. Android XR glasses (Samsung/Warby Parker/Gentle Monster/XREAL) and Aluminium OS also previewed. [src] [src] [src]
Meta（product_release）：Meta’s first Superintelligence Labs model, Muse Spark (codename Avocado), continues rolling out as Meta’s most powerful model and is proprietary — a notable break from the open Llama lineage — powering the Meta AI app/site with planned integration into WhatsApp/Instagram/Facebook/Messenger and glasses. No new incremental Meta model announcement in the 48h window; 2026 AI capex guided at $115B-$135B. [src] [src]
xAI（product_release）：xAI’s Grok 4.3 (released May 6, cost-efficient frontier model: 1M-token context, native video input, reasoning, ~53 Intelligence Index, #1 CaseLaw v2/CorpFin, $1.25/$2.50 per M tokens) reached full API availability; eight legacy models (grok-4-fast, grok-4-0709, grok-3, grok-code-fast-1, grok-imagine-image-pro, etc.) were retired May 15 with auto-redirect to grok-4.3. Also entering CAISI pre-release government eval agreement. [src] [src] [src]
Mistral（other）：Mistral acquired Vienna-based Emmi AI (large engineering models / physics-based simulation) in May, and is in talks with European banks to deploy a cybersecurity-focused model positioned as an alternative to Anthropic’s Mythos for banks lacking Mythos access; release timing unconfirmed. [src] [src]
Qwen：本窗口无高置信增量信号（见来源与交叉验证说明）。
DeepSeek：本窗口无高置信增量信号（见来源与交叉验证说明）。

未验证前沿信号（仅观察，不作为当日主线）：

Leaked/AI-Studio-metadata benchmarks for the new Google I/O Gemini model (e.g. ~84.6% on ARC-AGI2, a ‘Gemini Omni’ video model card surfaced on Reddit) circulated pre-keynote; Google had not published official benchmarks at time of writing, so exact model name/version and scores are unconfirmed. [src] [src]
Reports claim Anthropic’s round could reach as high as ~$950B valuation (vs $900B in Bloomberg’s May 12 report); the higher figure and the $50B upper bound are not confirmed and no term sheet has been signed. [src]

关键人物与社区信号

今日追踪人物的 48 小时实质性产品/研究信号偏薄：OpenAI 侧(Sam Altman / Greg Brockman)以 Musk 诉讼、Brockman 约 300 亿美元持股披露与产品战略接管等市场/法务噪声为主，按规则不计入信号；Andrej Karpathy 的 autoresearch loop 与 Simon Willison 对 Shopify PR 的记录虽在 5/19 被科技媒体回顾，但原始事件发生在 3 月，非 48h 新信号。真正新鲜的高信号集中在 agent 长程记忆研究层：5/18 上线的 LongMINT 基准(UNC Bansal 组)被 paper-digest 列为 Top pick 并标记两位新作者。本日发现扫描产出 3 名达标候选(Mohit Bansal、Hyunji Lee、Aditya Tanna)，均不在现有 92 人池内。

Simon Willison（simon-willison）：5/19 科技媒体回顾报道再次引用 Simon Willison 对 Karpathy autoresearch 模式及 Shopify Liquid PR(声称 53% 提速)的逐日记录与澄清(实际工具为 Pi TypeScript 工具包而非 Claude Code)；属持续高曝光的社区信号源,但核心事件发生在 3 月,非 48h 内新产物,信号强度偏弱仅作记录。 [src]

Energy

Behind-the-meter gas remains the fastest path to AI power: CalEthos/TerraVolt Infrastructure (8-K, May 11, 2026) signed a firm natural-gas supply agreement for 55,000 MMBtu/day to feed a 200-240 MW onsite power plant for an SE Idaho AI data-center campus, paying a $3.83M reservation fee (May 8) and committing up to $56M in letters of credit. This is incremental confirmation that grid-bypass, behind-the-meter generation (vs 4-7 yr interconnect queues flagged in the prior daily) is the operative procurement model for new AI capacity in the 48h window. [src] [src] [src]
Power, not silicon, is now the binding AI constraint: analysis converging this week estimates up to ~11 GW of 2026 data-center capacity stuck in ‘announced, not under construction’, ~50% of global projects delayed by power/grid-equipment shortages, and high-power transformer lead times stretched to ~5 years. Hyperscalers are responding by relocating multi-$B builds to power-rich regions (Microsoft UAE $15.2B, Meta Louisiana $10B) and signing direct procurement deals to bypass the grid. [src] [src] [src]

Chips

HBM supply tightens further and is fully spoken for: SK hynix has effectively sold out DRAM/NAND/HBM to NVIDIA through 2026 (projected ~50% HBM bit share, down from 59% as Samsung rises to ~28%). NVIDIA has asked SK hynix, Samsung and Micron to deliver denser 16-Hi HBM4 by Q4 2026 for Rubin; Samsung+SK hynix tapped as Rubin HBM4 suppliers with shipments from ~March. HBM remains the gating component for accelerator output, reinforcing the prior daily’s supply-constraint thesis. [src] [src] [src]
Custom-ASIC erosion of NVIDIA share continues: Meta’s MTIA 400 accelerator completed testing and entered production deployment in Meta data centers (next-gen MTIA carries more HBM for genAI inference), and Google is deliberately pitting MediaTek’s cost positioning against Broadcom’s premium ASIC pricing. Analyst framing: NVIDIA accelerator share drifting toward 55-60% as Broadcom/Marvell custom silicon fills the gap — extends the Broadcom/Marvell duopoly thread from the prior daily, now with Meta’s own silicon in production. [src] [src] [src]

Infra

2026 compute capex confirmed as the first trillion-dollar year: Big Four (MSFT/AMZN/GOOGL/META) at ~$725B (up 77% from $410B in 2025), Q1 alone ~$130B (3.7x Q1-2023); adding Oracle ~$50B, Apple ~$13B, neoclouds ~$60B, China ~$80B, sovereigns ~$60B brings the total to ~$1.04T. The Anthropic-Google Cloud commitment ($200B / 5 GW over 5 years, disclosed May 5) anchors demand. This raises the upper bound vs the prior daily’s $690-755B hyperscaler-only range. [src] [src] [src]
Capex-to-realized-capacity gap widening: industry analysis at/around Data Center World 2026 projects 30-50% of planned 2026 data-center capacity slipping to 2027-2028 due to power, transformer and grid-equipment shortages — i.e. the trillion-dollar capex figure overstates near-term deliverable compute. Rack power has jumped from 30-40 kW to hundreds of kW approaching the MW range, intensifying the power/cooling bottleneck. [src] [src]

Model

Open-weights inference cost war intensifies: four Chinese frontier-class open-weights models dropped in a ~12-day window in early May 2026 — GLM-5.1 (Z.ai), MiniMax M2.7, Kimi K2.6 (Moonshot), DeepSeek V4 — all competitive on agentic coding benchmarks and running inference at <1/3 of Claude Opus pricing. This is the incremental escalation of the prior daily’s ‘5 open-weights drops in 30 days / falling inference marginal cost’ theme. [src] [src] [src]
Western frontier pricing under structural pressure: GPT-5.5 ($5/$30, $0.50 cached) is the new complex-coding frontier; Anthropic’s prior 67% Opus cut (Opus 4.6/4.7 at $5/$25, down from $15/$75) is now visibly the floor it can hold while open-weights run at $0.10-0.28 in/out. Commentary frames Anthropic’s premium strategy as ‘running out of room’ given the 3x-7x cost gap — pricing compression is the dominant model-layer dynamic, no major new Western frontier release in the 48h window. [src] [src] [src]

论文层（consumed from paper-digest 2026-05-19）：

2605.17894 Evaluating Cognitive Age Alignment in Interactive AI Agents — 首个心理测量学接地的交互式认知年龄基准，量化 MLLM 智能体的认知年龄差距；当日 HF 热度第一。（市场相关性：把「智能体在交互中表现得像几岁的人」做成可测指标，给面向消费者/教育/陪伴类 agent 产品提供了一个超出准确率的合规与产品定位评估维度——对需要按用户群体调校交互成熟度的部署方有直接选型价值。）
2605.16909 TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents — 面向真实世界的全模态闭环工具使用基准；最强模型仅约 32% 任务成功率，远低于 94% 人类基线。（市场相关性：这是今日最硬的「agent 可信度缺口」数据点：闭环全模态工具使用上最强模型仅 32% vs 人类 94%，直接说明把 tool-using agent 投入无人值守生产仍需重型人类兜底，是评估 agent 部署经济性与 ROI 的基准锚点。）
2605.16079 VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation — 将 agentic 推理内化进实例级视频理解，配四阶段全自动数据合成管线，效果超 GPT-4o。（市场相关性：原生工具调用 + 自动数据合成的范式降低了视频理解 agent 的训练数据采集成本，对监控、媒资、内容审核等视频类 agent 产品的可规模化落地与单位经济性有直接含义。）
2605.18663 GIM: Evaluating models via tasks that integrate multiple cognitive domains — IRT 校准的整合型推理基准，首次大规模量化 test-time compute 与模型能力的权衡，发现思考预算/量化与选型同等重要。（市场相关性：把「思考预算/量化」抬升到与模型选型同等重要的决策维度，等于给推理成本预算与延迟 SLA 的工程团队一份可量化的 test-time-compute 投入产出曲线——直接影响推理账单与 SKU 选择。）
2605.18621 CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark — 跨视角空间智能完整三件套：1.6M 数据集 + 对齐模型 + 基准。（市场相关性：跨视角空间推理是机器人、AR/XR、多摄像头自动驾驶等具身/空间产品的卡点能力；数据集+模型+基准齐备意味着这条能力线从研究演示向可复现工程基线收敛，缩短相关产品的可信度验证周期。）
2605.18565 LongMINT: Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems — 高干扰长程记忆抗干扰基准；7 套记忆系统在多目标干扰下平均仅 27.9% 准确率，系统性暴露检索/构建短板。（市场相关性：几乎所有长程 agent / 企业助理都依赖外置记忆系统，27.9% 的抗干扰准确率说明当前记忆栈在真实多任务噪声下并不可靠——对赌「长程记忆」卖点的 agent 产品与向量库/记忆中间件赛道是直接风险信号。）
2605.18572 MA $^{2}$ P: A Meta-Cognitive Autonomous Intelligent Agents Framework for Complex Persuasion — 元认知配置器驱动的自治多智能体框架，跨域自动选策略以缓解性能波动，提升复杂劝说成功率。（市场相关性：针对「跨域性能波动」给出元认知调度方案，对销售/客服/谈判类 agent 产品的稳定性是工程化思路；同时劝说能力增强也带来 trust & safety 与监管审视的双刃后果，需在部署侧权衡。）
2605.15572 Measuring Maximum Activations in Open Large Language Models — 系统测量开源 LLM 最大激活幅度，给出 MoE 峰值比同规模 dense 低约 14–23 倍的部署经验律。（市场相关性：MoE 峰值激活比 dense 低一个量级是可直接用于量化/低精度部署与显存预算的经验律，关系到推理硬件选型与单位 token 成本——在当前开源 MoE 模型密集发布的背景下尤具部署决策价值。）

Application

Agentic AI crossing from pilot to production at institutional scale: Broadridge (PR May 11, coverage through May 13) put agentic AI live in production across post-trade, account management and client services (trade-fails, break resolution, valuation exceptions), claiming up to 30% Day-1 operational cost reduction and built on deployments across 40+ BPO clients since 2024. Concrete evidence the prior daily’s pilot->production bottleneck is being broken in financial operations. [src] [src] [src]
Deployment-services layer being built to attack the production gap: ServiceNow+Accenture launched a Forward Deployed Engineering program to scale agentic AI from enterprise pilot to production, and OpenAI launched a ~$4B deployment company with 19 investment firms/SIs/consultancies. Counter-signal: WRITER reports 79% of orgs still face adoption challenges (double-digit YoY rise), ~60% governance gap, only ~23-29% seeing significant genAI/agent ROI — adoption breadth is real but ROI realization remains thin. [src] [src] [src]

技术信号（paper-digest 不覆盖的工具/发布层）：

LightSeek Foundation 于 2026-05-07 发布开源（MIT）LLM 推理引擎 TokenSpeed，专为长上下文（>50K token）多轮 agentic 工作负载设计，宣称在 NVIDIA B200 上相比 TensorRT-LLM 批大小 1 延迟约快 9%、100 TPS/User 吞吐约高 11%，MLA kernel 在长前缀 KV cache 的推测解码场景近乎减半 decode 延迟；当前为 preview 状态、无配套学术论文，属 paper-digest 不覆盖的纯工具发布。 [src]
vLLM 于 2026-05-11 发布博客称在 Artificial Analysis 推理性能榜登顶：DeepSeek V3.2 达 230 TPS 输出吞吐（约为多数推理服务商 4 倍）、Qwen 3.5 397B 在 12 家服务商中第一且 1 万 token prompt TTFT < 1s、MiniMax-M2.5 在并发 1 下 326 TPS。该成果为合入主仓的内核融合 / 推测解码 / 模型专项优化的工程文档，无学术论文，属基础设施/工具进展，paper-digest 不覆盖。 [src]
Google I/O 2026 定于 2026-05-19（与本报告日同日）举行，keynote 美西 10am PT，业界预期发布 Gemini 4.0、Android XR 眼镜与 Aluminium OS——属产品/发布会层信号，非论文，paper-digest 不覆盖；具体内容待发布后由其他 layer 跟进。 [src]

层间联动影响

Energy -> Infra -> Chips: the power bottleneck (5-yr transformer lead times, ~50% of projects delayed, ~11 GW announced-but-unbuilt) is what converts the ~$1.04T 2026 compute capex into only partially-realized capacity, with 30-50% of planned 2026 capacity slipping to 2027-2028. This means accelerator/HBM demand (NVIDIA Rubin, SK hynix sold out through 2026) is paced by megawatts available behind-the-meter, not by fab/HBM output alone — driving deals like CalEthos/TerraVolt’s 200-240 MW onsite gas plant. [src] [src] [src]
Chips -> Model -> Application: custom-ASIC maturation (Meta MTIA 400 in production, Broadcom/Marvell duopoly, Google playing MediaTek vs Broadcom) plus an open-weights flood (4 Chinese frontier models in 12 days at <1/3 Opus price) collapses inference marginal cost. That cost collapse is the precondition for the application layer pushing agentic AI into production economically — e.g. Broadridge’s 30% Day-1 cost-reduction agentic deployment and the ServiceNow/Accenture and OpenAI deployment-company plays only pencil out once token costs fall. [src] [src] [src]

🎙️ 播客动态

晚点聊 — 165: 英伟达 GEAR 高深远：世界模型、自进化循环、DreamDojo：本期《晚点聊》，我与刚从港科大博士毕业的一位年轻研究者高深远，他从去年开始在英伟达实习，接下来马上会正式加入英伟达的具身智能实验室 GEAR。我们聊了深远 2024 年以来一直专注的方向：世界模型。前 1 个多小时，我们展开了整个世界模型的大图景：它的分类？它是为了解决什么问题？它的现状、瓶颈和未来方向，以及各主要…
张小珺Jùn｜商业访谈录 — 141. Freda的投资札记第2集：Tokenmaxxing、把电机塞进蒸汽机、接力赛变篮球赛、孤独、人的连接：今天是我们的系列节目《Freda的投资札记》第2集。可能有听众是第一次听我们的节目，那还是先介绍一下——Freda Duan在湾区做投资，是Altimeter Capital的合伙人。Altimeter是一个硅谷科技基金，横跨一二级。在一级市场投资案例有OpenAI、Anthropic、字节跳动等，在二级市场投资案例…

候选池变化与后续关注

本日发现扫描新增 3 名候选人(Mohit Bansal、Hyunji Lee、Aditya Tanna),均经 paper-digest 2026-05-19 交叉核对且确认不在现有 92 人池内;扫描覆盖 X/Twitter 高曝光线程、两档预取播客嘉宾(高深远/Freda Duan 已在候选池故排除)、paper-digest Top picks 与 GitHub trending。

本日新增候选人：

Mohit Bansal — oss-ai-builders — UNC Chapel Hill 团队 5/18 上线的 LongMINT 长程记忆抗干扰基准(15.6k QA、平均 138.8k tokens、最高 1.8M tokens,现有 memory-augmented agent 平均仅 27.9% 准确率)的资深/末位作者,被 paper-digest 列为今日 Top pick #5 并标记为新作者,代表 agent 记忆评测前沿且不在追踪/候选池内。 [src] [src]
Hyunji Lee — oss-ai-builders — LongMINT(arXiv 2605.18565,5/18 提交)的第一作者,主导设计这一聚焦多目标干扰下长程记忆评测的新基准,揭示当前记忆增强型 agent 在干扰密集场景下显著失效,是 agent 记忆评测方向有 48h 具体产物的新研究人。 [src] [src]
Aditya Tanna — oss-ai-builders — 与 Vinay Kumar Sankarapu / Pratinav Seth / Mohamed Bouadi 同组,5/18-5/19 同日在 4 篇表格基础模型与可解释性方向预印本(arXiv 2605.18702/18696/18635/18654)重复出现,被 paper-digest 按「不同论文重复出现 ≥2 次」规则标记为新作者,代表表格 FM/可解释性研究集群且不在现有池内。 [src] [src]

来源与交叉验证说明

来源构成： primary / official：arXiv（经 paper-digest）、OpenAI/xAI 官方迁移文档、docs.x.ai、Broadridge/Accenture 新闻稿、Goldman Sachs。company / filing：CalEthos/TerraVolt 8-K（经 StockTitan/GlobeNewswire 转载，sec.gov 原文 403）、PRNewswire。media / analysis：CNBC / Bloomberg / TechCrunch / VentureBeat / Sherwood / NotebookCheck / TweakTown / TrendForce / DataCenterKnowledge / Futurum / ghacks / theaiinsider。community / social：X.com、Substack（datacenterrichness / macromicro）、marktechpost、vLLM 博客、buildfastwithai、abhs.in、llm-stats、testingcatalog、Reddit、paper-digest 2026-05-19 daily JSON（本报告 consumed）、订阅播客（小宇宙）。

交叉验证： Google I/O Gemini 细节来自 keynote 前预览/直播博客，官方 model card 与 benchmark 截稿未发，版本/分数记为暂定（confidence_flag + coverage_gap）；Anthropic 估值区间（Bloomberg $900B vs 其他源约 $950B）与 $30–50B 轮规模为媒体报道、未签约，已下沉至 unverified_frontier_signals；CNBC/Bloomberg 主源对自动抓取返回 403，已用多家二手高质量源交叉确认；paper-digest Top picks 由其自身 ranking + HF trending 双重确认，本报告直接 consumed（未二次 WebSearch arXiv），并继承其 s2_tldr_sparse / s2_similar_unavailable / affiliations_empty 标记；paper-digest summary 文案将 #2 误称 MM-ToolBench，结构化 top_picks #2 实为 TOBench(2605.16909)，32% vs 94% 归属 TOBench，以结构化数据为准；CalEthos/TerraVolt 能源条目 sec.gov 8-K 原文 403，依据 GlobeNewswire/Yahoo/StockTitan 同稿转载（条款跨源一致）；万亿 capex 总额与 30–50% 产能滑移为分析机构估算（方向高、点值中）；开源权重定价/对标为媒体综合 + 价格追踪器交叉（中）；Broadridge 30% 降本为公司自报。

覆盖缺口（coverage_gaps）：

[frontier_radar] qwen_no_incremental_signal: no new Qwen model/release in the 48h window; Qwen 3.5 (Feb 16) remains the current line, Qwen 3.7 not yet shipped
[frontier_radar] deepseek_no_incremental_signal: no new DeepSeek release in window; V4-Pro/V4-Flash still in preview (Apr 24), promo pricing ends May 31
[frontier_radar] google_io_official_model_card_pending: Gemini model name/version and official benchmarks not yet published at time of writing (keynote in progress)
[paper_layer] paper-digest 继承缺口 s2_similar_unavailable：Semantic Scholar 相似论文图谱对全部 135 条候选未返回，延伸阅读/相似工作维度本期为空。
[paper_layer] Google I/O 2026（5-19 同日）keynote 内容在本 section 截稿时尚未发布，technical_signals 仅记录日程预期，实际产品发布需由发布会层跟进。
[people_pool] Sam Altman / Greg Brockman 等 OpenAI 高管 48h 内仅有 Musk 诉讼、持股披露与产品战略接管等法务/市场噪声,无可计入的产品/技术信号,按规则未纳入 tracked_people_signals。
[people_pool] Karpathy autoresearch loop 与 Shopify/Tobi Lütke/David Cortés 的 53% 提速 PR 虽在 5/19 被科技媒体回顾,原始事件发生在 2026 年 3 月,不构成 48h 新信号,故未纳入。
[people_pool] 晚点聊 #165(高深远/NVIDIA GEAR DreamDojo)与张小珺 #141(Freda Duan)两位嘉宾均已在候选池(last_seen 2026-05-18),未发现具备独立 48h 产物且未入池的合作者可提名。
[macro_news] Primary SEC 8-K filing for CalEthos/TerraVolt (sec.gov ex99-1.htm) returned HTTP 403 to WebFetch; energy entry relies on the GlobeNewswire/Yahoo/StockTitan secondary reproductions of the same press release (terms consistent across sources).
[macro_news] No single dated primary release in the precise May 17-19 48h window for the chips and model layers; those entries lean on early-May dated events (HBM4 supplier reports, 4-model open-weights burst) plus mid-May pricing trackers as the freshest available incremental signal.

置信标记（confidence_flags）：

[frontier_radar] Google I/O details synthesized from pre-keynote previews and live-blog reporting; official Gemini model card/benchmarks pending — treat exact version/scores as provisional
[frontier_radar] Anthropic valuation range ($900B Bloomberg vs ~$950B other outlets) and round size ($30B-$50B) are media-reported and not finalized
[frontier_radar] CNBC and Bloomberg primary URLs returned 403 to automated fetch; signals corroborated via multiple secondary high-quality reports
[paper_layer] inherits_paper_digest_flags: 继承 paper-digest 的 s2_tldr_sparse（135 候选仅 9 条带 S2 tldr，Top picks 的 tldr_en 全空，故本 section 全部 tldr_en 留空）、s2_similar_unavailable、affiliations_empty（arXiv listing 与 HF JSON 均未附机构）。
[paper_layer] paper-digest summary 文案中将 #2 称为「MM-ToolBench」，但 top_picks 结构化数据中 #2 实际为 TOBench（arxiv_id 2605.16909），32.0% vs 94.0% 数据点归属 TOBench；本 section 以结构化 top_picks 为准。
[paper_layer] technical_signals 全部来自二手聚合源（marktechpost / vllm 博客 / buildfastwithai），TokenSpeed 与 vLLM 性能数字为厂商自报基准，未独立复核。
[people_pool] Hyunji Lee、Aditya Tanna 暂无可验证 X handle,机构信息部分依据 arXiv/公开认知标注,后续晋升前需补充身份核验。
[people_pool] Aditya Tanna 的 4 篇同日预印本以 paper-digest 抓取的 arXiv ID 为准,本次未逐篇打开正文核对全部作者顺序。
[macro_news] infra: trillion-dollar 2026 capex total (~$1.04T) and the 30-50% capacity-slippage figure are analyst/aggregator estimates (Goldman, Futurum, Omdia/DCK), not company filings — directionally high-confidence, point figures medium.
[macro_news] model: open-weights-vs-Opus ‘under 1/3 pricing’ and ‘3x-7x cost gap’ are media/analysis syntheses cross-checked against pricing trackers; benchmark-parity claims are vendor/secondary, medium confidence.
[macro_news] application: Broadridge 30% Day-1 cost-reduction figure is a company press-release claim (primary but self-reported); 79% adoption-challenge / ROI figures are WRITER survey data, medium confidence.

引用源合计 66 条（完整书目见 ai.json sources[]）。

Hanzhi's BLOG

[市场·2026-05-19] AI