[市场·2026-05-19] AI

AI 日报|2026-05-19

一句话结论:5/19 是 Google I/O 2026(Gemini Intelligence 系统级 agentic 助手 + 约 GPT-5.5 档新 Gemini,落后 Claude Mythos)与「agent 评测可信度缺口」论文爆发的叠加日;宏观侧算力 capex 首次破万亿但被电力瓶颈卡兑现,开源权重价格战把推理成本压向 Opus 1/3 以下,应用层 agentic AI 在金融后台出现 pilot→production 实证突破。

摘要

5/19 主线是 Google I/O 2026 keynote(Gemini Intelligence + Android 17 系统级 agentic 助手,新 Gemini 为 3.x 更新、约 GPT-5.5 档、落后 Claude Mythos)与「agent 评测可信度缺口」论文集中爆发的叠加日。OpenAI 5/18 上线美国 ChatGPT Pro 个人理财(Plaid 接 1.2 万家机构)并把产品线收拢到 Brockman;Anthropic 登顶 CNBC Disruptor 50、$30–50B/约$950B 估值轮推进中。论文层 paper-digest Top 8 中 6 篇为新基准,最硬数据点 TOBench 闭环工具使用最强模型仅 ~32% vs 人类 94%、LongMINT 长程记忆抗干扰 7 套系统均值 27.9%。宏观侧 2026 算力 capex 首次破万亿(~$1.04T)但被电力瓶颈(变压器约 5 年交期、30–50% 产能滑入 2027–28)卡住兑现;中国开源权重 12 天内 4 款 frontier 级密集放出、推理价跌至 Opus 1/3 以下,Broadridge 把 agentic AI 推进金融后台生产并称首日降本 30%。人物发现扫描新增 3 名候选(Mohit Bansal / Hyunji Lee / Aditya Tanna),均来自 paper-digest 作者层、不在现有 92 人池内。

当日要点:

  • Google I/O 2026:Gemini Intelligence 系统级 agentic 助手 + 新 Gemini(3.x,约 GPT-5.5 档,落后 Claude Mythos),官方 model card/benchmark 截稿未发,记为 confidence_flag。
  • OpenAI 5/18 推美国 ChatGPT Pro 个人理财(Plaid 接 1.2 万机构,默认 GPT-5.5 Thinking),产品线收拢至 Greg Brockman——消费应用层冲量。
  • Anthropic 登顶 CNBC 2026 Disruptor 50;$30–50B 融资、最高约 $950B 估值推进中(媒体源、未签 term sheet,记为 unverified)。
  • 论文层为 agent 评测基准爆发日:TOBench 闭环工具使用最强模型仅 ~32% vs 人类 94%,LongMINT 长程记忆抗干扰均值 27.9%——无人值守 agent 生产化仍需重型人类兜底。
  • 2026 算力 capex 首次破万亿(~$1.04T)但电力为硬约束:变压器约 5 年交期、~50% 项目延期、30–50% 规划产能滑入 2027–28,capex 不等于可兑现算力。
  • 中国开源权重 12 天内 4 款 frontier 级(GLM-5.1 / MiniMax M2.7 / Kimi K2.6 / DeepSeek V4)放出、价跌至 Opus 1/3 以下,Anthropic 溢价策略「空间见顶」。
  • 应用层 pilot→production 出现具体突破:Broadridge 把 agentic AI 推进金融后台生产、称首日降本 30%;但 WRITER 调研 79% 组织仍有落地阻力、仅约 23–29% 见显著 ROI。
  • 人物候选池新增 3 名(Mohit Bansal LongMINT 末位作者 / Hyunji Lee LongMINT 一作 / Aditya Tanna 表格 FM 团队),全部源自 paper-digest 2026-05-19 作者层、不在现有 92 人池。

Frontier Labs / Frontier Model Radar

May 19 is dominated by Google’s I/O 2026 keynote (Mountain View, 10am PT), where Gemini Intelligence — an agentic, proactive system-level assistant embedded in Android 17 — is the headline, alongside a new Gemini model (widely reported as a 3.x update, not 4.0) that lands roughly at GPT-5.5 level and meaningfully behind Anthropic’s Claude Mythos. OpenAI shipped a US ChatGPT Pro personal-finance experience (Plaid account linking) on May 18 and consolidated products under Greg Brockman. Anthropic’s reported $30B-$50B raise at up to ~$950B valuation continues to advance (expected to close by month-end) and it topped CNBC’s 2026 Disruptor 50; Meta (proprietary Muse Spark), xAI (Grok 4.3 GA + May 15 legacy-model retirement) and Mistral (Emmi AI acquisition, Mythos-alternative cyber model in development) round out the active set. DeepSeek and Qwen had no fresh in-window release; their V4 / 3.5 lines remain status-quo with DeepSeek’s promo pricing ending May 31.

  • OpenAI(product_release):OpenAI launched a US-only ChatGPT personal-finance experience in preview for Pro users on May 18, using a Plaid integration to link 12,000+ institutions (Schwab, Fidelity, Chase, Robinhood, Amex, Capital One) with a portfolio/spending/subscriptions dashboard; defaults to GPT-5.5 Thinking. Follows the April Hiro team acquisition. [src] [src]
  • OpenAI(leadership_signal):OpenAI consolidated multiple product lines under co-founder Greg Brockman as it expanded into personal finance, signalling a tighter product-org structure around its consumer applications push. [src]
  • Anthropic(other):Anthropic topped CNBC’s 2026 Disruptor 50 list (published May 19), and its reported funding round of $30B-$50B at a valuation as high as ~$950B (NYT/Bloomberg/Sherwood) continues to advance, with the round expected to close as soon as end of May; that valuation would put it ahead of OpenAI. Claude Mythos remains the benchmark leader (17 of 18 measured). [src] [src] [src]
  • Google DeepMind(product_release):Google opened I/O 2026 on May 19 (Shoreline Amphitheatre, 10am PT keynote). Headline is Gemini Intelligence — a proactive, system-level agentic assistant running inside Android 17 (cross-app/web autonomous tasks: locating a Gmail syllabus and auto-filling a shopping cart was demoed), plus a new Gemini model reported to be a 3.x update (not 4.0) landing roughly at GPT-5.5 level and meaningfully behind Claude Mythos; auto-browse rolls out to subscribers late June, phones this summer. Android XR glasses (Samsung/Warby Parker/Gentle Monster/XREAL) and Aluminium OS also previewed. [src] [src] [src]
  • Meta(product_release):Meta’s first Superintelligence Labs model, Muse Spark (codename Avocado), continues rolling out as Meta’s most powerful model and is proprietary — a notable break from the open Llama lineage — powering the Meta AI app/site with planned integration into WhatsApp/Instagram/Facebook/Messenger and glasses. No new incremental Meta model announcement in the 48h window; 2026 AI capex guided at $115B-$135B. [src] [src]
  • xAI(product_release):xAI’s Grok 4.3 (released May 6, cost-efficient frontier model: 1M-token context, native video input, reasoning, ~53 Intelligence Index, #1 CaseLaw v2/CorpFin, $1.25/$2.50 per M tokens) reached full API availability; eight legacy models (grok-4-fast, grok-4-0709, grok-3, grok-code-fast-1, grok-imagine-image-pro, etc.) were retired May 15 with auto-redirect to grok-4.3. Also entering CAISI pre-release government eval agreement. [src] [src] [src]
  • Mistral(other):Mistral acquired Vienna-based Emmi AI (large engineering models / physics-based simulation) in May, and is in talks with European banks to deploy a cybersecurity-focused model positioned as an alternative to Anthropic’s Mythos for banks lacking Mythos access; release timing unconfirmed. [src] [src]
  • Qwen:本窗口无高置信增量信号(见来源与交叉验证说明)。
  • DeepSeek:本窗口无高置信增量信号(见来源与交叉验证说明)。

未验证前沿信号(仅观察,不作为当日主线):

  • Leaked/AI-Studio-metadata benchmarks for the new Google I/O Gemini model (e.g. ~84.6% on ARC-AGI2, a ‘Gemini Omni’ video model card surfaced on Reddit) circulated pre-keynote; Google had not published official benchmarks at time of writing, so exact model name/version and scores are unconfirmed. [src] [src]
  • Reports claim Anthropic’s round could reach as high as ~$950B valuation (vs $900B in Bloomberg’s May 12 report); the higher figure and the $50B upper bound are not confirmed and no term sheet has been signed. [src]

关键人物与社区信号

今日追踪人物的 48 小时实质性产品/研究信号偏薄:OpenAI 侧(Sam Altman / Greg Brockman)以 Musk 诉讼、Brockman 约 300 亿美元持股披露与产品战略接管等市场/法务噪声为主,按规则不计入信号;Andrej Karpathy 的 autoresearch loop 与 Simon Willison 对 Shopify PR 的记录虽在 5/19 被科技媒体回顾,但原始事件发生在 3 月,非 48h 新信号。真正新鲜的高信号集中在 agent 长程记忆研究层:5/18 上线的 LongMINT 基准(UNC Bansal 组)被 paper-digest 列为 Top pick 并标记两位新作者。本日发现扫描产出 3 名达标候选(Mohit Bansal、Hyunji Lee、Aditya Tanna),均不在现有 92 人池内。

  • Simon Willisonsimon-willison):5/19 科技媒体回顾报道再次引用 Simon Willison 对 Karpathy autoresearch 模式及 Shopify Liquid PR(声称 53% 提速)的逐日记录与澄清(实际工具为 Pi TypeScript 工具包而非 Claude Code);属持续高曝光的社区信号源,但核心事件发生在 3 月,非 48h 内新产物,信号强度偏弱仅作记录。 [src]

Energy

  • Behind-the-meter gas remains the fastest path to AI power: CalEthos/TerraVolt Infrastructure (8-K, May 11, 2026) signed a firm natural-gas supply agreement for 55,000 MMBtu/day to feed a 200-240 MW onsite power plant for an SE Idaho AI data-center campus, paying a $3.83M reservation fee (May 8) and committing up to $56M in letters of credit. This is incremental confirmation that grid-bypass, behind-the-meter generation (vs 4-7 yr interconnect queues flagged in the prior daily) is the operative procurement model for new AI capacity in the 48h window. [src] [src] [src]
  • Power, not silicon, is now the binding AI constraint: analysis converging this week estimates up to ~11 GW of 2026 data-center capacity stuck in ‘announced, not under construction’, ~50% of global projects delayed by power/grid-equipment shortages, and high-power transformer lead times stretched to ~5 years. Hyperscalers are responding by relocating multi-$B builds to power-rich regions (Microsoft UAE $15.2B, Meta Louisiana $10B) and signing direct procurement deals to bypass the grid. [src] [src] [src]

Chips

  • HBM supply tightens further and is fully spoken for: SK hynix has effectively sold out DRAM/NAND/HBM to NVIDIA through 2026 (projected ~50% HBM bit share, down from 59% as Samsung rises to ~28%). NVIDIA has asked SK hynix, Samsung and Micron to deliver denser 16-Hi HBM4 by Q4 2026 for Rubin; Samsung+SK hynix tapped as Rubin HBM4 suppliers with shipments from ~March. HBM remains the gating component for accelerator output, reinforcing the prior daily’s supply-constraint thesis. [src] [src] [src]
  • Custom-ASIC erosion of NVIDIA share continues: Meta’s MTIA 400 accelerator completed testing and entered production deployment in Meta data centers (next-gen MTIA carries more HBM for genAI inference), and Google is deliberately pitting MediaTek’s cost positioning against Broadcom’s premium ASIC pricing. Analyst framing: NVIDIA accelerator share drifting toward 55-60% as Broadcom/Marvell custom silicon fills the gap — extends the Broadcom/Marvell duopoly thread from the prior daily, now with Meta’s own silicon in production. [src] [src] [src]

Infra

  • 2026 compute capex confirmed as the first trillion-dollar year: Big Four (MSFT/AMZN/GOOGL/META) at ~$725B (up 77% from $410B in 2025), Q1 alone ~$130B (3.7x Q1-2023); adding Oracle ~$50B, Apple ~$13B, neoclouds ~$60B, China ~$80B, sovereigns ~$60B brings the total to ~$1.04T. The Anthropic-Google Cloud commitment ($200B / 5 GW over 5 years, disclosed May 5) anchors demand. This raises the upper bound vs the prior daily’s $690-755B hyperscaler-only range. [src] [src] [src]
  • Capex-to-realized-capacity gap widening: industry analysis at/around Data Center World 2026 projects 30-50% of planned 2026 data-center capacity slipping to 2027-2028 due to power, transformer and grid-equipment shortages — i.e. the trillion-dollar capex figure overstates near-term deliverable compute. Rack power has jumped from 30-40 kW to hundreds of kW approaching the MW range, intensifying the power/cooling bottleneck. [src] [src]

Model

  • Open-weights inference cost war intensifies: four Chinese frontier-class open-weights models dropped in a ~12-day window in early May 2026 — GLM-5.1 (Z.ai), MiniMax M2.7, Kimi K2.6 (Moonshot), DeepSeek V4 — all competitive on agentic coding benchmarks and running inference at <1/3 of Claude Opus pricing. This is the incremental escalation of the prior daily’s ‘5 open-weights drops in 30 days / falling inference marginal cost’ theme. [src] [src] [src]
  • Western frontier pricing under structural pressure: GPT-5.5 ($5/$30, $0.50 cached) is the new complex-coding frontier; Anthropic’s prior 67% Opus cut (Opus 4.6/4.7 at $5/$25, down from $15/$75) is now visibly the floor it can hold while open-weights run at $0.10-0.28 in/out. Commentary frames Anthropic’s premium strategy as ‘running out of room’ given the 3x-7x cost gap — pricing compression is the dominant model-layer dynamic, no major new Western frontier release in the 48h window. [src] [src] [src]

论文层(consumed from paper-digest 2026-05-19):

  • 2605.17894 Evaluating Cognitive Age Alignment in Interactive AI Agents — 首个心理测量学接地的交互式认知年龄基准,量化 MLLM 智能体的认知年龄差距;当日 HF 热度第一。(市场相关性:把「智能体在交互中表现得像几岁的人」做成可测指标,给面向消费者/教育/陪伴类 agent 产品提供了一个超出准确率的合规与产品定位评估维度——对需要按用户群体调校交互成熟度的部署方有直接选型价值。)
  • 2605.16909 TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents — 面向真实世界的全模态闭环工具使用基准;最强模型仅约 32% 任务成功率,远低于 94% 人类基线。(市场相关性:这是今日最硬的「agent 可信度缺口」数据点:闭环全模态工具使用上最强模型仅 32% vs 人类 94%,直接说明把 tool-using agent 投入无人值守生产仍需重型人类兜底,是评估 agent 部署经济性与 ROI 的基准锚点。)
  • 2605.16079 VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation — 将 agentic 推理内化进实例级视频理解,配四阶段全自动数据合成管线,效果超 GPT-4o。(市场相关性:原生工具调用 + 自动数据合成的范式降低了视频理解 agent 的训练数据采集成本,对监控、媒资、内容审核等视频类 agent 产品的可规模化落地与单位经济性有直接含义。)
  • 2605.18663 GIM: Evaluating models via tasks that integrate multiple cognitive domains — IRT 校准的整合型推理基准,首次大规模量化 test-time compute 与模型能力的权衡,发现思考预算/量化与选型同等重要。(市场相关性:把「思考预算/量化」抬升到与模型选型同等重要的决策维度,等于给推理成本预算与延迟 SLA 的工程团队一份可量化的 test-time-compute 投入产出曲线——直接影响推理账单与 SKU 选择。)
  • 2605.18621 CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark — 跨视角空间智能完整三件套:1.6M 数据集 + 对齐模型 + 基准。(市场相关性:跨视角空间推理是机器人、AR/XR、多摄像头自动驾驶等具身/空间产品的卡点能力;数据集+模型+基准齐备意味着这条能力线从研究演示向可复现工程基线收敛,缩短相关产品的可信度验证周期。)
  • 2605.18565 LongMINT: Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems — 高干扰长程记忆抗干扰基准;7 套记忆系统在多目标干扰下平均仅 27.9% 准确率,系统性暴露检索/构建短板。(市场相关性:几乎所有长程 agent / 企业助理都依赖外置记忆系统,27.9% 的抗干扰准确率说明当前记忆栈在真实多任务噪声下并不可靠——对赌「长程记忆」卖点的 agent 产品与向量库/记忆中间件赛道是直接风险信号。)
  • 2605.18572 MA2^{2}P: A Meta-Cognitive Autonomous Intelligent Agents Framework for Complex Persuasion — 元认知配置器驱动的自治多智能体框架,跨域自动选策略以缓解性能波动,提升复杂劝说成功率。(市场相关性:针对「跨域性能波动」给出元认知调度方案,对销售/客服/谈判类 agent 产品的稳定性是工程化思路;同时劝说能力增强也带来 trust & safety 与监管审视的双刃后果,需在部署侧权衡。)
  • 2605.15572 Measuring Maximum Activations in Open Large Language Models — 系统测量开源 LLM 最大激活幅度,给出 MoE 峰值比同规模 dense 低约 14–23 倍的部署经验律。(市场相关性:MoE 峰值激活比 dense 低一个量级是可直接用于量化/低精度部署与显存预算的经验律,关系到推理硬件选型与单位 token 成本——在当前开源 MoE 模型密集发布的背景下尤具部署决策价值。)

Application

  • Agentic AI crossing from pilot to production at institutional scale: Broadridge (PR May 11, coverage through May 13) put agentic AI live in production across post-trade, account management and client services (trade-fails, break resolution, valuation exceptions), claiming up to 30% Day-1 operational cost reduction and built on deployments across 40+ BPO clients since 2024. Concrete evidence the prior daily’s pilot->production bottleneck is being broken in financial operations. [src] [src] [src]
  • Deployment-services layer being built to attack the production gap: ServiceNow+Accenture launched a Forward Deployed Engineering program to scale agentic AI from enterprise pilot to production, and OpenAI launched a ~$4B deployment company with 19 investment firms/SIs/consultancies. Counter-signal: WRITER reports 79% of orgs still face adoption challenges (double-digit YoY rise), ~60% governance gap, only ~23-29% seeing significant genAI/agent ROI — adoption breadth is real but ROI realization remains thin. [src] [src] [src]

技术信号(paper-digest 不覆盖的工具/发布层):

  • LightSeek Foundation 于 2026-05-07 发布开源(MIT)LLM 推理引擎 TokenSpeed,专为长上下文(>50K token)多轮 agentic 工作负载设计,宣称在 NVIDIA B200 上相比 TensorRT-LLM 批大小 1 延迟约快 9%、100 TPS/User 吞吐约高 11%,MLA kernel 在长前缀 KV cache 的推测解码场景近乎减半 decode 延迟;当前为 preview 状态、无配套学术论文,属 paper-digest 不覆盖的纯工具发布。 [src]
  • vLLM 于 2026-05-11 发布博客称在 Artificial Analysis 推理性能榜登顶:DeepSeek V3.2 达 230 TPS 输出吞吐(约为多数推理服务商 4 倍)、Qwen 3.5 397B 在 12 家服务商中第一且 1 万 token prompt TTFT < 1s、MiniMax-M2.5 在并发 1 下 326 TPS。该成果为合入主仓的内核融合 / 推测解码 / 模型专项优化的工程文档,无学术论文,属基础设施/工具进展,paper-digest 不覆盖。 [src]
  • Google I/O 2026 定于 2026-05-19(与本报告日同日)举行,keynote 美西 10am PT,业界预期发布 Gemini 4.0、Android XR 眼镜与 Aluminium OS——属产品/发布会层信号,非论文,paper-digest 不覆盖;具体内容待发布后由其他 layer 跟进。 [src]

层间联动影响

  • Energy -> Infra -> Chips: the power bottleneck (5-yr transformer lead times, ~50% of projects delayed, ~11 GW announced-but-unbuilt) is what converts the ~$1.04T 2026 compute capex into only partially-realized capacity, with 30-50% of planned 2026 capacity slipping to 2027-2028. This means accelerator/HBM demand (NVIDIA Rubin, SK hynix sold out through 2026) is paced by megawatts available behind-the-meter, not by fab/HBM output alone — driving deals like CalEthos/TerraVolt’s 200-240 MW onsite gas plant. [src] [src] [src]
  • Chips -> Model -> Application: custom-ASIC maturation (Meta MTIA 400 in production, Broadcom/Marvell duopoly, Google playing MediaTek vs Broadcom) plus an open-weights flood (4 Chinese frontier models in 12 days at <1/3 Opus price) collapses inference marginal cost. That cost collapse is the precondition for the application layer pushing agentic AI into production economically — e.g. Broadridge’s 30% Day-1 cost-reduction agentic deployment and the ServiceNow/Accenture and OpenAI deployment-company plays only pencil out once token costs fall. [src] [src] [src]

🎙️ 播客动态

候选池变化与后续关注

本日发现扫描新增 3 名候选人(Mohit Bansal、Hyunji Lee、Aditya Tanna),均经 paper-digest 2026-05-19 交叉核对且确认不在现有 92 人池内;扫描覆盖 X/Twitter 高曝光线程、两档预取播客嘉宾(高深远/Freda Duan 已在候选池故排除)、paper-digest Top picks 与 GitHub trending。

本日新增候选人:

  • Mohit Bansaloss-ai-builders — UNC Chapel Hill 团队 5/18 上线的 LongMINT 长程记忆抗干扰基准(15.6k QA、平均 138.8k tokens、最高 1.8M tokens,现有 memory-augmented agent 平均仅 27.9% 准确率)的资深/末位作者,被 paper-digest 列为今日 Top pick #5 并标记为新作者,代表 agent 记忆评测前沿且不在追踪/候选池内。 [src] [src]
  • Hyunji Leeoss-ai-builders — LongMINT(arXiv 2605.18565,5/18 提交)的第一作者,主导设计这一聚焦多目标干扰下长程记忆评测的新基准,揭示当前记忆增强型 agent 在干扰密集场景下显著失效,是 agent 记忆评测方向有 48h 具体产物的新研究人。 [src] [src]
  • Aditya Tannaoss-ai-builders — 与 Vinay Kumar Sankarapu / Pratinav Seth / Mohamed Bouadi 同组,5/18-5/19 同日在 4 篇表格基础模型与可解释性方向预印本(arXiv 2605.18702/18696/18635/18654)重复出现,被 paper-digest 按「不同论文重复出现 ≥2 次」规则标记为新作者,代表表格 FM/可解释性研究集群且不在现有池内。 [src] [src]

来源与交叉验证说明

来源构成: primary / official:arXiv(经 paper-digest)、OpenAI/xAI 官方迁移文档、docs.x.ai、Broadridge/Accenture 新闻稿、Goldman Sachs。company / filing:CalEthos/TerraVolt 8-K(经 StockTitan/GlobeNewswire 转载,sec.gov 原文 403)、PRNewswire。media / analysis:CNBC / Bloomberg / TechCrunch / VentureBeat / Sherwood / NotebookCheck / TweakTown / TrendForce / DataCenterKnowledge / Futurum / ghacks / theaiinsider。community / social:X.com、Substack(datacenterrichness / macromicro)、marktechpost、vLLM 博客、buildfastwithai、abhs.in、llm-stats、testingcatalog、Reddit、paper-digest 2026-05-19 daily JSON(本报告 consumed)、订阅播客(小宇宙)。

交叉验证: Google I/O Gemini 细节来自 keynote 前预览/直播博客,官方 model card 与 benchmark 截稿未发,版本/分数记为暂定(confidence_flag + coverage_gap);Anthropic 估值区间(Bloomberg $900B vs 其他源约 $950B)与 $30–50B 轮规模为媒体报道、未签约,已下沉至 unverified_frontier_signals;CNBC/Bloomberg 主源对自动抓取返回 403,已用多家二手高质量源交叉确认;paper-digest Top picks 由其自身 ranking + HF trending 双重确认,本报告直接 consumed(未二次 WebSearch arXiv),并继承其 s2_tldr_sparse / s2_similar_unavailable / affiliations_empty 标记;paper-digest summary 文案将 #2 误称 MM-ToolBench,结构化 top_picks #2 实为 TOBench(2605.16909),32% vs 94% 归属 TOBench,以结构化数据为准;CalEthos/TerraVolt 能源条目 sec.gov 8-K 原文 403,依据 GlobeNewswire/Yahoo/StockTitan 同稿转载(条款跨源一致);万亿 capex 总额与 30–50% 产能滑移为分析机构估算(方向高、点值中);开源权重定价/对标为媒体综合 + 价格追踪器交叉(中);Broadridge 30% 降本为公司自报。

覆盖缺口(coverage_gaps):

  • [frontier_radar] qwen_no_incremental_signal: no new Qwen model/release in the 48h window; Qwen 3.5 (Feb 16) remains the current line, Qwen 3.7 not yet shipped
  • [frontier_radar] deepseek_no_incremental_signal: no new DeepSeek release in window; V4-Pro/V4-Flash still in preview (Apr 24), promo pricing ends May 31
  • [frontier_radar] google_io_official_model_card_pending: Gemini model name/version and official benchmarks not yet published at time of writing (keynote in progress)
  • [paper_layer] paper-digest 继承缺口 s2_similar_unavailable:Semantic Scholar 相似论文图谱对全部 135 条候选未返回,延伸阅读/相似工作维度本期为空。
  • [paper_layer] Google I/O 2026(5-19 同日)keynote 内容在本 section 截稿时尚未发布,technical_signals 仅记录日程预期,实际产品发布需由发布会层跟进。
  • [people_pool] Sam Altman / Greg Brockman 等 OpenAI 高管 48h 内仅有 Musk 诉讼、持股披露与产品战略接管等法务/市场噪声,无可计入的产品/技术信号,按规则未纳入 tracked_people_signals。
  • [people_pool] Karpathy autoresearch loop 与 Shopify/Tobi Lütke/David Cortés 的 53% 提速 PR 虽在 5/19 被科技媒体回顾,原始事件发生在 2026 年 3 月,不构成 48h 新信号,故未纳入。
  • [people_pool] 晚点聊 #165(高深远/NVIDIA GEAR DreamDojo)与张小珺 #141(Freda Duan)两位嘉宾均已在候选池(last_seen 2026-05-18),未发现具备独立 48h 产物且未入池的合作者可提名。
  • [macro_news] Primary SEC 8-K filing for CalEthos/TerraVolt (sec.gov ex99-1.htm) returned HTTP 403 to WebFetch; energy entry relies on the GlobeNewswire/Yahoo/StockTitan secondary reproductions of the same press release (terms consistent across sources).
  • [macro_news] No single dated primary release in the precise May 17-19 48h window for the chips and model layers; those entries lean on early-May dated events (HBM4 supplier reports, 4-model open-weights burst) plus mid-May pricing trackers as the freshest available incremental signal.

置信标记(confidence_flags):

  • [frontier_radar] Google I/O details synthesized from pre-keynote previews and live-blog reporting; official Gemini model card/benchmarks pending — treat exact version/scores as provisional
  • [frontier_radar] Anthropic valuation range ($900B Bloomberg vs ~$950B other outlets) and round size ($30B-$50B) are media-reported and not finalized
  • [frontier_radar] CNBC and Bloomberg primary URLs returned 403 to automated fetch; signals corroborated via multiple secondary high-quality reports
  • [paper_layer] inherits_paper_digest_flags: 继承 paper-digest 的 s2_tldr_sparse(135 候选仅 9 条带 S2 tldr,Top picks 的 tldr_en 全空,故本 section 全部 tldr_en 留空)、s2_similar_unavailable、affiliations_empty(arXiv listing 与 HF JSON 均未附机构)。
  • [paper_layer] paper-digest summary 文案中将 #2 称为「MM-ToolBench」,但 top_picks 结构化数据中 #2 实际为 TOBench(arxiv_id 2605.16909),32.0% vs 94.0% 数据点归属 TOBench;本 section 以结构化 top_picks 为准。
  • [paper_layer] technical_signals 全部来自二手聚合源(marktechpost / vllm 博客 / buildfastwithai),TokenSpeed 与 vLLM 性能数字为厂商自报基准,未独立复核。
  • [people_pool] Hyunji Lee、Aditya Tanna 暂无可验证 X handle,机构信息部分依据 arXiv/公开认知标注,后续晋升前需补充身份核验。
  • [people_pool] Aditya Tanna 的 4 篇同日预印本以 paper-digest 抓取的 arXiv ID 为准,本次未逐篇打开正文核对全部作者顺序。
  • [macro_news] infra: trillion-dollar 2026 capex total (~$1.04T) and the 30-50% capacity-slippage figure are analyst/aggregator estimates (Goldman, Futurum, Omdia/DCK), not company filings — directionally high-confidence, point figures medium.
  • [macro_news] model: open-weights-vs-Opus ‘under 1/3 pricing’ and ‘3x-7x cost gap’ are media/analysis syntheses cross-checked against pricing trackers; benchmark-parity claims are vendor/secondary, medium confidence.
  • [macro_news] application: Broadridge 30% Day-1 cost-reduction figure is a company press-release claim (primary but self-reported); 79% adoption-challenge / ROI figures are WRITER survey data, medium confidence.

引用源合计 66 条(完整书目见 ai.json sources[])。


相关细分报告