← 返回主报告：[Podcast·2026-W20] Report

Dwarkesh Patel — What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

Group: ai
Channel: @DwarkeshPatel
Published: 2026-05-15
Duration: 2h37m
Language: en
Evidence: youtube_subtitles

TL;DR

Dwarkesh专访Eric Jang复刻AlphaGo：搜索、自我对弈、从经验学习是智能三原语，AlphaGo是"看似不可计算"问题被压缩进单次前向传播的最干净worked example。MCTS每步直接吐出严格更优动作作为训练靶子，正面绕过LLM在10万token轨迹里做credit assignment的痛点——人类学习方式更接近MCTS而非naive policy gradient。架构在当前GPU速度下基本无差异，Transformer/ResNet等价；KataGo所需V100集群如今半个桌面Blackwell就够，9x9小棋盘是compute multiplier。Auto-research试验显示LLM已能实现实验、跑hparam搜索，但"选下一个该问的问题"和"跳出研究死胡同"仍未解决。

Hanzhi's BLOG

[Podcast·2026-W20] AI · X_ZVSPcZhtw

Dwarkesh Patel — What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

TL;DR