youtube-to-writer

Dr. Hun Lye has a YouTube channel with 913+ videos of Buddhist teaching content — dharma talks, commentary on texts, guided practice. This system converts that video library into authored written work. It scrapes transcripts, cleans them through a learned correction process, indexes them by topic, then researches across the full library to write long-form content — books, articles — in Dr. Lye’s voice, built from a profile of his existing published work.

Status: incubation — architecture built, not actively in development.

Claude · markdown only

The pipeline

No database server, no orchestration framework. Four skills and a folder of markdown files. The value is in the research step — cross-referencing topics across 913 talks to assemble source material that a single person couldn’t reasonably compile by hand.

SCRAPE            YouTube API → transcript extraction
                   (913+ videos, growing)
──────────────────────────────────────────────────────
CLEAN             learned keyword list, correction rules
                   normalize names, terminology, references
──────────────────────────────────────────────────────
INDEX             categorize by topic, build keyword index
                   ← searchable across full library
──────────────────────────────────────────────────────
RESEARCH          cross-reference topics across 913 talks
                   assemble source material for new content
                   ← this is where the value is
──────────────────────────────────────────────────────
WRITE             voice profile from published work
                   → books, articles in the speaker's voice

Why it matters

There’s a real problem here: years of teaching content locked in video format. People who prefer reading, people searching for specific topics, people who want to study a text that was discussed across twelve different talks — video doesn’t serve them well. The transcript cleanup alone is non-trivial. Buddhist terminology, Tibetan and Sanskrit names, specific lineage references — standard speech-to-text gets a lot of it wrong. The learned correction list is what makes the transcripts actually usable.