youtube-to-writer
Dr. Hun Lye has a YouTube channel with 913+ videos of Buddhist teaching content — dharma talks, commentary on texts, guided practice. This system converts that video library into authored written work. It scrapes transcripts, cleans them through a learned correction process, indexes them by topic, then researches across the full library to write long-form content — books, articles — in Dr. Lye’s voice, built from a profile of his existing published work.
Status: incubation — architecture built, not actively in development.
Claude · markdown only
The pipeline
No database server, no orchestration framework. Four skills and a folder of markdown files. The value is in the research step — cross-referencing topics across 913 talks to assemble source material that a single person couldn’t reasonably compile by hand.
SCRAPE YouTube API → transcript extraction
(913+ videos, growing)
──────────────────────────────────────────────────────
CLEAN learned keyword list, correction rules
normalize names, terminology, references
──────────────────────────────────────────────────────
INDEX categorize by topic, build keyword index
← searchable across full library
──────────────────────────────────────────────────────
RESEARCH cross-reference topics across 913 talks
assemble source material for new content
← this is where the value is
──────────────────────────────────────────────────────
WRITE voice profile from published work
→ books, articles in the speaker's voiceWhy it matters
There’s a real problem here: years of teaching content locked in video format. People who prefer reading, people searching for specific topics, people who want to study a text that was discussed across twelve different talks — video doesn’t serve them well. The transcript cleanup alone is non-trivial. Buddhist terminology, Tibetan and Sanskrit names, specific lineage references — standard speech-to-text gets a lot of it wrong. The learned correction list is what makes the transcripts actually usable.