Building a GPT YouTube Summarizer with Supabase

A practical walkthrough of the product and technical decisions behind Podcastdotai, a YouTube summarizer powered by GPT and Supabase.

June 11, 2026GPTSupabaseNext.jsSide Projects

I have been working on Podcastdotai, a small app that summarizes YouTube videos with GPT. The product idea is straightforward: paste in a video URL, get back a useful summary, and decide whether the full video is worth your time.

This post is a starting point for documenting how I built it, what I learned, and where I would take the architecture next.

The shape of the app

At a high level, the app needs to do four things:

Accept a YouTube URL from the user.
Fetch or derive the video transcript.
Send the transcript to an LLM with a prompt that produces a useful summary.
Store the result so the user can revisit it without paying the full generation cost again.

That makes the architecture a good fit for a Next.js app backed by Supabase. Next.js handles the UI and API routes. Supabase gives the project authentication, persistence, and a Postgres database without a lot of early operational overhead.

Why store summaries?

The simplest version of this app could be stateless: the user submits a URL, the app generates a summary, and the result disappears when they leave the page.

That works for a demo, but it breaks down quickly in a real product. Storing summaries gives you a better experience and a better cost profile:

Users can return to summaries later.
Duplicate requests can reuse existing results.
The app can show history, search, and saved items.
You can inspect failures and improve prompts over time.

Even for a side project, that product loop matters.

Data model

The core database tables can stay small:

create table videos (
  id uuid primary key default gen_random_uuid(),
  youtube_url text not null,
  title text,
  transcript text,
  created_at timestamptz not null default now()
);

create table summaries (
  id uuid primary key default gen_random_uuid(),
  video_id uuid not null references videos(id),
  summary text not null,
  model text,
  created_at timestamptz not null default now()
);

You can normalize further later, but this is enough to support the first version of the product.

Prompting tradeoffs

The prompt should produce summaries that are easy to scan. I prefer asking for structured sections instead of a single paragraph:

One paragraph overview.
Key ideas.
Tactical takeaways.
Notable quotes or timestamps if available.

The important part is being honest about the transcript quality. A bad transcript should not produce a confident summary.

What I would improve next

The next iteration should focus on reliability:

Cache generated summaries by video URL.
Track transcript extraction failures.
Add loading and retry states that explain what is happening.
Add tests around URL parsing and summary persistence.
Make the summary format consistent enough to support search later.

The lesson so far is that the LLM call is only one piece of the product. The surrounding workflow, persistence, and failure handling determine whether the app feels useful or fragile.