TheCutline

How it's built

TheCutline is a local-first Python pipeline built around DSPy — a framework that treats AI behavior as an optimizable program rather than a hardcoded prompt. The result is a triage system that gets smarter the more you use it.

The pipeline

One command, one file in, one JSON report out. The pipeline runs locally on push — no server, no cron job.

flowchart TD
    A["links/YYYY-MM-DD.txt"] --> B[read_links]
    B --> C["for each URL"]
    C --> D["scrape\nhttpx + BeautifulSoup"]
    D --> E["triage\nDSPy TriageSignature"]
    E -->|Must Read / Skim| F["summarize\nDSPy SummarizeSignature"]
    E -->|Bankruptcy| G[skip]
    F --> H[sort by priority]
    G --> H
    H --> I["synthesize_lead\n1-2 sentence theme"]
    I --> J["reports/YYYY-MM-DD.json"]
    J --> K["update manifest.json"]
      

DSPy — behavior as a program

Instead of writing prompts, DSPy lets you define signatures: typed contracts that specify what goes in and what comes out. The framework handles prompt construction, retries, and output parsing. The key advantage: signatures are optimizable — you can run a feedback loop that compiles a better version of the program from examples.

TheCutline uses two signatures. TriageSignature classifies and rates each article. SummarizeSignature extracts actionable bullets for anything that cleared the Must Read or Skim bar.

flowchart LR
    subgraph TriageSignature
        T1["url, content, profile"] --> T2["DSPy Predict"]
        T2 --> T3["title, priority, rationale, tags"]
    end
    subgraph SummarizeSignature
        S1["url, content, priority"] --> S2["DSPy Predict"]
        S2 --> S3["summary_bullets"]
    end
    TriageSignature -->|priority != Bankruptcy| SummarizeSignature
      

profile.md is injected into the triage signature at load time — a plain markdown file that describes the user's context, interests, and signal framework. Changing it changes what the system considers worth reading.

Tags

Each article is tagged by the triage step from a fixed vocabulary of 22 topics. Using a closed set keeps the tag cloud coherent — the LLM can't invent new tags, so every tag that appears is one you can actually browse across reports.

Tags are emitted by TriageSignature alongside priority and rationale — no extra LLM call. Each article gets 1–3 tags. Browse the full tag cloud →

The feedback loop

After reading a report, you can vote on the AI's calls. Those votes accumulate as training examples. Running just optimize passes them to DSPy's optimizer, which compiles a new version of the triage program and saves it to optimized/triage.json. On the next run, the pipeline loads it automatically.

flowchart LR
    A[Read report] --> B["Vote on each item"]
    B -->|agree| C[spot_on]
    B -->|too low| D[higher_priority]
    B -->|too high| E[lower_priority]
    C --> F["feedback/YYYY-MM-DD.json"]
    D --> F
    E --> F
    F --> G["just optimize"]
    G --> H["DSPy MIPROv2 optimizer"]
    H --> I["optimized/triage.json"]
    I --> J[Pipeline gets smarter]
      

v1 deployment

The v1 stack keeps infrastructure surface area at zero. Git is the queue, GitHub Actions is the compute, the repo is the database, and Cloudflare Pages is the CDN. No servers to manage, no database to maintain.

flowchart TD
    A["Commit links/YYYY-MM-DD.txt"] --> B[GitHub Actions]
    B --> C["uv run main.py"]
    C --> D["reports/YYYY-MM-DD.json\nmanifest.json updated"]
    D --> E["GHA commits reports/ to main"]
    E --> F[Cloudflare Pages]
    F --> G["Astro build"]
    G --> H["site live"]
      
Input
Git as the queue — links/YYYY-MM-DD.txt committed to the repo. Full link history in git, diffs readable, nothing to maintain.
Storage
The repo itself. reports/*.json are committed artifacts. No S3, no R2, no database. Every report is auditable via git history.
Compute
GitHub Actions on push. Secrets live in GHA repo secrets. Free tier is ample — ~2 min per run, well within 2000 min/month.
Frontend
Cloudflare Pages — static files served globally from the edge. No build step complexity. Astro reads JSON at build time.

What comes next

  • Cloudflare Worker + D1 for a real feedback API (no more copy-paste JSON)
  • Automated optimize loop — weekly GHA job pulls feedback, recompiles the DSPy program
  • Cron trigger alongside push — reports generate even without a new commit
  • Google Sheets as input — zero-friction link capture from mobile