GROUND TRUTH

media-tsunami

Name: media-tsunami
Author: WhyStrohm

The empirical layer. Extracts brand voice as executable code — cadence, vocabulary, forbidden words, exemplar sentences — serialized as a CLAUDE.md any LLM can load.

Updated Apr 28, 2026

Hand the engine over →Or run it yourself · GitHub →

This is the math behind the magic. Real stylometric analysis, reproducible output, deterministic file format. Every other WhyStrohm skill reads from what tsunami generates.

The Problem

Conversational voice profiles (the kind LLMs generate) are non-deterministic. Run the same prompt twice, get two different voice descriptions. That is not infrastructure; that is vibes.

What It Does

01Generates a deterministic brand-config.json from a corpus of content
02Outputs cadence statistics, vocabulary clusters, exemplar sentences, forbidden words
03Same input produces same output, every time, byte-for-byte reproducible

See it runmedia-tsunami

Empirical extraction · cadence + vocab + exemplars · same corpus equals same output, every run

Why tsunami exists

The other voice tools in this package are conversational — they ask an LLM to characterize the voice. That works but isn't reproducible. Different LLM runs produce different profiles for the same content.

tsunami is empirical. It computes the voice profile from the raw text using standard NLP techniques:

TF-IDF clusters for vocabulary signatures
Sentence length distributions for cadence
Centroid-based selection for exemplar sentences
Wikitext baseline comparison for forbidden words

Run it twice on the same corpus, get byte-identical output. That is what "deterministic" means.

What the output looks like

{
  "brand": "Insightful Recovery Solutions",
  "version": "1.2",
  "extracted_at": "2026-05-12T14:30:00Z",
  "axes": {
    "authority": 78,
    "emotional_temperature": 64,
    "proof_density": 82,
    "cadence": "short-punchy",
    "vocabulary_range": "accessible-clinical"
  },
  "signature_vocabulary": ["..."],
  "forbidden_words": ["..."],
  "exemplar_sentences": ["..."]
}

Drop that JSON into any of the other WhyStrohm skills as the shared reference. They all read the same file.

Install

git clone https://github.com/whystrohm/media-tsunami.git
cd media-tsunami
pip install -e .

Full docs and methodology on GitHub →

How It Composes

Sits at the ground truth layer. Every conversational skill (voice-extract, audit, voice-scorer, digital-twin) can use the tsunami-generated brand-config.json as their shared reference point. One source of truth, four operators, infinite content.

01 — ORCHESTRATION

ritual

02 — GROUND TRUTH

media-tsunamiPython CLI · voice as code

03 — CONVERSATIONAL OPERATORS

voice-extract audit voice-scorer digital-twin

04 — PRODUCTION

shotkit slopfiles

Related Skills

VOICE★ 1

whystrohm-voice-extract

Extract a 6-dimension voice profile from any URL. Generate 15-20 enforceable guardrails. Outputs as CLAUDE.md.

Install →

PRODUCTION★ 6

shotkit

Pre-production for founder-led video at scale. Brief becomes storyboard, shot specs, and per-generator prompts in minutes.

Install →

Why tsunami exists

tsunami is empirical. It computes the voice profile from the raw text using standard NLP techniques:

TF-IDF clusters for vocabulary signatures

Sentence length distributions for cadence

Centroid-based selection for exemplar sentences

Wikitext baseline comparison for forbidden words

Run it twice on the same corpus, get byte-identical output. That is what "deterministic" means.

What the output looks like

{ "brand": "Insightful Recovery Solutions", "version": "1.2", "extracted_at": "2026-05-12T14:30:00Z", "axes": { "authority": 78, "emotional_temperature": 64, "proof_density": 82, "cadence": "short-punchy", "vocabulary_range": "accessible-clinical" }, "signature_vocabulary": ["..."], "forbidden_words": ["..."], "exemplar_sentences": ["..."] }

Drop that JSON into any of the other WhyStrohm skills as the shared reference. They all read the same file.