media-tsunami
The empirical layer. Extracts brand voice as executable code — cadence, vocabulary, forbidden words, exemplar sentences — serialized as a CLAUDE.md any LLM can load.
This is the math behind the magic. Real stylometric analysis, reproducible output, deterministic file format. Every other WhyStrohm skill reads from what tsunami generates.
The Problem
Conversational voice profiles (the kind LLMs generate) are non-deterministic. Run the same prompt twice, get two different voice descriptions. That is not infrastructure; that is vibes.
What It Does
- 01Generates a deterministic brand-config.json from a corpus of content
- 02Outputs cadence statistics, vocabulary clusters, exemplar sentences, forbidden words
- 03Same input produces same output, every time, byte-for-byte reproducible
Why tsunami exists
The other voice tools in this package are conversational — they ask an LLM to characterize the voice. That works but isn't reproducible. Different LLM runs produce different profiles for the same content.
tsunami is empirical. It computes the voice profile from the raw text using standard NLP techniques:
- TF-IDF clusters for vocabulary signatures
- Sentence length distributions for cadence
- Centroid-based selection for exemplar sentences
- Wikitext baseline comparison for forbidden words
Run it twice on the same corpus, get byte-identical output. That is what "deterministic" means.
What the output looks like
{
"brand": "Insightful Recovery Solutions",
"version": "1.2",
"extracted_at": "2026-05-12T14:30:00Z",
"axes": {
"authority": 78,
"emotional_temperature": 64,
"proof_density": 82,
"cadence": "short-punchy",
"vocabulary_range": "accessible-clinical"
},
"signature_vocabulary": ["..."],
"forbidden_words": ["..."],
"exemplar_sentences": ["..."]
}
Drop that JSON into any of the other WhyStrohm skills as the shared reference. They all read the same file.
Install
git clone https://github.com/whystrohm/media-tsunami.git
cd media-tsunami
pip install -e .
Full docs and methodology on GitHub →
How It Composes
Sits at the ground truth layer. Every conversational skill (voice-extract, audit, voice-scorer, digital-twin) can use the tsunami-generated brand-config.json as their shared reference point. One source of truth, four operators, infinite content.
Related Skills
whystrohm-voice-extract
Extract a 6-dimension voice profile from any URL. Generate 15-20 enforceable guardrails. Outputs as CLAUDE.md.
Install →shotkit
Pre-production for founder-led video at scale. Brief becomes storyboard, shot specs, and per-generator prompts in minutes.
Install →