AutomateLab - Publishing Skills

0

Three composable skills that turn an AI coding agent (Claude, Cursor, Codex, Gemini, Copilot, ...) into a long-tail SEO publishing pipeline

3 skills

blog-figure-svg

Stop using stock photos. Generate accessible, lightweight SVG figures for any blog or CMS - flow diagrams, comparison bar charts, taxonomy/Venn diagrams, annotated terminal mocks, and 1600x840 OG feature cards. Hand-authored SVG (no embedded fonts, no external assets, no AI-image latency) with a consistent palette, screen-reader metadata (title + desc + aria-labelledby), and a figcaption-required handoff to the writer. Rasterizes to compressed PNG ready for Ghost, WordPress, Webflow, or any static-site generator. Built for content marketers, indie hackers, and dev-tool blogs that want unique illustrations on every post without paying a designer or burning Midjourney credits. Trigger when the user says: 'add a figure to the post', 'illustrate this comparison', 'draw a flow diagram for X', 'make a feature/OG image', or any request to produce a chart/diagram for editorial use.

# blog-figure-svg Produces SVG figures intended for blog posts: in-line illustrations (1 per ~500 body words is the rule of thumb) and a templated OG feature card. Output is a clean SVG file (the editable source) rasterized to a compressed PNG (what the post references). Every figure carries `title` + `desc` + `role="img"` so screen readers can read it. This skill is **platform-agnostic** — the SVG and PNG it produces work in any CMS (Ghost, WordPress, Webflow, Sanity) or static-site generator (Hugo, Astro, Eleventy, Jekyll, Next-MDX). It complements `seo-blog-writer` (handles the publish step for whatever platform you're on) and `blog-topic-research` (validates the topic). Use it during the **illustration step** of writing a post — after the prose is stable so the anchor sentences are final. ``` /blog-figure-svg flow "<title>" --steps "Trigger -> Filter -> HTTP -> Slack" /blog-figure-svg compare "<title>" --bars "Zapier:0.03,Make:0.015,n8n:0.008" --unit "$ per task" /blog-figure-svg taxonomy "<title>" --groups "Workflows,Agents,RPA" --notes "see references/style-examples.md" /blog-figure-svg terminal "<title>" --lines "$ npm install\nadded 42 packages" /blog-figure-svg feature "<headline>" --accent "#4F46E5" --pill "How To" ``` All variants write to `tmp/blog-drafts/<slug>-<N>-<short-name>.svg` (editable source, gitignored), then rasterize to `<slug>-<N>-<short-name>.png` (uploaded to the blog CDN). --- ## Before you start The skill expects a working directory it can write into. Default: `tmp/blog-drafts/`. The PNG rasterizer requires one of: - **ImageMagick** (`magick` command) — preferred. `magick -density 192 -background white in.svg -resize 1600x out.png`. - **rsvg-convert** — `rsvg-convert -w 1600 -b white in.svg -o out.png`. - **inkscape** (CLI) — `inkscape --export-type=png --export-width=1600 in.svg`. - **cairosvg** (Python) — `pip install cairosvg`; `cairosvg in.svg -W 1600 -o out.png`. Plus **pngquant** (or `oxipng`) for compression — typical 60-80% size reduction with no visible quality loss. Core Web Vitals and ad-network reviews (Mediavine, Raptive) care about image weight. ```bash command -v magick || command -v rsvg-convert || command -v inkscape || python3 -c "import cairosvg" 2>/dev/null \ || echo "no SVG rasterizer found - install one of magick, rsvg-convert, inkscape, cairosvg" command -v pngquant || command -v oxipng || echo "no PNG compressor - install pngquant or oxipng" ``` --- ## The three illustration shapes Match each figure to a paragraph the reader has just finished, and to **one concrete information structure**: | Shape | Use when the post... | Variant | |---|---|---| | **Comparison** | ...cites two or more numbers (prices, latencies, accuracy, counts) | `compare` (bar chart) | | **Taxonomy** | ...introduces named categories (e.g. workflow / agent / RPA, or trigger / action / filter) | `taxonomy` (Venn, hierarchy, or labelled groups) | | **Process / flow** | ...describes a "how to" sequence, integration topology, or decision tree | `flow` (horizontal flow with named steps) | | **CLI / API mock** | ...shows command output, an error message, or a config blob | `terminal` (annotated terminal mock) | | **Title card** | ...needs an OG feature image | `feature` (1600x840 templated card) | **Never plot data the post doesn't already cite.** If you can't identify even one information structure to illustrate, skip — note in the report "no figures: post is too short / too definitional." **Hard rule for editorial pipelines:** any post >=800 words needs at least 1 figure; figure count = `max(1, body_words // 500)`. Sub-800-word definitional explainers are the only legitimate zero-figure case. --- ## Palette and typography Pick from these hex values. **No new hues** — consistency across figures is the brand: | Hex | Role | |---|---| | `#3b82f6` | accent blue — primary data series | | `#fb923c` | orange — secondary series | | `#10b981` | green — tertiary / positive | | `#0b0b11` | text — titles, primary callouts | | `#475569` / `#6b7280` / `#9ca3af` | greys — secondary labels, axis ticks | | `#cbd5e1` / `#94a3b8` | light greys — gridlines, weak series | | `#fafafa` | background fill | **Typography:** `font-family="ui-sans-serif,system-ui,-apple-system,Segoe UI,Roboto,sans-serif"` only. **No embedded web fonts** — they fail to load in feed readers, dark-mode previews, and AMP renders. Sizes: title 20px bold, section labels 14-16px, axis labels 11-13px. **ViewBox:** `viewBox="0 0 800 <height>"` for inline figures (a sane width for most CMS content columns, including Ghost's Casper, the WordPress block editor, and Hugo / Astro defaults); `viewBox="0 0 1600 840"` for OG cards. **Do not set root `width`/`height` attributes** — let the host theme scale. --- ## SVG skeleton (every figure) ```svg <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 360" font-family="ui-sans-serif,system-ui,-apple-system,Segoe UI,Roboto,sans-serif" role="img" aria-labelledby="t1 d1"> <title id="t1">Short, informative title - what the figure shows</title> <desc id="d1">Long-form description for screen readers - what the bars/circles/lines depict, including all numbers shown on screen</desc> <rect width="800" height="360" fill="#fafafa"/> <!-- bars / circles / paths / labels --> </svg> ``` **Accessibility checklist:** - `role="img"` on the root `<svg>`. - `<title>` + `<desc>` referenced via `aria-labelledby` (NOT `aria-describedby` — the former covers both). - Suffix IDs with the figure number (`t1`/`d1`, `t2`/`d2`, ...) so multiple figures on one page don't collide. - `<desc>` includes every number visible in the figure (screen readers can't OCR the chart). **Honesty:** never round towards a more dramatic gap, never anchor an axis to inflate differences. If the data is "practitioner observation, not a measured study," say so in `<desc>` and in a small grey caption inside the figure. --- ## Variant: `flow` — horizontal process flow For: "how to" sequences, integration topology, decision trees. ```python # Args: title, steps (--steps "Trigger -> Filter -> HTTP -> Slack") import sys, html, pathlib title, steps_arg, out_path = sys.argv[1], sys.argv[2], sys.argv[3] steps = [s.strip() for s in steps_arg.split('->') if s.strip()] n = len(steps) assert 2 <= n <= 7, f"flow needs 2-7 steps, got {n}" W, H = 800, 240 margin_x = 60 gap = (W - 2*margin_x) / (n - 1) if n > 1 else 0 box_w, box_h = 130, 64 cy = H // 2 + 10 nodes = [] arrows = [] for i, s in enumerate(steps): cx = margin_x + i * gap x = cx - box_w / 2 y = cy - box_h / 2 nodes.append( f'<rect x="{x:.0f}" y="{y:.0f}" width="{box_w}" height="{box_h}" ' f'rx="8" fill="#fff" stroke="#3b82f6" stroke-width="2"/>' f'<text x="{cx:.0f}" y="{cy + 5:.0f}" text-anchor="middle" ' f'font-size="14" font-weight="600" fill="#0b0b11">{html.escape(s)}</text>' ) if i < n - 1: x1 = cx + box_w / 2 x2 = margin_x + (i + 1) * gap - box_w / 2 arrows.append( f'<line x1="{x1:.0f}" y1="{cy}" x2="{x2 - 8:.0f}" y2="{cy}" ' f'stroke="#6b7280" stroke-width="2" marker-end="url(#arrow)"/>' ) desc = f"Flow diagram showing steps: {' to '.join(steps)}." svg = f'''<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 {W} {H}" font-family="ui-sans-serif,system-ui,-apple-system,Segoe UI,Roboto,sans-serif" role="img" aria-labelledby="t1 d1"> <title id="t1">{html.escape(title)}</title> <desc id="d1">{html.escape(desc)}</desc> <defs> <marker id="arrow" viewBox="0 0 10 10" refX="8" refY="5" markerWidth="6" markerHeight="6" orient="auto"> <path d="M0,0 L10,5 L0,10 z" fill="#6b7280"/> </marker> </defs> <rect width="{W}" height="{H}" fill="#fafafa"/> <text x="{W//2}" y="40" text-anchor="middle" font-size="20" font-weight="700" fill="#0b0b11">{html.escape(title)}</text> {"".join(arrows)} {"".join(nodes)} </svg>''' pathlib.Path(out_path).write_text(svg, encoding='utf-8') print(f"wrote {out_path} ({n} steps)") ``` --- ## Variant: `compare` — bar chart For: numeric comparisons (prices, latencies, accuracy, counts). 2-7 bars. ```python # Args: title, bars (--bars "Zapier:0.03,Make:0.015,n8n:0.008"), unit import sys, html, pathlib title, bars_arg, unit, out_path = sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4] pairs = [] for chunk in bars_arg.split(','): label, val = chunk.split(':') pairs.append((label.strip(), float(val.strip()))) n = len(pairs) assert 2 <= n <= 7, f"compare needs 2-7 bars, got {n}" W, H = 800, 360 margin_x, margin_top, margin_bottom = 80, 70, 70 plot_w = W - 2 * margin_x plot_h = H - margin_top - margin_bottom max_v = max(v for _, v in pairs) bar_w = plot_w / (n * 1.5) gap = bar_w * 0.5 colors = ['#3b82f6', '#fb923c', '#10b981', '#94a3b8', '#6b7280', '#cbd5e1', '#475569'] bars = [] labels = [] for i, (label, val) in enumerate(pairs): h = (val / max_v) * plot_h if max_v else 0 x = margin_x + i * (bar_w + gap) y = margin_top + (plot_h - h) bars.append( f'<rect x="{x:.0f}" y="{y:.0f}" width="{bar_w:.0f}" height="{h:.0f}" fill="{colors[i % len(colors)]}"/>' f'<text x="{x + bar_w/2:.0f}" y="{y - 8:.0f}" text-anchor="middle" font-size="13" font-weight="600" fill="#0b0b11">{val:g}</text>' ) labels.append( f'<text x="{x + bar_w/2:.0f}" y="{H - margin_bottom + 24:.0f}" text-anchor="middle" font-size="13" fill="#475569">{html.escape(label)}</text>' ) desc = f"Bar chart comparing {unit}: " + ", ".join(f"{label} {val:g}" for label, val in pairs) + "." svg = f'''<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 {W} {H}" font-family="ui-sans-serif,system-ui,-apple-system,Segoe UI,Roboto,sans-serif" role="img" aria-labelledby="t1 d1"> <title id="t1">{html.escape(title)}</title> <desc id="d1">{html.escape(desc)}</desc> <rect width="{W}" height="{H}" fill="#fafafa"/> <text x="{W//2}" y="36" text-anchor="middle" font-size="20" font-weight="700" fill="#0b0b11">{html.escape(title)}</text> <text x="{W//2}" y="56" text-anchor="middle" font-size="12" fill="#6b7280">{html.escape(unit)}</text> <line x1="{margin_x}" y1="{H - margin_bottom}" x2="{W - margin_x}" y2="{H - margin_bottom}" stroke="#cbd5e1" stroke-width="1"/> {"".join(bars)} {"".join(labels)} </svg>''' pathlib.Path(out_path).write_text(svg, encoding='utf-8') print(f"wrote {out_path} ({n} bars)") ``` --- ## Variant: `taxonomy` — labelled groups (Venn-lite) For: introducing named categories. 2-4 groups. ```python # Args: title, groups (--groups "Workflows,Agents,RPA"), notes import sys, html, pathlib, math title, groups_arg, out_path = sys.argv[1], sys.argv[2], sys.argv[3] groups = [g.strip() for g in groups_arg.split(',') if g.strip()] n = len(groups) assert 2 <= n <= 4, f"taxonomy needs 2-4 groups, got {n}" W, H = 800, 400 cx, cy = W // 2, H // 2 + 20 r = 110 colors = ['#3b82f6', '#fb923c', '#10b981', '#94a3b8'] opacity = 0.4 circles = [] labels = [] if n == 2: positions = [(cx - 60, cy), (cx + 60, cy)] elif n == 3: positions = [(cx, cy - 50), (cx - 70, cy + 40), (cx + 70, cy + 40)] else: # 4 positions = [(cx - 70, cy - 50), (cx + 70, cy - 50), (cx - 70, cy + 50), (cx + 70, cy + 50)] for i, ((x, y), label) in enumerate(zip(positions, groups)): circles.append( f'<circle cx="{x}" cy="{y}" r="{r}" fill="{colors[i]}" fill-opacity="{opacity}" stroke="{colors[i]}" stroke-width="2"/>' ) # Label outside the circle, away from center dx, dy = x - cx, y - cy mag = math.sqrt(dx*dx + dy*dy) or 1 lx = x + (dx / mag) * (r + 30) ly = y + (dy / mag) * (r + 30) labels.append( f'<text x="{lx:.0f}" y="{ly:.0f}" text-anchor="middle" font-size="15" font-weight="600" fill="#0b0b11">{html.escape(label)}</text>' ) desc = f"Taxonomy diagram showing groups: {', '.join(groups)}, with overlapping regions indicating shared concepts." svg = f'''<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 {W} {H}" font-family="ui-sans-serif,system-ui,-apple-system,Segoe UI,Roboto,sans-serif" role="img" aria-labelledby="t1 d1"> <title id="t1">{html.escape(title)}</title> <desc id="d1">{html.escape(desc)}</desc> <rect width="{W}" height="{H}" fill="#fafafa"/> <text x="{W//2}" y="40" text-anchor="middle" font-size="20" font-weight="700" fill="#0b0b11">{html.escape(title)}</text> {"".join(circles)} {"".join(labels)} </svg>''' pathlib.Path(out_path).write_text(svg, encoding='utf-8') print(f"wrote {out_path} ({n} groups)") ``` --- ## Variant: `terminal` — annotated terminal mock For: command output, error messages, config blobs. ```python # Args: title, lines (newline-separated), out_path import sys, html, pathlib title, lines_arg, out_path = sys.argv[1], sys.argv[2], sys.argv[3] lines = lines_arg.split('\n') assert 1 <= len(lines) <= 16, f"terminal needs 1-16 lines, got {len(lines)}" W = 800 line_h = 22 H = 80 + line_h * len(lines) + 30 chrome_h = 36 rows = [] for i, ln in enumerate(lines): y = 80 + chrome_h + i * line_h # Highlight error lines red, prompt lines green color = '#fb923c' if 'error' in ln.lower() or 'fail' in ln.lower() else '#10b981' if ln.startswith('$') else '#cbd5e1' rows.append( f'<text x="32" y="{y}" font-family="ui-monospace, Menlo, Consolas, monospace" ' f'font-size="14" fill="{color}" xml:space="preserve">{html.escape(ln)}</text>' ) desc = "Terminal mock showing: " + " | ".join(ln for ln in lines if ln.strip())[:200] svg = f'''<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 {W} {H}" font-family="ui-sans-serif,system-ui,-apple-system,Segoe UI,Roboto,sans-serif" role="img" aria-labelledby="t1 d1"> <title id="t1">{html.escape(title)}</title> <desc id="d1">{html.escape(desc)}</desc> <rect width="{W}" height="{H}" fill="#fafafa"/> <text x="{W//2}" y="36" text-anchor="middle" font-size="18" font-weight="700" fill="#0b0b11">{html.escape(title)}</text> <rect x="20" y="60" width="{W - 40}" height="{H - 80}" rx="8" fill="#0b0b11"/> <circle cx="40" cy="78" r="6" fill="#fb923c"/> <circle cx="58" cy="78" r="6" fill="#10b981"/> <circle cx="76" cy="78" r="6" fill="#94a3b8"/> {"".join(rows)} </svg>''' pathlib.Path(out_path).write_text(svg, encoding='utf-8') print(f"wrote {out_path} ({len(lines)} lines)") ``` --- ## Variant: `feature` — OG / feature card (1600x840) For: the post's hero image (Ghost `feature_image`, WordPress `featured_media`, the static-site front-matter `feature_image` field, OG previews, social cards). One per post. The card uses a tinted gradient background, a 24px grid pattern at 7% opacity, a soft radial highlight, and either a giant accent number (when the headline contains a 1-3 digit number) or a placeholder icon slot. Brand text (your wordmark, pill label) is configurable. ```python # Args: headline, accent (hex), pill (short tag like "How To"), brand_wordmark, out_path import sys, html, textwrap, re, pathlib headline, accent, pill, brand, out_path = sys.argv[1:6] # Auto-fit headline: 3-line cap on common tiers (longest tier may use 4). n = len(headline) if n <= 32: size, wrap, max_lines = 120, 14, 2 elif n <= 60: size, wrap, max_lines = 92, 20, 3 elif n <= 90: size, wrap, max_lines = 76, 26, 3 else: size, wrap, max_lines = 60, 32, 4 lines = textwrap.wrap(headline, wrap)[:max_lines] line_h = int(size * 1.15) total_h = line_h * (len(lines) - 1) + size y0 = 420 - total_h // 2 + size # vertical center inside 1600x840 tspans = "".join( f'<tspan x="120" dy="{0 if i==0 else line_h}">{html.escape(line)}</tspan>' for i, line in enumerate(lines) ) # Hero element: number-as-hero when the headline has a 1-3 digit number, # otherwise a clean geometric placeholder. Skips 4-digit matches (years). m = re.search(r'\b(\d{1,3})\b', headline) if m: hero = ( f'<text x="1500" y="640" text-anchor="end" font-family="ui-sans-serif, system-ui, sans-serif" ' f'font-weight="800" font-size="500" fill="{accent}" ' f'opacity="0.20" letter-spacing="-20">{m.group(1)}</text>' ) else: # Default placeholder icon: stacked geometric shapes hero = ( f'<g transform="translate(1190,260) scale(1.0)" fill="none" stroke="{accent}" stroke-width="7" stroke-linecap="round">' f'<circle cx="140" cy="140" r="100" opacity="0.4"/>' f'<circle cx="140" cy="140" r="60" opacity="0.6"/>' f'<circle cx="140" cy="140" r="20" fill="{accent}" opacity="1"/>' f'</g>' ) svg = f'''<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1600 840" role="img" aria-labelledby="t1 d1"> <title id="t1">{html.escape(headline)}</title> <desc id="d1">Feature card for blog post: {html.escape(headline)}. Pill label: {html.escape(pill)}.</desc> <defs> <linearGradient id="bg" x1="0" y1="0" x2="1" y2="1"> <stop offset="0%" stop-color="#F8FAFC"/> <stop offset="100%" stop-color="#E2E8F0"/> </linearGradient> <radialGradient id="hi" cx="0.15" cy="0.1" r="0.7"> <stop offset="0%" stop-color="{accent}" stop-opacity="0.18"/> <stop offset="100%" stop-color="{accent}" stop-opacity="0"/> </radialGradient> <pattern id="grid" x="0" y="0" width="24" height="24" patternUnits="userSpaceOnUse"> <path d="M 24 0 L 0 0 0 24" fill="none" stroke="{accent}" stroke-width="1" opacity="0.07"/> </pattern> </defs> <rect width="1600" height="840" fill="url(#bg)"/> <rect width="1600" height="840" fill="url(#grid)"/> <rect width="1600" height="840" fill="url(#hi)"/> <rect x="0" y="0" width="14" height="840" fill="{accent}"/> {hero} <text x="120" y="{y0}" font-family="ui-sans-serif, system-ui, sans-serif" font-size="{size}" font-weight="800" fill="#0F172A" letter-spacing="-2">{tspans}</text> <text x="120" y="760" font-family="ui-sans-serif, system-ui, sans-serif" font-size="28" font-weight="700" fill="{accent}" letter-spacing="2">{html.escape(brand.upper())}</text> <text x="1480" y="760" text-anchor="end" font-family="ui-sans-serif, system-ui, sans-serif" font-size="24" font-weight="600" fill="#475569" letter-spacing="1">{html.escape(pill)}</text> </svg>''' pathlib.Path(out_path).write_text(svg, encoding='utf-8') print(f"wrote {out_path} ({len(lines)} lines, hero={'number' if m else 'icon'})") ``` **Customising the hero icon:** replace the placeholder `<g>` block with cluster-specific iconography from your project. Keep stroke width 5-9, viewBox-relative coordinates (drawn for a 280x280 box), and stroke-only fills so the icon reads at thumbnail size in social previews. Examples (n8n nodes, code brackets, agent graph, RPA grid) are easy to author — see the `feature` script's structure. --- ## Rasterize SVG to PNG The SVG is the editable source. The blog references PNG only — most CMSes deliver PNG more reliably through their CDN than SVG. ```bash # Preferred: ImageMagick at 192 DPI (renders text at 2x for sharpness) for svg in tmp/blog-drafts/<slug>-*.svg; do png="${svg%.svg}.png" magick -density 192 -background white "$svg" -resize 1600x "$png" done # Or one of the fallbacks: rsvg-convert -w 1600 -b white in.svg -o out.png inkscape --export-type=png --export-width=1600 in.svg python3 -c "import cairosvg; cairosvg.svg2png(url='in.svg', write_to='out.png', output_width=1600)" ``` `-density 192` renders text at 2x before resize (sharpness). `-background white` prevents black halos around antialiased edges. `-resize 1600x` is the practical ceiling for a CMS content column. ### Compress before upload ImageMagick output is 200-400 KB per figure; `pngquant` typically cuts that 60-80% with no visible quality loss. ```bash for png in tmp/blog-drafts/<slug>-*.png; do pngquant --skip-if-larger --strip --output "$png" --force 256 "$png" || true done ls -lh tmp/blog-drafts/<slug>-*.png ``` If `pngquant` isn't installed, `oxipng -o 4 tmp/blog-drafts/<slug>-*.png` is a slower fallback. If neither is available, surface to the user and proceed — don't block the post on compression. ### Verify the PNG ```bash # Confirm dimensions and bit depth magick identify tmp/blog-drafts/<slug>-*.png 2>/dev/null \ || python3 -c "from PIL import Image; import sys; [print(p, Image.open(p).size) for p in sys.argv[1:]]" tmp/blog-drafts/<slug>-*.png ``` Open each PNG locally and confirm: text is sharp at 100% zoom, no missing glyphs, no black halos. --- ## Embed in the post For each figure, identify the **anchor sentence** in the draft — the closing `</p>` of the paragraph the figure should appear after. Pick a phrase distinctive enough that `str.replace` finds exactly one match. Insert with a generic `<figure>` block (renders cleanly in every major CMS theme and every static-site generator's default Markdown→HTML pipeline): ```html <figure> <img src="<uploaded-png-url-or-relative-path>" alt="<full description with all numbers and labels>" loading="lazy"> <figcaption>One sentence restating the takeaway in plain English (15-30 words).</figcaption> </figure> ``` **Caption rules:** - **Required on every figure.** No bare `<img>` and no `<figure>` without a `<figcaption>`. The `seo-blog-writer` skill's bundle validation refuses figures without captions. - One sentence, 15-30 words, restating the takeaway in plain English (not "Figure showing X" — say what the reader should conclude). - Allowed tags inside `<figcaption>`: `<a>` (with `rel="nofollow noopener"` for external), `<em>`. Nothing else. - No "Figure 1." numbering. **Alt text rules:** - Restate every label and number visible in the figure. Screen readers read alt, not the figure. - 50-200 chars. Longer than the caption. Verify each PNG URL appears exactly once in the draft: ```bash python3 -c " import pathlib, re, sys html = pathlib.Path(sys.argv[1]).read_text(encoding='utf-8') for m in re.finditer(r'src=\"([^\"]+\.png)\"', html): print(m.group(1)) " tmp/blog-drafts/<slug>.draft.html | sort | uniq -c ``` Each URL should print `1`. Zero = anchor missed; >1 = anchor matched multiple paragraphs (extend the anchor). --- ## Upload to your CMS This skill doesn't ship a CMS uploader — the `seo-blog-writer` skill handles auth and the upload endpoint for each platform it targets. After generating PNGs: - **For Ghost:** `seo-blog-writer`'s Ghost adapter exposes an image-upload snippet (POST to `/ghost/api/admin/images/upload/` with the Admin API JWT). - **For WordPress:** `seo-blog-writer`'s WordPress adapter posts to `/wp-json/wp/v2/media` with application-password auth. - **For static-site generators (Hugo, Astro, Eleventy, Jekyll, Next-MDX):** drop the PNGs into the project's static / public / assets directory and reference relative paths in the figure tag. - **For other CMSes (Webflow, Sanity, Strapi, Contentful):** write a 20-line adapter that POSTs the PNG to the platform's media endpoint, then splice the returned URL. --- ## Failure modes | Symptom | Cause | Fix | |---|---|---| | `magick: no decode delegate` on `.svg` | ImageMagick built without rsvg | Fallback: `rsvg-convert`, `inkscape`, or `cairosvg` | | Text rendered as boxes / missing glyphs in PNG | Embedded font referenced but not installed | Use only generic `ui-sans-serif, system-ui` font families; no `@font-face` | | Black halos around shapes in PNG | Antialiased SVG rendered against a transparent background | Pass `-background white` to ImageMagick | | PNG looks blurry | Rasterized at 96 DPI | Use `-density 192` (or `-w 1600` with rsvg/cairosvg) | | `aria-labelledby` ignored by screen readers | Missing `role="img"` on the root `<svg>` | Add `role="img"` — without it, the SVG is treated as a graphic group | | Feature card text overflows the 1600x840 canvas | Headline longer than ~120 chars | Truncate headline or use the longest tier (60pt, 4 lines, 32 chars/line) | | Figcaption missing on a `<figure>` | Manually pasted `<img>` not wrapped in `<figure>` | Wrap in `<figure>...<figcaption>...</figcaption></figure>` — every figure needs a caption | --- ## Companion skills - **`blog-topic-research`** — validates that a long-tail topic has real demand signals before drafting. - **`seo-blog-writer`** — drafts, scrubs, AI-SEO-audits, and publishes the post to your CMS (Ghost, WordPress, or static-site) via the platform adapter. Together, the three form a complete long-tail SEO publishing pipeline: research the topic, write the post, illustrate it, publish.

blog-topic-research

Stop writing blog posts nobody searches for. This skill builds your editorial backlog from real, verifiable user demand - never from AI vibes. It mines candidates from Google Suggest, People Also Ask, Reddit, Stack Overflow, GitHub issues, vendor forums, and changelogs; captures every signal as a citable URL with verbatim evidence; classifies each topic by post format (how-to-fix, x-vs-y, listicle, migration, release-recap, ...); checks against your existing backlog so you don't cannibalize what you already published; and outputs a writer-ready scaffold with primary sources, problem summary, confirmed fixes, version context, and FAQ variants. Built for content marketers, founders, indie hackers, and dev-tool teams who want a long-tail SEO pipeline backed by evidence instead of hallucinated keyword volumes. Trigger when the user says: 'research blog topics', 'find topics with real demand', 'expand the editorial backlog', 'research N long-tail topics', or any variant of growing a content pipeline with verified candidates.

# blog-topic-research Generates topic candidates for a blog with documented user demand. The skill exists to fight hallucinated SEO ideas: every topic it proposes must point to a URL that proves someone is asking about it. ``` research <N> topics [for cluster <C>] [--append-to <path>] ``` - `N` - number of topics to return (default 50; cap 100) - `cluster` - if the blog has cluster taxonomy, restrict to one cluster the user names - `--append-to <path>` - after presenting results, ask the user before appending accepted topics as JSON to the given path (a backlog file, a CSV, whatever the blog uses) The skill is content-only: it does no scraping of its own. It drives the agent's `WebFetch` and `WebSearch` tools to fetch sources, and (optionally) shells out to a Python similarity script for the cannibalization step. --- ## The contract For every topic the skill emits, it captures: | Field | What it is | |---|---| | `topic` | Full title shaped like a long-tail query | | `cluster` | A bucket the user defines for their blog (e.g. `n8n`, `databases`, `react-hooks`) | | `format` | One of `how-to-fix`, `how-to-connect`, `how-to-automate`, `x-vs-y`, `what-is`, `use-case`, `listicle`, `migration`, `release-recap` | | `demand_signals[]` | One or more, each with `type`, `url`, `evidence` (verbatim text), `strength` (1-3) | | `signal_score` | Sum of `strength` across all signals; topic accepted only if **>=3** | | `primary_sources[]` | At least **1** vendor doc / GitHub issue / official changelog URL | | `keywords[]` | Primary keyword + 3-5 LSI variants extracted from source text | | `commentary` | 1-2 sentences on what makes this topic specific (no fluff) | | `problem_summary` | 1-2 sentences distilling the symptom + trigger from the highest-engagement signal's body, in factual writer-voice (no marketing). Lets the writer skip re-fetching to figure out what the problem actually is. | | `confirmed_fixes[]` | Each `{kernel, source}`: a short fix kernel (one phrase, e.g. `"set N8N_PAYLOAD_SIZE_MAX=16000000"`, `"downgrade crewai to 0.113"`) plus the source URL where that fix is reported. Empty list if no fix is documented yet (still-open issues count). The writer expands kernels into prose and re-verifies. | | `version_context` | String like `"n8n 1.65+"`, `"Cursor 0.42 only"`, `"introduced in CrewAI 0.114"`, or `null` if no version qualifier applies. | | `question_variants[]` | 2-4 paraphrases of the topic that real users actually post (lifted from PAA, autocomplete depth-2, sibling forum titles). Feed the writer's FAQ block + LSI keyword spread directly. **Not invented** - every variant must trace to a captured signal or autocomplete completion. | **Hard rules:** - **No URL, no topic.** If the skill cannot cite a verifiable demand signal, the topic is dropped. No "this seems like a good topic" reasoning. - **No paraphrased evidence.** The `evidence` field is copied byte-for-byte from the source (PAA question text, GitHub issue title, Reddit post title, forum thread title). - **No invented version numbers, prices, or stats.** All numbers in the topic title must come from a source URL or be omitted entirely. - **Signal score >=3.** Each signal scores 1 (exists), 2 (engaged), or 3 (heavy engagement) per the strength tiers below. One 3-star signal is enough; three 1-stars is enough. - **Specificity floor.** Title must be either >=7 words OR contain a concrete qualifier: a version number, an error code/string, a named integration pair (`X to Y`), or a named edge case. Reject vague titles like "n8n tutorial" or "what is automation". - **Cannibalization check.** Three layers when the user supplies an existing-titles cache: (1) Jaccard token overlap >=0.6 against any cached title = drop. (2) Cosine similarity >=0.85 = drop. (3) Token-dupe (shared distinctive numeric/error-code + shared tool keyword, regardless of cosine rank) = drop. Cosine 0.75-0.85 with no token-dupe = REVIEW (kept but flagged). If the user has no cache, run only Jaccard and skip the rest with a one-line note in the summary footer. - **Substance distilled, not invented.** `problem_summary`, `confirmed_fixes[]`, `version_context`, and `question_variants[]` are derived from fetched bodies - never hallucinated. If a body doesn't mention a fix, `confirmed_fixes[]` stays empty. If no version is named, `version_context` is `null`. The writer treats these as a verified scaffold and still re-fetches at least one primary source for currency. - **No padding.** If the skill cannot reach `N` validated topics, it returns what it has and reports the shortfall. --- ## Demand-signal taxonomy A signal is valid only if its type is one of these, and the linked URL contains the verbatim evidence text: | Type | What counts | |---|---| | `paa` | "People Also Ask" question on a Google SERP. URL = the parent query SERP. Evidence = the PAA question text. | | `autocomplete` | Google Suggest entry triggered by typing a partial query. Evidence = the suggested completion. | | `reddit` | A Reddit post asking the question or a close variant on a relevant subreddit. Evidence = the post title. | | `stackoverflow` | A Stack Overflow question with the same intent. Evidence = the question title. | | `github_issue` | A GitHub issue on the tool's repo describing the problem (open or closed). Evidence = the issue title. | | `forum` | A community-forum thread (vendor or third-party). Evidence = the thread title. | | `vendor_doc` | A vendor docs page that exists because users ask the question. Evidence = the page heading. | | `trends` | A Google Trends rising query. Evidence = the query string + the rising-percent label from Trends. | **Strength tiers** (used to compute `signal_score`): | Type | 1 (exists) | 2 (engaged) | 3 (heavy) | |---|---|---|---| | `paa` | appears once | appears across >=2 parent SERPs | appears + "More questions" expansion shows >=4 follow-ups on same intent | | `autocomplete` | direct suggestion | suggestion at >=4-word depth | suggestion at >=6-word depth | | `reddit` | post exists | >=10 comments OR >=20 upvotes | >=50 comments OR >=100 upvotes | | `stackoverflow` | question exists | score >=3 OR views >=500 | score >=10 OR views >=2000 | | `github_issue` | issue exists | >=3 reactions OR >=5 comments | >=10 reactions OR >=20 comments OR linked from changelog | | `forum` | thread exists | >=5 replies | >=20 replies OR pinned / marked solution | | `vendor_doc` | page exists | page in main nav | dedicated FAQ entry or "Common errors" section | | `trends` | rising query | rising >=100% | rising >=500% or labelled "Breakout" | Engagement counts are read from the source page at fetch time. Record the count in the signal entry so the user can audit (e.g. `[strength=3, 14 reactions]`). **Does NOT count:** - The skill's own intuition. - "Common pain point" with no link. - "I've seen this on Twitter" without a specific URL. - Doc paraphrases without a source. - Topics extracted from another SEO blog without an underlying user-demand URL. --- ## Sources to mine The user supplies the source list for their blog (or asks the skill to suggest one). Generic source templates per cluster type: ### Developer-tool / SaaS clusters For each tool the blog covers, mine: - The tool's official community forum (filter to "Questions" / "Bug reports", sort by reply count). - The tool's GitHub issues, sorted by reactions desc, last 90 days. - The tool's subreddit (top of week / top of month). - Stack Overflow's tag page for the tool, sorted by views. - The vendor's docs site (changelog, recently updated pages, "Common errors" if it exists). - The vendor's blog (feature announcements - every new feature seeds new questions). ### Code-language / framework clusters - Stack Overflow tag page for the language / framework, sorted by views. - The framework's GitHub issues + discussions. - Reddit `r/<language>` and `r/<framework>`. - Recent docs changes (a doc page that was edited last week often answers a recurring question). ### Vertical clusters (e.g. `ecommerce`, `customer-support`, `marketing-ops`) - Subreddits dedicated to the vertical (`r/ecommerce`, `r/customerservice`, etc.). - Vendor template galleries (e.g. n8n's workflow gallery, Make's scenario library, Zapier's app directory) - each gallery entry is a verified user-demand signal because people search the named outcome. - Forum threads asking "how do I [outcome]" with high engagement. ### Always also mine - **Google Suggest autocomplete** for the tool / language / vertical names. - **People Also Ask** boxes on SERPs for the same. - **Google Trends** rising queries for the cluster's main terms. If the user has a specific cluster list, ask them for the source URLs before mining; if they don't, suggest a list and let them edit it. --- ## Process ### Step 1 - Inventory existing coverage (cannibalization prep) If the user has an existing backlog file (`backlog.json`, `posts.csv`, an RSS feed of published posts, whatever), ask for the path and a way to enumerate titles. The cannibalization step needs an embedding cache built from these existing titles. If a Python embedding script is available, the user can build a cache (a JSON file mapping each existing title to its OpenAI `text-embedding-3-small` vector) and point the cannibalization step at it. Typical cost is ~$0.02 per 1M tokens. If no Python / no cache, fall back to Jaccard-only cannibalization (cheap, less accurate; surfaces near-identical titles but misses synonym dupes). Also build a token-set per existing title (lowercase, alphanumeric, stopwords stripped) for the Jaccard prefilter used in Step 4. ### Step 2 - Compute cluster targets If the user has cluster weights (e.g. "30% n8n, 20% AI coding, ..."), apply them: subtract current backlog counts per cluster, then distribute `N` proportionally to the largest deficits. If `cluster <C>` was specified, allocate all `N` to that cluster. If no cluster taxonomy at all, treat the whole blog as one cluster and aim for format diversity instead (see Step 3 format matrix). ### Step 3 - Mine candidates per cluster For each cluster, walk its source list. For each source URL: 1. Fetch via `WebFetch` (or `WebSearch` for SERP-derived signals). 2. Extract candidate query strings: GitHub issue titles, PAA questions, Reddit post titles, forum thread titles, autocomplete suggestions. 3. Record the source URL + the verbatim title text + the engagement count (reactions, comments, upvotes, replies) so a strength tier can be assigned later. 4. **Mine the body, not just the title.** For each high-engagement issue / thread / SO question, fetch the body and extract: - Error strings (lines like `Error:`, `TypeError:`, `Traceback`, exception class names, stack-frame headers). - Code blocks between triple-backticks (often contain the failing snippet that names the real edge case). - Version-qualified phrases ("after upgrade to 1.65", "on Cursor 0.42", "since the v3 release"). Each extracted string becomes a candidate seed in its own right. Long-tail troubleshooting queries are usually the literal error message, which never appears in the post title. **Google-side seeds** - use `WebSearch` with these patterns and parse for autocomplete + PAA. Run the full set per cluster, not just the troubleshooting one - long-tail diversity is what stops the corpus drifting into all-`how-to-fix`: | Pattern | Format target | |---|---| | `<tool> <error string>`, `<tool> not working`, `<tool> stuck`, `<tool> fails` | `how-to-fix` | | `<tool> how to <verb>`, `<tool> connect to <other tool>` | `how-to-connect` / `how-to-automate` | | `<tool> vs <competitor>`, `<tool> or <competitor>` | `x-vs-y` | | `what is <feature>`, `<feature> explained`, `how does <feature> work` | `what-is` | | `build <outcome> with <tool>`, `<tool> for <vertical>` (`for ecommerce`, `for SaaS`, `for marketing`, `for customer support`), `<tool> agent for <task>` | `use-case` | | `best <tool category>`, `top <N> <tool category>`, `free <tool>`, `<tool> alternatives`, `<tool> templates for <vertical>` | `listicle` | | `migrate from <tool A> to <tool B>`, `switch from <tool A> to <tool B>`, `<tool A> to <tool B> migration` | `migration` | | `what's new in <tool>`, `<tool> changelog`, `<tool> <recent-version> features`, `<tool> release notes` | `release-recap` | Repeat the matrix per tool / framework / vertical in the cluster. **Per-format source pointers** (in addition to the per-cluster source list above): - `use-case` - vendor template galleries are the highest-signal source. Each gallery entry is a verified user-demand signal (people search the named outcome). Reddit threads like *"how do I [outcome] with [tool]"* with many comments count too. Drop any seed whose only signal is an existing SEO blog's listicle - that's a competitor signal, not a user-demand signal. - `listicle` - Google autocomplete depth-2 on `best <category>` and `top <N> <category>`; competitor roundup SERPs (look at what the top 3 results list). Reddit posts asking *"what's the best X for Y"* with many comments score high. - `migration` - Reddit threads with titles starting `moving from` / `switching from`; Stack Overflow questions about exporting + re-importing between two tools; vendor migration docs (the doc exists because users ask). - `release-recap` - the vendor changelogs already in the source list, but mine for **specific versions shipped in the last 90 days**. A release-recap post for `<tool> <version>` is only worth writing if (a) that version is current or one minor behind, and (b) the changelog has at least one user-facing entry, not just internal refactors. Older recaps go stale fast. Each extracted query becomes a candidate. The source is its first demand signal. ### Step 3b - Recursive autocomplete pass For each candidate that survived Step 3, append one more qualifier word and re-query Google Suggest. Keep any deeper completion that returns a new variant. This is what gets you actual long-tail vs mid-tail - one autocomplete pass surfaces "how to fix n8n webhook error", a second surfaces "how to fix n8n webhook error 404 after restart". Common qualifiers to try (try several, keep what returns a real completion): `error`, `not working`, `after update`, `stuck`, `slow`, `timeout`, `free`, `limit`, `vs <known competitor>`, `<current year>`, `self-hosted`, `cloud`, `docker` Cap at **2 recursive passes** to bound runtime. Each new completion inherits the parent's first signal and must still pass Step 4 on its own. ### Step 3c - Distill substance from the source body For each candidate that survived Step 3b, build the writer-facing scaffold by re-reading the body of the **highest-engagement** demand signal (the one that scored 2 or 3 in the strength table - usually a high-reaction GitHub issue, pinned forum thread, or popular SO question). If you already fetched the body in Step 3.4, reuse it; otherwise fetch now. Extract four fields: 1. **`problem_summary`** - 1-2 sentences in writer-voice describing the symptom + trigger. Factual, no marketing copy. Example: *"n8n's HTTP Request node returns 401 when an OAuth2 credential's access token has expired and the refresh-token grant is missing the `offline_access` scope."* Pull verbs and nouns from the body; don't paraphrase the title. 2. **`confirmed_fixes[]`** - list of `{kernel, source}` entries. Each kernel is one short phrase capturing the action (env var to set, version to downgrade to, setting toggle, code edit). The source URL is where the fix is reported (forum reply, vendor doc, GitHub commit, changelog entry). Walk the thread replies, accepted-answer block, vendor "common errors" page. **Skip noise**: ignore "have you tried restarting" or fixes contradicted by later replies. If the issue is genuinely still open with no working fix, leave `confirmed_fixes[]` empty - the writer will frame the post around "what's known so far" rather than fabricating a fix. 3. **`version_context`** - extract any version-qualified phrase the body or thread uses to scope the problem (`"after upgrade to n8n 1.65"`, `"Cursor 0.42 only"`, `"introduced in CrewAI 0.114"`, `"Power Automate desktop V2"`). If multiple versions are named, pick the one most closely tied to the failure. If the body is version-agnostic, set `null`. 4. **`question_variants[]`** - 2-4 near-paraphrases of the topic that surface in PAA boxes, autocomplete depth-2 completions, sibling forum thread titles, or SO related-questions. Each variant must be a real captured string from a source - **do not invent paraphrases**. These feed the writer's FAQ block and LSI keyword spread. If any of these can't be filled honestly from the body, leave the field at its empty default (`""`, `[]`, `null`) rather than fabricating. A sparsely-filled scaffold is more useful than a hallucinated one - the writer can re-research, but cannot un-trust a polluted blob. ### Step 4 - Classify, score, and validate For each candidate: 1. **Cluster** - derived from the tool / framework / vertical named in the topic. If the topic spans two clusters, pick the one with the higher-strength demand signal. 2. **Format** - match phrasing (evaluate in order; first match wins): - `migrate from X to Y` / `switch from X to Y` / `move from X to Y` -> `migration` - `what's new in <tool> <version>` / `<tool> changelog` / `<tool> release notes` / `<tool> v<N> features` -> `release-recap` - `best X for Y` / `top N X` / `free X` / `X alternatives` / `most popular X` -> `listicle` - `X vs Y` / `X or Y` -> `x-vs-y` - `<error string>` / `not working` / `fix` / `fails` / `broken` -> `how-to-fix` - `connect X to Y` / `integration` / `integrate X with Y` -> `how-to-connect` - `build <outcome> with <tool>` / `<outcome> using <tool>` / `<tool> for <vertical/use-case>` -> `use-case` - `automate X` / `<tool> workflow for Y` -> `how-to-automate` - `what is X` / `X explained` / `how does X work` -> `what-is` Ambiguity rules: prefer `use-case` over `how-to-automate` when the title names a concrete artefact ("daily Slack digest", "invoice extraction agent"); prefer `how-to-automate` for generic processes ("automate email triage"). Prefer `listicle` for >=3 options and `x-vs-y` for exactly 2. 3. **Cannibalization** - three-layer check against the existing-titles cache built in Step 1: - **Jaccard prefilter:** token-set overlap against existing titles; drop if >=0.6 (cheap kill on obvious dupes). - **Semantic + token-dupe check** (only if an embedding cache was built): - Cosine similarity >=0.85 against any existing title -> drop, log under `cannibalization-semantic`. - Token-dupe: a shared distinctive numeric / error-code AND a shared tool keyword between candidate and any existing title anywhere in the cache, regardless of cosine rank -> drop. (Catches substance-dupes whose surface wording differs enough that cosine ranks them low. Example: "Make AI Agent 40-second timeout" vs "Make HTTP module 40-second timeout error" - cosine 0.51, but shared `[40-second, make]` flags it.) - Cosine 0.75-0.85 with no token-dupe -> REVIEW (kept but flagged). The user decides at append time. - Cosine <0.75 with no token-dupe -> accept. - If no embedding cache, run only the Jaccard prefilter and add a one-line `cannibalization: jaccard-only` note to the summary footer so the user knows the check was reduced. 4. **Score signals** - assign each signal 1-3 per the strength tier table. If `signal_score < 3`, fetch one more SERP and look for additional signals (PAA, second forum thread, SO question, vendor doc). If still `< 3`, drop under `low-signal-score`. 5. **Specificity floor** - title must be >=7 words OR contain a concrete qualifier (version number, error code/string, named integration pair, named edge case). If neither holds, try to rewrite the title using a qualifier from the source text; if that's not possible, drop under `low-specificity`. 6. **Primary source** - find at least one vendor doc / GitHub issue / official changelog URL the writer can cite. If none, drop. 7. **Keywords** - extract the primary keyword (the topic's core noun phrase) plus 3-5 LSI variants from the source text. Do not invent variants. ### Step 5 - Output Print one block per accepted topic in this exact shape so the user can scan or pipe it: ``` [01/50] cluster=<cluster> format=how-to-fix priority=2 TOPIC: How to fix the n8n "Cannot read properties of undefined" error in Code node slug: n8n-cannot-read-properties-undefined-code-node keywords: n8n Code node error, undefined property, JavaScript error, item access, $json commentary: Specific to mistakes when accessing $json on the wrong item; vendor docs do not show the failure mode. problem: The n8n Code node throws "Cannot read properties of undefined" when the script accesses $json on an item index that doesn't exist (typically the second iteration of a for-loop reading $items[1].json with only one input item). fixes: - guard with $items[i]?.json before accessing (https://github.com/n8n-io/n8n/issues/<id>#issuecomment-<n>) - use $input.item.json instead of $items[i] when iterating (https://docs.n8n.io/code/builtin/data/) version_context: n/a question_variants: - "n8n Code node Cannot read properties of undefined reading 'json'" - "n8n JavaScript error TypeError undefined json" - "Code node loop fails when item missing" demand: (signal_score=4) - github_issue: https://github.com/n8n-io/n8n/issues/<id> ("Code node throws Cannot read properties of undefined") [strength=3, 14 reactions] - reddit: https://www.reddit.com/r/n8n/comments/<id> ("Help: Code node error when looping over items") [strength=1] sources: - https://docs.n8n.io/code/builtin/data/ (n8n docs - Built-in data variables) ``` If `confirmed_fixes` is empty (open issue with no working fix), print `fixes: (none documented yet - frame as 'what's known so far')` instead of an empty list. If `version_context` is null, print `version_context: n/a`. If `question_variants` is empty, omit the line entirely (rare but valid). If the topic landed in the REVIEW band of the cannibalization check, append a `dedup_review` line so the user can decide knowingly: ``` dedup_review: cosine=0.81 vs "n8n Code node returns undefined when reading $json" (backlog-queued) ``` Then a summary footer: ``` Requested: 50 Validated: <X> Dropped: <Y> (cannibalization-jaccard: <a1>, cannibalization-semantic: <a2>, cannibalization-token-dupe: <a3>, low-signal-score: <b>, no primary source: <c>, off-cluster: <d>, low-specificity: <e>) Cluster mix: <cluster1>=__ <cluster2>=__ ... Format mix: how-to-fix=__ how-to-connect=__ how-to-automate=__ x-vs-y=__ what-is=__ use-case=__ listicle=__ migration=__ release-recap=__ ``` If `--append-to <path>` was passed: print the diff (count + cluster mix changes), ask `Append <X> topics to <path>? [y/N]`. On `y`, write each topic as JSON to the target path with: ```json { "id": "<slug>", "topic": "<title>", "cluster": "<cluster>", "format": "<format>", "priority": 100, "status": "queued", "tags": ["<format-tag>", "<tool-tag>"], "notes": "<commentary>", "added_at": "<YYYY-MM-DD>", "research_proof": { "demand_signals": [...], "primary_sources": [...], "keywords": [...], "problem_summary": "<1-2 factual sentences from the highest-engagement signal's body>", "confirmed_fixes": [ {"kernel": "<short fix phrase>", "source": "<url>"}, {"kernel": "<short fix phrase>", "source": "<url>"} ], "version_context": "<version qualifier or null>", "question_variants": ["<variant 1>", "<variant 2>", "<variant 3>"] }, "published_slug": null, "published_at": null } ``` The `research_proof` blob is preserved on the backlog entry while `status = "queued"` so a downstream writer skill can read it later (saves re-research). When the topic ships, strip the proof blob to keep the backlog file from ballooning at scale; the published post carries the citations. --- ## Anti-hallucination guardrails - **WebFetch / WebSearch only.** Never fabricate a URL. If a fetch fails, drop the candidate - do not invent the response. - **Verbatim evidence.** Each `evidence` string is copied byte-for-byte. No paraphrase. Length cap 120 chars; truncate with `...` if longer. - **No invented numbers.** Version numbers, prices, error codes in the topic title must appear in at least one source URL. If unsure, drop them from the title. - **Date-bound the run.** Every source URL must resolve on the run date. If 404 / removed, the signal is invalid even if the URL was good last week. - **Signal score >=3, strictly enforced.** Cumulative strength below 3 goes to `low-signal-score`. The strength tier table is the only authority - do not invent your own scoring. - **Specificity floor enforced.** Vague titles without a concrete qualifier go to `low-specificity`. Long-tail is proven by *what's in the title*, not by hand-waving "this is specific". - **Cluster discipline.** Stick to the user's cluster taxonomy. Never invent a new cluster mid-run; if a topic doesn't fit, drop it or ask the user. - **Allowed-tag check.** If the blog has a tag allowlist, surface any new implied tag to the user before accepting - do not silently expand the tag set. - **No commercial-intent topics by default.** Reject SaaS reviews, lead-gen queries, affiliate-driven comparisons unless the user explicitly says they want them. Display-ad RPM is poor for these and they get out-bid by affiliate sites anyway. --- ## What this skill does NOT do - **Does not draft posts.** Pair with a writer skill downstream. - **Does not estimate search volume in absolute numbers** (no Ahrefs / SEMrush / Keyword Planner). Demand is qualitative, proven by source URLs - not by a hallucinated `5400 / mo` figure. - **Does not modify the blog's strategy docs.** Cluster targets and weights are the user's call. - **Does not auto-publish.** With `--append-to <path>` it adds to a backlog as `queued`; the user picks what gets written next. - **Does not run on a schedule by itself.** Pair with a scheduling skill if you want a weekly or monthly research run. --- ## One-command summary ``` research <N> topics [for cluster <C>] [--append-to <path>] ``` 1. Refresh / build an embedding cache from existing titles, if the user has one. 2. Compute per-cluster target counts. 3. Mine candidates from the cluster source list, capturing verbatim text + URL + engagement count per signal. Mine issue / thread bodies for error strings and version-qualified phrases, not just titles. 4. Recursively expand surviving candidates through Google Suggest (up to 2 passes) to push from mid-tail into long-tail. 5. Distill substance per surviving candidate from the highest-engagement signal's body: `problem_summary`, `confirmed_fixes[]`, `version_context`, `question_variants[]`. Empty defaults are valid; never fabricate. 6. Validate each: format match, cluster fit, three-layer cannibalization (Jaccard >=0.6 OR cosine >=0.85 OR token-dupe = drop; cosine 0.75-0.85 = REVIEW), `signal_score >= 3`, specificity floor, >=1 primary source, real keywords. 7. Print structured per-topic blocks + summary footer. 8. If `--append-to <path>`, confirm with user, then write to the backlog file with the proof blob preserved.

seo-blog-writer

Turn a single long-tail query into a publish-ready blog post that ranks in search and gets quoted by AI assistants. Runs the full pipeline: classify the topic, research it against real sources, draft clean HTML, scrub LLM-tell vocabulary and typography, audit for AI-SEO (TL;DR block, query-phrased H2s, FAQ section, FAQPage + BreadcrumbList + HowTo JSON-LD), then publish through a platform adapter (Ghost Admin API, WordPress REST, or static-site file output). Platform-agnostic core; swap the publish step without rewriting the writing pipeline. Built for indie hackers, founders, and content marketers who want AI to draft posts that are actually citable - not paraphrased docs, not hallucinated benchmarks. Trigger when the user says: 'write a blog post on X', 'draft an article about X', 'publish a post on X to Ghost / WordPress / the static site', or any request to ship editorial content for a long-tail query.

# seo-blog-writer End-to-end pipeline for shipping a single long-tail blog post: **topic -> research -> draft -> scrub -> AI-SEO audit -> publish**. Designed for SEO and AI-citation extractability (FAQ blocks, BreadcrumbList + FAQPage + HowTo schema, query-phrased headings). The **writing pipeline is platform-agnostic** — it produces a publish-ready bundle (clean HTML, slug, meta, JSON-LD blocks, feature-image alt). The **publish step is pluggable**: out-of-the-box adapters for Ghost Admin API, WordPress REST, and static-site file output. Adding another CMS (Webflow, Sanity, Strapi, Contentful, Hugo, Astro) is a matter of writing a 20-line POST snippet. The skill takes **one required argument**: the topic. Optional flags control the publish target and state. ``` /seo-blog-writer <topic> /seo-blog-writer <topic> --target ghost # publish via Ghost adapter /seo-blog-writer <topic> --target wordpress # publish via WordPress REST /seo-blog-writer <topic> --target static --out posts/ # write files into a static-site repo /seo-blog-writer <topic> --target ghost --publish # actually publish (default: draft) /seo-blog-writer <topic> --target ghost --publish-at <ISO> # schedule for future publish /seo-blog-writer <topic> --angle "<angle>" # narrow the angle ``` Default state is **draft** — the post lands in the platform's editor for human review before going live, unless `--publish` or `--publish-at` is passed. `--publish-at` accepts an ISO 8601 UTC timestamp (e.g. `2026-05-10T07:42:00Z`) and is mutually exclusive with `--publish`. Default `--target` is `static` — writes a self-contained HTML file + a `metadata.json` next to it so you can wire any platform yourself. --- ## Before you start — preflight The platform-agnostic checks: ```bash # 1. Python available (rasterizer, scrubber, schema builder) command -v python3 # 2. Working directory writable mkdir -p tmp/blog-drafts && touch tmp/blog-drafts/.touch && rm tmp/blog-drafts/.touch ``` **3. (Optional) ai-seo MCP — check before continuing** Check whether the current agent session has access to a tool named `audit_page` from the ai-seo-mcp server (`@automatelab/ai-seo-mcp`). That MCP provides a programmatic citation-worthiness and schema score that Step 5 uses automatically when available. - **If the MCP is connected:** nothing to do — Step 5 will call `audit_page` automatically. - **If the MCP is not connected:** ask the user: > "The **ai-seo MCP** (`@automatelab/ai-seo-mcp`) is not connected. Step 5 can run a programmatic citation-worthiness and schema score on your draft in addition to the manual audit. To install it: > ``` > npx -y @automatelab/ai-seo-mcp > ``` > then register it in your MCP config. See the [ai-seo-mcp README](https://github.com/AutomateLab-tech/ai-seo-mcp) for one-line configs for Claude Code, Cursor, and Cline. Type **skip** to continue with the manual-only audit." Wait for the user's response before continuing to Step 0. Any response other than a config/install action counts as skip — proceed without the MCP. Platform-specific credential checks live in the per-adapter sections at the end of this skill. The writing pipeline (Steps 0-7) runs without any platform credentials — credentials are only needed at Step 8. --- ## Step 0 — Parse and classify the topic The topic is the one thing the skill cannot invent. It must arrive as an argument. | Shape | Example | Treatment | |---|---|---| | **Long-tail how-to** | `"how to fix n8n HTTP Request 401 error"` | Ideal. Format = troubleshooting (template 1). | | **Integration walk-through** | `"how to connect Airtable to Slack with Zapier"` | Format = integration (template 2). | | **Workflow tutorial** | `"automate invoice processing with Make"` | Format = workflow tutorial (template 3). | | **Comparison** | `"Zapier vs Make vs n8n"` | Format = comparison (template 4). | | **Definition / explainer** | `"what is an AI agent"` | Format = explainer (template 5). | | **Use case / outcome** | `"build a daily Slack digest from RSS with n8n"` | Format = use-case (template 6). | | **Listicle / roundup** | `"12 best n8n templates for marketing teams"` | Format = listicle (template 7). | | **Migration guide** | `"migrate from Zapier to n8n"` | Format = migration (template 8). | | **Release recap** | `"what's new in n8n 1.80"` | Format = release-recap (template 9). | | **Too vague** | `"AI"`, `"automation"` | **Stop.** Ask the user to narrow it. Suggest 2-3 candidate long-tail variants. | If `--angle` was passed, append it to the topic. The classification picks the structural template used in Step 3. --- ## Step 1 — Research The piece must be specific. Real version numbers, real error messages, real screenshots — not generic "best practices." ### 1a. Identify the search intent What does someone typing this query want? One sentence — the implicit desire behind the words. - `"how to fix n8n HTTP 401"` -> wants the exact change to make in the UI to stop the error - `"Zapier vs Make"` -> wants a quick decision, then a longer breakdown - `"what is an AI agent"` -> wants a one-paragraph explanation, then how it differs from a workflow If you can't write one sentence describing the intent, the topic is too vague — go back to Step 0. ### 1b. Seed search and SERP teardown ``` WebSearch("<topic>") WebSearch("<topic> <current-year>") # force a fresh lens ``` Extract three structured signals from the page-1 results: 1. **Word count distribution** — eyeball the top 5 results' length. Target 1.1–1.3x the median, not the longest. If the median is 600 words, don't write 1500 — that's padding. 2. **People Also Ask boxes** — Google surfaces 4-8 PAA questions for most queries. These are free FAQ content. Capture verbatim into the FAQ-variant list. 3. **Currently-winning featured snippet** — if there is one, note its format (paragraph, list, table). Write the lead paragraph in that exact shape; that's how you challenge for the snippet. Goal: write something **more specific or more current** than the existing top results, not a paraphrase. ### 1c. Deep fetch Pick **2-4 URLs** from the SERP. Prioritize: - **Vendor docs** — primary sources for the tool being discussed. - **GitHub issues / changelogs** — for "fix X error" topics, the actual issue thread is gold. - **Reddit / community forums** — for confirming a workaround actually works in the wild. - **Existing top-ranked posts** — to see the bar you're clearing. ``` WebFetch(url, "Return the full article body as clean prose. Include code snippets, error messages, and screenshot references verbatim. Do NOT summarize.") ``` Skip SEO-farm rewrites and listicles with no specifics. ### 1d. Five-question gate before drafting Before writing, you must be able to answer all five. 1. **What is the exact query intent?** (one sentence from 1a) 2. **What is the direct answer?** (one to two sentences — the lead paragraph in compressed form) 3. **What's the canonical primary source?** (vendor doc, GitHub issue, official changelog — at least one URL) 4. **What's the gotcha most existing posts miss?** (the specific detail that makes this post worth writing). **Hard rule:** if the honest answer is "nothing, I'm summarizing the docs," **abort and tell the user**. A doc paraphrase will rank below the actual docs. 5. **What 3-6 follow-on questions belong in the FAQ?** (long-tail variations of the main query, ideally lifted from the PAA boxes captured in 1b) If any answer is `?`, keep researching or ask the user for a specific source. ### 1e. Save research artifacts ```bash mkdir -p tmp/blog-drafts # <slug> = kebab-case of the topic, e.g. n8n-http-401-fix ``` Files (gitignored): - `tmp/blog-drafts/<slug>.research.md` — 5-question answers, source list, key quotes - `tmp/blog-drafts/<slug>.interlinks.json` — written in Step 1f (outbound interlink targets) - `tmp/blog-drafts/<slug>.draft.html` — written in Step 3 - `tmp/blog-drafts/<slug>.schema.html` — written in Step 7b (JSON-LD `<script>` blocks) - `tmp/blog-drafts/<slug>.metadata.json` — written in Step 7f (title, slug, tags, meta, etc.) - `tmp/blog-drafts/<slug>.refresh.json` — written in Step 7h (versions, prices, years cited; for future refresh runs) ### 1f. Outbound interlinks (recommended; required for >800-word posts) Pick **2-3 prior posts** on the same site whose topic genuinely overlaps with this one. Bake the links into the draft in Step 3 on topical noun phrases (not "see this post"). Internal links don't carry `nofollow`; outbound links to other domains do (see Step 3 link policy). Where the candidate list comes from depends on the platform: - **Ghost** — `GET /ghost/api/admin/posts/?limit=all&filter=status:published&fields=id,slug,title,published_at,custom_excerpt&order=published_at%20desc` (same `GHOST_ADMIN_KEY` Step 8 uses). - **WordPress** — `GET /wp-json/wp/v2/posts?per_page=100&_fields=id,slug,title,date,excerpt&orderby=date&order=desc` (same `WP_APP_PASSWORD` Step 8 uses). - **Static-site** — read the SSG's content directory directly (`ls content/posts/*.md`) or maintain a hand-curated `posts-inventory.json` in the repo. Save the chosen targets so Step 3 can splice them in and Step 7g can verify they survived the audit: ```bash cat > tmp/blog-drafts/<slug>.interlinks.json <<'EOF' { "outbound": [ {"slug": "<prior-slug-1>", "url": "https://<your-host>/<prior-slug-1>/", "anchor_phrase": "<noun phrase>"}, {"slug": "<prior-slug-2>", "url": "https://<your-host>/<prior-slug-2>/", "anchor_phrase": "<noun phrase>"} ] } EOF ``` Step 7g verifies that every `outbound[].url` appears at least once as an `href` in the final draft. If you decided mid-draft to drop a link, edit the file before re-running 7g. Posts under 800 words can skip this step; long posts ship with outbound links or they look orphaned to both the reader and the site graph. > **Note on inbound links.** Editing prior posts after publish to add a forward link back to the new one (inbound splicing) is a separate concern that depends on having write access to historical posts and a state file to keep the operation idempotent. This skill does not handle it — too platform-specific to generalize. If you want it, run it as a cron against your platform's API after publish. --- ## Step 2 — Pick the format and length band Each query type maps to a structural template: | Format | Length band | |---|---| | `how-to-fix` (troubleshooting) | 600-1200 | | `how-to-connect` (integration) | 1000-1500 | | `how-to-automate` (workflow) | 1000-1500 | | `x-vs-y` (comparison) | 1200-1500 | | `what-is` (explainer) | 600-1200 | | `use-case` (outcome) | 1000-1500 | | `listicle` (roundup) | 1500-2500 | | `migration` | 1200-1800 | | `release-recap` | 800-1400 | **Hard length range: 600-1500 words for most formats.** Word count = prose inside `<p>` tags + heading text. Excludes code blocks, table cells, figcaptions. Use the SERP word-count signal from Step 1b to pick a target inside the band (1.1–1.3x the SERP median). Under the floor means the answer is genuinely too thin — add an FAQ expansion, a "common errors" section, or a "how to verify" section. Over the ceiling means the post is sprawling — cut the weakest section. **Never pad to hit a floor.** Google rewards directness; AI Overviews preferentially extract from concise answers. --- ## Step 3 — Draft the post Write directly in HTML. Allowed tags: `<p>`, `<h2>`, `<h3>`, `<a>`, `<strong>`, `<em>`, `<code>`, `<pre>`, `<blockquote>`, `<ul>`, `<ol>`, `<li>`, `<table>`, `<thead>`, `<tbody>`, `<tr>`, `<th>`, `<td>`, `<figure>`, `<figcaption>`, `<img>`. No inline styles. No `<div>`, no `<span>`, no `<br>`. No H1 (most platforms emit the post title as H1; emitting your own creates a duplicate). ### Link policy — internal vs. outbound, follow vs. nofollow | Destination | `rel` attribute | |---|---| | Your own blog (other posts on the same host) | none — internal, follow | | Anything else (vendor docs, GitHub, news, social, all third-party) | `rel="nofollow noopener"` | Do not use `target="_blank"` — most blog themes handle outbound link UX themselves. Set `CANONICAL_HOST=blog.example.com` in the shell before running the audit in Step 5 so the validator knows which links are internal. ### Voice checks while drafting - **Open with a TL;DR block.** First child of the body is `<p><strong>TL;DR:</strong> ...</p>` — a single sentence, 8-40 words, that answers the query directly with specific nouns (tool name, version, error code, command). LLM citation hook. Asserted in Step 7g. - **Lead paragraph follows the TL;DR** with one or two sentences of context (when this hits, who it bites, why other guides miss the cause). It is not a re-statement of the answer. - **H2 as a question or operational label.** Every `<h2>` either ends with `?` (e.g. `## How do you fix the "ECONNREFUSED" error in n8n?`) **or** is one of the allowlist: `Install`, `Prerequisites`, `Links`, `TL;DR`, `FAQ`, `Frequently asked questions`, `Summary`, `References`, `Further reading`, `Sources`, `Bottom line`. `<h3>` follows the same convention. Question-shaped H2s are how Google AI Overviews and Perplexity slice the page into citable chunks. Asserted in Step 7g. - **Specific over general.** Real version numbers, real error messages, real prices. No "modern", "powerful", "robust", "seamless." - **Impersonal voice.** "Here's the fix." Not "we found that" and not "I tried this." - **Forensic linking.** Every external claim links on the noun phrase that names the source. Every external link has `rel="nofollow noopener"`. - **Bullet discipline.** No `<ul>` or `<ol>` under 3 items — convert to prose. No list over 9 items without a sub-grouping (split into 2 lists under separate H3s, or fold into a `<table>`). Every `<li>` carries a data point, recommendation, or argument; each ends with a period; parallel grammar across items. Asserted in Step 7g. - **Structured-spec labels for diagnostic posts.** Troubleshooting roundups, "N reasons X is broken", and cause/effect listicles repeat a labeled triple inside every item — the default is `**Symptom:**` / `**Diagnostic:**` / `**Fix:**` (one paragraph each). The bold-keyword-colon form is allowed here and only here. For migration posts use `**Before:**` / `**After:**` / `**Migration step:**`; for comparison posts use `**When to pick:**` / `**Avoid if:**` / `**Cost:**`. This is what gets AI assistants to extract per-item structured citations instead of mashing the whole list into one quote. - **Recap checklist before the FAQ for enumerative posts.** Posts with **three or more enumerated items** close with an `<ol>` of one-sentence imperative steps under a question-shaped H2 (e.g. `<h2>How do you test all seven blockers in 20 minutes?</h2>`). One step per body item, no sub-bullets. Skip for posts under 800 words or fewer than three items. The recap is what gets quoted as the AI-answer "summary" — without it the model has to invent one. - **Currency where it matters.** Any version number, year, or price in a load-bearing claim either is current (cross-check against vendor docs in Step 5) or carries `as of <YYYY-MM>` next to it so a reader knows the time-context. Step 7g flags any year > 1 year stale without an explicit `as of` qualifier. - **End with a `<h2>FAQ</h2>` block** — 3-6 H3 questions, each with a 1-3 sentence answer. - **Self-check:** *Does the TL;DR stand alone as a quotable answer? Does the lead paragraph add context the TL;DR doesn't have? If either fails, rewrite.* Save to `tmp/blog-drafts/<slug>.draft.html`. --- ## Step 4 — Scrub LLM tells Run **before** the AI-SEO audit. The audit may add vocabulary the scrub would then need to remove; do the order this way. ### 4a. Character scrub (automatic) Replace common LLM-tell characters with ASCII equivalents: ```bash python3 -c " import sys, pathlib p = pathlib.Path(sys.argv[1]) t = p.read_text(encoding='utf-8') # em-dash/en-dash -> hyphen t = t.replace('—', '-').replace('–', '-') # smart quotes -> straight quotes t = t.replace('“', '\"').replace('”', '\"') t = t.replace('‘', \"'\").replace('’', \"'\") # ellipsis -> three dots t = t.replace('…', '...') # zero-width / non-breaking space -> regular space or empty t = t.replace('​', '').replace(' ', ' ') p.write_text(t, encoding='utf-8') print('scrubbed', sys.argv[1]) " tmp/blog-drafts/<slug>.draft.html ``` ### 4b. Prose-level tells (manual) Search the draft for these banned phrases and rewrite: - "delve into", "delving" - "in today's fast-paced world", "in the ever-evolving" - "robust", "seamless", "powerful", "cutting-edge" - "harness the power of" - "it's worth noting that", "it's important to note" - "navigate the landscape", "navigating the complexities" - "unlock the potential of", "unleash" - "game-changer", "revolutionize" - "leverage" (as a verb) Rewrite every hit — do not just delete; the surrounding sentence is usually also lazy. --- ## Step 5 — AI-SEO audit ### Programmatic pass (if ai-seo-mcp is connected) If the ai-seo-mcp server is connected, call `audit_page` on the draft before running the manual passes: ``` audit_page(url_or_path="tmp/blog-drafts/<slug>.draft.html") ``` Feed the score and any flagged issues into the manual passes below as additional signal. The MCP output is advisory — the six manual passes are still required gates. ### Manual passes Run the audit against the draft, checking each pass: 1. **Structure pass** — does the lead answer the query in the first paragraph; do H2s match query phrasing; is each section self-contained. 2. **Authority pass** — at least one cited primary source (vendor doc / GitHub issue / changelog) on a relevant noun phrase. 3. **Freshness pass** — current year referenced where it makes sense; version numbers are current. **Currency check, mandatory:** any version number cited must still be the current (or one of the still-supported) versions per vendor docs. A 6-month-old "introduced in CrewAI 0.114" may now read as historical context, not present-tense scope. If the version has rolled forward, either update the framing or add `as of <YYYY-MM>` next to the claim so the reader knows the time-context. Vendors ship fast; stale qualifiers tank citation quality. 4. **Schema readiness** — most platforms emit Article + Person + Organization schema automatically. Step 7b adds FAQPage + BreadcrumbList (always) and HowTo (procedural posts only). Confirm the FAQ block has H3 question + paragraph answer pairs the 7b extractor can parse. 5. **Long-tail coverage** — does the FAQ block capture 3-6 long-tail variants of the main query. 6. **Platform-fact pass** — any claim about a specific shell, OS, language runtime, or tool is a verifiable fact, not a vibe. Verify the load-bearing ones against vendor docs before publish. Apply recommendations **in place** in the draft, then re-run Step 4a (the audit may have re-introduced smart quotes). ### Non-negotiable invariants - **Body is within the format's length band** (Step 2). Count via the snippet below. - **TL;DR is the first `<p>` of the body**, opens with `<strong>TL;DR:</strong>`, 8-40 words, single sentence. - **Lead paragraph (second `<p>`) answers the query** in 1-2 sentences. - **At least one primary-source link** with `rel="nofollow noopener"`. - **FAQ block at the end** with 3-6 H3/p pairs. - **Every external `<a>` carries `rel="nofollow noopener"`.** - **Zero U+2014, U+201C, U+201D, U+2018, U+2019, U+2026, U+00A0, U+200B.** ```bash # Word count (excludes code blocks, table cells, figcaptions) python3 -c " import sys, re, pathlib html = pathlib.Path(sys.argv[1]).read_text(encoding='utf-8') no_code = re.sub(r'<pre\b[^>]*>.*?</pre>', ' ', html, flags=re.S|re.I) no_table = re.sub(r'<table\b[^>]*>.*?</table>', ' ', no_code, flags=re.S|re.I) no_fig = re.sub(r'<figure\b[^>]*>.*?</figure>', ' ', no_table, flags=re.S|re.I) text = re.sub(r'<[^>]+>', ' ', no_fig) words = re.findall(r\"[A-Za-z0-9][A-Za-z0-9'-]*\", text) print(f'{len(words)} words') " tmp/blog-drafts/<slug>.draft.html ``` ```bash # nofollow coverage on external links — expected: 0 violations. # Set CANONICAL_HOST to your blog's hostname (e.g. blog.example.com). python3 -c " import re, sys, pathlib, os from urllib.parse import urlparse html = pathlib.Path(sys.argv[1]).read_text(encoding='utf-8') host = os.environ.get('CANONICAL_HOST', '') internal = {host, f'www.{host}' if host else ''} internal = {h for h in internal if h} violations = [] for m in re.finditer(r'<a\b([^>]*)>', html, flags=re.I): attrs = m.group(1) href = re.search(r'href=\"([^\"]+)\"', attrs, flags=re.I) if not href: continue h = urlparse(href.group(1)).hostname or '' if h and h not in internal: rel = re.search(r'rel=\"([^\"]+)\"', attrs, flags=re.I) rel_val = (rel.group(1) if rel else '').lower() if 'nofollow' not in rel_val: violations.append(href.group(1)) for v in violations: print('MISSING nofollow:', v) print(f'{len(violations)} violation(s)') " tmp/blog-drafts/<slug>.draft.html ``` --- ## Step 6 — Illustrate the post (optional) Figures are not required for short posts, but **mandatory for posts >=800 words**. The rule: `figures >= max(1, words // 500)` whenever body word count >=800. An 800-word post -> 1-2 figures. A 1200-word post -> 2-3. A 1500-word post -> 3. Step 7g asserts this. Past failure mode this rule is fixing: long troubleshooting posts (1000+ words) shipped with zero figures because the agent declared the topic "too definitional" — the assert refuses those bundles. For figure generation (SVG flow diagrams, comparison charts, taxonomy diagrams, OG feature cards) see the companion `blog-figure-svg` skill — it generates accessible SVGs with consistent styling and rasterizes them for upload. The skill is CMS-agnostic; it produces PNG files that any adapter in Step 8 can upload. For screenshots, capture from the live tool (Playwright, real session, etc.), crop to the relevant region, redact tokens or personal data. Save as `tmp/blog-drafts/<slug>-<N>-<short-name>.png`. ### Splice figure tags into the draft ```html <figure> <img src="<image-url-or-path>" alt="<full description with all numbers and labels>" loading="lazy"> <figcaption>One sentence restating the takeaway in plain English (15-30 words).</figcaption> </figure> ``` **Caption rules:** - Required on every figure. Step 7g asserts this. - 15-30 words, restating the takeaway (not "Figure showing X" — say what the reader should conclude). - Allowed tags inside `<figcaption>`: `<a>` (with `rel="nofollow noopener"` for external), `<em>`. The `<img src>` value depends on the publish target: - **Ghost / WordPress**: upload first (per-adapter snippet in Step 8), then splice the returned CDN URL. - **Static-site**: copy the PNG into the site's image directory and use a relative path. --- ## Step 7 — Build the publish bundle The bundle is three files that every adapter consumes: | File | Contents | |---|---| | `<slug>.draft.html` | Body HTML (already produced in Step 3, scrubbed and audited). | | `<slug>.schema.html` | JSON-LD `<script>` blocks (FAQPage + BreadcrumbList + optional HowTo). | | `<slug>.metadata.json` | Title, slug, tags, author, meta title/description, excerpt, feature image, status, publish-at. | ### 7a. Headline and slug rules **Headline** (becomes the SEO title unless `meta_title` overrides): - Under **70 chars**. - Match the search query closely. - Lead with the verb / noun the searcher typed. **Slug** (URL fragment): - **<=60 chars.** - **Strip stop words** — drop `the`, `a`, `an`, `for`, `with`, `in`, `to`, `of`, `on`, `and`, `or`, `is`, `are`. - **No version numbers** — `n8n-1-45-2-fix` goes stale; `n8n-http-401-fix` does not. - **Match the primary keyword**, not the full headline. ```python import re STOP = {'the','a','an','for','with','in','to','of','on','and','or','is','are'} slug = "-".join(t for t in re.findall(r'[a-z0-9]+', topic.lower()) if t not in STOP) slug = slug[:60].rstrip('-') ``` ### 7b. Build JSON-LD schema (FAQPage + BreadcrumbList + HowTo) Most platforms emit Article/BlogPosting/Person/Organization schema by default. This skill **adds three more** for AI-citation extractability: - **FAQPage** — mandatory. Every post has a FAQ block (Step 3 rule). - **BreadcrumbList** — mandatory. `Home > <Primary Tag> > <Post Title>`. - **HowTo** — only for procedural formats with >=3 step-shaped H2s. **Critical gotcha for rich-text editors:** several CMSes (Ghost's Lexical, WordPress's block editor under some configurations) convert the source HTML into a structured format on save and silently drop `<script>` nodes — so JSON-LD inlined in the draft body **disappears in the live page** even though it was present in the POST payload. The blocks must go in a platform-specific "head injection" slot: | Platform | Where the schema goes | |---|---| | Ghost | `codeinjection_head` field on the post payload | | WordPress | `<head>` via a theme hook, or the Yoast / Rank Math "schema" panel | | Static-site | written directly into the rendered HTML's `<head>` by your build step | **Never append `<script type="application/ld+json">` to the body HTML.** Build it once via this step into `<slug>.schema.html`; the platform adapter in Step 8 reads that file and writes it into the correct field. ```bash # Args: slug, headline, format, primary-tag-name, canonical-base-url python3 - "<slug>" "<headline>" "<format>" "<primary-tag>" "https://blog.example.com" <<'PY' import json, re, pathlib, sys slug, headline, fmt, primary_tag, base = sys.argv[1:6] base = base.rstrip('/') draft = pathlib.Path(f"tmp/blog-drafts/{slug}.draft.html") html = draft.read_text(encoding='utf-8') def slugify(s): return re.sub(r'[^a-z0-9]+', '-', s.lower()).strip('-') blocks = [] # 1. BreadcrumbList — always blocks.append(("BreadcrumbList", { "@context": "https://schema.org", "@type": "BreadcrumbList", "itemListElement": [ {"@type":"ListItem","position":1,"name":"Home","item":f"{base}/"}, {"@type":"ListItem","position":2,"name":primary_tag, "item":f"{base}/tag/{slugify(primary_tag)}/"}, {"@type":"ListItem","position":3,"name":headline, "item":f"{base}/{slug}/"}, ], })) # 2. FAQPage — extracted from the FAQ block m = re.search(r'<h2[^>]*>\s*FAQ\s*</h2>(.*)$', html, flags=re.S|re.I) qa = [] if m: pairs = re.findall(r'<h3[^>]*>(.*?)</h3>\s*<p[^>]*>(.*?)</p>', m.group(1), flags=re.S|re.I) qa = [{"@type":"Question", "name": re.sub(r'<[^>]+>','',q).strip(), "acceptedAnswer":{"@type":"Answer","text": re.sub(r'<[^>]+>','',a).strip()}} for q, a in pairs] if qa: blocks.append(("FAQPage", {"@context":"https://schema.org","@type":"FAQPage","mainEntity":qa})) else: print("WARN: no FAQ Q/A pairs found — Step 3 requires an FAQ block", file=sys.stderr) # 3. HowTo — procedural formats with >=3 step-shaped H2s if fmt in {"how-to-fix", "how-to-connect", "how-to-automate", "use-case", "migration"}: h2s = re.findall(r'<h2[^>]*>(.*?)</h2>', html) proc = [re.sub(r'<[^>]+>','',h).strip() for h in h2s if re.match(r'^\s*(Step|How to|Fix|Configure|Set up|Install|Create|Add|Enable)', re.sub(r'<[^>]+>','',h).strip(), flags=re.I)] if len(proc) >= 3: blocks.append(("HowTo", {"@context":"https://schema.org","@type":"HowTo", "name": headline, "step":[{"@type":"HowToStep","name":s,"position":i+1} for i,s in enumerate(proc)]})) ci = "\n".join(f'<script type="application/ld+json">{json.dumps(b, ensure_ascii=False)}</script>' for _, b in blocks) pathlib.Path(f"tmp/blog-drafts/{slug}.schema.html").write_text(ci, encoding='utf-8') print(f"wrote {len(blocks)} JSON-LD block(s): {[t for t,_ in blocks]}") PY ``` ### 7c. Feature image (recommended) A feature image is shown at the top of the post and as the OG image in social shares. Strongly recommended for any post you intend to promote. Options: - **Upload a custom image** — per-adapter upload snippets are in Step 8. - **Generate a templated title card** — see the companion `blog-figure-svg` skill (`feature` variant) for a 1600x840 OG card with a clean headline + brand mark. - **Skip it** — the post will render without a hero image; social previews fall back to the site default. Whatever path you pick, capture the URL (or filesystem path for static targets) plus a one-line alt-text in `metadata.json`. **Cap alt text at 191 chars** — Ghost silently truncates at varchar(191), and the limit is a reasonable upper bound for any platform. ### 7d. Author byline Every post needs an author. The shape varies by platform; capture it generically in metadata: ```json "author": {"slug": "<author-slug>", "name": "<display name>"} ``` The adapter in Step 8 translates this to the platform's API shape: - **Ghost** — `authors: [{"slug": "<slug>"}]`. Slug must match an existing user; otherwise Ghost silently substitutes the integration owner. - **WordPress** — `author: <user-id>` (numeric). Resolve slug -> id once and cache. - **Static-site** — written into the front-matter `author:` field of the generated file. ### 7e. Tags Use a flat list of tag name strings: ```json "tags": ["How To", "n8n"] ``` **Pick 1-3 tags per post.** The first tag is the **primary tag** — it becomes the breadcrumb segment in 7b and is used by most themes for category labelling. Maintain a small canonical tag list in your project (don't let the AI invent new tags every post — duplicates dilute SEO). Common patterns: format tags (`How To`, `Tutorial`, `Comparison`, `What Is`) + topic tags (your tool/category names). ### 7f. Build the metadata bundle ```bash python3 - <<'PY' import json, pathlib, sys # Edit per post: SLUG = "<slug>" HEADLINE = "<headline>" TAGS = ["How To", "n8n"] # first entry is the primary tag passed to 7b AUTHOR_SLUG = "<author-slug>" AUTHOR_NAME = "<author display name>" FEATURE_IMAGE = "<https://cdn.example.com/feature.png>" # or "" / relative path for static FEATURE_IMAGE_ALT = "<one-line alt text, <=191 chars>" FEATURE_IMAGE_CAPTION = "<one sentence, 12-25 words, restates the post promise>" META_TITLE = "<SEO title under 60 chars>" META_DESCRIPTION = "<SEO description, 140-160 chars>" CUSTOM_EXCERPT = "<dek shown on index page>" PUBLISH_FLAG = False # set by --publish PUBLISH_AT_ISO = None # set by --publish-at <iso> # status semantics map cleanly to every adapter: # default -> "draft" # --publish -> "published" # --publish-at <iso> -> "scheduled" + published_at status, published_at = "draft", None if PUBLISH_AT_ISO: status, published_at = "scheduled", PUBLISH_AT_ISO elif PUBLISH_FLAG: status = "published" meta = { "slug": SLUG, "title": HEADLINE, "tags": TAGS, "author": {"slug": AUTHOR_SLUG, "name": AUTHOR_NAME}, "meta_title": META_TITLE, "meta_description": META_DESCRIPTION, "custom_excerpt": CUSTOM_EXCERPT, "feature_image": FEATURE_IMAGE or None, "feature_image_alt": FEATURE_IMAGE_ALT if FEATURE_IMAGE else None, "feature_image_caption": FEATURE_IMAGE_CAPTION if FEATURE_IMAGE else None, "status": status, "published_at": published_at, } pathlib.Path(f"tmp/blog-drafts/{SLUG}.metadata.json").write_text( json.dumps(meta, ensure_ascii=False, indent=2), encoding="utf-8") print("metadata written") PY ``` ### 7g. Pre-publish bundle validation Before invoking the platform adapter, all of these must hold: ```bash python3 - "<slug>" <<'PY' import json, pathlib, re, sys slug = sys.argv[1] meta = json.loads(pathlib.Path(f"tmp/blog-drafts/{slug}.metadata.json").read_text()) html = pathlib.Path(f"tmp/blog-drafts/{slug}.draft.html").read_text(encoding='utf-8') schema = pathlib.Path(f"tmp/blog-drafts/{slug}.schema.html").read_text(encoding='utf-8') assert meta.get("author", {}).get("slug"), "author.slug missing" assert meta.get("tags"), "tags list empty" # Feature image: if set, alt text is required and capped at 191 if meta.get("feature_image"): alt = meta.get("feature_image_alt") or "" assert alt.strip(), "feature_image_alt required when feature_image is set" assert len(alt) <= 191, \ f"feature_image_alt is {len(alt)} chars; cap at 191 (Ghost varchar(191))" # JSON-LD in schema.html assert '"@type": "FAQPage"' in schema or '"@type":"FAQPage"' in schema, \ "FAQPage JSON-LD missing in schema.html - re-run 7b" assert '"@type": "BreadcrumbList"' in schema or '"@type":"BreadcrumbList"' in schema, \ "BreadcrumbList JSON-LD missing in schema.html - re-run 7b" # TL;DR block check m_first_p = re.search(r'<p\b[^>]*>(.*?)</p>', html, flags=re.S|re.I) assert m_first_p, "no <p> in body - TL;DR check cannot run" first_p_inner = m_first_p.group(1) assert re.search(r'^\s*<strong>\s*TL;DR\s*:?\s*</strong>', first_p_inner, flags=re.I), \ "first <p> must open with <strong>TL;DR:</strong>" _t = re.sub(r'<code\b[^>]*>.*?</code>', '', first_p_inner, flags=re.S|re.I) _t = re.sub(r'<pre\b[^>]*>.*?</pre>', '', _t, flags=re.S|re.I) tldr_text = re.sub(r'<[^>]+>', '', _t) tldr_text = re.sub(r'^\s*TL;DR\s*:?\s*', '', tldr_text, flags=re.I).strip() tldr_words = len(re.findall(r"[A-Za-z0-9][A-Za-z0-9'\-]*", tldr_text)) assert 8 <= tldr_words <= 40, f"TL;DR must be 8-40 words, got {tldr_words}: {tldr_text!r}" mid_sentence_ends = len(re.findall(r'(?<!\.)[.!?]\s+[A-Z(]', tldr_text)) assert mid_sentence_ends == 0, \ f"TL;DR must be a single sentence; got: {tldr_text!r}" # Scheduled posts need a future timestamp if meta.get("status") == "scheduled": import datetime pa = meta.get("published_at") or "" assert pa, "scheduled posts require published_at" ts = datetime.datetime.fromisoformat(pa.replace("Z","+00:00")) assert ts > datetime.datetime.now(datetime.timezone.utc), \ f"scheduled published_at must be in the future, got {pa}" # H2 question-shape gate (Step 3 voice rule) # Every H2 ends with '?' OR is in the operational-label allowlist. H2_QUESTION_ALLOWLIST = { "install", "prerequisites", "links", "tl;dr", "tldr", "faq", "frequently asked questions", "summary", "references", "further reading", "sources", "bottom line", } _h2_inner = re.findall(r'<h2\b[^>]*>(.*?)</h2>', html, flags=re.S|re.I) _h2_text = [re.sub(r'\s+', ' ', re.sub(r'<[^>]+>', '', h)).strip() for h in _h2_inner] _bad_h2 = [h for h in _h2_text if h and not h.endswith('?') and h.lower().strip(':?. ') not in H2_QUESTION_ALLOWLIST] assert not _bad_h2, \ "H2s must end with '?' or be in the allowlist (Step 3). Bad H2s: " + \ "; ".join(repr(h) for h in _bad_h2) + \ ". Rewrite as natural-language questions, e.g. 'How do you ...?', 'Why does ...?', 'When should you ...?'." # Body word count (same recipe as Step 5) _no_code = re.sub(r'<pre\b[^>]*>.*?</pre>', ' ', html, flags=re.S|re.I) _no_table = re.sub(r'<table\b[^>]*>.*?</table>', ' ', _no_code, flags=re.S|re.I) _no_fig = re.sub(r'<figure\b[^>]*>.*?</figure>', ' ', _no_table, flags=re.S|re.I) _no_script = re.sub(r'<script\b[^>]*>.*?</script>', ' ', _no_fig, flags=re.S|re.I) _text_only = re.sub(r'<[^>]+>', ' ', _no_script) _words = len(re.findall(r"[A-Za-z0-9][A-Za-z0-9'-]*", _text_only)) # Figure-count gate (Step 6 rule): max(1, words // 500) when body >= 800 words fig_count = len(re.findall(r'<figure\b', html, flags=re.I)) _required = max(1, _words // 500) if _words >= 800 else 0 assert fig_count >= _required, \ f"figure shortfall: {fig_count} present, {_required} required for {_words}-word body. Step 6." # Bullet discipline gate (Step 3 voice rule) # Reject any <ul>/<ol> with fewer than 3 items or more than 9 items. # Recap-checklist <ol> after the last H2 question is exempt from the upper bound; # common practice ships a 5-7 step recap that should not be split. _lists = re.findall(r'<(ul|ol)\b[^>]*>(.*?)</\1>', html, flags=re.S|re.I) _bad_lists = [] for kind, body in _lists: items = re.findall(r'<li\b', body, flags=re.I) n = len(items) if n < 3: _bad_lists.append(f"<{kind}> with {n} items (min 3; convert to prose)") elif n > 9: _bad_lists.append(f"<{kind}> with {n} items (max 9; split or use a <table>)") assert not _bad_lists, \ "bullet discipline (Step 3): " + "; ".join(_bad_lists) # Currency check (Step 3 / Step 5 rule) # Flag any cited year that is > 1 year stale relative to the current year # unless an explicit 'as of <YYYY>' or 'as of <YYYY-MM>' qualifier sits within 80 chars. import datetime as _dt _now_year = _dt.datetime.now(_dt.timezone.utc).year _text_for_dates = re.sub(r'<(pre|code|script|style)\b[^>]*>.*?</\1>', ' ', html, flags=re.S|re.I) _stale = [] for m in re.finditer(r'\b(20\d{2})\b', _text_for_dates): y = int(m.group(1)) if y > _now_year: continue if _now_year - y <= 1: continue window_start = max(0, m.start() - 80) window = _text_for_dates[window_start:m.end() + 80] if re.search(r'as of\s+20\d{2}', window, flags=re.I): continue _stale.append(m.group(1)) assert not _stale, \ "currency check (Step 5): stale year(s) cited without 'as of <YYYY>' qualifier: " + \ ", ".join(sorted(set(_stale))) + \ f". Either update to {_now_year - 1}-{_now_year} or add 'as of <YYYY-MM>' within 80 chars of the year." # Figure caption gate: every <figure> must contain a non-empty <figcaption> figures = re.findall(r'<figure\b[^>]*>.*?</figure>', html, flags=re.S|re.I) uncaptioned = [] for i, fig in enumerate(figures, 1): cap = re.search(r'<figcaption\b[^>]*>(.*?)</figcaption>', fig, flags=re.S|re.I) if not cap or not re.sub(r'<[^>]+>', '', cap.group(1)).strip(): src = re.search(r'<img[^>]*src="([^"]+)"', fig) uncaptioned.append(f"figure {i} ({src.group(1) if src else 'no src'})") assert not uncaptioned, \ "missing/empty <figcaption> on: " + ", ".join(uncaptioned) # Outbound interlink survival (Step 1f rule): every planned URL appears in the draft _il_path = pathlib.Path(f"tmp/blog-drafts/{slug}.interlinks.json") if _il_path.exists(): _il = json.loads(_il_path.read_text(encoding='utf-8')) _missing = [t["url"] for t in _il.get("outbound", []) if t.get("url") and f'href="{t["url"]}"' not in html] assert not _missing, \ "outbound interlinks planned in Step 1f but missing from draft: " + \ ", ".join(_missing) + ". Splice them into Step 3 prose or remove from interlinks.json." print(f"bundle OK ({_words} words, {fig_count} figures, all captioned, {len(_h2_text)} H2s)") PY ``` If any assert fires, fix and re-build before Step 8. ### 7h. Refresh metadata snapshot Save a small JSON snapshot of the post's facts so a future refresh pass can identify staleness without re-reading the prose. Cheap to write now; expensive to backfill at 500 posts. ```bash python3 - "<slug>" "<format>" <<'PY' import json, pathlib, datetime, re, sys slug, fmt = sys.argv[1], sys.argv[2] html = pathlib.Path(f"tmp/blog-drafts/{slug}.draft.html").read_text(encoding='utf-8') # Version detector: requires a leading "v" OR a preceding tool/runtime keyword # to avoid swallowing IPv4 octets ("127.0.0.1" -> "127.0.0") on networking posts. # Add your own keywords to the second alternation for project-specific tools. versions = sorted(set( re.findall(r'\bv\d+\.\d+(?:\.\d+)?\b', html) + [m.group(2) for m in re.finditer( r'\b(version|node|n8n|python|ubuntu|debian|docker|nginx|caddy|postgres|sqlite|claude|cursor|wordpress|ghost)\b[^\n<]{0,15}?(\d+\.\d+(?:\.\d+)?)', html, flags=re.I)] )) record = { "slug": slug, "published_at": datetime.datetime.now(datetime.timezone.utc).isoformat(), "format": fmt, "versions_cited": versions, "prices_cited": sorted(set(re.findall(r'\$\d+(?:\.\d+)?(?:/\w+)?', html))), "years_cited": sorted(set(re.findall(r'\b20\d{2}\b', html))), "external_sources": sorted(set( m.group(1) for m in re.finditer(r'href="(https?://[^"]+)"', html))), } pathlib.Path(f"tmp/blog-drafts/{slug}.refresh.json").write_text( json.dumps(record, indent=2), encoding='utf-8') print(f"refresh snapshot written: {len(versions)} versions, " f"{len(record['years_cited'])} years, {len(record['external_sources'])} external sources") PY ``` When a topic refresh comes due (typically every 6-12 months for high-traffic posts), the refresh skill (future / your-own) diffs the snapshot's `versions_cited` against current vendor docs. Versions that have rolled forward by a major release are flagged for rewrite; everything else is left alone. ### 7i. Glossary auto-link (optional) If you maintain a glossary of technical terms with definition pages on your site, pipe the draft HTML through `scripts/inject-glossary-links.py` to turn the first mention of each known term into an internal link to its definition page. Each link also carries a `data-definition` attribute that the bundled `references/decorate.js` snippet renders as a hover tooltip on the published page. **Skip this step if** you don't have a `glossary.json` file yet — there's no default. See [references/glossary-schema.md](references/glossary-schema.md) for the file shape and a starter example. ```bash python3 scripts/inject-glossary-links.py \ tmp/blog-drafts/<slug>.draft.html \ --glossary path/to/glossary.json \ --base-url /glossary/ \ --max-links 6 \ > tmp/blog-drafts/<slug>.draft.linked.html mv tmp/blog-drafts/<slug>.draft.linked.html tmp/blog-drafts/<slug>.draft.html ``` The injector: - Links **first occurrence only** per term per post (Wikipedia rule). - Caps at `--max-links` (default 6), priority-sorted from the glossary. - Skips headings, code/pre, tables, blockquotes, asides, existing links, and the TL;DR paragraph. - Rejects matches embedded in identifier-like compounds (`user-agent` won't match `agent`, `@scope/ai-seo-mcp` won't match `mcp`). - Writes a `data-definition` attribute on each link for the tooltip. Run order: **after Step 7g validates the draft** so the validator's structural asserts run on clean HTML; **before Step 8 publishes** so the linked HTML is what ships. Glossary links count as internal navigation, not outbound — the Step 7g outbound-survival assert ignores them. To enable the hover tooltip on the live site, copy `skills/seo-blog-writer/references/decorate.js` into your theme bundle (or paste it inline in a `<script>` tag in your site `<head>`) once. It's self-contained, ~1 KB, no dependencies, and skips itself on `/glossary/*` pages. --- ## Step 8 — Publish via the platform adapter Pick one adapter per run. Each adapter reads the same bundle (`<slug>.draft.html`, `<slug>.schema.html`, `<slug>.metadata.json`) and writes the post to its target platform. --- ### Adapter A — Ghost (Admin API) The Ghost adapter uses the Admin API over HTTPS. No Docker, no SSH — just authenticated POST to `/ghost/api/admin/posts/`. **Credentials**: | Env var | Source | Shape | |---|---|---| | `GHOST_URL` | Your Ghost site URL | `https://blog.example.com` (no trailing slash) | | `GHOST_ADMIN_KEY` | Ghost admin -> Settings -> Integrations -> (your integration) -> **Admin API Key** | `<24-hex>:<64-hex>` combined | Preflight: ```bash curl -sS "$GHOST_URL/ghost/api/admin/site/" | head -c 80 [ -n "$GHOST_URL" ] && [ -n "$GHOST_ADMIN_KEY" ] && echo "keys present" || echo "MISSING" ``` **Image upload** (call once per figure, then splice the returned URL into the draft): ```bash python3 - <<'PY' import os, sys, pathlib, datetime, requests, jwt GHOST_URL = os.environ['GHOST_URL'].rstrip('/') key = os.environ['GHOST_ADMIN_KEY'] kid, secret = key.split(':', 1) iat = int(datetime.datetime.now(datetime.timezone.utc).timestamp()) token = jwt.encode( {'iat': iat, 'exp': iat + 5 * 60, 'aud': '/admin/'}, bytes.fromhex(secret), algorithm='HS256', headers={'kid': kid, 'alg': 'HS256', 'typ': 'JWT'}, ) img_path = pathlib.Path(sys.argv[1]) with img_path.open('rb') as f: r = requests.post( f"{GHOST_URL}/ghost/api/admin/images/upload/", headers={'Authorization': f'Ghost {token}'}, files={'file': (img_path.name, f, 'image/png')}, data={'purpose': 'image'}, ) r.raise_for_status() print(r.json()['images'][0]['url']) PY ``` **Publish the post**: ```bash python3 - "<slug>" <<'PY' import os, sys, json, pathlib, datetime, requests, jwt slug = sys.argv[1] ghost_url = os.environ['GHOST_URL'].rstrip('/') key = os.environ['GHOST_ADMIN_KEY'] kid, secret = key.split(':', 1) meta = json.loads(pathlib.Path(f"tmp/blog-drafts/{slug}.metadata.json").read_text()) html = pathlib.Path(f"tmp/blog-drafts/{slug}.draft.html").read_text(encoding='utf-8') schema = pathlib.Path(f"tmp/blog-drafts/{slug}.schema.html").read_text(encoding='utf-8') post = { "title": meta["title"], "slug": meta["slug"], "html": html, "status": meta["status"], "tags": [{"name": t} for t in meta["tags"]], "authors": [{"slug": meta["author"]["slug"]}], "meta_title": meta["meta_title"], "meta_description": meta["meta_description"], "custom_excerpt": meta["custom_excerpt"], "codeinjection_head": schema, } if meta.get("feature_image"): post["feature_image"] = meta["feature_image"] post["feature_image_alt"] = meta["feature_image_alt"] post["feature_image_caption"] = meta["feature_image_caption"] if meta.get("published_at"): post["published_at"] = meta["published_at"] iat = int(datetime.datetime.now(datetime.timezone.utc).timestamp()) token = jwt.encode( {'iat': iat, 'exp': iat + 5 * 60, 'aud': '/admin/'}, bytes.fromhex(secret), algorithm='HS256', headers={'kid': kid, 'alg': 'HS256', 'typ': 'JWT'}, ) r = requests.post( f"{ghost_url}/ghost/api/admin/posts/?source=html", headers={'Authorization': f'Ghost {token}', 'Content-Type': 'application/json', 'Accept-Version': 'v5.0'}, json={"posts": [post]}, ) if not r.ok: print(f"FAILED {r.status_code}: {r.text}", file=sys.stderr); sys.exit(1) created = r.json()['posts'][0] print(json.dumps({'id': created['id'], 'url': created.get('url'), 'slug': created.get('slug'), 'status': created.get('status')}, indent=2)) PY ``` `?source=html` tells Ghost to convert the `html` field into Lexical. Without it, Ghost treats the field as Lexical JSON and the POST fails with a 422. **Python deps**: `pip install requests pyjwt`. PyJWT 2.x required. --- ### Adapter B — WordPress (REST API) Uses the WordPress REST API with **Application Password** auth (Users -> Profile -> Application Passwords). Works on any WP site with REST exposed at `/wp-json/wp/v2/`. **Credentials**: | Env var | Source | Shape | |---|---|---| | `WP_URL` | Your WordPress site URL | `https://blog.example.com` (no trailing slash) | | `WP_USER` | The WP username the app password belongs to | `admin` | | `WP_APP_PASSWORD` | Profile -> Application Passwords -> new -> "seo-blog-writer" | `xxxx xxxx xxxx xxxx xxxx xxxx` | Preflight: ```bash curl -sS "$WP_URL/wp-json/wp/v2/" | head -c 120 [ -n "$WP_URL" ] && [ -n "$WP_USER" ] && [ -n "$WP_APP_PASSWORD" ] && echo "keys present" || echo "MISSING" ``` **Image upload** (returns the media id and URL): ```bash python3 - <<'PY' import os, sys, pathlib, requests from requests.auth import HTTPBasicAuth img = pathlib.Path(sys.argv[1]) r = requests.post( f"{os.environ['WP_URL'].rstrip('/')}/wp-json/wp/v2/media", auth=HTTPBasicAuth(os.environ['WP_USER'], os.environ['WP_APP_PASSWORD']), headers={"Content-Disposition": f'attachment; filename="{img.name}"', "Content-Type": "image/png"}, data=img.read_bytes(), ) r.raise_for_status() j = r.json(); print(j['id'], j['source_url']) PY ``` **Publish the post**: ```bash python3 - "<slug>" <<'PY' import os, sys, json, pathlib, requests from requests.auth import HTTPBasicAuth slug = sys.argv[1] wp = os.environ['WP_URL'].rstrip('/') auth = HTTPBasicAuth(os.environ['WP_USER'], os.environ['WP_APP_PASSWORD']) meta = json.loads(pathlib.Path(f"tmp/blog-drafts/{slug}.metadata.json").read_text()) html = pathlib.Path(f"tmp/blog-drafts/{slug}.draft.html").read_text(encoding='utf-8') schema = pathlib.Path(f"tmp/blog-drafts/{slug}.schema.html").read_text(encoding='utf-8') # Resolve tag names -> term ids (create if missing) def ensure_tag(name): g = requests.get(f"{wp}/wp-json/wp/v2/tags", auth=auth, params={"search": name}).json() for t in g: if t['name'].lower() == name.lower(): return t['id'] return requests.post(f"{wp}/wp-json/wp/v2/tags", auth=auth, json={"name": name}).json()['id'] # Resolve author slug -> user id def author_id(slug): u = requests.get(f"{wp}/wp-json/wp/v2/users", auth=auth, params={"slug": slug}).json() if not u: sys.exit(f"no WP user with slug {slug!r}") return u[0]['id'] status_map = {"draft": "draft", "published": "publish", "scheduled": "future"} # WordPress doesn't have a clean "codeinjection_head" slot. Two options: # 1. Schema goes into a custom field (`meta`) and a theme hook reads it into <head>. # 2. Schema is appended to the body (works because WP doesn't strip <script> on save # *if the user has unfiltered_html — see notes below). # Option 2 is the path of least resistance for a vanilla WP; we use that here. body = html + "\n" + schema post = { "title": meta["title"], "slug": meta["slug"], "content": body, "status": status_map[meta["status"]], "tags": [ensure_tag(t) for t in meta["tags"]], "author": author_id(meta["author"]["slug"]), "excerpt": meta["custom_excerpt"], # Yoast / Rank Math read these via their own meta keys; vanilla WP ignores them. "meta": {"_yoast_wpseo_title": meta["meta_title"], "_yoast_wpseo_metadesc": meta["meta_description"]}, } if meta.get("published_at"): post["date_gmt"] = meta["published_at"].replace("Z", "") r = requests.post(f"{wp}/wp-json/wp/v2/posts", auth=auth, json=post) if not r.ok: print(r.status_code, r.text, file=sys.stderr); sys.exit(1) j = r.json() print(json.dumps({'id': j['id'], 'url': j['link'], 'status': j['status']}, indent=2)) PY ``` **Notes**: - `featured_media` in the post payload is a media **id**, not a URL. Upload the feature image first, capture the id, then set `post["featured_media"] = <id>`. - WordPress accepts `<script>` in `content` only if the user has the `unfiltered_html` capability (admins do by default; editors may not). If your user lacks it, install a small theme snippet that reads the schema from a post meta key into `wp_head`. --- ### Adapter C — Static-site (file output) For Hugo / Astro / Eleventy / Jekyll / Next-MDX style setups where posts live as files in a git repo. The adapter writes the bundle into the target directory; your usual build + deploy takes it from there. **No credentials.** Just a target path. ```bash python3 - "<slug>" "<out-dir>" <<'PY' import json, pathlib, sys slug, out_dir = sys.argv[1], pathlib.Path(sys.argv[2]) out_dir.mkdir(parents=True, exist_ok=True) meta = json.loads(pathlib.Path(f"tmp/blog-drafts/{slug}.metadata.json").read_text()) html = pathlib.Path(f"tmp/blog-drafts/{slug}.draft.html").read_text(encoding='utf-8') schema = pathlib.Path(f"tmp/blog-drafts/{slug}.schema.html").read_text(encoding='utf-8') # Hugo / Jekyll-style YAML front matter; tweak the field names for your SSG. fm_lines = [ "---", f'title: {json.dumps(meta["title"])}', f'slug: {meta["slug"]}', f'date: {meta.get("published_at") or ""}', f'draft: {str(meta["status"] == "draft").lower()}', f'author: {meta["author"]["slug"]}', f'tags: {json.dumps(meta["tags"])}', f'description: {json.dumps(meta["meta_description"])}', ] if meta.get("feature_image"): fm_lines.append(f'feature_image: {meta["feature_image"]}') fm_lines.append(f'feature_image_alt: {json.dumps(meta["feature_image_alt"])}') fm_lines.append("---\n") post_path = out_dir / f"{slug}.html" post_path.write_text("\n".join(fm_lines) + html, encoding='utf-8') (out_dir / f"{slug}.schema.html").write_text(schema, encoding='utf-8') print(f"wrote {post_path}") print(f"wrote {out_dir / f'{slug}.schema.html'} (include in <head> via your SSG template)") PY ``` Your SSG's layout template needs one line to include the schema in `<head>` — e.g. for Hugo: ```html {{ if (fileExists (printf "content/posts/%s.schema.html" .File.BaseFileName)) }} {{ readFile (printf "content/posts/%s.schema.html" .File.BaseFileName) | safeHTML }} {{ end }} ``` For Astro / Eleventy / Next, do the equivalent (read file at build time, inject into the layout head). --- ### Adapter D — bring-your-own The bundle is a stable contract. Any platform with an "upload an image" and a "create a post" endpoint can be adapted in ~50 lines. The contract: - `<slug>.draft.html` — body HTML, post-scrub, post-audit. - `<slug>.schema.html` — JSON-LD `<script>` blocks to inject in `<head>`. - `<slug>.metadata.json` — title, slug, tags (string list), author (slug + name), meta title/desc, excerpt, feature image (URL or local path), status (`draft` / `published` / `scheduled`), published_at (ISO). Adapter examples shipped above (Ghost, WordPress, static) cover ~90% of small-publisher use cases. Webflow CMS, Sanity, Strapi, and Contentful each take a similar shape: POST to the platform's content endpoint with their auth header, body field, and metadata fields. --- ### Step 8b. Report back to the user Whatever adapter ran, the final report includes: - Draft URL or live URL (`<base-url>/<slug>/` if published; admin edit URL if draft). - Platform admin / repo edit URL. - Word count, tag list, author slug. - Confirmation: scrub passed, AI-SEO audit applied, FAQ block present, JSON-LD injected. - Figure URLs and captions. --- ## Step 9 — Verify live post (only if `--publish`) ```bash # Post is reachable curl -sSI "<base-url>/<slug>/" | head -5 # Post in RSS curl -sS "<base-url>/rss/" | grep -o "<title>[^<]*</title>" | head -5 # Post in sitemap (path varies by platform — Ghost: /sitemap-posts.xml; WP: /sitemap.xml; SSG: as configured) curl -sS "<base-url>/sitemap-posts.xml" | grep "<slug>" # OG + full schema set rendered curl -sS "<base-url>/<slug>/" | grep -o 'property="og:[^"]*"' | sort -u curl -sS "<base-url>/<slug>/" | grep -oE '"@type":\s*"[^"]+"' | sort -u ``` **Expected:** `HTTP/2 200`, slug in RSS and sitemap, `og:title`/`og:description` present. The `"@type"` set must include **`Article`** (or `BlogPosting`), **`FAQPage`**, and **`BreadcrumbList`**; procedural how-to posts must also include **`HowTo`**. Missing FAQPage/BreadcrumbList means the schema slot wasn't wired correctly — check the platform-specific head-injection field. --- ## What this skill does NOT do - **Does not commit to git.** Adapters write to CMS APIs or to your static-site directory; the latter you commit yourself. - **Does not schedule posts by default.** Pass `--publish-at <ISO-UTC>` to schedule. Without it the post lands as draft (default) or live (`--publish`). - **Does not handle member-only posts, newsletters, or email sends.** Each platform's newsletter flow is manual via its admin UI. - **Does not generate figures.** Use the companion `blog-figure-svg` skill for SVG charts, taxonomies, and flow diagrams. - **Does not research topics from scratch.** Use the companion `blog-topic-research` skill to validate a topic has real demand signals before drafting. --- ## Failure modes | Symptom | Adapter | Cause | Fix | |---|---|---|---| | `401 Unauthorized` | Ghost / WordPress | Key expired / wrong key / wrong app-password | Regenerate the integration / app password | | Ghost `422 Validation failed: Value in [posts.html] cannot be blank` | Ghost | Missing `?source=html` | Add the query param | | Ghost `422` with `feature_image_alt` in message | Ghost | Alt text >191 chars | Trim to <=191; Step 7g asserts this | | `404` on slug after publish | any | Post saved as draft (default) | Drafts only reachable via admin editor URL | | Body shows as one HTML blob | Ghost | Ghost fell back to plain-text mode | Re-post with `?source=html` | | Smart quotes reappear in rendered post | Ghost | Ghost typographer auto-conversion | Settings -> Publication: turn off "Use typographer's quotes" | | Wrong slug | any | Platform auto-slugged from title | PUT/PATCH the post with the corrected slug | | Ghost `409 Conflict` on PUT | Ghost | Stale `updated_at` | Re-GET to refresh, retry | | Author silently substituted | Ghost / WordPress | Author slug doesn't exist / user lacks `publish_posts` | Create the user; PUT correction with correct slug or user id | | Live page missing FAQPage / HowTo `@type` (Step 9) | Ghost | JSON-LD was inlined in the body and stripped by Lexical conversion | PUT with `codeinjection_head` set to `<slug>.schema.html`; echo current `updated_at` to avoid 409 | | WordPress strips `<script type="application/ld+json">` from body | WordPress | User lacks `unfiltered_html` | Move schema injection to a theme hook reading a post meta key | --- ## Companion skills - **`blog-topic-research`** — validate a long-tail topic has real demand signals (PAA, Reddit threads, GitHub issues) before drafting. Run this *before* this skill. - **`blog-figure-svg`** — generate accessible SVG figures (flow diagrams, comparison charts, taxonomy diagrams) with consistent styling. Run this *during Step 6* if the post needs illustrations. Together, the three form a complete long-tail SEO publishing pipeline: research the topic, write the post, illustrate it, publish. --- ## Maintenance scripts The per-post scrub in Step 4a covers the common LLM-tell characters and the per-post audit in Step 7g enforces the structural rules. For corpus-wide drift — characters or banlist phrases that crept back in across many posts — there's a separate audit script in the repo: ```bash # Sweep your published-content directory for non-ASCII chars + prose banlist python3 scripts/audit-corpus.py path/to/your/content/ # Examples (per platform): python3 scripts/audit-corpus.py tmp/blog-drafts/ # current drafts python3 scripts/audit-corpus.py content/posts/ # Hugo / Astro / 11ty python3 scripts/audit-corpus.py site/source/_posts/ # Jekyll # Add domain-specific terms you want flagged (comma-separated): python3 scripts/audit-corpus.py content/posts/ --extra "synergy,best-in-class" # CI mode: exit 1 on any hit, pipe to your notifier or fail the build python3 scripts/audit-corpus.py content/posts/ >/dev/null || echo "drift detected" ``` Default scan covers `*.html` and `*.md`. The script exits `0` clean / `1` on hits / `2` on bad invocation, so it composes with CI. Run it weekly (or as a pre-deploy step) — much cheaper than re-reading every post by hand. Don't point it at the publishing-skills repo itself or at the seo-blog-writer SKILL.md: both contain the banlist literals as data and will self-flag. Target your *content* directory, not your *tooling* directory.