prompt-to-asset

Reliably generate production-grade software assets (logos, app icons, favicons, OG images, illustrations, vectors, transparent marks) from simple briefs.

Source

8 skills

app-icon

Generate an app icon from a single 1024² master and fan out to iOS (AppIcon.appiconset with squircle-ready 1024 opaque), Android (adaptive foreground + background + Android 13 monochrome), PWA (192/512 + 512 maskable with 80% safe zone), and visionOS (3-layer parallax).

# App icon generation

## Platform requirements (non-negotiable)

| Platform | Size/format | Transparency | Safe zone |
|---|---|---|---|
| **iOS App Store** | 1024×1024 PNG, **no alpha** | **opaque** | squircle mask applied by OS; keep subject in 824px center |
| iOS device | 180, 167, 152, 120, 87, 80, 76, 60, 58, 40, 29, 20 (@1x, @2x, @3x) | opaque | same |
| iOS 18 dark / tinted | Icon Composer layered source | per-layer | same |
| **Android adaptive** | 108 dp foreground + 108 dp background; 72 dp visible safe zone | FG yes / BG no | 72 dp of 108 dp |
| Android 13 monochrome | themed drawable | yes | same |
| Google Play | 512×512 PNG, no alpha | opaque | — |
| **PWA** | 192, 512 `any`; 512 `maskable` | `maskable`: opaque + 80% safe zone | 80% of 512 for maskable |
| **visionOS** | 3× 1024² PNGs (parallax layers) | per-layer | — |

## Generation — mark, then pack

**Never generate per-platform. Always: one 1024² RGBA master → deterministic fan-out.**

1. Route the *mark* generation (see `skills/logo/SKILL.md`) — no text; subject-only.
2. Matte with BiRefNet if the chosen provider returned opaque.
3. Deterministic export via `pipeline/export.ts::exportAppIconBundle` (pure `sharp` — no external CLI dependency):
   - **iOS:** `AppIcon.appiconset/Contents.json` + every required PNG (iPhone/iPad/Mac/Watch/marketing). 1024 marketing variant flattened onto brand primary color (App Store rejects alpha). iOS 18 dark/tinted `appearances` emitted when `ios_18_appearances: true`.
   - **Android:** `mipmap-{mdpi,hdpi,xhdpi,xxhdpi,xxxhdpi}/ic_launcher.png` + `mipmap-anydpi-v26/ic_launcher.xml` pointing to `ic_launcher_foreground` + `ic_launcher_background` + `ic_launcher_monochrome` (foreground-derived via greyscale+threshold; the final themed-icon result should still be audited manually for complex marks).
   - **PWA:** `192.png`, `512.png`, `512-maskable.png` (80% safe-zone padding) + `manifest-snippet.json`.
   - **visionOS:** 1024² `master.png` + a README describing the 3-layer parallax split that Xcode's Reality Composer Pro or a manual Photoshop step produces.
   - **Favicon bonus:** `.ico` (16/32/48 multi-res) via optional `png-to-ico`; falls back to separate PNGs with a warning if the dep is absent.

## Prompt scaffold

```
A [flat vector | isometric 3D | glyph | soft gradient] app icon representing [SUBJECT, concrete noun phrase].
Bold, memorable silhouette. High contrast.
Subject fills 70-80% of frame, centered.
No text, no labels, no wordmark.
Palette: [#primary, #secondary, #accent].
Solid pure white background.
1:1 square, 1024x1024.
```

For platform styling: `iOS-style rounded square backdrop` or `Android-style adaptive foreground on transparent` can be added to steer, but the **mark should be subject-only** — backdrop is applied deterministically in export.

## Android 13 monochrome derivation

Heuristic (see Open Question G11 in SYNTHESIS.md): `sharp(foreground).greyscale().threshold(128).tint('#000')`. Works for bold single-subject marks; audit visually for complex marks.

## Validation

- 1024 marketing variant: opaque, dimensions exact, file <1MB.
- Android foreground: tight-bbox inside 72dp of 108dp.
- PWA maskable: subject inside 80% center circle.
- Contrast at 16×16 (renders to "favicon-ish" size): WCAG AA vs both white and dark card.
- No text detected by OCR (app icons must be text-free).

## Output
```
app-icon/
├── master.png                      # 1024² RGBA
├── ios/AppIcon.appiconset/         # Contents.json + all sizes
├── ios/AppIcon-1024-opaque.png     # App Store marketing
├── android/mipmap-*/ic_launcher.png
├── android/mipmap-anydpi-v26/ic_launcher.xml
├── android/drawable/ic_launcher_{foreground,background,monochrome}.png
├── pwa/{192,512,512-maskable}.png
├── pwa/manifest-snippet.json
├── visionos/{front,middle,back}.png   # if parallax
└── meta.json
```

asset-enhancer

Classify a software-asset brief (logo, app icon, favicon, OG image, illustration, splash, icon pack, transparent mark), route to the right image model, rewrite the prompt in the target model's dialect, pick an execution mode (inline_svg / external_prompt_only / api) based on what's actually available, and run the pipeline. Use whenever the user asks for any visual asset for a software product.

# asset-enhancer skill

Comprehensive behavior spec for producing production-grade software-development assets. Drives every `asset_*` MCP tool.

## Cardinal rules

> 1. Producing production-grade software assets is a **routing + post-processing** problem, not a prompt-engineering problem.
> 2. The user may not have an image-model API key. The plugin must work anyway.
> 3. There are real zero-key paths — always surface them first before asking the user to pay for anything. The `free_api.routes` block in `asset_capabilities()` enumerates Pollinations (zero-signup HTTP), Stable Horde (anonymous queue), HF Inference (free token), Google AI Studio free tier (best quality-to-$0 ratio), and local ComfyUI.

See `rules/asset-enhancer-activate.md` for the condensed always-on version. This file is the long-form spec.

## Three execution modes

Before generating anything, decide (with the user) which mode to use. Call `asset_capabilities()` to learn what's available in the current environment.

### 1. `inline_svg` — zero key, Claude writes the SVG, server persists

Two-step round trip:

1. The server returns an `svg_brief` (viewBox, palette, path budget, style hints, skeleton) and `instructions_to_host_llm`. You emit the `<svg>…</svg>` inline in your reply as a ```svg code block so the user can see it.
2. **Immediately after emitting, call `asset_save_inline_svg({ svg, asset_type })`.** The server validates the SVG against the brief (viewBox, `<path>` count, palette), writes `master.svg` to disk, and — for favicon / app_icon — runs the full platform export (favicon.ico + apple-touch-icon + PWA 192/512/512-maskable; or AppIconSet + Android adaptive + PWA). Returns an `AssetBundle` with `variants[].path`. Show those paths to the user.

Skip step 2 and the user gets a code block with no file. Do not skip step 2.

**Good for:** `logo`, `favicon`, `icon_pack`, `sticker`, `transparent_mark`, simple `app_icon` masters. Flat, geometric, ≤40 paths.

**Not good for:** illustrations, hero art, photoreal, OG cards with web-font typography.

### 2. `external_prompt_only` — zero key, user pastes into a web UI

The server returns the dialect-correct prompt + a list of paste targets with URLs in **free-first order**: Pollinations (zero-signup HTTP), HF Inference (free `HF_TOKEN`), Stable Horde (anonymous queue), Google AI Studio "Nano Banana" (free `GEMINI_API_KEY` — ~1,500 images/day), then paid UIs (Ideogram web, Recraft web, Midjourney, fal.ai Flux, BFL Playground, OpenAI Platform Playground, etc.). The user generates elsewhere, saves the image locally, and calls `asset_ingest_external({ image_path, asset_type })` to run the matte / vectorize / validate pipeline.

**Good for:** any asset type. Best for `illustration`, `hero`, text-heavy logos, when the user has a Midjourney / Ideogram subscription but no API key.

### 3. `api` — requires a provider key

Server calls the routed provider directly, mattes, vectorizes, exports, validates. Writes a content-addressed `AssetBundle` with `variants[].path` for every artifact.

**Requires** at least one of: `OPENAI_API_KEY`, `IDEOGRAM_API_KEY`, `RECRAFT_API_KEY`, `BFL_API_KEY`, `GEMINI_API_KEY` (or `GOOGLE_API_KEY`). The router falls back among configured providers automatically.

## Recommended flow

```
user brief
  ↓  asset_capabilities()                   → what's available RIGHT NOW
  ↓  asset_enhance_prompt({ brief })        → AssetSpec + modes_available
  ↓  ask the user which mode they want
  ↓  asset_generate_<type>({ brief, mode }) → InlineSvgPlan | ExternalPromptPlan | AssetBundle
  ↓  if inline_svg:
       1. emit <svg> in reply (user sees the shape)
       2. call asset_save_inline_svg({ svg, asset_type })
       3. show the returned variants[].path list (user sees files on disk)
  ↓  if external: show prompt + paste_targets; wait for asset_ingest_external
  ↓  if api: show variants[].path and validation warnings
```

## Asset taxonomy — closed enum

| `asset_type` | Transparency default | Vector? | Text? | inline_svg? | external_prompt_only? | api? |
|---|---|---|---|---|---|---|
| `logo` | yes (RGBA PNG **and** SVG) | preferred | wordmark optional (composite) | ✅ | ✅ | ✅ |
| `app_icon` | **no** on iOS 1024 marketing | no | no | ✅ master only | ✅ master only | ✅ full fan-out |
| `favicon` | mixed | prefer | rare | ✅ | ✅ | ✅ |
| `og_image` | no | no | yes (headline) | ❌ (web-font layout beyond LLM reach) | ✅ bg only | ✅ |
| `splash_screen` | vector icon + solid bg | no | no | ❌ | ✅ | ✅ |
| `illustration` | often | SVG where geometry allows | avoid | ❌ path budget | ✅ | ✅ |
| `icon_pack` | yes | **mandatory SVG** | no | ✅ | ✅ | ✅ |
| `hero` | optional | no | sometimes | ❌ | ✅ | ✅ |
| `sticker` | yes | no | rare | ✅ | ✅ | ✅ |
| `transparent_mark` | yes | no | avoid | ✅ | ✅ | ✅ |

## Pipeline (api mode)

```
  brief
    ↓  classify()                           → asset_type ∈ enum
    ↓  parse_brand_bundle()?                → BrandBundle | null
    ↓  compute_params(asset_type, brand)    → dimensions, transparency, safe_zone
    ↓  route(asset_type, ...)               → primary_model, fallback_model, postprocess[]
    ↓  rewrite(brief, model, brand)         → dialect-appropriate final prompt
  ── call provider with seed pinned ──
    ↓  validate_0(image)                    → reject checkerboard / bad alpha / moderation
    ↓  matte?        (birefnet | rmbg via PROMPT_TO_BUNDLE_RMBG_URL | difference fallback | native)
    ↓  vectorize?    (recraft | vtracer-on-PATH | potrace-on-PATH | llm-svg | posterize fallback)
    ↓  upscale?      (dat2 | supir | img2img | never)
    ↓  export        (sharp platform fan-out + @resvg/resvg-js + satori + png-to-ico + svgo)
    ↓  validate_final (tile-luma alpha check + bbox + palette ΔE2000 + OCR Levenshtein + WCAG contrast)
    ↓  content-address cache (prompt_hash, model, seed, params_hash)
    →  AssetBundle
```

## Model routing (data-driven)

The routing table lives in `data/routing-table.json`. Summary:

| Need | Primary | Fallback | Never |
|---|---|---|---|
| Transparent PNG mark | `gpt-image-1` (`background:"transparent"`) | Ideogram 3 Turbo `style:"transparent"` → Recraft V3 | Imagen any, Gemini any, SD 1.5 |
| Logo with 1–3 word text | Ideogram 3 → `gpt-image-1` → Recraft V3 | Composite real SVG type over mark | Imagen, SD 1.5, Flux schnell |
| Logo with >3 word text | Text-free mark + composite SVG type | — | Any diffusion sampler for the text |
| Native SVG | Recraft V3 | LLM-author SVG (simple geometry) — **this is `inline_svg` mode** | Everyone else |
| Photoreal hero | Flux Pro / Flux.2 → `gpt-image-1` → Imagen 4 | SDXL + brand LoRA | DALL·E 3 |
| Empty-state illustration | Flux Pro + brand LoRA/IP-Adapter | SDXL + LoRA → Recraft brand style | One-off Ideogram/MJ (style drift) |
| App icon | Recraft V3 or Ideogram 3 for mark → packaging pipeline | `gpt-image-1` mark → packaging | Full-bleed Imagen 4 as final |
| Favicon | SVG first (LLM-author or Recraft V3) | Raster 512 → vectorize | — |
| OG image | Satori + `@resvg/resvg-js` template (no diffusion) | Diffusion **only** for hero image inside OG template | — |

## Dialect rules per provider

Prompts are **never** forwarded verbatim. Rewrite per target family. See the per-angle research under `docs/research/` for derivations; implementation is in `packages/mcp-server/src/rewriter.ts`.

### `gpt-image-1` / `gpt-image-1.5` (OpenAI)
- Prose sentences. Subject → Context → Style → Constraints.
- For transparency: **set API param `background: "transparent"`**, do not rely on prompt.
- For text: `"Acme"` in double quotes. Ceiling ~30–50 chars.
- Never rely on `negative_prompt` field — silently ignored.

### Imagen 3 / 4 (Google) & Gemini 2.5/3 Flash Image ("Nano Banana")
- Narrative prose. Imagen has a default rewriter for prompts <30 words, so write ≥30 words of concrete description to suppress it.
- **Do not ask for transparency in the prompt** — the model renders a checkerboard visual. Ask for `"solid pure white background"` and matte externally.
- Text ceiling ~10–20 chars; anything longer → composite.
- Gemini supports multimodal edit (in-context image refs); Imagen does not.

### Stable Diffusion 1.5 / SDXL
- Tag-soup: comma-separated descriptors. `"masterpiece, 8k, studio lighting"`.
- 77-token CLIP limit — front-load important tags.
- `negative_prompt` **is** a real CFG sampler feature; use it.
- For transparency: LayerDiffuse adapter, or matte after.

### Flux.1 / Flux.2 / Flux Pro / Kontext
- Prose narrative. T5 + VLM text encoder — long-form sentences work.
- **Never send `negative_prompt` — errors out.** Use positive anchors.
- Flux.2 accepts up to 8 brand reference images.

### Midjourney v6 / v7
- Prose + `--` flags. `--sref`, `--cref`, `--mref` for consistency.
- **No official API** — only reachable via `external_prompt_only` mode.
- `--ar 1:1`, `--style raw`, `--q 2`.

### Ideogram 2 / 3 / 3 Turbo
- Prose + quoted text strings. Best-in-class typography.
- `style: "transparent"` for RGBA (v3 only).
- "Magic Prompt" expands literal prompts — toggle off when you want exact control.

### Recraft V2 / V3 / V4
- Prose. `style_id` = brand lock. `controls.colors` = hex palette enforcement.
- **Only production model with native SVG output.**
- Path count correlates with quality; reject >200 paths for a clean mark.

## Brand bundle

If a `BrandBundle` is provided, every generation injects it. Shape:

```yaml
palette: [#hex, #hex, ...]     # DTCG color tokens
style_refs: [path.png, ...]    # IP-Adapter / --sref / Recraft style_id
lora: path.safetensors?        # SDXL/Flux subject/style LoRA
sref_code: "--sref 123456"?    # Midjourney
style_id: "uuid"?              # Recraft
do_not: ["drop shadows", ...]  # Rewritten as positive anchors
logo_mark: path.svg?           # Canonical mark for composition
typography: { primary, secondary }  # Fonts for composited type
```

Use `asset_brand_bundle_parse` to canonicalize a `brand.md` / `brand.json` / DTCG `tokens.json` / AdCP spec into this shape.

## Validation

Three tiers. Tier-0 always. Tier-1 on first asset in a set. Tier-2 on any failure or on user request.

**Tier 0 — deterministic:** dimensions, alpha, checkerboard FFT, safe-zone bbox, file-size budget, DCT entropy.

**Tier 1 — metric:** VQAScore, palette ΔE2000, OCR Levenshtein, WCAG AA at 16×16.

**Tier 2 — VLM-as-judge:** Claude Sonnet / GPT vision against asset-type rubrics (not wired by default; set `PROMPT_ENHANCER_VLM_URL` to enable).

## Regenerate vs. repair

| Failure | Action |
|---|---|
| Checkerboard pattern | **Regenerate with route change** — architectural, not sampling |
| Alpha missing on transparency-required | Matte with BiRefNet / RMBG |
| Wordmark misspelled | **Drop text from prompt, composite SVG type** |
| Palette drift | Regenerate with `controls.colors` (Recraft) or recolor post |
| Safe zone violation | Regenerate with explicit center-framing + padding |
| Wrong aspect | Inpaint/outpaint via edit endpoint if available; else regenerate |
| Watermark / stock photo vibe | Regenerate with positive-anchor rewrite |
| Low contrast at 16² (favicon) | Regenerate with explicit contrast instruction |

## Caching

Key: `(model, version, seed, prompt_hash, params_hash)`. Storage: content-addressed path `assets/<hash[0:2]>/<hash>/<variant>.<ext>`. The MCP server is synchronous — `prompt_hash` is emitted on every `AssetBundle` so a future hosted tier (BullMQ / SQS / Cloudflare Queues) can use it as a deduplication key.

## Output contracts

### `InlineSvgPlan` (mode: "inline_svg")

```json
{
  "mode": "inline_svg",
  "asset_type": "logo",
  "brief": "…",
  "svg_brief": { "viewBox": "…", "palette": {...}, "path_budget": 40, "require": [...], "do_not": [...], "skeleton": "…" },
  "instructions_to_host_llm": "…"
}
```

Read `svg_brief`. Emit `<svg>…</svg>` as a ```svg code block in your reply **and then call `asset_save_inline_svg({ svg, asset_type })`** to persist the file. That tool returns an `AssetBundle` with `variants[].path` — show those paths to the user so they know the files exist on disk.

### `ExternalPromptPlan` (mode: "external_prompt_only")

```json
{
  "mode": "external_prompt_only",
  "asset_type": "logo",
  "enhanced_prompt": "…",
  "target_model": "ideogram-3-turbo",
  "paste_targets": [{ "name": "Ideogram web", "url": "https://ideogram.ai", "notes": "…" }],
  "ingest_hint": {
    "tool": "asset_ingest_external",
    "args": { "image_path": "<path>", "asset_type": "logo" }
  }
}
```

Show the prompt + the top paste target(s). After the user saves the result, call `asset_ingest_external`.

### `AssetBundle` (mode: "api")

```json
{
  "mode": "api",
  "asset_type": "logo",
  "variants": [
    { "path": "…/master.png", "format": "png", "width": 1024, "height": 1024, "rgba": true },
    { "path": "…/mark.svg", "format": "svg", "paths": 18 }
  ],
  "provenance": { "model": "recraft-v3", "seed": 1234, "prompt_hash": "…", "params_hash": "…" },
  "validations": { "tier0": { "...all pass": true } },
  "warnings": []
}
```

## Research

Every decision in this skill is traceable to a research angle. See
`docs/RESEARCH_MAP.md` for the full file-level mapping. The load-bearing
angles most relevant to day-to-day behavior:

- `docs/research/04-gemini-imagen-prompting/4c-transparent-background-checker-problem.md` — why Imagen/Gemini never gets transparency
- `docs/research/07-midjourney-ideogram-recraft/7b-ideogram-text-rendering-for-logos.md` — text ≤3 words rule
- `docs/research/08-logo-generation/8e-svg-vector-logo-pipeline.md` — three SVG paths (Recraft / LLM-author / raster-vectorize)
- `docs/research/09-app-icon-generation/9a-ios-app-icon-hig-specs.md` — 824² safe zone
- `docs/research/19-agentic-mcp-skills-architectures/` — why 13 small tools and three modes

favicon

Generate a full favicon bundle — favicon.ico (16/32/48 multi-res), icon.svg with prefers-color-scheme dark support, apple-touch-icon.png 180×180 opaque, and PWA 192/512/512-maskable — from a brand mark or a mark prompt.

# Favicon generation

## What modern browsers actually want (see research 11)

```html
<link rel="icon" href="/favicon.ico" sizes="any">
<link rel="icon" type="image/svg+xml" href="/icon.svg">
<link rel="icon" type="image/svg+xml" href="/icon-dark.svg" media="(prefers-color-scheme: dark)">
<link rel="apple-touch-icon" href="/apple-touch-icon.png">
<link rel="manifest" href="/manifest.webmanifest">
```

| File | Size | Alpha | Notes |
|---|---|---|---|
| `favicon.ico` | 16, 32, 48 packed | yes | legacy compatibility; <15KB |
| `icon.svg` | vector | yes | primary modern icon |
| `icon-dark.svg` | vector | yes | `prefers-color-scheme: dark` |
| `apple-touch-icon.png` | 180×180 | **no** — opaque | iOS home-screen; transparent = black background on iOS |
| `pwa/192.png`, `pwa/512.png` | 192, 512 | yes | manifest `any` purpose |
| `pwa/512-maskable.png` | 512, 80% safe zone padding | opaque | manifest `maskable` purpose |

## Generation strategy

**SVG-first.** Three paths, in order of preference:

1. **Reuse existing mark.** If a brand `logo-mark.svg` exists in the brand bundle, use it directly. Optimize with SVGO. Generate color variants by re-coloring SVG paths.
2. **LLM-author SVG.** For simple geometric marks (letter, glyph, shape), ask Claude/GPT to emit raw SVG with fixed viewBox, ≤40 paths, palette as hex list. Validate with `@resvg/resvg-js` rasterization.
3. **Generate raster + vectorize.** Prompt at 1024² (see `skills/logo/SKILL.md`), BiRefNet matte, K-means 4-color (favicons benefit from low color count), `potrace --color` or `vtracer --mode polygon`, SVGO.

## Prompt scaffold for raster fallback

```
A simple, memorable [letter | glyph | shape] representing [SUBJECT].
Bold silhouette. Two or three colors maximum: [#primary, #bg].
Subject fills 70% of frame, centered.
Solid pure white background.
No text, no details that vanish at 16x16.
1:1 square, 1024x1024.
```

## Dark-mode variant

Option A (recommended): generate two separate SVGs — light on white, dark on black. Never algorithmically invert (loses WCAG contrast).

Option B (compromise): re-color SVG paths via data-driven palette swap from `brand.light` and `brand.dark`.

## Validation (critical for favicons)

- Render SVG at **16×16** with `@resvg/resvg-js`.
- WCAG AA contrast ratio ≥ 4.5:1 between foreground and both white and dark browser chrome.
- No detail <2px stroke at 16×16 size.
- `.ico` file size <15KB.
- `apple-touch-icon.png` is **opaque** — reject if alpha channel present.
- `pwa/512-maskable.png` subject fits inside 80% center circle.

## Output
```
favicon/
├── favicon.ico             # 16/32/48 packed
├── icon.svg                # primary modern icon
├── icon-dark.svg           # prefers-color-scheme dark
├── apple-touch-icon.png    # 180² opaque
├── pwa/192.png
├── pwa/512.png
├── pwa/512-maskable.png
├── head-snippet.html       # <link> tags ready to paste
└── meta.json
```

illustration

Generate an in-product illustration (empty state, onboarding, hero, spot art) that obeys a brand bundle. Uses IP-Adapter / LoRA / Recraft style_id / Flux.2 brand refs for consistency across a set; validates palette ΔE2000 and style adherence.

# Illustration generation

## The consistency problem

A single illustration is easy. **A set of twelve in the same style** is the hard problem. Research 10 + 15: prompt words drift; reference images do not. Any illustration skill must inject a brand style reference at every call.

## Routing

| Model | Brand lock mechanism | Best for |
|---|---|---|
| **Flux Pro / Flux.2** | `reference_images[]` (up to 8 in Flux.2) + brand LoRA | Photoreal, stylized 3D, brand illustration sets |
| **SDXL + brand LoRA** | trained LoRA (6d recipe, ~5k steps on 20 images) | Bespoke brand style, open-weight |
| **Recraft V3** | `style_id` (brand magic) | Flat vector, editorial illustration |
| **Ideogram 3** | style codes | Loose "same vibe" — not strict lock |
| **Midjourney v6/v7** | `--sref` / `--cref` / `--mref` | Concept work; no API, community wrappers only |
| **`gpt-image-1`** | `input_image[]` | Edit / composite flows |

First illustration in a set is **human-gated**. Once approved, its style becomes the reference injected into all subsequent generations.

## Brand bundle injection

```
illustration prompt =
  [SUBJECT + SCENE from brief]
+ [style anchor: "in the style of the provided reference images"]
+ [palette: exact hex list from brand]
+ [do_not list as positive anchors: "flat matte surfaces" not "no glossy plastic"]
+ [typography reminder: "no text, no labels"]
+ [technical constraints: aspect, resolution, composition]

+ reference_images[]: [style_ref_01.png, style_ref_02.png, (prior approved illustration).png]
+ LoRA handle / style_id / --sref
```

## Prompt scaffold

```
An illustration of [SUBJECT: concrete noun phrase] in a [SCENE: clear action/context].
Composition: [centered | rule-of-thirds | off-center-left]. Subject occupies ~60% of frame.
Style: in the style of the provided reference images. Flat vector with soft gradients. 
Line weight consistent with references.
Palette strictly limited to: [#hex, #hex, #hex, #hex, #hex].
Materials: matte surfaces, soft ambient lighting, no rim lights, no lens flare.
No text, no labels, no UI elements.
[aspect ratio]. 2048x1280 resolution.
```

Drop quality modifiers on Flux (no `masterpiece, 8k` — hurts adherence). Keep them on SD.

## Post-processing

1. Background removal only if asset is spot art (use BiRefNet for soft edges).
2. Palette validation: K-means 8-color in LAB, ΔE2000 against brand palette, regenerate if max ΔE > 10.
3. Composition validation: VLM rubric check against style references.
4. Resize to target sizes (sharp premultiplied-alpha resize).

## Full-set propagation

Workflow for generating N illustrations:

1. Generate illustration #1 with the brand bundle and the brief.
2. Human gates: accept / regenerate / tweak.
3. On accept: **add illustration #1 to the style reference set.**
4. Subsequent illustrations pull the whole augmented reference set → style locks progressively tighter.
5. After 3–4 accepted illustrations, train a LoRA for even tighter lock on 20+ asset sets.

## Output
```
illustrations/
├── empty-state-projects.png
├── empty-state-tasks.png
├── onboarding-welcome.png
└── meta.json     # includes provenance + "set coherence score" across the batch
```

logo

Generate a production-grade logo (primary brand mark). Returns RGBA PNG master + SVG vector + monochrome variant. Route by text-length; never render wordmarks past 3 words in a diffusion sampler.

# Logo generation

## Spec
- **Master:** 1024×1024 RGBA PNG, transparent background, subject centered in 80% safe zone
- **Vector:** SVG, ≤200 paths, fixed viewBox, SVGO optimized
- **Monochrome:** 1-color mono variant (auto-derived via luminance threshold or separate generation)
- **No wordmark past 3 words in diffusion output** — composite SVG type

## Routing by intent
| text in mark? | vector needed? | primary | fallback |
|---|---|---|---|
| none | yes | Recraft V3 (native SVG) | `gpt-image-1` → BiRefNet → vtracer |
| none | no | `gpt-image-1` (`background:"transparent"`) | Ideogram 3 Turbo (`style:"transparent"`) |
| 1–3 words | yes | Ideogram 3 → raster 1024 → BiRefNet → K-means(6) → vtracer → SVGO | Recraft V3 |
| 1–3 words | no | Ideogram 3 | `gpt-image-1` |
| >3 words | any | Generate text-free mark, composite SVG type in app layer | — |

## Dialect-ready prompt scaffold
```
A [flat vector | isometric | brutalist | soft gradient] logo mark representing [SUBJECT, concrete noun phrase].
Centered composition. Clean silhouette, strong tight outline, high contrast.
Palette: [#hex, #hex, #hex] strictly.
["WORDMARK TEXT" in sans-serif, tight tracking | text-free mark].
Solid pure white background. Square 1:1 aspect. 1024x1024.
```

For SD: prepend `masterpiece, vector logo, flat design, minimalist, 8k, `.
For Flux: drop all quality modifiers; be concrete about subject and palette.

## Post-process
1. BiRefNet matte (or Recraft SVG → rasterize at 1024).
2. Checkerboard-pattern probe: reject if failed.
3. If SVG needed: K-means 6-color → `vtracer --mode polygon` → SVGO (conservative preset).
4. Monochrome variant: greyscale → Otsu threshold → tint brand black.
5. Validate: tight-bbox ≤ 80% safe zone, palette ΔE2000 ≤ 10, OCR matches wordmark if present.

## Failure → action
- Checkerboard → route to `gpt-image-1` (different provider).
- Wordmark garbled → regenerate mark text-free, composite SVG type.
- Palette drift → Recraft `controls.colors` or recolor post.
- >200 SVG paths → regenerate with "simpler geometry" or raise K-means color count.

## Output
```
logo/
├── master.png        # 1024² RGBA
├── mark.svg          # vector
├── mark-mono.svg     # monochrome
├── mark-inverse.png  # dark-mode variant
└── meta.json         # provenance + validations
```

og-image

Generate an Open Graph / Twitter Card social unfurler image — 1200×630 JPEG/PNG, <5MB, with deterministic typography. Defaults to Satori + @resvg/resvg-js template rendering (no diffusion). Diffusion is optional and only for the hero art layer composited behind the template.

# OG image generation

## Spec

- **Dimensions:** 1200×630 (OG 1.91:1) primary. Optional: 1080×1080 square (Instagram), 2:1 Twitter Card `summary_large_image`.
- **Format:** PNG or JPEG, <5MB, served over absolute HTTPS URL, no auth headers.
- **Text:** Deterministic — use Satori + real fonts. **Never sample typography from a diffusion model** for OG cards; headlines get garbled (see logo skill § 3-word rule).
- **Headline:** ≤8 words, 64–96pt. Subheadline: ≤15 words, 28–36pt.
- **Contrast:** WCAG AA min 4.5:1 (Slack/Discord render inside their own color backgrounds).

## Architecture

**Default: template-based, no diffusion.** The OG engine is:

1. **Satori** — converts JSX or HTML+CSS to SVG with real font shaping.
2. **`@resvg/resvg-js`** — rasterizes the SVG to PNG deterministically, no system fonts leaked.
3. **Optional hero layer** — if brief says "with illustration," diffusion-generate a 1200×630 background and composite the Satori-rendered text layer on top.
4. **`@vercel/og`** — reference Satori wrapper if running inside Next.js/Vercel.

```
brief + brand bundle
   ↓
pick template (centered hero, left-aligned, minimal)
   ↓
fill template slots:
   - title (required)
   - subtitle (optional)
   - logo.svg (optional, brand bundle)
   - accent color (brand palette)
   - background: solid | gradient | image
   ↓
[optional] generate background image (diffusion, see skills/hero or skills/illustration)
   ↓
satori(jsx) → SVG
   ↓
resvg.render(svg) → 1200×630 PNG
   ↓
validate: dimensions exact, <5MB, contrast, no-text-cropped
```

## Templates (canonical)

| Template id | Layout | Use for |
|---|---|---|
| `centered_hero` | title + subtitle centered; logo bottom-left | launch posts, generic pages |
| `left_title` | title left-aligned; hero image right | feature pages, blog posts |
| `minimal` | title + brand color block | landing pages |
| `quote` | pull-quote + attribution | blog, documentation |
| `product_card` | logo + product name + tagline + accent bar | product pages |

Templates live as JSX/TSX strings in `packages/mcp-server/src/templates/og/*.tsx`.

## Validation

- Width = 1200, height = 630 exactly.
- File <5MB.
- No text in outer 60px gutter (Slack/Twitter crop aggressively).
- WCAG AA contrast between headline and background (measure average luminance in the text-bbox region).
- If brand logo included: logo has a minimum size of 60×60px rendered.

## Output
```
og/
├── og.png        # 1200×630 PNG
├── og.jpg        # 1200×630 JPEG (higher compression)
├── og-2x.png     # 2400×1260 retina (optional)
└── meta.json
```

transparent-bg

Produce a truly RGBA-transparent asset from a brief. Handles the #1 pain (Gemini/Imagen rendering checkerboard pixels instead of real alpha). Routes to native-RGBA providers; falls back to BiRefNet / BRIA RMBG / LayerDiffuse / difference matting.

# Transparent background

## The problem (see research 13 + 04)

Almost every modern T2I VAE is **RGB-only**. Asking Imagen 3/4 or Gemini 2.5/3 Flash Image for a "transparent logo" triggers one of two failures:

1. **Flat background** — DALL·E 3, Stable Diffusion families render a white/gray background that must be matted.
2. **Checkerboard as RGB pixels** — Gemini/Imagen literally render the gray-and-white 8×8 tile pattern (because Photoshop screenshots in their training data show transparency that way). This is the "weird boxes" users report. **It cannot be fixed by prompting.**

## Fix hierarchy (apply first that fits)

### 1. Route to a native-RGBA provider

| Provider | Mechanism |
|---|---|
| **`gpt-image-1` / `gpt-image-1.5`** | API param `background: "transparent"` + `output_format: "png"` or `"webp"` |
| **Ideogram 3 Turbo** | API param `style: "transparent"` |
| **Recraft V3** | native SVG output (alpha is trivial); rasterize if raster needed |
| **LayerDiffuse on SDXL / Flux** | In-diffusion-loop transparency adapter; better edges than post-matte |

### 2. Post-process matte (for everything else)

| Matting model | License | Strength |
|---|---|---|
| **BiRefNet** | MIT | Default choice 2026; best soft-edge handling |
| **BRIA RMBG-2.0** | CC-BY-NC-4.0 (hosted API for commercial) | Best overall quality |
| **U²-Net** | Apache-2.0 | Legacy fallback |
| **rembg** | wrapper around U²-Net / BiRefNet | Easy CLI integration |
| **SAM 2** | Apache-2.0 | For complex/multi-object scenes; 2-stage (segment → matte) |

### 3. Difference matting (for semi-transparency: glass, hair, smoke)

1. Generate the same subject on pure-white background.
2. Generate again with same seed on pure-black background.
3. `alpha = (white_version - black_version + 255) / 2` in luminance.
4. Works for true semi-transparent objects where `rembg`/BiRefNet produce hard-cutout artifacts.

### 4. Vectorize-and-drop (for flat marks)

1. Generate raster at 1024² on white background.
2. BiRefNet matte → K-means quantize to N colors → `vtracer` or `potrace`.
3. SVG has alpha trivially. Rasterize at target sizes with `@resvg/resvg-js` for RGBA PNGs.

## Validation (never ship without)

Tier-0 alpha validator (see `packages/mcp-server/src/pipeline/validate.ts`):

```
1. Check PNG/WebP header has alpha channel type (RGBA, not RGB).
2. Reject if no pixels have alpha < 255 (it's opaque).
3. Reject if >5% of pixels have alpha ∈ [0.05, 0.95] AND FFT signature shows gray-tile band frequency (Gemini fake checkerboard).
4. Alpha coverage: subject pixels (alpha > 0.5) should occupy 30–85% of frame (sanity check).
5. Premultiplied-alpha check: no RGB values where alpha==0 (cleanup artifacts).
```

## Prompt scaffold for fallback matting flow

**Do not ask for transparency in the prompt.** Ask for pure white:

```
[SUBJECT, concrete].
Centered, isolated, no surrounding context or props.
Solid pure white #FFFFFF background.
Clean silhouette with distinct outline.
No drop shadow, no ground plane, no reflection.
1:1 square, 1024x1024.
```

Then post-process with BiRefNet / BRIA.

## Never do this

- `"transparent background"` as a prompt to Imagen or Gemini (checkerboard result).
- Trust `output_format: "png"` alone to produce alpha (PNG supports alpha, but the model has to emit it).
- Algorithmic `#FFFFFF → alpha=0` thresholding (loses anti-aliasing, creates jagged edges).
- Post-resize RGBA with non-premultiplied alpha (dark fringes).

## Output
```
transparent/
├── mark.png         # RGBA, validated
├── mark@2x.png      # optional high-DPI
├── mark.svg         # if vector path chosen
└── meta.json        # alpha_coverage, matting_method, validation_hash
```

vectorize

Convert a raster image to SVG. Three paths — Recraft hosted vectorization, vtracer (multi-color polygon), potrace (1-bit). Optimizes with SVGO; validates path count as a quality signal.

# Vectorize

## Three paths

### 1. Recraft `/vectorize` (hosted, highest quality)

Input: any raster. Output: clean SVG, typically 20–100 paths for a mark, 200–500 for an illustration. Commercial tier only.

```ts
const svg = await recraft.vectorize({ image: pngBuffer });
```

### 2. `vtracer` (local, multi-color polygon)

- Rust binary; `npm install vtracer` for the Node wrapper.
- Modes: `polygon` (recommended for logos), `spline` (smoother curves), `pixel` (blocky).
- Color count controlled upstream via K-means quantization before vectorization.

```
pipeline:
  raster 1024²
  → BiRefNet matte (alpha the background)
  → K-means LAB 6-color palette
  → vtracer --mode polygon --filter-speckle 4 --color-precision 6
  → SVGO (conservative preset)
```

### 3. `potrace` (local, 1-bit)

- Classical Stan Ford vectorizer; binary output only (no color).
- Best for: icon packs (single-color), typography work, line art.
- Multi-color workaround: separate each color into its own 1-bit layer, vectorize each, stack SVG `<g>`s.

```
pipeline:
  raster 1024²
  → BiRefNet matte
  → per-color mask (binary threshold per palette entry)
  → potrace per mask → <g> wrapper
  → combine into single SVG
  → SVGO
```

## Choosing the path

| Use case | Path |
|---|---|
| Budget available, one-shot quality | Recraft vectorize |
| Multi-color logo, local | vtracer (polygon mode) |
| Single-color icon pack | potrace |
| Photorealistic illustration → vector | don't — keep as PNG/WebP. Vectorization of photoreal is lossy noise. |

## Path-count quality signal

After vectorization, count `<path>` elements:

- **≤40 paths** → clean, production-ready mark.
- **40–200 paths** → likely acceptable; scan for overlapping slivers.
- **>200 paths** → probably bad. Either the input has noise (regenerate), too many colors (reduce K-means), or the subject is inappropriate for vectorization (keep as raster).

## SVGO configuration

Conservative preset — **preserve:** viewBox, IDs, classes. **Strip:** metadata, editor comments, hidden elements.

```js
{
  multipass: true,
  plugins: [
    { name: 'preset-default', params: {
        overrides: { removeViewBox: false, cleanupIds: { minify: false } } } },
    'removeDimensions',     // prefer viewBox-only
    'sortAttrs',
    'removeScriptElement'   // security
  ]
}
```

Never enable `convertPathData` with default `floatPrecision` — it collapses small strokes.

## Validation

- SVG parses with `@resvg/resvg-js` without errors.
- `<path>` count ≤ 200 for logos, ≤ 500 for illustrations.
- No `<image>` tags (sneaking raster into SVG).
- No `<script>` tags.
- viewBox present.
- Rasterized at 1024 × 1024 should be visually identical (SSIM > 0.95) to source raster.

## Output
```
vector/
├── mark.svg         # primary vectorization
├── mark.svg.orig    # pre-SVGO original (for debugging)
└── meta.json        # paths_count, colors_used, svgo_savings_bytes
```