prompt-optimizer

134
Iteratively evaluate and optimize prompts and rules using a 6-dimension scoring system. Auto-detects Prompt vs Rules mode and applies the matching scoring criteria.
Source
prompt-engineering prompt-optimization scoring rules cursor-rules iterative evaluation
1 skills

prompt-optimizer

# Prompt Optimizer

A semi-automatic iterative optimization skill inspired by the autoresearch
paradigm. Supports two modes — **Prompt Mode** for task-specific prompts and
**Rules Mode** for persistent system-level rules — with auto-detection.

## Workflow

### Step 1: Receive Input

Accept a prompt or rule via inline text or file path. If the user provides a
file path, read the file contents. If neither, ask the user to provide the
text they want to optimize.

### Step 1.5: Auto-Detect Mode

Classify the input as **Prompt** or **Rule** based on these signals:

| Signal | Prompt | Rule |
|--------|--------|------|
| Describes a single task with expected output | Yes | No |
| Uses persistent behavioral language ("always", "never", "when X do Y") | No | Yes |
| Contains role/persona definition for ongoing use | No | Yes |
| Expects a one-time deliverable | Yes | No |
| Located in `.cursor/rules/` or user_rules config | No | Yes |
| References other rules or system-level concerns | No | Yes |

If ambiguous, ask the user to confirm.
Display the detected mode: `[Mode: Prompt]` or `[Mode: Rules]`.

### Step 2: Evaluate — 6 Dimensions (1-10)

Select the scoring table matching the detected mode.

**Prompt Mode** — for task-specific, one-off prompts:

| Dim | Name | Guiding Question |
|-----|------|------------------|
| C | **Clarity** | Would a context-free LLM interpret this unambiguously? |
| S | **Specificity** | Are constraints, output format, and expected behavior explicit? |
| T | **Structure** | Is the information logically organized with clear hierarchy? |
| O | **Completeness** | Does it cover context, examples, edge cases, and error handling? |
| E | **Efficiency** | Is every sentence carrying necessary information? Zero fluff? |
| R | **Robustness** | Would 10 runs produce consistent, high-quality outputs? |

**Rules Mode** — for persistent system-level rules:

| Dim | Name | Guiding Question |
|-----|------|------------------|
| C | **Clarity** | Would any LLM unambiguously understand the behavioral intent? |
| S | **Scope Fit** | Is the rule's breadth appropriate — not too broad, not too narrow? |
| T | **Structure** | Is the rule well-organized and easy to scan during every conversation? |
| O | **Coverage** | Does it handle the relevant scenarios without over-specifying? |
| E | **Efficiency** | Is the token cost justified given this runs on EVERY conversation? |
| R | **Composability** | Does this rule coexist peacefully with other rules? No conflicts? |

Composite score = unweighted average of all 6 dimensions (user may override
weights).

For detailed scoring rubrics and anchor examples, see
[scoring-rubric.md](scoring-rubric.md).

### Step 3: Output the Scorecard

Use this exact format:

**Prompt Mode**:
```
== Prompt Scorecard v{N} ==
Clarity:      {score}/10  {delta}
Specificity:  {score}/10  {delta}
Structure:    {score}/10  {delta}
Completeness: {score}/10  {delta}
Efficiency:   {score}/10  {delta}
Robustness:   {score}/10  {delta}
------------------------------
Composite:    {avg}/10    {delta}

Weakest:  {dimension_name}
Verdict:  {one-line diagnosis}
```

**Rules Mode**:
```
== Rules Scorecard v{N} ==
Clarity:       {score}/10  {delta}
Scope Fit:     {score}/10  {delta}
Structure:     {score}/10  {delta}
Coverage:      {score}/10  {delta}
Efficiency:    {score}/10  {delta}
Composability: {score}/10  {delta}
------------------------------
Composite:     {avg}/10    {delta}

Weakest:  {dimension_name}
Verdict:  {one-line diagnosis}
```

- For v1, leave `{delta}` blank.
- For v2+, show delta as `(+1)`, `(-1)`, or `(=)` relative to previous version.

### Step 4: Suggest Improvements

Focus on the weakest 1-2 dimensions only. Greedy strategy — small targeted
fixes avoid regression on other dimensions.

Each suggestion must be:
1. **Concrete** — show the exact text to add, remove, or rewrite.
2. **Justified** — explain which dimension it targets and why.
3. **Minimal** — smallest change for maximum score uplift.

### Step 5: User Confirmation

Present the suggested changes and wait for user confirmation:
- **Confirmed** → apply changes, go to Step 6.
- **Modified** → incorporate user adjustments, then go to Step 6.
- **Rejected** → generate alternative suggestions, return to Step 4.

### Step 6: Apply and Re-evaluate

1. Produce the new prompt version.
2. Re-run the 6-dimension evaluation (Step 2).
3. Output the updated scorecard with deltas.
4. Append to the version history table.

### Step 7: Version History

Maintain a running table throughout the session. Column headers adapt to mode:

**Prompt Mode**: `| Version | C | S | T | O | E | R | Composite | Change Summary |`
**Rules Mode**: `| Version | C | SF | T | Cov | E | Comp | Composite | Change Summary |`

```
| Version | C | S | T | O | E | R | Composite | Change Summary |
|---------|---|---|---|---|---|---|-----------|----------------|
| v1      | 5 | 4 | 6 | 3 | 7 | 4 | 4.8       | baseline       |
| v2      | 7 | 4 | 6 | 5 | 7 | 5 | 5.7       | added examples  |
```

### Step 8: Termination

The loop ends when:
- Composite score >= 8.5, OR
- User explicitly says they are satisfied.

On termination, output:
1. The final optimized prompt (complete text).
2. The full version history table.
3. A summary of key improvements made.

## Optimization Principles

1. **One thing at a time** — never rewrite the entire prompt in one iteration.
   Target the weakest dimension with surgical changes.
2. **Never break what works** — if a dimension scored 8+, do not touch the
   text responsible for that score unless absolutely necessary.
3. **Simplicity over cleverness** — if two rewrites achieve the same score
   gain, pick the shorter one.
4. **Evidence over intuition** — justify every score with a specific quote
   or absence from the prompt text.
5. **Respect user intent** — the optimization must preserve the user's
   original purpose. If unclear, ask before changing.

## Quick Examples

For complete optimization walkthroughs (from low-score to high-score), see
[examples.md](examples.md).