sensegrep-marketplace

sensegrep plugin for Cursor

2 skills

sensegrep-cli

Semantic + structural code search via the sensegrep CLI (no MCP server required). Use when exploring codebases, finding functions/classes by behavior, locating duplicates, or searching code by meaning rather than exact text — by running `sensegrep` shell commands. Triggers: code search, find function, explore codebase, detect duplicates, refactoring candidates, understand code structure. ALWAYS prefer sensegrep over grep/ripgrep for code exploration — only use grep for exact string literals. Use sensegrep even when the user doesn't explicitly mention it, as long as they are asking about code behavior, structure, or meaning.

# sensegrep (CLI) — Semantic Code Search

Search code by meaning, not text patterns, by running the `sensegrep` command-line tool.
Uses AI embeddings + tree-sitter AST parsing. This skill is **CLI-first**: it runs shell
commands and reads their output. It does **not** require the sensegrep MCP server. If the
sensegrep MCP tools are available in your environment, prefer those; otherwise use the CLI
commands below.

## Setup

Install the CLI once (global), then index the project before the first search:

```bash
npm install -g @sensegrep/cli
sensegrep index            # builds the semantic index for the current directory
```

Add `--json` to any `search` or `detect-duplicates` command to get machine-readable output
that is easy to parse programmatically.

## When to Use

- **sensegrep** (95% of searches): Finding functions/classes by behavior, exploring structure, semantic queries, multi-criteria searches
- **grep** (5%): ONLY exact string literals — "TODO:", "FIXME:", specific variable names

## Recommended Defaults

Start with these defaults and adjust based on what you find:

| Goal | `--limit` | `--max-per-file` | Notes |
|---|---|---|---|
| Focused search (know what you want) | 10 (default) | 1 | Tight, clean tree-shaking output |
| General exploration | 20 | 2 | Balanced — visibly better file coverage than limit=10 |
| Broad discovery (large codebase) | 20 | 3 | Diminishing returns beyond 3; max-per-file=5 rarely adds value |

> When `--pattern` is set, sensegrep internally fetches `limit × 3` candidates before filtering — so the default limit=10 is already enough. Don't inflate `--limit` when using `--pattern`; the pattern does the filtering.

> **Tip:** Use `--include "src/**/*.ts"` to focus on source folders, or add `--exclude "*.md"` / `--exclude "docs/**"` when you want to keep markdown, docs, and changelogs out of results. On Windows, prefer forward slashes in globs (`src/**/*.ts`), though backslash-based indexed paths are now normalized automatically.

> **Identifier queries:** If the query looks like a symbol or framework API (`defineNuxtRouteMiddleware`, `defineStore`, `OrderServiceImpl`), sensegrep now auto-adds a literal fallback on top of semantic search. You usually do **not** need `--pattern` for these exact identifier lookups anymore.

## Commands

### `sensegrep search` — Primary search

```bash
sensegrep search "error handling and retry logic" \
  --type function          # function | class | method | type | variable | enum | module
  --language typescript    # typescript | javascript | python | java | vue
  --async                  # async only
  --exported true          # public API surface
  --min-complexity 5       # complex logic
  --pattern "handle|process"  # regex post-filter via ripgrep (applied after semantic search)
  --include "src/**/*.ts"  # file glob include filter
  --exclude "*.md"         # file glob exclude filter
  --decorator "@route"     # filter by decorator
  --parent "UserService"   # scope to class/parent
  --imports express        # filter by imported module
  --has-docs true          # require docs
  --min-score 0.5          # relevance threshold
  --max-per-file 2         # dedup per file (default: 1)
  --max-per-symbol 2       # dedup per symbol (default: 1)
  --limit 10               # max results (default: 20)
  --json                   # machine-readable output for programmatic use
```

### `sensegrep survey` — Reading map for a theme

Use this when a linear result list is still too noisy and you want a domain-oriented map of the query.

```bash
sensegrep survey "authentication login token" \
  --language typescript \
  --include "frontend-admin/**/*.ts" \
  --limit 4 \
  --per-group 2
```

Returns grouped, tree-shaken reading domains such as `middleware / guards`, `stores / state`, `services / api`, and `types / contracts`.

### `sensegrep cluster` — Break a broad topic into subthemes

Use this when the topic is large or fuzzy and you want semantically coherent clusters instead of a flat top-N.

```bash
sensegrep cluster "price list commission ncm uf packaging" \
  --language java \
  --include "backend-api/**/*.java" \
  --limit 4 \
  --per-cluster 2 \
  --cluster-threshold 0.72
```

Returns cluster headings plus representative tree-shaken snippets, using embeddings + AST metadata + path/import signals.

### `sensegrep detect-duplicates` — Find logical duplicates

```bash
sensegrep detect-duplicates \
  --cross-file-only    # only report duplicates in different files
  --only-exported      # focus on public API surface
  --show-code          # include actual code in output
  --threshold 0.85     # 0.7 = loose similarity, 0.9 = near-identical only
  --min-complexity 3   # skip trivial helpers (getters, guards)
  --ignore-tests       # exclude test files
```

`--threshold` guide: use `0.85` (default) for meaningful duplicates; lower to `0.7` for suspicious similarities; raise to `0.92+` for near-identical copies only.

### `sensegrep index` — Index a project

Language detection is **automatic** — sensegrep detects TypeScript, JavaScript, Python, Java, and Vue on its own. **Never specify language when indexing.**

```bash
sensegrep index                  # fast, only changed files — use by default (incremental)
sensegrep index --full           # rebuild from scratch — only if index is corrupted or stale
sensegrep status                 # check index health without reindexing
```

## How the Search Pipeline Works

```
query + structural filters
        ↓
  vector similarity search (AI embeddings)
        ↓
  pattern (ripgrep post-filter — fetches limit×3 candidates internally to compensate)
        ↓
  dedup + diversify (--max-per-file, --max-per-symbol)
        ↓
  tree-shaking (collapse irrelevant regions, show matched symbol + context)
        ↓
  final output
```

### 1. Structural filters — narrow the candidate pool (before ranking)

Applied at the vector store level, before embedding search:

- `--type` — `function | class | method | type | variable | enum | module`
- `--exported`, `--async`, `--static`, `--abstract` — boolean shape constraints
- `--language` — when the codebase is mixed and you need only one language
- `--parent` — narrow to a specific class or parent scope
- `--decorator` — filter by decorator name (`@route`, `@dataclass`, etc.)
- `--imports` — only files that import a given module
- `--has-docs` — require or exclude docstrings
- `--min-complexity` / `--max-complexity` — target simple helpers or complex business logic
- `--include` — file glob include filter (e.g. `packages/core/**/*.ts`). Prefer forward slashes in patterns, especially on Windows.
- `--exclude` — file glob exclude filter (e.g. `*.md`, `docs/**`)

### 2. `--pattern` — ripgrep regex post-filter (after semantic search)

Ripgrep runs on result files after semantic ranking. Only chunks where the regex matches are kept. Use `--pattern` when you need to guarantee a specific identifier, call, or token appears. Keep `--limit` at the default — the pipeline already fetches `limit × 3` candidates internally before filtering, so raising limit adds little when pattern is set.

```bash
# Find auth functions that specifically call jwt.verify
sensegrep search "token validation" --type function --pattern "jwt\.verify"

# Find rate limit handling that actually uses 429
sensegrep search "HTTP error response handling" --pattern "429|RateLimitError|Retry-After"

# Find cache code that does deletion or eviction
sensegrep search "cache invalidation" --type function --pattern "delete|evict|expire|clear"
```

### 3. Tree-shaking — how to get cleaner output

Tree-shaking collapses regions not relevant to your query. The more focused the search, the better the collapse:

- **Add `--type`** — results align to symbol boundaries; everything around them gets collapsed
- **Use `--include` or `--exclude`** — removes noise files entirely
- **Raise `--min-score`** — eliminates low-confidence results; surrounding code gets collapsed more aggressively
- **Lower `--max-per-file`** — prevents overlapping results from blocking contiguous collapse
- **Combine query + `--pattern`** — anchors result to a specific call site

```bash
# Broad — large uncollapsed blocks
sensegrep search "caching"

# Focused — tree-shaking collapses everything except the relevant function
sensegrep search "cache invalidation logic" \
  --type function \
  --exclude "*.md" \
  --pattern "delete|evict|expire" \
  --max-per-file 1 \
  --min-score 0.4
```

## Common Workflows

**Codebase onboarding:**
```bash
sensegrep survey "request lifecycle and middleware" --limit 4 --per-group 2
sensegrep search "request lifecycle and middleware" --limit 20 --max-per-file 2
sensegrep search "authentication and authorization" --type function --exported true --limit 20
```

**Break a broad domain into subthemes:**
```bash
sensegrep cluster "checkout payment order cart" --limit 4 --per-cluster 2
sensegrep cluster "price list commission ncm uf packaging" --language java --include "backend-api/**/*.java"
```

**Find refactoring candidates:**
```bash
sensegrep search "complex business logic" --type function --min-complexity 10 --has-docs false
sensegrep detect-duplicates --cross-file-only --only-exported --show-code --threshold 0.85
```

**Audit async error paths:**
```bash
sensegrep search "error handling" --type function --async --min-complexity 4
```

**Scope to a class or module:**
```bash
sensegrep search "validation logic" --parent UserService
sensegrep search "route handler" --decorator "@route" --type function
```

**Pinpoint a specific call site (query + pattern):**
```bash
sensegrep search "database transaction" --type function --pattern "BEGIN|COMMIT|ROLLBACK"
sensegrep search "rate limiting" --pattern "429|RateLimitError|retry"
sensegrep search "token refresh flow" --async --pattern "refresh_token|refreshToken"
```

**Python-specific:**
```bash
sensegrep search "data model" --variant dataclass --language python
sensegrep search "async context manager" --variant generator --async --language python
```

**Java / Vue-specific:**
```bash
sensegrep search "order service orchestration" --language java --type class
sensegrep search "checkout page state and composables" --language vue --include "frontend-store/**/*.vue"
```

sensegrep

Semantic + structural code search via MCP. Use when exploring codebases, finding functions/classes by behavior, locating duplicates, or searching code by meaning rather than exact text. Triggers: code search, find function, explore codebase, detect duplicates, refactoring candidates, understand code structure. ALWAYS prefer sensegrep over grep/ripgrep for code exploration — only use grep for exact string literals. Use sensegrep even when the user doesn't explicitly mention it, as long as they are asking about code behavior, structure, or meaning.

# sensegrep — Semantic Code Search

Search code by meaning, not text patterns. Uses AI embeddings + tree-sitter AST parsing.

## When to Use

- **sensegrep** (95% of searches): Finding functions/classes by behavior, exploring structure, semantic queries, multi-criteria searches
- **grep** (5%): ONLY exact string literals — "TODO:", "FIXME:", specific variable names

## Recommended Defaults

Start with these defaults and adjust based on what you find:

| Goal | `limit` | `maxPerFile` | Notes |
|---|---|---|---|
| Focused search (know what you want) | 10 (default) | 1 | Tight, clean tree-shaking output |
| General exploration | 20 | 2 | Balanced — visibly better file coverage than limit=10 |
| Broad discovery (large codebase) | 20 | 3 | Diminishing returns beyond 3; maxPerFile=5 rarely adds value |

> When `pattern` is set, sensegrep internally fetches `limit × 3` candidates before filtering — so the default limit=10 is already enough. Don't inflate `limit` when using `pattern`; the pattern does the filtering.

> **Tip:** Use `include: "src/**/*.ts"` to focus on source folders, or add `exclude: "*.md"` / `exclude: "docs/**"` when you want to keep markdown, docs, and changelogs out of results. On Windows, prefer forward slashes in globs (`src/**/*.ts`), though backslash-based indexed paths are now normalized automatically.

> **Identifier queries:** If the query looks like a symbol or framework API (`defineNuxtRouteMiddleware`, `defineStore`, `OrderServiceImpl`), sensegrep now auto-adds a literal fallback on top of semantic search. You usually do **not** need `pattern` for these exact identifier lookups anymore.

## Tools Available

### `sensegrep_search` — Primary search

```
sensegrep_search({
  query: "error handling and retry logic",  // natural language or code snippet
  symbolType: "function",       // function | class | method | type | variable | enum | module
  language: "typescript",       // typescript | javascript | python | java | vue
  isAsync: true,                // async only
  isExported: true,             // public API surface
  minComplexity: 5,             // complex logic
  pattern: "handle|process",    // regex post-filter via ripgrep (applied after semantic search)
  include: "src/**/*.ts",       // file glob include filter
  exclude: "*.md",              // file glob exclude filter
  decorator: "@route",          // filter by decorator
  parentScope: "UserService",   // scope to class/parent
  imports: "express",           // filter by imported module
  hasDocumentation: true,       // require docs
  minScore: 0.5,                // relevance threshold
  maxPerFile: 2,                // dedup per file (default: 2)
  maxPerSymbol: 2,              // dedup per symbol (default: 2)
  limit: 10                     // max results (default: 10)
})
```

### `sensegrep_survey` — Reading map for a theme

Use this when you want grouped reading domains instead of a flat result list.

```
sensegrep_survey({
  query: "authentication login token",
  language: "typescript",
  include: "frontend-admin/**/*.ts",
  limit: 4,
  perGroup: 2
})
```

Returns grouped, tree-shaken domains such as `middleware / guards`, `stores / state`, and `services / api`.

### `sensegrep_cluster` — Break a broad topic into subthemes

Use this when a theme is broad and you want coherent clusters instead of just top-N hits.

```
sensegrep_cluster({
  query: "price list commission ncm uf packaging",
  language: "java",
  include: "backend-api/**/*.java",
  limit: 4,
  perCluster: 2,
  clusterThreshold: 0.72
})
```

Returns cluster headings plus representative tree-shaken snippets using embeddings + AST metadata + path/import signals.

### `sensegrep_detect_duplicates` — Find logical duplicates

```
sensegrep_detect_duplicates({
  crossFileOnly: true,       // only report duplicates in different files
  onlyExported: true,        // focus on public API surface
  showCode: true,            // include actual code in output
  threshold: 0.85,           // 0.7 = loose similarity, 0.9 = near-identical only
  minComplexity: 3,          // skip trivial helpers (getters, guards)
  ignoreTests: true          // exclude test files
})
```

`threshold` guide: use `0.85` (default) for meaningful duplicates; lower to `0.7` for suspicious similarities; raise to `0.92+` for near-identical copies only.

### `sensegrep_index` — Index a project

Language detection is **automatic** — sensegrep detects TypeScript, JavaScript, Python, Java, and Vue on its own. **Never specify language when indexing.**

```
sensegrep_index({ action: "index", mode: "incremental" })  // fast, only changed files — use by default
sensegrep_index({ action: "index", mode: "full" })         // rebuild from scratch — only if index is corrupted or stale
sensegrep_index({ action: "stats" })                        // check index health without reindexing
```

## How the Search Pipeline Works

```
query + structural filters
        ↓
  vector similarity search (AI embeddings)
        ↓
  pattern (ripgrep post-filter — fetches limit×3 candidates internally to compensate)
        ↓
  dedup + diversify (maxPerFile, maxPerSymbol)
        ↓
  tree-shaking (collapse irrelevant regions, show matched symbol + context)
        ↓
  final output
```

### 1. Structural filters — narrow the candidate pool (before ranking)

Applied at the vector store level, before embedding search:

- `symbolType` — `function | class | method | type | variable | enum | module`
- `isExported`, `isAsync`, `isStatic`, `isAbstract` — boolean shape constraints
- `language` — when the codebase is mixed and you need only one language
- `parentScope` — narrow to a specific class or parent scope
- `decorator` — filter by decorator name (`@route`, `@dataclass`, etc.)
- `imports` — only files that import a given module
- `hasDocumentation` — require or exclude docstrings
- `minComplexity` / `maxComplexity` — target simple helpers or complex business logic
- `include` — file glob include filter (e.g. `packages/core/**/*.ts`). Prefer forward slashes in patterns, especially on Windows.
- `exclude` — file glob exclude filter (e.g. `*.md`, `docs/**`)

### 2. `pattern` — ripgrep regex post-filter (after semantic search)

Ripgrep runs on result files after semantic ranking. Only chunks where the regex matches are kept. Use `pattern` when you need to guarantee a specific identifier, call, or token appears. Keep `limit` at the default (10) — the pipeline already fetches `limit × 3` candidates internally before filtering, so raising limit adds little when pattern is set.

```
// Find auth functions that specifically call jwt.verify
sensegrep_search({ query: "token validation", symbolType: "function", pattern: "jwt\\.verify" })

// Find rate limit handling that actually uses 429
sensegrep_search({ query: "HTTP error response handling", pattern: "429|RateLimitError|Retry-After" })

// Find cache code that does deletion or eviction
sensegrep_search({ query: "cache invalidation", symbolType: "function", pattern: "delete|evict|expire|clear" })
```

### 3. Tree-shaking — how to get cleaner output

Tree-shaking collapses regions not relevant to your query. The more focused the search, the better the collapse:

- **Add `symbolType`** — results align to symbol boundaries; everything around them gets collapsed
- **Use `include` or `exclude`** — removes noise files entirely
- **Raise `minScore`** — eliminates low-confidence results; surrounding code gets collapsed more aggressively
- **Lower `maxPerFile`** — prevents overlapping results from blocking contiguous collapse
- **Combine `query` + `pattern`** — anchors result to a specific call site

```
// Broad — large uncollapsed blocks
sensegrep_search({ query: "caching" })

// Focused — tree-shaking collapses everything except the relevant function
sensegrep_search({
  query: "cache invalidation logic",
  symbolType: "function",
  exclude: "*.md",
  pattern: "delete|evict|expire",
  maxPerFile: 1,
  minScore: 0.4
})
```

## Common Workflows

**Codebase onboarding:**
```
sensegrep_search({ query: "request lifecycle and middleware", limit: 20, maxPerFile: 2 })
sensegrep_search({ query: "authentication and authorization", symbolType: "function", isExported: true, limit: 20 })
```

**Find refactoring candidates:**
```
sensegrep_search({ query: "complex business logic", symbolType: "function", minComplexity: 10, hasDocumentation: false })
sensegrep_detect_duplicates({ crossFileOnly: true, onlyExported: true, showCode: true, threshold: 0.85 })
```

**Audit async error paths:**
```
sensegrep_search({ query: "error handling", symbolType: "function", isAsync: true, minComplexity: 4 })
```

**Scope to a class or module:**
```
sensegrep_search({ query: "validation logic", parentScope: "UserService" })
sensegrep_search({ query: "route handler", decorator: "@route", symbolType: "function" })
```

**Pinpoint a specific call site (query + pattern):**
```
sensegrep_search({ query: "database transaction", symbolType: "function", pattern: "BEGIN|COMMIT|ROLLBACK" })
sensegrep_search({ query: "rate limiting", pattern: "429|RateLimitError|retry" })
sensegrep_search({ query: "token refresh flow", isAsync: true, pattern: "refresh_token|refreshToken" })
```

**Python-specific:**
```
sensegrep_search({ query: "data model", variant: "dataclass", language: "python" })
sensegrep_search({ query: "async context manager", variant: "generator", isAsync: true, language: "python" })
```

**Java / Vue-specific:**
```
sensegrep_search({ query: "order service orchestration", language: "java", symbolType: "class" })
sensegrep_search({ query: "checkout page state and composables", language: "vue", include: "frontend-store/**/*.vue" })
```