sensegrep plugin for Cursor
Semantic + structural code search via the sensegrep CLI (no MCP server required). Use when exploring codebases, finding functions/classes by behavior, locating duplicates, or searching code by meaning rather than exact text — by running `sensegrep` shell commands. Triggers: code search, find function, explore codebase, detect duplicates, refactoring candidates, understand code structure. ALWAYS prefer sensegrep over grep/ripgrep for code exploration — only use grep for exact string literals. Use sensegrep even when the user doesn't explicitly mention it, as long as they are asking about code behavior, structure, or meaning.
# sensegrep (CLI) — Semantic Code Search
Search code by meaning, not text patterns, by running the `sensegrep` command-line tool.
Uses AI embeddings + tree-sitter AST parsing. This skill is **CLI-first**: it runs shell
commands and reads their output. It does **not** require the sensegrep MCP server. If the
sensegrep MCP tools are available in your environment, prefer those; otherwise use the CLI
commands below.
## Setup
Install the CLI once (global), then index the project before the first search:
```bash
npm install -g @sensegrep/cli
sensegrep index # builds the semantic index for the current directory
```
Add `--json` to any `search` or `detect-duplicates` command to get machine-readable output
that is easy to parse programmatically.
## When to Use
- **sensegrep** (95% of searches): Finding functions/classes by behavior, exploring structure, semantic queries, multi-criteria searches
- **grep** (5%): ONLY exact string literals — "TODO:", "FIXME:", specific variable names
## Recommended Defaults
Start with these defaults and adjust based on what you find:
| Goal | `--limit` | `--max-per-file` | Notes |
|---|---|---|---|
| Focused search (know what you want) | 10 (default) | 1 | Tight, clean tree-shaking output |
| General exploration | 20 | 2 | Balanced — visibly better file coverage than limit=10 |
| Broad discovery (large codebase) | 20 | 3 | Diminishing returns beyond 3; max-per-file=5 rarely adds value |
> When `--pattern` is set, sensegrep internally fetches `limit × 3` candidates before filtering — so the default limit=10 is already enough. Don't inflate `--limit` when using `--pattern`; the pattern does the filtering.
> **Tip:** Use `--include "src/**/*.ts"` to focus on source folders, or add `--exclude "*.md"` / `--exclude "docs/**"` when you want to keep markdown, docs, and changelogs out of results. On Windows, prefer forward slashes in globs (`src/**/*.ts`), though backslash-based indexed paths are now normalized automatically.
> **Identifier queries:** If the query looks like a symbol or framework API (`defineNuxtRouteMiddleware`, `defineStore`, `OrderServiceImpl`), sensegrep now auto-adds a literal fallback on top of semantic search. You usually do **not** need `--pattern` for these exact identifier lookups anymore.
## Commands
### `sensegrep search` — Primary search
```bash
sensegrep search "error handling and retry logic" \
--type function # function | class | method | type | variable | enum | module
--language typescript # typescript | javascript | python | java | vue
--async # async only
--exported true # public API surface
--min-complexity 5 # complex logic
--pattern "handle|process" # regex post-filter via ripgrep (applied after semantic search)
--include "src/**/*.ts" # file glob include filter
--exclude "*.md" # file glob exclude filter
--decorator "@route" # filter by decorator
--parent "UserService" # scope to class/parent
--imports express # filter by imported module
--has-docs true # require docs
--min-score 0.5 # relevance threshold
--max-per-file 2 # dedup per file (default: 1)
--max-per-symbol 2 # dedup per symbol (default: 1)
--limit 10 # max results (default: 20)
--json # machine-readable output for programmatic use
```
### `sensegrep survey` — Reading map for a theme
Use this when a linear result list is still too noisy and you want a domain-oriented map of the query.
```bash
sensegrep survey "authentication login token" \
--language typescript \
--include "frontend-admin/**/*.ts" \
--limit 4 \
--per-group 2
```
Returns grouped, tree-shaken reading domains such as `middleware / guards`, `stores / state`, `services / api`, and `types / contracts`.
### `sensegrep cluster` — Break a broad topic into subthemes
Use this when the topic is large or fuzzy and you want semantically coherent clusters instead of a flat top-N.
```bash
sensegrep cluster "price list commission ncm uf packaging" \
--language java \
--include "backend-api/**/*.java" \
--limit 4 \
--per-cluster 2 \
--cluster-threshold 0.72
```
Returns cluster headings plus representative tree-shaken snippets, using embeddings + AST metadata + path/import signals.
### `sensegrep detect-duplicates` — Find logical duplicates
```bash
sensegrep detect-duplicates \
--cross-file-only # only report duplicates in different files
--only-exported # focus on public API surface
--show-code # include actual code in output
--threshold 0.85 # 0.7 = loose similarity, 0.9 = near-identical only
--min-complexity 3 # skip trivial helpers (getters, guards)
--ignore-tests # exclude test files
```
`--threshold` guide: use `0.85` (default) for meaningful duplicates; lower to `0.7` for suspicious similarities; raise to `0.92+` for near-identical copies only.
### `sensegrep index` — Index a project
Language detection is **automatic** — sensegrep detects TypeScript, JavaScript, Python, Java, and Vue on its own. **Never specify language when indexing.**
```bash
sensegrep index # fast, only changed files — use by default (incremental)
sensegrep index --full # rebuild from scratch — only if index is corrupted or stale
sensegrep status # check index health without reindexing
```
## How the Search Pipeline Works
```
query + structural filters
↓
vector similarity search (AI embeddings)
↓
pattern (ripgrep post-filter — fetches limit×3 candidates internally to compensate)
↓
dedup + diversify (--max-per-file, --max-per-symbol)
↓
tree-shaking (collapse irrelevant regions, show matched symbol + context)
↓
final output
```
### 1. Structural filters — narrow the candidate pool (before ranking)
Applied at the vector store level, before embedding search:
- `--type` — `function | class | method | type | variable | enum | module`
- `--exported`, `--async`, `--static`, `--abstract` — boolean shape constraints
- `--language` — when the codebase is mixed and you need only one language
- `--parent` — narrow to a specific class or parent scope
- `--decorator` — filter by decorator name (`@route`, `@dataclass`, etc.)
- `--imports` — only files that import a given module
- `--has-docs` — require or exclude docstrings
- `--min-complexity` / `--max-complexity` — target simple helpers or complex business logic
- `--include` — file glob include filter (e.g. `packages/core/**/*.ts`). Prefer forward slashes in patterns, especially on Windows.
- `--exclude` — file glob exclude filter (e.g. `*.md`, `docs/**`)
### 2. `--pattern` — ripgrep regex post-filter (after semantic search)
Ripgrep runs on result files after semantic ranking. Only chunks where the regex matches are kept. Use `--pattern` when you need to guarantee a specific identifier, call, or token appears. Keep `--limit` at the default — the pipeline already fetches `limit × 3` candidates internally before filtering, so raising limit adds little when pattern is set.
```bash
# Find auth functions that specifically call jwt.verify
sensegrep search "token validation" --type function --pattern "jwt\.verify"
# Find rate limit handling that actually uses 429
sensegrep search "HTTP error response handling" --pattern "429|RateLimitError|Retry-After"
# Find cache code that does deletion or eviction
sensegrep search "cache invalidation" --type function --pattern "delete|evict|expire|clear"
```
### 3. Tree-shaking — how to get cleaner output
Tree-shaking collapses regions not relevant to your query. The more focused the search, the better the collapse:
- **Add `--type`** — results align to symbol boundaries; everything around them gets collapsed
- **Use `--include` or `--exclude`** — removes noise files entirely
- **Raise `--min-score`** — eliminates low-confidence results; surrounding code gets collapsed more aggressively
- **Lower `--max-per-file`** — prevents overlapping results from blocking contiguous collapse
- **Combine query + `--pattern`** — anchors result to a specific call site
```bash
# Broad — large uncollapsed blocks
sensegrep search "caching"
# Focused — tree-shaking collapses everything except the relevant function
sensegrep search "cache invalidation logic" \
--type function \
--exclude "*.md" \
--pattern "delete|evict|expire" \
--max-per-file 1 \
--min-score 0.4
```
## Common Workflows
**Codebase onboarding:**
```bash
sensegrep survey "request lifecycle and middleware" --limit 4 --per-group 2
sensegrep search "request lifecycle and middleware" --limit 20 --max-per-file 2
sensegrep search "authentication and authorization" --type function --exported true --limit 20
```
**Break a broad domain into subthemes:**
```bash
sensegrep cluster "checkout payment order cart" --limit 4 --per-cluster 2
sensegrep cluster "price list commission ncm uf packaging" --language java --include "backend-api/**/*.java"
```
**Find refactoring candidates:**
```bash
sensegrep search "complex business logic" --type function --min-complexity 10 --has-docs false
sensegrep detect-duplicates --cross-file-only --only-exported --show-code --threshold 0.85
```
**Audit async error paths:**
```bash
sensegrep search "error handling" --type function --async --min-complexity 4
```
**Scope to a class or module:**
```bash
sensegrep search "validation logic" --parent UserService
sensegrep search "route handler" --decorator "@route" --type function
```
**Pinpoint a specific call site (query + pattern):**
```bash
sensegrep search "database transaction" --type function --pattern "BEGIN|COMMIT|ROLLBACK"
sensegrep search "rate limiting" --pattern "429|RateLimitError|retry"
sensegrep search "token refresh flow" --async --pattern "refresh_token|refreshToken"
```
**Python-specific:**
```bash
sensegrep search "data model" --variant dataclass --language python
sensegrep search "async context manager" --variant generator --async --language python
```
**Java / Vue-specific:**
```bash
sensegrep search "order service orchestration" --language java --type class
sensegrep search "checkout page state and composables" --language vue --include "frontend-store/**/*.vue"
```Semantic + structural code search via MCP. Use when exploring codebases, finding functions/classes by behavior, locating duplicates, or searching code by meaning rather than exact text. Triggers: code search, find function, explore codebase, detect duplicates, refactoring candidates, understand code structure. ALWAYS prefer sensegrep over grep/ripgrep for code exploration — only use grep for exact string literals. Use sensegrep even when the user doesn't explicitly mention it, as long as they are asking about code behavior, structure, or meaning.
# sensegrep — Semantic Code Search
Search code by meaning, not text patterns. Uses AI embeddings + tree-sitter AST parsing.
## When to Use
- **sensegrep** (95% of searches): Finding functions/classes by behavior, exploring structure, semantic queries, multi-criteria searches
- **grep** (5%): ONLY exact string literals — "TODO:", "FIXME:", specific variable names
## Recommended Defaults
Start with these defaults and adjust based on what you find:
| Goal | `limit` | `maxPerFile` | Notes |
|---|---|---|---|
| Focused search (know what you want) | 10 (default) | 1 | Tight, clean tree-shaking output |
| General exploration | 20 | 2 | Balanced — visibly better file coverage than limit=10 |
| Broad discovery (large codebase) | 20 | 3 | Diminishing returns beyond 3; maxPerFile=5 rarely adds value |
> When `pattern` is set, sensegrep internally fetches `limit × 3` candidates before filtering — so the default limit=10 is already enough. Don't inflate `limit` when using `pattern`; the pattern does the filtering.
> **Tip:** Use `include: "src/**/*.ts"` to focus on source folders, or add `exclude: "*.md"` / `exclude: "docs/**"` when you want to keep markdown, docs, and changelogs out of results. On Windows, prefer forward slashes in globs (`src/**/*.ts`), though backslash-based indexed paths are now normalized automatically.
> **Identifier queries:** If the query looks like a symbol or framework API (`defineNuxtRouteMiddleware`, `defineStore`, `OrderServiceImpl`), sensegrep now auto-adds a literal fallback on top of semantic search. You usually do **not** need `pattern` for these exact identifier lookups anymore.
## Tools Available
### `sensegrep_search` — Primary search
```
sensegrep_search({
query: "error handling and retry logic", // natural language or code snippet
symbolType: "function", // function | class | method | type | variable | enum | module
language: "typescript", // typescript | javascript | python | java | vue
isAsync: true, // async only
isExported: true, // public API surface
minComplexity: 5, // complex logic
pattern: "handle|process", // regex post-filter via ripgrep (applied after semantic search)
include: "src/**/*.ts", // file glob include filter
exclude: "*.md", // file glob exclude filter
decorator: "@route", // filter by decorator
parentScope: "UserService", // scope to class/parent
imports: "express", // filter by imported module
hasDocumentation: true, // require docs
minScore: 0.5, // relevance threshold
maxPerFile: 2, // dedup per file (default: 2)
maxPerSymbol: 2, // dedup per symbol (default: 2)
limit: 10 // max results (default: 10)
})
```
### `sensegrep_survey` — Reading map for a theme
Use this when you want grouped reading domains instead of a flat result list.
```
sensegrep_survey({
query: "authentication login token",
language: "typescript",
include: "frontend-admin/**/*.ts",
limit: 4,
perGroup: 2
})
```
Returns grouped, tree-shaken domains such as `middleware / guards`, `stores / state`, and `services / api`.
### `sensegrep_cluster` — Break a broad topic into subthemes
Use this when a theme is broad and you want coherent clusters instead of just top-N hits.
```
sensegrep_cluster({
query: "price list commission ncm uf packaging",
language: "java",
include: "backend-api/**/*.java",
limit: 4,
perCluster: 2,
clusterThreshold: 0.72
})
```
Returns cluster headings plus representative tree-shaken snippets using embeddings + AST metadata + path/import signals.
### `sensegrep_detect_duplicates` — Find logical duplicates
```
sensegrep_detect_duplicates({
crossFileOnly: true, // only report duplicates in different files
onlyExported: true, // focus on public API surface
showCode: true, // include actual code in output
threshold: 0.85, // 0.7 = loose similarity, 0.9 = near-identical only
minComplexity: 3, // skip trivial helpers (getters, guards)
ignoreTests: true // exclude test files
})
```
`threshold` guide: use `0.85` (default) for meaningful duplicates; lower to `0.7` for suspicious similarities; raise to `0.92+` for near-identical copies only.
### `sensegrep_index` — Index a project
Language detection is **automatic** — sensegrep detects TypeScript, JavaScript, Python, Java, and Vue on its own. **Never specify language when indexing.**
```
sensegrep_index({ action: "index", mode: "incremental" }) // fast, only changed files — use by default
sensegrep_index({ action: "index", mode: "full" }) // rebuild from scratch — only if index is corrupted or stale
sensegrep_index({ action: "stats" }) // check index health without reindexing
```
## How the Search Pipeline Works
```
query + structural filters
↓
vector similarity search (AI embeddings)
↓
pattern (ripgrep post-filter — fetches limit×3 candidates internally to compensate)
↓
dedup + diversify (maxPerFile, maxPerSymbol)
↓
tree-shaking (collapse irrelevant regions, show matched symbol + context)
↓
final output
```
### 1. Structural filters — narrow the candidate pool (before ranking)
Applied at the vector store level, before embedding search:
- `symbolType` — `function | class | method | type | variable | enum | module`
- `isExported`, `isAsync`, `isStatic`, `isAbstract` — boolean shape constraints
- `language` — when the codebase is mixed and you need only one language
- `parentScope` — narrow to a specific class or parent scope
- `decorator` — filter by decorator name (`@route`, `@dataclass`, etc.)
- `imports` — only files that import a given module
- `hasDocumentation` — require or exclude docstrings
- `minComplexity` / `maxComplexity` — target simple helpers or complex business logic
- `include` — file glob include filter (e.g. `packages/core/**/*.ts`). Prefer forward slashes in patterns, especially on Windows.
- `exclude` — file glob exclude filter (e.g. `*.md`, `docs/**`)
### 2. `pattern` — ripgrep regex post-filter (after semantic search)
Ripgrep runs on result files after semantic ranking. Only chunks where the regex matches are kept. Use `pattern` when you need to guarantee a specific identifier, call, or token appears. Keep `limit` at the default (10) — the pipeline already fetches `limit × 3` candidates internally before filtering, so raising limit adds little when pattern is set.
```
// Find auth functions that specifically call jwt.verify
sensegrep_search({ query: "token validation", symbolType: "function", pattern: "jwt\\.verify" })
// Find rate limit handling that actually uses 429
sensegrep_search({ query: "HTTP error response handling", pattern: "429|RateLimitError|Retry-After" })
// Find cache code that does deletion or eviction
sensegrep_search({ query: "cache invalidation", symbolType: "function", pattern: "delete|evict|expire|clear" })
```
### 3. Tree-shaking — how to get cleaner output
Tree-shaking collapses regions not relevant to your query. The more focused the search, the better the collapse:
- **Add `symbolType`** — results align to symbol boundaries; everything around them gets collapsed
- **Use `include` or `exclude`** — removes noise files entirely
- **Raise `minScore`** — eliminates low-confidence results; surrounding code gets collapsed more aggressively
- **Lower `maxPerFile`** — prevents overlapping results from blocking contiguous collapse
- **Combine `query` + `pattern`** — anchors result to a specific call site
```
// Broad — large uncollapsed blocks
sensegrep_search({ query: "caching" })
// Focused — tree-shaking collapses everything except the relevant function
sensegrep_search({
query: "cache invalidation logic",
symbolType: "function",
exclude: "*.md",
pattern: "delete|evict|expire",
maxPerFile: 1,
minScore: 0.4
})
```
## Common Workflows
**Codebase onboarding:**
```
sensegrep_search({ query: "request lifecycle and middleware", limit: 20, maxPerFile: 2 })
sensegrep_search({ query: "authentication and authorization", symbolType: "function", isExported: true, limit: 20 })
```
**Find refactoring candidates:**
```
sensegrep_search({ query: "complex business logic", symbolType: "function", minComplexity: 10, hasDocumentation: false })
sensegrep_detect_duplicates({ crossFileOnly: true, onlyExported: true, showCode: true, threshold: 0.85 })
```
**Audit async error paths:**
```
sensegrep_search({ query: "error handling", symbolType: "function", isAsync: true, minComplexity: 4 })
```
**Scope to a class or module:**
```
sensegrep_search({ query: "validation logic", parentScope: "UserService" })
sensegrep_search({ query: "route handler", decorator: "@route", symbolType: "function" })
```
**Pinpoint a specific call site (query + pattern):**
```
sensegrep_search({ query: "database transaction", symbolType: "function", pattern: "BEGIN|COMMIT|ROLLBACK" })
sensegrep_search({ query: "rate limiting", pattern: "429|RateLimitError|retry" })
sensegrep_search({ query: "token refresh flow", isAsync: true, pattern: "refresh_token|refreshToken" })
```
**Python-specific:**
```
sensegrep_search({ query: "data model", variant: "dataclass", language: "python" })
sensegrep_search({ query: "async context manager", variant: "generator", isAsync: true, language: "python" })
```
**Java / Vue-specific:**
```
sensegrep_search({ query: "order service orchestration", language: "java", symbolType: "class" })
sensegrep_search({ query: "checkout page state and composables", language: "vue", include: "frontend-store/**/*.vue" })
```