sensegrep-marketplace

0

sensegrep plugin for Cursor

2 skills

sensegrep-cli

Semantic + structural code search via the sensegrep CLI (no MCP server required). Use when exploring codebases, finding functions/classes by behavior, locating duplicates, or searching code by meaning rather than exact text — by running `sensegrep` shell commands. Triggers: code search, find function, explore codebase, detect duplicates, refactoring candidates, understand code structure. ALWAYS prefer sensegrep over grep/ripgrep for code exploration — only use grep for exact string literals. Use sensegrep even when the user doesn't explicitly mention it, as long as they are asking about code behavior, structure, or meaning.

# sensegrep (CLI) — Semantic Code Search Search code by meaning, not text patterns, by running the `sensegrep` command-line tool. Uses AI embeddings + tree-sitter AST parsing. This skill is **CLI-first**: it runs shell commands and reads their output. It does **not** require the sensegrep MCP server. If the sensegrep MCP tools are available in your environment, prefer those; otherwise use the CLI commands below. ## Setup Install the CLI once (global), then index the project before the first search: ```bash npm install -g @sensegrep/cli sensegrep index # builds the semantic index for the current directory ``` Add `--json` to any `search` or `detect-duplicates` command to get machine-readable output that is easy to parse programmatically. ## When to Use - **sensegrep** (95% of searches): Finding functions/classes by behavior, exploring structure, semantic queries, multi-criteria searches - **grep** (5%): ONLY exact string literals — "TODO:", "FIXME:", specific variable names ## Recommended Defaults Start with these defaults and adjust based on what you find: | Goal | `--limit` | `--max-per-file` | Notes | |---|---|---|---| | Focused search (know what you want) | 10 (default) | 1 | Tight, clean tree-shaking output | | General exploration | 20 | 2 | Balanced — visibly better file coverage than limit=10 | | Broad discovery (large codebase) | 20 | 3 | Diminishing returns beyond 3; max-per-file=5 rarely adds value | > When `--pattern` is set, sensegrep internally fetches `limit × 3` candidates before filtering — so the default limit=10 is already enough. Don't inflate `--limit` when using `--pattern`; the pattern does the filtering. > **Tip:** Use `--include "src/**/*.ts"` to focus on source folders, or add `--exclude "*.md"` / `--exclude "docs/**"` when you want to keep markdown, docs, and changelogs out of results. On Windows, prefer forward slashes in globs (`src/**/*.ts`), though backslash-based indexed paths are now normalized automatically. > **Identifier queries:** If the query looks like a symbol or framework API (`defineNuxtRouteMiddleware`, `defineStore`, `OrderServiceImpl`), sensegrep now auto-adds a literal fallback on top of semantic search. You usually do **not** need `--pattern` for these exact identifier lookups anymore. ## Commands ### `sensegrep search` — Primary search ```bash sensegrep search "error handling and retry logic" \ --type function # function | class | method | type | variable | enum | module --language typescript # typescript | javascript | python | java | vue --async # async only --exported true # public API surface --min-complexity 5 # complex logic --pattern "handle|process" # regex post-filter via ripgrep (applied after semantic search) --include "src/**/*.ts" # file glob include filter --exclude "*.md" # file glob exclude filter --decorator "@route" # filter by decorator --parent "UserService" # scope to class/parent --imports express # filter by imported module --has-docs true # require docs --min-score 0.5 # relevance threshold --max-per-file 2 # dedup per file (default: 1) --max-per-symbol 2 # dedup per symbol (default: 1) --limit 10 # max results (default: 20) --json # machine-readable output for programmatic use ``` ### `sensegrep survey` — Reading map for a theme Use this when a linear result list is still too noisy and you want a domain-oriented map of the query. ```bash sensegrep survey "authentication login token" \ --language typescript \ --include "frontend-admin/**/*.ts" \ --limit 4 \ --per-group 2 ``` Returns grouped, tree-shaken reading domains such as `middleware / guards`, `stores / state`, `services / api`, and `types / contracts`. ### `sensegrep cluster` — Break a broad topic into subthemes Use this when the topic is large or fuzzy and you want semantically coherent clusters instead of a flat top-N. ```bash sensegrep cluster "price list commission ncm uf packaging" \ --language java \ --include "backend-api/**/*.java" \ --limit 4 \ --per-cluster 2 \ --cluster-threshold 0.72 ``` Returns cluster headings plus representative tree-shaken snippets, using embeddings + AST metadata + path/import signals. ### `sensegrep detect-duplicates` — Find logical duplicates ```bash sensegrep detect-duplicates \ --cross-file-only # only report duplicates in different files --only-exported # focus on public API surface --show-code # include actual code in output --threshold 0.85 # 0.7 = loose similarity, 0.9 = near-identical only --min-complexity 3 # skip trivial helpers (getters, guards) --ignore-tests # exclude test files ``` `--threshold` guide: use `0.85` (default) for meaningful duplicates; lower to `0.7` for suspicious similarities; raise to `0.92+` for near-identical copies only. ### `sensegrep index` — Index a project Language detection is **automatic** — sensegrep detects TypeScript, JavaScript, Python, Java, and Vue on its own. **Never specify language when indexing.** ```bash sensegrep index # fast, only changed files — use by default (incremental) sensegrep index --full # rebuild from scratch — only if index is corrupted or stale sensegrep status # check index health without reindexing ``` ## How the Search Pipeline Works ``` query + structural filters ↓ vector similarity search (AI embeddings) ↓ pattern (ripgrep post-filter — fetches limit×3 candidates internally to compensate) ↓ dedup + diversify (--max-per-file, --max-per-symbol) ↓ tree-shaking (collapse irrelevant regions, show matched symbol + context) ↓ final output ``` ### 1. Structural filters — narrow the candidate pool (before ranking) Applied at the vector store level, before embedding search: - `--type` — `function | class | method | type | variable | enum | module` - `--exported`, `--async`, `--static`, `--abstract` — boolean shape constraints - `--language` — when the codebase is mixed and you need only one language - `--parent` — narrow to a specific class or parent scope - `--decorator` — filter by decorator name (`@route`, `@dataclass`, etc.) - `--imports` — only files that import a given module - `--has-docs` — require or exclude docstrings - `--min-complexity` / `--max-complexity` — target simple helpers or complex business logic - `--include` — file glob include filter (e.g. `packages/core/**/*.ts`). Prefer forward slashes in patterns, especially on Windows. - `--exclude` — file glob exclude filter (e.g. `*.md`, `docs/**`) ### 2. `--pattern` — ripgrep regex post-filter (after semantic search) Ripgrep runs on result files after semantic ranking. Only chunks where the regex matches are kept. Use `--pattern` when you need to guarantee a specific identifier, call, or token appears. Keep `--limit` at the default — the pipeline already fetches `limit × 3` candidates internally before filtering, so raising limit adds little when pattern is set. ```bash # Find auth functions that specifically call jwt.verify sensegrep search "token validation" --type function --pattern "jwt\.verify" # Find rate limit handling that actually uses 429 sensegrep search "HTTP error response handling" --pattern "429|RateLimitError|Retry-After" # Find cache code that does deletion or eviction sensegrep search "cache invalidation" --type function --pattern "delete|evict|expire|clear" ``` ### 3. Tree-shaking — how to get cleaner output Tree-shaking collapses regions not relevant to your query. The more focused the search, the better the collapse: - **Add `--type`** — results align to symbol boundaries; everything around them gets collapsed - **Use `--include` or `--exclude`** — removes noise files entirely - **Raise `--min-score`** — eliminates low-confidence results; surrounding code gets collapsed more aggressively - **Lower `--max-per-file`** — prevents overlapping results from blocking contiguous collapse - **Combine query + `--pattern`** — anchors result to a specific call site ```bash # Broad — large uncollapsed blocks sensegrep search "caching" # Focused — tree-shaking collapses everything except the relevant function sensegrep search "cache invalidation logic" \ --type function \ --exclude "*.md" \ --pattern "delete|evict|expire" \ --max-per-file 1 \ --min-score 0.4 ``` ## Common Workflows **Codebase onboarding:** ```bash sensegrep survey "request lifecycle and middleware" --limit 4 --per-group 2 sensegrep search "request lifecycle and middleware" --limit 20 --max-per-file 2 sensegrep search "authentication and authorization" --type function --exported true --limit 20 ``` **Break a broad domain into subthemes:** ```bash sensegrep cluster "checkout payment order cart" --limit 4 --per-cluster 2 sensegrep cluster "price list commission ncm uf packaging" --language java --include "backend-api/**/*.java" ``` **Find refactoring candidates:** ```bash sensegrep search "complex business logic" --type function --min-complexity 10 --has-docs false sensegrep detect-duplicates --cross-file-only --only-exported --show-code --threshold 0.85 ``` **Audit async error paths:** ```bash sensegrep search "error handling" --type function --async --min-complexity 4 ``` **Scope to a class or module:** ```bash sensegrep search "validation logic" --parent UserService sensegrep search "route handler" --decorator "@route" --type function ``` **Pinpoint a specific call site (query + pattern):** ```bash sensegrep search "database transaction" --type function --pattern "BEGIN|COMMIT|ROLLBACK" sensegrep search "rate limiting" --pattern "429|RateLimitError|retry" sensegrep search "token refresh flow" --async --pattern "refresh_token|refreshToken" ``` **Python-specific:** ```bash sensegrep search "data model" --variant dataclass --language python sensegrep search "async context manager" --variant generator --async --language python ``` **Java / Vue-specific:** ```bash sensegrep search "order service orchestration" --language java --type class sensegrep search "checkout page state and composables" --language vue --include "frontend-store/**/*.vue" ```

sensegrep

Semantic + structural code search via MCP. Use when exploring codebases, finding functions/classes by behavior, locating duplicates, or searching code by meaning rather than exact text. Triggers: code search, find function, explore codebase, detect duplicates, refactoring candidates, understand code structure. ALWAYS prefer sensegrep over grep/ripgrep for code exploration — only use grep for exact string literals. Use sensegrep even when the user doesn't explicitly mention it, as long as they are asking about code behavior, structure, or meaning.

# sensegrep — Semantic Code Search Search code by meaning, not text patterns. Uses AI embeddings + tree-sitter AST parsing. ## When to Use - **sensegrep** (95% of searches): Finding functions/classes by behavior, exploring structure, semantic queries, multi-criteria searches - **grep** (5%): ONLY exact string literals — "TODO:", "FIXME:", specific variable names ## Recommended Defaults Start with these defaults and adjust based on what you find: | Goal | `limit` | `maxPerFile` | Notes | |---|---|---|---| | Focused search (know what you want) | 10 (default) | 1 | Tight, clean tree-shaking output | | General exploration | 20 | 2 | Balanced — visibly better file coverage than limit=10 | | Broad discovery (large codebase) | 20 | 3 | Diminishing returns beyond 3; maxPerFile=5 rarely adds value | > When `pattern` is set, sensegrep internally fetches `limit × 3` candidates before filtering — so the default limit=10 is already enough. Don't inflate `limit` when using `pattern`; the pattern does the filtering. > **Tip:** Use `include: "src/**/*.ts"` to focus on source folders, or add `exclude: "*.md"` / `exclude: "docs/**"` when you want to keep markdown, docs, and changelogs out of results. On Windows, prefer forward slashes in globs (`src/**/*.ts`), though backslash-based indexed paths are now normalized automatically. > **Identifier queries:** If the query looks like a symbol or framework API (`defineNuxtRouteMiddleware`, `defineStore`, `OrderServiceImpl`), sensegrep now auto-adds a literal fallback on top of semantic search. You usually do **not** need `pattern` for these exact identifier lookups anymore. ## Tools Available ### `sensegrep_search` — Primary search ``` sensegrep_search({ query: "error handling and retry logic", // natural language or code snippet symbolType: "function", // function | class | method | type | variable | enum | module language: "typescript", // typescript | javascript | python | java | vue isAsync: true, // async only isExported: true, // public API surface minComplexity: 5, // complex logic pattern: "handle|process", // regex post-filter via ripgrep (applied after semantic search) include: "src/**/*.ts", // file glob include filter exclude: "*.md", // file glob exclude filter decorator: "@route", // filter by decorator parentScope: "UserService", // scope to class/parent imports: "express", // filter by imported module hasDocumentation: true, // require docs minScore: 0.5, // relevance threshold maxPerFile: 2, // dedup per file (default: 2) maxPerSymbol: 2, // dedup per symbol (default: 2) limit: 10 // max results (default: 10) }) ``` ### `sensegrep_survey` — Reading map for a theme Use this when you want grouped reading domains instead of a flat result list. ``` sensegrep_survey({ query: "authentication login token", language: "typescript", include: "frontend-admin/**/*.ts", limit: 4, perGroup: 2 }) ``` Returns grouped, tree-shaken domains such as `middleware / guards`, `stores / state`, and `services / api`. ### `sensegrep_cluster` — Break a broad topic into subthemes Use this when a theme is broad and you want coherent clusters instead of just top-N hits. ``` sensegrep_cluster({ query: "price list commission ncm uf packaging", language: "java", include: "backend-api/**/*.java", limit: 4, perCluster: 2, clusterThreshold: 0.72 }) ``` Returns cluster headings plus representative tree-shaken snippets using embeddings + AST metadata + path/import signals. ### `sensegrep_detect_duplicates` — Find logical duplicates ``` sensegrep_detect_duplicates({ crossFileOnly: true, // only report duplicates in different files onlyExported: true, // focus on public API surface showCode: true, // include actual code in output threshold: 0.85, // 0.7 = loose similarity, 0.9 = near-identical only minComplexity: 3, // skip trivial helpers (getters, guards) ignoreTests: true // exclude test files }) ``` `threshold` guide: use `0.85` (default) for meaningful duplicates; lower to `0.7` for suspicious similarities; raise to `0.92+` for near-identical copies only. ### `sensegrep_index` — Index a project Language detection is **automatic** — sensegrep detects TypeScript, JavaScript, Python, Java, and Vue on its own. **Never specify language when indexing.** ``` sensegrep_index({ action: "index", mode: "incremental" }) // fast, only changed files — use by default sensegrep_index({ action: "index", mode: "full" }) // rebuild from scratch — only if index is corrupted or stale sensegrep_index({ action: "stats" }) // check index health without reindexing ``` ## How the Search Pipeline Works ``` query + structural filters ↓ vector similarity search (AI embeddings) ↓ pattern (ripgrep post-filter — fetches limit×3 candidates internally to compensate) ↓ dedup + diversify (maxPerFile, maxPerSymbol) ↓ tree-shaking (collapse irrelevant regions, show matched symbol + context) ↓ final output ``` ### 1. Structural filters — narrow the candidate pool (before ranking) Applied at the vector store level, before embedding search: - `symbolType` — `function | class | method | type | variable | enum | module` - `isExported`, `isAsync`, `isStatic`, `isAbstract` — boolean shape constraints - `language` — when the codebase is mixed and you need only one language - `parentScope` — narrow to a specific class or parent scope - `decorator` — filter by decorator name (`@route`, `@dataclass`, etc.) - `imports` — only files that import a given module - `hasDocumentation` — require or exclude docstrings - `minComplexity` / `maxComplexity` — target simple helpers or complex business logic - `include` — file glob include filter (e.g. `packages/core/**/*.ts`). Prefer forward slashes in patterns, especially on Windows. - `exclude` — file glob exclude filter (e.g. `*.md`, `docs/**`) ### 2. `pattern` — ripgrep regex post-filter (after semantic search) Ripgrep runs on result files after semantic ranking. Only chunks where the regex matches are kept. Use `pattern` when you need to guarantee a specific identifier, call, or token appears. Keep `limit` at the default (10) — the pipeline already fetches `limit × 3` candidates internally before filtering, so raising limit adds little when pattern is set. ``` // Find auth functions that specifically call jwt.verify sensegrep_search({ query: "token validation", symbolType: "function", pattern: "jwt\\.verify" }) // Find rate limit handling that actually uses 429 sensegrep_search({ query: "HTTP error response handling", pattern: "429|RateLimitError|Retry-After" }) // Find cache code that does deletion or eviction sensegrep_search({ query: "cache invalidation", symbolType: "function", pattern: "delete|evict|expire|clear" }) ``` ### 3. Tree-shaking — how to get cleaner output Tree-shaking collapses regions not relevant to your query. The more focused the search, the better the collapse: - **Add `symbolType`** — results align to symbol boundaries; everything around them gets collapsed - **Use `include` or `exclude`** — removes noise files entirely - **Raise `minScore`** — eliminates low-confidence results; surrounding code gets collapsed more aggressively - **Lower `maxPerFile`** — prevents overlapping results from blocking contiguous collapse - **Combine `query` + `pattern`** — anchors result to a specific call site ``` // Broad — large uncollapsed blocks sensegrep_search({ query: "caching" }) // Focused — tree-shaking collapses everything except the relevant function sensegrep_search({ query: "cache invalidation logic", symbolType: "function", exclude: "*.md", pattern: "delete|evict|expire", maxPerFile: 1, minScore: 0.4 }) ``` ## Common Workflows **Codebase onboarding:** ``` sensegrep_search({ query: "request lifecycle and middleware", limit: 20, maxPerFile: 2 }) sensegrep_search({ query: "authentication and authorization", symbolType: "function", isExported: true, limit: 20 }) ``` **Find refactoring candidates:** ``` sensegrep_search({ query: "complex business logic", symbolType: "function", minComplexity: 10, hasDocumentation: false }) sensegrep_detect_duplicates({ crossFileOnly: true, onlyExported: true, showCode: true, threshold: 0.85 }) ``` **Audit async error paths:** ``` sensegrep_search({ query: "error handling", symbolType: "function", isAsync: true, minComplexity: 4 }) ``` **Scope to a class or module:** ``` sensegrep_search({ query: "validation logic", parentScope: "UserService" }) sensegrep_search({ query: "route handler", decorator: "@route", symbolType: "function" }) ``` **Pinpoint a specific call site (query + pattern):** ``` sensegrep_search({ query: "database transaction", symbolType: "function", pattern: "BEGIN|COMMIT|ROLLBACK" }) sensegrep_search({ query: "rate limiting", pattern: "429|RateLimitError|retry" }) sensegrep_search({ query: "token refresh flow", isAsync: true, pattern: "refresh_token|refreshToken" }) ``` **Python-specific:** ``` sensegrep_search({ query: "data model", variant: "dataclass", language: "python" }) sensegrep_search({ query: "async context manager", variant: "generator", isAsync: true, language: "python" }) ``` **Java / Vue-specific:** ``` sensegrep_search({ query: "order service orchestration", language: "java", symbolType: "class" }) sensegrep_search({ query: "checkout page state and composables", language: "vue", include: "frontend-store/**/*.vue" }) ```