mixedbread-skills

0

Agent skills for search, RAG, and document parsing with Mixedbread.

3 skills

mixedbread-parsing

Parse documents, extract structured content, and run OCR using the Parsing API. Supports PDFs, Word documents, PowerPoint presentations, and images.

# Mixedbread Parsing Parse documents, extract structured content, and run OCR using the Parsing API. Supports PDFs, Word documents, PowerPoint presentations, and images. Docs: https://www.mixedbread.com/docs/parsing/overview.md Agent-readable docs: https://www.mixedbread.com/docs/llms.txt Latest docs search: https://www.mixedbread.com/question?q=parsing&section=docs ## Setup ```bash pip install mixedbread # Python npm install @mixedbread/sdk # TypeScript ``` ```bash export MXBAI_API_KEY=your_api_key ``` ## Quick Start **Python:** ```python from mixedbread import Mixedbread mxbai = Mixedbread() # Upload and parse a document (waits for completion) job = mxbai.parsing.jobs.upload_and_poll( file=open("report.pdf", "rb"), return_format="markdown", ) for chunk in job.result.chunks: print(chunk.content) ``` **TypeScript:** ```typescript import Mixedbread from '@mixedbread/sdk'; import fs from 'fs'; const mxbai = new Mixedbread(); const job = await mxbai.parsing.jobs.uploadAndPoll( fs.createReadStream('report.pdf'), { return_format: 'markdown' }, ); for (const chunk of job.result.chunks) { console.log(chunk.content); } ``` ## Decision Tree - **Which convenience method?** - File on disk → `upload_and_poll()` (uploads + creates job + polls) - File already uploaded via Files API → `create_and_poll()` (creates job + polls) - Need async control → `upload()` or `create()` then `poll()` separately - **Which parsing mode?** - Born-digital PDF (selectable text) → `fast` mode. Fastest, lowest cost. Extracts text, structure, and layout. - Scanned document, image, or complex layout → `high_quality` mode. Uses OCR. Extracts text with confidence scores, handles rotated/skewed pages, multi-column layouts. - **Need specific elements only?** → Set `element_types` to reduce processing time ## Supported File Types PDF (`.pdf`), Word (`.doc`, `.docx`, `.dotx`, `.docm`, `.dotm`, `.odt`, `.rtf`), Slides (`.ppt`, `.pptx`, `.ppsx`, `.pptm`, `.potm`, `.ppsm`, `.odp`), Images (`.jpeg`, `.png`, `.webp`, `.avif`). Element types: `text`, `title`, `section-header`, `header`, `footer`, `page-number`, `list-item`, `figure`, `picture`, `table`, `form`, `footnote`, `caption`, `formula`. ## Workflows ### Extract Tables from Documents Filter for table elements to pull structured data from reports. **Python:** ```python job = mxbai.parsing.jobs.upload_and_poll( file=open("financial-report.pdf", "rb"), element_types=["table"], return_format="html", mode="high_quality", ) for chunk in job.result.chunks: for element in chunk.elements: if element.type == "table": print(f"Page {element.page}, confidence {element.confidence:.2f}") print(element.content) ``` **TypeScript:** ```typescript const job = await mxbai.parsing.jobs.uploadAndPoll( fs.createReadStream('financial-report.pdf'), { element_types: ['table'], return_format: 'html', mode: 'high_quality' }, ); for (const chunk of job.result.chunks) { for (const element of chunk.elements) { if (element.type === 'table') { console.log(`Page ${element.page}, confidence ${element.confidence.toFixed(2)}`); console.log(element.content); } } } ``` ### Batch Parse Multiple Files Upload multiple files asynchronously, then poll all jobs: **Python:** ```python import os jobs = [] for filename in os.listdir("./documents"): if filename.endswith(".pdf"): job = mxbai.parsing.jobs.upload( file=open(f"./documents/{filename}", "rb"), return_format="markdown", ) jobs.append(job) # Poll all jobs for job in jobs: completed = mxbai.parsing.jobs.poll(job_id=job.id) print(f"{completed.filename}: {len(completed.result.chunks)} chunks") ``` **TypeScript:** ```typescript import { readdirSync, createReadStream } from 'fs'; import path from 'path'; const files = readdirSync('./documents').filter(f => f.endsWith('.pdf')); const jobs = await Promise.all( files.map(f => mxbai.parsing.jobs.upload( createReadStream(path.join('./documents', f)), { return_format: 'markdown' }, )), ); // Poll all jobs for (const job of jobs) { const completed = await mxbai.parsing.jobs.poll(job.id); console.log(`${completed.filename}: ${completed.result.chunks.length} chunks`); } ``` ## Rules ### CRITICAL - **Don't double-parse.** Store uploads auto-parse documents. Files uploaded with `parsing_strategy: "high_quality"` automatically get OCR text (images), summaries (images), and transcriptions (audio & video) extracted. These are available as fields on search result chunks. There is no benefit to also running the Parsing API on the same file. Use the Parsing API only for standalone document extraction outside of stores. - **Use `upload_and_poll()` / `create_and_poll()` instead of manual polling loops.** These methods handle backoff automatically. Manual `while` loops with `retrieve()` are fragile and waste API calls. ### HIGH - **Specify `element_types` when you only need certain elements.** Requesting all types increases processing time and response size. If you only need tables, set `element_types` to `table` only. - **Use `fast` mode for born-digital PDFs.** The `high_quality` mode adds OCR overhead that provides no benefit when text is already selectable. - **Check `confidence` scores on OCR output.** Low-confidence elements (< 0.5) may contain errors. Filter or flag them. ### MEDIUM - **Check `job.error` before retrying failed jobs.** Common causes: unsupported file type, corrupt file, file too large. Blindly retrying wastes quota. - **Use `content_to_embed` for embedding pipelines.** Each chunk provides both `content` (full text) and `content_to_embed` (optimized for embedding). Use the latter when feeding into vector stores outside Mixedbread. - **Verify file format before parsing.** Only PDF, Word, PowerPoint, and images are supported. Convert other formats first. ## Troubleshooting | Symptom | Cause | Fix | |---------|-------|-----| | Job stuck in `pending` | Queue is busy | Use `poll()` with a longer `poll_timeout_ms`. Check job status with `retrieve()`. | | Job status `failed` | Unsupported file type, corrupt file, or file too large | Check `job.error` for details. Verify file format is supported. | | Empty chunks in result | File has no extractable content (blank pages) | Verify the file has content. Try `high_quality` mode for scanned documents. | | Low confidence scores | Scanned or low-resolution source | Use `high_quality` mode for better OCR accuracy. | | Missing tables or figures | Element types not requested | Set `element_types` to include `table` and `figure` explicitly. | | `upload_and_poll()` timeout | Very large document or slow processing | Increase `poll_timeout_ms`, or use `upload()` + `poll()` separately for more control. |

mixedbread-search

Create and search managed knowledge bases using the Stores API. Stores are multimodal search indexes that handle text, images, tables, audio, and video across 100+ languages.

# Mixedbread Search Create and search managed knowledge bases using the Stores API. Stores are multimodal search indexes that handle text, images, tables, audio, and video across 100+ languages. Docs: https://www.mixedbread.com/docs/stores/overview.md Agent-readable docs: https://www.mixedbread.com/docs/llms.txt Latest docs search: https://www.mixedbread.com/question?q=stores&section=docs ## Setup ```bash pip install mixedbread # Python npm install @mixedbread/sdk # TypeScript ``` ```bash export MXBAI_API_KEY=your_api_key ``` ## Quick Start **Python:** ```python import os from mixedbread import Mixedbread mxbai = Mixedbread(api_key=os.environ["MXBAI_API_KEY"]) store = mxbai.stores.create(name="my-docs", description="Product documentation") mxbai.stores.files.upload( store_identifier=store.id, file=open("guide.pdf", "rb"), metadata={"category": "guides", "version": "2.0"}, ) results = mxbai.stores.search( query="How does authentication work?", store_identifiers=["my-docs"], top_k=5, ) for chunk in results.data: print(f"{chunk.score:.3f} | {chunk.filename}: {chunk.text[:100]}") ``` **TypeScript:** ```typescript import { Mixedbread } from '@mixedbread/sdk'; import fs from 'fs'; const mxbai = new Mixedbread({ apiKey: process.env.MXBAI_API_KEY!, }); const store = await mxbai.stores.create({ name: 'my-docs', description: 'Product documentation', }); await mxbai.stores.files.upload({ storeIdentifier: store.id, file: fs.createReadStream('guide.pdf'), body: { metadata: { category: 'guides', version: '2.0' } }, }); const results = await mxbai.stores.search({ query: 'How does authentication work?', store_identifiers: ['my-docs'], top_k: 5, }); ``` ## Decision Tree - **What kind of retrieval do you need?** - Simple keyword/semantic lookup → Standard `search()` with `top_k` - Natural-language answer with citations → `question_answering()` with `cite` enabled - Complex multi-hop question → `search()` with `agentic` enabled - Combine internal docs with live web → Add `"mixedbread/web"` to `store_identifiers` - **Do you need metadata filtering?** - Don't know what metadata exists → Call `metadata_facets()` first - Know the fields → Build `filters` with `all`/`any`/`none` combinators - **Do you need higher relevance?** - Yes → Set `"rerank": true` in `search_options`, or use `{"rerank": {"model": "mixedbread-ai/mxbai-rerank-large-v2"}}` to choose a model - **Do you need OCR, summaries, or transcriptions from files?** - Yes → Upload files with `config: {"parsing_strategy": "high_quality"}`. Stores auto-extract OCR text, summaries, and transcriptions — no separate parsing needed. - No / text-only documents → Default `parsing_strategy` (`"fast"`) is sufficient. - **Is the store temporary (e.g., PR review)?** - Yes → Set `expires_after` with a day limit at creation ## Workflows ### Build a Searchable Knowledge Base Create a store, upload documents, and search. Most of the time you do not need to poll for finished files. Only gate on processing when the workflow depends on complete batch coverage, such as benchmarks or recall evaluation. **Python:** ```python store = mxbai.stores.create( name="product-docs", description="Product documentation", config={"contextualization": {"with_metadata": ["title", "category"]}}, ) mxbai.stores.files.upload( store_identifier=store.id, file=open("guide.pdf", "rb"), metadata={"title": "Setup Guide", "category": "guides"}, ) mxbai.stores.files.upload( store_identifier=store.id, file=open("faq.md", "rb"), metadata={"title": "FAQ", "category": "support"}, ) results = mxbai.stores.search( query="How do I reset my password?", store_identifiers=["product-docs"], top_k=5, search_options={"rerank": True, "return_metadata": True}, ) for chunk in results.data: print(f"{chunk.score:.3f} | {chunk.filename}: {chunk.text[:100]}") # Optional: poll store.file_counts if you need deterministic full-batch coverage (benchmarks, migrations). ``` **TypeScript:** ```typescript const store = await mxbai.stores.create({ name: 'product-docs', description: 'Product documentation', config: { contextualization: { with_metadata: ['title', 'category'] } }, }); await mxbai.stores.files.upload({ storeIdentifier: store.id, file: fs.createReadStream('guide.pdf'), body: { metadata: { title: 'Setup Guide', category: 'guides' } }, }); await mxbai.stores.files.upload({ storeIdentifier: store.id, file: fs.createReadStream('faq.md'), body: { metadata: { title: 'FAQ', category: 'support' } }, }); const results = await mxbai.stores.search({ query: 'How do I reset my password?', store_identifiers: ['product-docs'], top_k: 5, search_options: { rerank: true, return_metadata: true }, }); // Optional: poll store.file_counts if you need deterministic full-batch coverage (benchmarks, migrations). ``` ### Filter-Driven Search Discover available metadata, then build targeted filters. **Python:** ```python facets = mxbai.stores.metadata_facets(store_identifiers=["product-docs"]) for key, values in facets.facets.items(): print(f"{key}: {values}") results = mxbai.stores.search( query="deployment guide", store_identifiers=["product-docs"], top_k=10, filters={ "all": [ {"key": "category", "operator": "eq", "value": "guides"}, {"key": "status", "operator": "not_eq", "value": "archived"}, ] }, search_options={"rerank": True, "return_metadata": True}, ) ``` **TypeScript:** ```typescript const facets = await mxbai.stores.metadataFacets({ store_identifiers: ['product-docs'], }); for (const [key, values] of Object.entries(facets.facets ?? {})) { console.log(`${key}: ${JSON.stringify(values)}`); } const results = await mxbai.stores.search({ query: 'deployment guide', store_identifiers: ['product-docs'], top_k: 10, filters: { all: [ { key: 'category', operator: 'eq', value: 'guides' }, { key: 'status', operator: 'not_eq', value: 'archived' }, ], }, search_options: { rerank: true, return_metadata: true }, }); ``` Filter operators: `eq`, `not_eq`, `gt`, `gte`, `lt`, `lte`, `in`, `not_in`, `like`, `starts_with`, `not_like`, `regex`. Combine with `all` (AND), `any` (OR), `none` (NOT). ### Web-Augmented Search Include `"mixedbread/web"` in `store_identifiers` to combine store search with live web results. This is a reserved store identifier — no setup required. You can also search the web alone. **Python:** ```python results = mxbai.stores.search( query="latest best practices", store_identifiers=["my-docs", "mixedbread/web"], ) ``` **TypeScript:** ```typescript const results = await mxbai.stores.search({ query: 'latest best practices', store_identifiers: ['my-docs', 'mixedbread/web'], }); ``` ### Question Answering Get a generated answer with cited sources. The answer may contain `<cite i="n"/>` tags referencing the sources list. **Python:** ```python result = mxbai.stores.question_answering( query="What are the rate limits?", store_identifiers=["my-docs"], top_k=10, qa_options={"cite": True}, search_options={"rerank": True}, ) print(result.answer) for source in result.sources: print(f" {source.filename} (score: {source.score:.3f})") ``` **TypeScript:** ```typescript const result = await mxbai.stores.questionAnswering({ query: 'What are the rate limits?', store_identifiers: ['my-docs'], top_k: 10, qa_options: { cite: true }, search_options: { rerank: true }, }); console.log(result.answer); for (const source of result.sources) { console.log(` ${source.filename} (score: ${source.score.toFixed(3)})`); } ``` ### Question Answering with Agentic Fallback When QA returns no sources, retry with agentic search for deeper retrieval. Always re-call `question_answering()` — do not fall back to raw `search()`, which loses the generated answer. **Python:** ```python result = mxbai.stores.question_answering( query="Compare the pricing tiers and their feature differences", store_identifiers=["my-docs"], top_k=10, qa_options={"cite": True}, search_options={"rerank": True}, ) if not result.sources: result = mxbai.stores.question_answering( query="Compare the pricing tiers and their feature differences", store_identifiers=["my-docs"], top_k=10, qa_options={"cite": True}, search_options={ "rerank": True, "agentic": {"max_rounds": 3}, }, ) print(result.answer) for source in result.sources: print(f" {source.filename} (score: {source.score:.3f})") ``` ### Agentic Search For complex questions requiring multi-step retrieval. The system decomposes your query into sub-queries and runs multiple rounds. Works in both `search()` and `question_answering()`. **Python:** ```python results = mxbai.stores.search( query="Compare the pricing tiers and their feature differences", store_identifiers=["product-docs"], search_options={ "agentic": { "max_rounds": 3, "queries_per_round": 2, } }, ) ``` **TypeScript:** ```typescript const results = await mxbai.stores.search({ query: 'Compare the pricing tiers and their feature differences', store_identifiers: ['product-docs'], search_options: { agentic: { max_rounds: 3, queries_per_round: 2, }, }, }); ``` Set `agentic` to `true` for default settings, or pass an object to control `max_rounds` and `queries_per_round`. ## Response Shapes **Search results** (`search()` returns): ```python response.data # list of chunks chunk.text # str — the matched text chunk.score # float — relevance score (0–1) chunk.filename # str — source file name chunk.file_id # str — source file ID chunk.store_id # str — store the chunk belongs to chunk.metadata # dict — attached metadata (when return_metadata is enabled) chunk.type # str — chunk type (e.g. "text", "image_url") chunk.image_url # dict | None — image payload for image chunks chunk.ocr_text # str | None — OCR text for image-heavy chunks chunk.summary # str | None — auto-generated summary for image chunks (high_quality mode) chunk.transcription # str | None — transcription for audio/video chunks (high_quality mode) ``` **QA results** (`question_answering()` returns): ```python result.answer # str — generated answer, may contain <cite i="n"/> tags result.sources # list of source objects source.filename # str source.score # float source.file_id # str source.text # str — the source chunk text source.image_url # dict | None — image payload with url/format for image chunks ``` ## Store Management ```python stores = mxbai.stores.list(limit=20) for store in stores.data: print(store.name) store = mxbai.stores.retrieve(store_identifier="my-docs") print(store.file_counts) # {"completed": 5, "in_progress": 2, "failed": 0} mxbai.stores.delete(store_identifier="my-docs") files = mxbai.stores.files.list(store_identifier="my-docs", limit=20) for file in files.data: print(file.filename, file.status) ``` ## Rules ### CRITICAL - **Store names must be lowercase letters, numbers, hyphens, and periods only.** Invalid names cause creation to fail. No spaces, underscores, or uppercase. - **For field-level contextualization, use the documented `{"with_metadata": [...]}` form.** The other documented modes are `true` (all metadata) and `false` (none). Dot notation is supported for nested fields. ### HIGH - **Do not block on full ingestion unless completeness matters.** Stores process files asynchronously, and completed files become searchable as they finish. Most of the time, especially for interactive flows, upload and search immediately without polling. Poll file status or `file_counts` only when the workflow depends on complete batch coverage, such as benchmarks, migrations, or sync verification. - **Use `metadata_facets()` before building filters.** Don't guess metadata keys — discover them. Typos in filter keys silently return no results. - **Enable `rerank` for production search.** Reranking significantly improves relevance. Only skip it for latency-sensitive prototyping. - **Use `parsing_strategy: "high_quality"` to enable automatic content extraction.** When set in per-file config at upload time, high quality mode extracts OCR text and summaries for images, and transcriptions for audio and video. These fields are directly usable as LLM context. The default `"fast"` strategy indexes content without these additional extractions. - **Use standard search for simple lookups.** Agentic search adds latency from multiple retrieval rounds. Only use it for complex, multi-hop questions. ### MEDIUM - **Set `expires_after` for temporary stores.** PR review stores, demo stores, and test stores should auto-expire to avoid accumulating unused indexes. - **One store per knowledge domain, not per query.** Stores are persistent indexes meant to be reused. Create once, search many times. - **Use chunk scores to filter low-relevance noise.** If you need a minimum relevance cutoff, post-filter on `chunk.score` (for example `>= 0.3`) after retrieval. - **Start with default `agentic` settings.** Only increase `max_rounds` if results are insufficient. ## Troubleshooting | Symptom | Cause | Fix | |---------|-------|-----| | No results returned | Newly uploaded files are still processing, or the store name/query is wrong | Retry after processing completes for at least one file. For completeness-sensitive runs, verify the expected files are `completed` before evaluating results. | | No results returned | Score cutoff too high | Lower or remove your post-filter threshold. | | No results returned | Wrong `store_identifiers` | Verify the store name or ID matches exactly. | | Metadata filters return nothing | Wrong key name or value | Use `metadata_facets()` to discover actual keys and values. | | Slow agentic search | Too many rounds or queries | Reduce `max_rounds` or `queries_per_round`. Use standard search if the query is simple. | | API key error | Invalid or missing key | Verify `MXBAI_API_KEY` is set. Get a key at https://platform.mixedbread.com/platform?next=api-keys |

mxbai-cli

The `mxbai` CLI manages stores, uploads files, performs semantic search, and syncs directories with Mixedbread from the terminal.

# mxbai CLI The `mxbai` CLI manages stores, uploads files, performs semantic search, and syncs directories with Mixedbread from the terminal. Docs: https://www.mixedbread.com/cli.md Agent-readable docs: https://www.mixedbread.com/docs/llms.txt Latest docs search: https://www.mixedbread.com/question?q=cli&section=cli ## Installation ```bash npm install -g @mixedbread/cli # global npm install --save-dev @mixedbread/cli # project-local (use npx mxbai) ``` Requires Node.js >= 20.0. Verify with `mxbai --version`. ## Authentication Resolved in priority order: 1. **Flag:** `--api-key mxb_xxxxx` or `--saved-key <name>` 2. **Environment variable:** `export MXBAI_API_KEY=mxb_xxxxx` 3. **Config file:** `mxbai config set api_key mxb_xxxxx` Get your API key at https://platform.mixedbread.com/platform?next=api-keys ## Quick Start ```bash # Create a store and upload docs mxbai store create "my-docs" --description "Product documentation" mxbai store upload "my-docs" "docs/**/*.md" # Search mxbai store search "my-docs" "How does authentication work?" # Sync changed files (hash-based detection by default) mxbai store sync "my-docs" "docs/**" ``` ## Decision Tree - **Upload vs Sync?** - One-time or manual upload → `mxbai store upload` - Ongoing updates (especially CI/CD) → `mxbai store sync` - **Which change detection for sync?** - In a git repo with known base commit → `--from-git HEAD~1` (fastest) - Outside git or need exact comparison → hash-based detection (default, compares content hashes) - **CLI vs SDK?** - Shell scripts, CI/CD, one-off tasks → CLI - Application code, custom logic, programmatic access → Python/TypeScript SDK ## Workflows ### CI/CD Documentation Sync Sync documentation to a store on every push using the default hash-based change detection. **GitHub Actions:** ```yaml name: Sync Documentation on: push: branches: [main] paths: - 'docs/**' jobs: sync: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install mxbai CLI run: npm install -g @mixedbread/cli - name: Sync docs to store env: MXBAI_API_KEY: ${{ secrets.MXBAI_API_KEY }} run: | mxbai store sync my-docs "docs/**/*.md" \ --strategy high_quality \ --yes ``` For faster change detection in git repos, add `--from-git HEAD~1` (requires `fetch-depth: 2`) or `--from-git origin/main` (requires `fetch-depth: 0`). **Key points:** - Always pass `--yes` — CI environments are non-interactive and commands hang without it - Use `--from-git` for faster change detection in git repos - Store the API key as a secret via `MXBAI_API_KEY` - Use `--dry-run` in a separate step to preview changes before applying **Preview before syncing:** ```bash mxbai store sync "my-docs" "docs/**" --dry-run ``` ### Multi-Environment Setup Manage separate API keys for staging and production. ```bash # Add keys for different environments mxbai config keys add mxb_xxxxx production mxbai config keys add mxb_xxxxx staging # Set production as default mxbai config keys set-default production # Use staging for a specific command mxbai store search staging-docs "query" --saved-key staging ``` ### Upload with Manifest Use a manifest file for complex uploads with per-file metadata and strategy overrides. ```yaml # upload-manifest.yaml version: "1" defaults: strategy: fast metadata: team: engineering files: - path: docs/getting-started.md metadata: title: Getting Started Guide priority: high - path: docs/api-reference.md strategy: high_quality metadata: title: API Reference - path: reports/*.pdf metadata: category: reports ``` ```bash mxbai store upload "my-docs" --manifest upload-manifest.yaml ``` ### Store Aliases Create short aliases for frequently used stores: ```bash mxbai config set aliases.docs "my-documentation-store" mxbai config set aliases.prod "str_abc123" # Use aliases in any command mxbai store search docs "how to deploy" mxbai store upload prod "files/**/*.md" ``` ## Rules ### CRITICAL - **Always pass `--yes` in CI/CD.** Without it, sync and delete commands hang waiting for interactive confirmation that never comes. CI environments don't have a TTY. - **`--contextualization` on upload/sync is deprecated since v2.2.0.** Configure contextualization at store creation with `mxbai store create --contextualization`. The flag on upload/sync shows a warning and is ignored. ### HIGH - **`--parallel` max is 200.** The CLI validates and rejects values above 200. Default is 100. - **`--force` sync re-uploads everything.** It bypasses change detection entirely. Use sparingly — typically only for periodic full re-syncs (e.g., weekly cron). - **Store names: lowercase letters, numbers, hyphens, and periods only.** Invalid names cause creation to fail. ### MEDIUM - **Use `--dry-run` before first sync.** Preview what would be uploaded, changed, or deleted before committing. - **Use store aliases for frequently-used stores.** Avoids typos and long store names in commands. - **Use `--unique` on upload to skip duplicates.** Prevents re-uploading files that already exist (based on content hash). ## Troubleshooting | Symptom | Cause | Fix | |---------|-------|-----| | "Command not found" | Node.js < 20 or not globally installed | Verify Node.js >= 20. Try `npx mxbai` for project-local installs. | | "No API key" | No key configured | Run `mxbai config keys add <key>` or set `MXBAI_API_KEY` env var. | | Sync hangs in CI | Missing `--yes` flag | Pass `--yes` for non-interactive mode. | | Upload timeout for large files | Default multipart settings insufficient | Tune `--multipart-threshold`, `--multipart-part-size`, `--multipart-concurrency`. | | Store not found | Wrong name or alias | Check aliases with `mxbai config get aliases`. Verify name uses valid characters. | | Contextualization warning | Deprecated flag on upload/sync | Set contextualization at store creation instead. | | Sync detects no changes | Hash-based detection with modified metadata only | Use `--force` to re-upload, or `--from-git` to detect changes via git. | | `--from-git` misses files | `fetch-depth` too shallow in CI | Set `fetch-depth: 0` for full history, or `fetch-depth: 2` minimum for `HEAD~1`. |