- Sync
- Async
Parameters
query(str, optional): Search query text. Mutually exclusive withquery_image.filters(Dict[str, Any], optional): Optional metadata filtersk(int, optional): Number of results. Defaults to 4.min_score(float, optional): Minimum similarity threshold. Defaults to 0.0.use_colpali(bool, optional): Whether to use ColPali-style embedding model to retrieve the chunks (only works for documents ingested withuse_colpali=True). Defaults to True.folder_name(str | List[str], optional): Optional folder scope. Accepts canonical paths (e.g.,/projects/alpha/specs) or a list of paths/names.folder_depth(int, optional): Folder scope depth.None/0= exact match,-1= include all descendants,n > 0= include descendants up tonlevels deep.padding(int, optional): Number of additional chunks/pages to retrieve before and after matched chunks (ColPali only). Defaults to 0.output_format(str, optional): Controls how image chunks are returned:"base64"(default): Returns base64-encoded image data"url": Returns presigned HTTPS URLs"text": Converts images to markdown text via OCR
query_image(str, optional): Base64-encoded image for reverse image search. Mutually exclusive withquery. Requiresuse_colpali=True.
Metadata Filters
Filters follow the same JSON syntax across the API. See the Metadata Filtering guide for supported operators and typed comparisons. Example:Returns
List[FinalChunkResult]: List of chunk results
Examples
- Sync
- Async
FinalChunkResult Properties
TheFinalChunkResult objects returned by this method have the following properties:
content(str | PILImage): Chunk content (text or image)score(float): Relevance scoredocument_id(str): Parent document IDchunk_number(int): Chunk sequence numbermetadata(Dict[str, Any]): Document metadatacontent_type(str): Content typefilename(Optional[str]): Original filenamedownload_url(Optional[str]): URL to download full document
Output Format Options
"base64"(default): Image chunks are returned as base64 data (the SDK attempts to decode these into aPIL.ImageforFinalChunkResult.content)."url": Image chunks are returned as presigned HTTPS URLs incontent. This is convenient for UIs and LLMs that accept remote image URLs (e.g., viaimage_url)."text": Image chunks are converted to markdown text via OCR. Use this when you need faster inference or when documents are mostly text-based.- Text chunks are unaffected by
output_formatand are always returned as strings. - The
download_urlfield may be populated for image chunks. When usingoutput_format="url", it will typically matchcontentfor those chunks.
When to Use Each Format
| Format | Best For |
|---|---|
base64 | Direct image processing, local applications |
url | Web UIs, LLMs with vision capabilities (lighter on network) |
text | Faster inference, text-heavy documents, context length concerns |
base64 vs url: Both formats pass images to LLMs for visual understanding and produce similar results. However,
url is lighter on network transfer since only the URL is sent to your application (the LLM fetches the image directly). This can result in faster response times, especially with multiple images.When to use text: Passing images to LLMs for inference can be slow and consume significant context tokens. Use output_format="text" when you need faster inference speeds or when your documents are primarily text-based.If you’re hitting context limits with images, it may be because they aren’t being passed correctly to the model. See Generating Completions with Retrieved Chunks for examples of properly passing images (both base64 and URLs) to vision-capable models like GPT-4o.get_document_download_url.
Reverse Image Search
You can search using an image instead of text by providingquery_image with a base64-encoded image. This enables finding visually similar content in your documents.
- Sync
- Async
Reverse image search requires documents to be ingested with
use_colpali=True. You must provide either query or query_image, but not both.
