Reprocess Source

The Reprocess endpoint lets you re-run the ingestion pipeline on an existing source using a different partition method. Processing runs asynchronously: the API returns a build_id immediately; you then poll Get build status until the job completes.

Endpoint overview

HTTP Method

POST

Endpoint URL

https://sources.graphorlm.com/reprocess

Authentication

This endpoint requires authentication using an API token. Include your API token as a Bearer token in the Authorization header.

Learn how to create and manage API tokens in the API Tokens guide.

Async flow

POST /reprocess with file_id and optional partition_method. The response returns immediately with a build_id.
Poll Get build status: GET https://sources.graphorlm.com/builds/{build_id} until status is Completed or indicates failure.
Use the file_id from the build status response (unchanged) for subsequent API calls.

Request format

Headers

Header	Value	Required
`Authorization`	`Bearer YOUR_API_TOKEN`	Yes
`Content-Type`	`application/json`	Yes

Request body

Send a JSON body with the following fields:

Field	Type	Description	Required
`file_id`	string	Unique identifier of the source to re-process	Yes
`method`	string	Partitioning strategy. One of: `fast`, `balanced`, `accurate`, `vlm`, `agentic`. Default: `fast`	No

Partition method values (v2)

Use these values for the partition_method field:

Value	Name	Description
`fast`	Fast	Fast processing with heuristic classification. No OCR.
`balanced`	Balanced	OCR-based extraction with structure classification.
`accurate`	Accurate	Fine-tuned model for highest accuracy (Premium).
`vlm`	VLM	Best for manuscripts and handwritten content.
`agentic`	Agentic	Highest accuracy for complex layouts, tables, and diagrams.

Available processing methods

Fast

Best for: Simple text documents, quick processing

Fast processing with heuristic classification
No OCR processing
Suitable for plain text files and well-structured documents
Recommended for testing and development

Balanced

Best for: Complex documents with varied layouts

OCR-based text extraction
AI-powered document structure classification
Better recognition of tables, figures, and document elements
Enhanced accuracy for complex layouts

Accurate

Best for: Premium accuracy, specialized documents

OCR-based text extraction
Fine-tuned AI model for document classification
Highest accuracy for document structure recognition
Note: Premium feature

VLM

Best for: Text-first parsing, manuscripts, and handwritten documents

Best text-first parsing; no bounding boxes or page layout
Best for manuscript and handwritten documents
Performs page and document annotation
Best-in-class text parsing quality

Agentic

Best for: Complex layouts, multi-page tables, diagrams, and images

Highest parsing setting for complex layouts
Rich annotations for images and complex elements
Agentic processing for enhanced understanding

Method comparison

Method	Speed	Text parsing	Element classification	Bounding boxes	Best use cases	OCR
Fast	High	Good	Good	Yes (limited)	Simple text files, testing	No
Balanced	Medium	Very good	Very good	Yes	Complex layouts, mixed content	Yes
Accurate	Medium	Excellent	Excellent	Yes	Premium accuracy needed	Yes
VLM	High	Excellent	Good	No	Manuscripts, handwritten	Yes
Agentic	Medium	Excellent	Excellent	Yes	Complex layouts, multi-page tables, diagrams	Yes

Request example

{
  "file_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "method": "balanced"
}

With default method (optional field omitted):

{
  "file_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Re-processing runs in the background and can take several minutes depending on document size and the selected method. Use the returned build_id to poll Get build status until completion.

Response format

Success response (200 OK)

The endpoint returns immediately with a build identifier. It does not wait for processing to finish.

{
  "build_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
  "success": true,
  "error": null
}

Response fields

Field	Type	Description
`build_id`	string	Use this ID to poll Get build status
`success`	boolean	Whether the re-process job was successfully scheduled
`error`	string \| null	Error message if the job was not scheduled successfully

To get the final source metadata (file_id, file_name, status, etc.) and optional parsed elements, call GET /builds/{build_id} (see Upload sources – Get build status).

Code examples

JavaScript/Node.js

const reprocessSource = async (apiToken, fileId, partitionMethod = "fast") => {
  const response = await fetch("https://sources.graphorlm.com/reprocess", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ file_id: fileId, method: partitionMethod }),
  });

  if (!response.ok) {
    const err = await response.json().catch(() => ({}));
    throw new Error(err.detail || `Reprocess failed: ${response.status}`);
  }

  const { build_id } = await response.json();
  return build_id;
};

// Usage: get build_id, then poll Get build status until success
const buildId = await reprocessSource(
  "grlm_your_api_token_here",
  "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "balanced"
);
console.log("Build ID:", buildId);

Python

import requests

def reprocess_source(api_token, file_id, partition_method="fast"):
    url = "https://sources.graphorlm.com/reprocess"
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }
    payload = {"file_id": file_id, "method": partition_method}
    response = requests.post(url, headers=headers, json=payload, timeout=60)
    response.raise_for_status()
    return response.json()["build_id"]

# Usage
build_id = reprocess_source(
    "grlm_your_api_token_here",
    "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "balanced",
)
print("Build ID:", build_id)

cURL

curl -X POST https://sources.graphorlm.com/reprocess \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"file_id":"a1b2c3d4-e5f6-7890-abcd-ef1234567890","method":"balanced"}'

Reprocess and poll until complete

import time
import requests

def reprocess_and_wait(api_token, file_id, partition_method="balanced", poll_interval=3, max_wait=600):
    # Start reprocess
    r = requests.post(
        "https://sources.graphorlm.com/reprocess",
        headers={"Authorization": f"Bearer {api_token}", "Content-Type": "application/json"},
        json={"file_id": file_id, "method": partition_method},
        timeout=60,
    )
    r.raise_for_status()
    build_id = r.json()["build_id"]

    # Poll until complete
    url = f"https://sources.graphorlm.com/builds/{build_id}"
    headers = {"Authorization": f"Bearer {api_token}"}
    start = time.time()
    while time.time() - start < max_wait:
        status_r = requests.get(url, headers=headers)
        status_r.raise_for_status()
        data = status_r.json()
        if data.get("success"):
            return data
        if data.get("status") == "Processing failed" or (data.get("error") and data.get("status") != "not_found"):
            raise RuntimeError(data.get("error") or data.get("message", "Reprocess failed"))
        time.sleep(poll_interval)
    raise TimeoutError("Reprocess did not complete in time")

Error responses

Common error codes

Status code	Description
`404`	Source not found for the given `file_id`
`500`	Processing or unexpected internal error

Error response format

{
  "detail": "Source node not found"
}

{
  "detail": "Failed to process source"
}

Error examples

Source not found (404)

{ "detail": "Source node not found" }

Cause: The given file_id does not exist in your project.
Solution: Verify the file_id (e.g. from List sources or a previous upload/build status).

Processing failed (500)

{ "detail": "Failed to process source" }

Cause: Internal error during re-processing.
Solution: Retry later or try a different partition_method; check file integrity.

When to reprocess

Poor text extraction

Symptoms: Missing text, garbled characters, incomplete content
Recommended: balanced or accurate for complex layouts; vlm for text-only when bounding boxes are not needed.

Table detection issues

Symptoms: Tables not recognized, merged cells, structure lost
Recommended: balanced, accurate, or agentic for multi-page tables.

Image and figure handling

Symptoms: Missing captions, poor figure recognition
Recommended: balanced, accurate, or agentic for rich image annotations.

Document structure problems

Symptoms: Headers/footers mixed with content, poor section detection
Recommended: balanced, accurate, or agentic for better structure and semantics.

Best practices

Use file_id: Always use the source’s file_id (from list sources or build status); do not rely on file name.
Poll build status: After calling reprocess, poll Get build status with a reasonable interval (e.g. 2–5 seconds) and timeout.
Choose method by need: Start with fast for testing; use balanced or accurate for better quality; use vlm for manuscripts; use agentic for complex layouts and tables.
Timeout: Allow sufficient time for large documents and heavier methods when polling.

Next steps

After re-processing completes (build status Completed):

Get build status

Poll status and optionally retrieve parsed elements for a build

List sources

View all sources and their status in your project

Upload sources

Upload new files, URLs, GitHub repos, or YouTube videos (async)

Delete source

Remove a source from your project

Get Started

Data API Options

Endpoint overview

HTTP Method

Endpoint URL

Authentication

Async flow

Request format

Headers

Request body

Partition method values (v2)

Available processing methods

Method comparison

Request example

Response format

Success response (200 OK)

Response fields

Code examples

JavaScript/Node.js

Python

cURL

Reprocess and poll until complete

Error responses

Common error codes

Error response format

Error examples

When to reprocess

Best practices

Next steps

Get build status

List sources

Upload sources

Delete source

Get Started

Data API Options

​Endpoint overview

HTTP Method

Endpoint URL

​Authentication

​Async flow

​Request format

​Headers

​Request body

​Partition method values (v2)

​Available processing methods

​Method comparison

​Request example

​Response format

​Success response (200 OK)

​Response fields

​Code examples

​JavaScript/Node.js

​Python

​cURL

​Reprocess and poll until complete

​Error responses

​Common error codes

​Error response format

​Error examples

​When to reprocess

​Best practices

​Next steps

Get build status

List sources

Upload sources

Delete source

Endpoint overview

Authentication

Async flow

Request format

Headers

Request body

Partition method values (v2)

Available processing methods

Method comparison

Request example

Response format

Success response (200 OK)

Response fields

Code examples

JavaScript/Node.js

Python

cURL

Reprocess and poll until complete

Error responses

Common error codes

Error response format

Error examples

When to reprocess

Best practices

Next steps