Skip to main content
The Process Source endpoint allows you to reprocess previously uploaded documents using different parsing and classification methods. This enables you to optimize document processing for better text extraction, structure recognition, and retrieval performance without re-uploading the file.

Endpoint Overview

HTTP Method

POST

Authentication

This endpoint requires authentication using an API token. You must include your API token as a Bearer token in the Authorization header.
Learn how to create and manage API tokens in the API Tokens guide.

Request Format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKEN✅ Yes
Content-Typeapplication/json✅ Yes

Request Body

The request must be sent as JSON with the following fields:
FieldTypeDescriptionRequired
file_namestringName of the previously uploaded file to reprocess✅ Yes
partition_methodstringProcessing method to use (see available methods below)✅ Yes

Available Processing Methods

Best for: Simple text documents, quick processing
  • Fast processing with heuristic classification
  • No OCR processing
  • Suitable for plain text files and well-structured documents
  • Recommended for testing and development
Best for: Scanned documents, images with text
  • Utilizes OCR for text extraction and parsing
  • Heuristic-based document element classification
  • Ideal for scanned PDFs and image files
  • Balances processing speed and accuracy
Best for: Complex documents with varied layouts
  • OCR-based text extraction
  • AI-powered document structure classification using Hi-Res model
  • Better recognition of tables, figures, and document elements
  • Enhanced accuracy for complex layouts
Best for: Premium accuracy, specialized documents
  • OCR-based text extraction
  • Fine-tuned AI model for document classification
  • Highest accuracy for document structure recognition
  • Optimized for specialized and complex document types
  • Note: Premium feature
Best for: Custom processing workflows
  • Specialized processing method
  • Custom document analysis pipeline
  • Advanced document understanding capabilities
Best for: Fast text-focused processing without layout metadata
  • Model-assisted partitioning focused on textual content
  • Does not output bounding boxes or page layout (no bbox)
  • Lightweight and faster when you only need clean text and element types
  • Performs page annotation (page-level labels and context)
  • Performs document annotation (document-level labels and summaries)
  • Performs image annotation when images are present in the document
  • Best-in-class text parsing quality; element classification is limited

partition_method values

Use these values for the partition_method field when calling the endpoint:
Methodpartition_method
Basicbasic
OCRocr
Hi-Reshi_res
Hi-Res FThi_res_ft
GraphorLMgraphorlm
MAImai

Processing Method Selection Guide

Method Comparison

MethodSpeedText ParsingElement ClassificationBounding BoxesBest Use CasesOCR
Basic⚡⚡⚡⭐⭐⭐⭐✅ (limited)Simple text files, testing
OCR⚡⚡⭐⭐⭐⭐⭐✅ (images)Scanned documents, images
Hi-Res⭐⭐⭐⭐⭐⭐⭐⭐Complex layouts, mixed content
Hi-Res FT⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐Premium accuracy needed
Graphor⭐⭐⭐⭐⭐⭐⭐⭐Custom workflows
MAI⚡⚡⚡⭐⭐⭐⭐⭐⭐⭐⭐Text precision without layout metadata

Request Example

{
  "file_name": "document.pdf",
  "partition_method": "hi_res"
}
Processing can take several minutes depending on document size, complexity, and the selected processing method. Advanced methods like Hi-Res, Hi-Res FT, Graphor and MAI typically require more time for analysis.

Response Format

Success Response (200 OK)

{
  "status": "success",
  "message": "Source processed successfully",
  "file_name": "document.pdf",
  "file_size": 2048576,
  "file_type": "pdf",
  "file_source": "local file",
  "project_id": "550e8400-e29b-41d4-a716-446655440000",
  "project_name": "My Project",
  "partition_method": "hi_res"
}

Response Fields

FieldTypeDescription
statusstringProcessing result (typically “success”)
messagestringHuman-readable success message
file_namestringName of the processed file
file_sizeintegerSize of the file in bytes
file_typestringFile extension/type
file_sourcestringSource type of the original file
project_idstringUUID of the project containing the file
project_namestringName of the project
partition_methodstringProcessing method that was applied

Code Examples

JavaScript/Node.js

const processDocument = async (apiToken, fileName, partitionMethod) => {
  const response = await fetch('https://sources.graphorlm.com/process', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      file_name: fileName,
      partition_method: partitionMethod
    })
  });

  if (response.ok) {
    const result = await response.json();
    console.log('Processing successful:', result);
    return result;
  } else {
    const error = await response.text();
    throw new Error(`Processing failed: ${response.status} ${error}`);
  }
};

// Usage
processDocument('grlm_your_api_token_here', 'document.pdf', 'hi_res')
  .then(result => console.log('Document processed:', result.file_name))
  .catch(error => console.error('Error:', error));

Python

import requests
import json

def process_document(api_token, file_name, partition_method):
    url = "https://sources.graphorlm.com/process"
    
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json"
    }
    
    payload = {
        "file_name": file_name,
        "partition_method": partition_method
    }
    
    # Increased timeout for processing complex documents
    response = requests.post(
        url, 
        headers=headers, 
        json=payload, 
        timeout=300  # 5 minutes
    )
    
    if response.status_code == 200:
        result = response.json()
        print(f"Processing successful: {result['file_name']}")
        return result
    else:
        response.raise_for_status()

# Usage
try:
    result = process_document(
        "grlm_your_api_token_here", 
        "document.pdf", 
        "hi_res"
    )
    print(f"Document processed with method: {result['partition_method']}")
except requests.exceptions.RequestException as e:
    print(f"Error processing document: {e}")

cURL

curl -X POST https://sources.graphorlm.com/process \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "document.pdf",
    "partition_method": "hi_res"
  }'

PHP

<?php
function processDocument($apiToken, $fileName, $partitionMethod) {
    $url = "https://sources.graphorlm.com/process";
    
    $headers = [
        "Authorization: Bearer " . $apiToken,
        "Content-Type: application/json"
    ];
    
    $payload = json_encode([
        'file_name' => $fileName,
        'partition_method' => $partitionMethod
    ]);
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 300); // 5 minutes
    
    $response = curl_exec($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    
    if ($httpCode === 200) {
        return json_decode($response, true);
    } else {
        throw new Exception("Processing failed with HTTP code: " . $httpCode);
    }
}

// Usage
try {
    $result = processDocument(
        "grlm_your_api_token_here", 
        "document.pdf", 
        "hi_res"
    );
    echo "Document processed: " . $result['file_name'] . "\n";
    echo "Method used: " . $result['partition_method'] . "\n";
} catch (Exception $e) {
    echo "Error: " . $e->getMessage() . "\n";
}
?>

Error Responses

Common Error Codes

Status CodeError TypeDescription
400Bad RequestInvalid request format or missing required fields
401UnauthorizedInvalid or missing API token
403ForbiddenAccess denied to the specified project
404Not FoundFile not found in the project
500Internal Server ErrorProcessing failure or server error

Error Response Format

{
  "detail": "Source node not found"
}

Error Examples

{
  "detail": "Source node not found"
}
Cause: The specified file name doesn’t exist in your projectSolution: Verify the file name and ensure it was previously uploaded
{
  "detail": "Invalid authentication credentials"
}
Cause: API token is invalid, expired, or malformedSolution: Check your API token and ensure it hasn’t been revoked
{
  "detail": "Failed to process file document.pdf"
}
Cause: Internal processing error with the specified methodSolution: Try a different processing method or check file integrity
{
  "detail": "Invalid partition method specified"
}
Cause: Unsupported or invalid partition methodSolution: Use one of: basic, ocr, hi_res, hi_res_ft, graphorlm, mai

When to Reprocess

Symptoms: Missing text, garbled characters, incomplete contentRecommended methods:
  • OCR for scanned documents
  • Hi-Res or Hi-Res FT for complex layouts
  • MAI for text-only documents when bounding boxes are not required
Symptoms: Tables not properly recognized, merged cells, structure lostRecommended methods:
  • Hi-Res for better table detection
  • Hi-Res FT for complex table structures
Symptoms: Missing captions, poor figure recognitionRecommended methods:
  • Hi-Res for figure detection
  • Hi-Res FT for comprehensive image analysis
Symptoms: Headers/footers mixed with content, poor section detectionRecommended methods:
  • Hi-Res for structure recognition
  • Hi-Res FT for complex document hierarchies
  • Graphor for enhanced semantic structure and relationships

Best Practices

Processing Strategy

  • Start with Basic: For testing and simple documents
  • Upgrade gradually: Move to OCR → Hi-Res → Hi-Res FT -> Graphor -> MAI based on needs
  • Monitor results: Use document preview to evaluate processing quality
  • Consider efficiency vs. quality: Advanced methods take longer but provide better results

Performance Optimization

  • Batch processing: Process multiple files sequentially rather than simultaneously
  • Method selection: Choose the appropriate method for your document types
  • Timeout handling: Allow sufficient time for complex processing methods
  • Error recovery: Implement retry logic for transient failures

Quality Assessment

After processing, evaluate the results by:
  • Checking text extraction completeness
  • Verifying table and figure recognition
  • Reviewing document structure classification
  • Testing retrieval quality in your RAG pipeline

Integration Examples

Automatic Quality Improvement

def improve_processing_quality(api_token, file_name):
    """Automatically upgrade processing method for better quality."""
    methods = ['basic', 'ocr', 'hi_res', 'hi_res_ft', 'graphorlm', 'mai']
    
    for method in methods:
        try:
            print(f"Trying {method} method...")
            result = process_document(api_token, file_name, method)
            
            # Add your quality assessment logic here
            if assess_quality(result):
                print(f"Success with {method} method")
                return result
                
        except Exception as e:
            print(f"Failed with {method}: {e}")
            continue
    
    raise Exception("All processing methods failed")

def assess_quality(result):
    """Add your quality assessment logic here."""
    # Example: check if processing was successful
    return result.get('status') == 'success'

Batch Reprocessing

const batchReprocess = async (apiToken, files, method) => {
  const results = [];
  const failed = [];
  
  for (const fileName of files) {
    try {
      console.log(`Processing ${fileName} with ${method}...`);
      const result = await processDocument(apiToken, fileName, method);
      results.push(result);
      
      // Wait between requests to avoid rate limiting
      await new Promise(resolve => setTimeout(resolve, 1000));
      
    } catch (error) {
      console.error(`Failed to process ${fileName}:`, error);
      failed.push({ fileName, error: error.message });
    }
  }
  
  console.log(`Processed: ${results.length}, Failed: ${failed.length}`);
  return { successful: results, failed };
};

// Usage
const files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf'];
batchReprocess('grlm_your_token', files, 'hi_res')
  .then(results => console.log('Batch processing complete:', results));

Processing with Progress Tracking

import time
from typing import List, Dict

def process_with_progress(api_token: str, files_and_methods: List[Dict]):
    """Process multiple files with progress tracking."""
    total = len(files_and_methods)
    completed = 0
    results = []
    
    print(f"Starting batch processing of {total} files...")
    
    for item in files_and_methods:
        file_name = item['file_name']
        method = item['method']
        
        try:
            print(f"[{completed + 1}/{total}] Processing {file_name} with {method}...")
            start_time = time.time()
            
            result = process_document(api_token, file_name, method)
            
            duration = time.time() - start_time
            completed += 1
            
            results.append({
                'file_name': file_name,
                'method': method,
                'status': 'success',
                'duration': duration,
                'result': result
            })
            
            print(f"✅ Completed {file_name} in {duration:.1f}s")
            
        except Exception as e:
            completed += 1
            results.append({
                'file_name': file_name,
                'method': method,
                'status': 'failed',
                'error': str(e)
            })
            
            print(f"❌ Failed {file_name}: {e}")
        
        # Progress update
        progress = (completed / total) * 100
        print(f"Progress: {progress:.1f}% ({completed}/{total})")
        
        # Small delay between requests
        time.sleep(0.5)
    
    return results

# Usage
processing_queue = [
    {'file_name': 'document1.pdf', 'method': 'hi_res'},
    {'file_name': 'document2.pdf', 'method': 'hi_res_ft'},
    {'file_name': 'document3.pdf', 'method': 'ocr'}
]

results = process_with_progress('grlm_your_token', processing_queue)

Troubleshooting

Causes: Large files, complex documents, or heavy server loadSolutions:
  • Increase request timeout (5+ minutes recommended)
  • Try a simpler processing method first
  • Process during off-peak hours
  • Contact support for very large documents
Causes: Incorrect file name, file deleted, or wrong projectSolutions:
  • Verify exact file name (case-sensitive)
  • Use the List Sources endpoint to check available files
  • Ensure you’re using the correct API token for the project
Causes: Corrupted files, unsupported content, or method incompatibilitySolutions:
  • Try a different processing method
  • Check file integrity
  • Re-upload the file if necessary
  • Contact support for persistent issues
Causes: Method not suitable for document type, or complex layoutSolutions:
  • Upgrade to Hi-Res or Hi-Res FT method
  • Ensure document quality is good
  • Consider pre-processing the document
  • Review processing results in the dashboard

Next Steps

After successfully processing your documents: