Key Features
- Multiple Models: Access to 8 language models including GPT-4o, GPT-4.1, and high-speed Groq models
- Prompt Integration: Custom prompt templates for specialized response behaviors
- Temperature Control: Adjust creativity from deterministic (0.0) to creative (2.0) responses
- Streaming Support: Real-time response generation for interactive applications
- Quality Metrics: Built-in evaluation using DeepEval metrics
Available Endpoints
List LLM Nodes
GET
/{flow_name}/llmRetrieve all LLM nodes with configurations and metricsUpdate Configuration
PATCH
/{flow_name}/llm/{node_id}Modify LLM node settings including model and temperatureList Prompts
GET
/{flow_name}/promptsAccess available prompt templates for LLM customizationLLM Node Structure
Configuration Parameters
| Parameter | Type | Description |
|---|---|---|
| model | string | Language model for response generation |
| promptId | string | Prompt template for instruction guidance |
| temperature | float (0.0-2.0) | Creativity and randomness control |
Available Models
| Model | Context Window | Best For | Expected Latency |
|---|---|---|---|
| gpt-4o | 128K tokens | High accuracy, complex reasoning | 2-4 seconds |
| gpt-4o-mini | 128K tokens | Balanced quality and speed | 1-2 seconds |
| gpt-4.1 | 128K tokens | Latest capabilities | 2-5 seconds |
| gpt-4.1-mini | 128K tokens | Modern features, efficient | 1-2 seconds |
| gpt-4.1-nano | 128K tokens | Resource optimization | 0.8-1.5 seconds |
| gpt-3.5-turbo-0125 | 16K tokens | High-volume processing | 0.5-1 second |
| mixtral-8x7b-32768 | 32K tokens | Real-time processing | 0.5-1 second |
| llama-3.1-8b-instant | 8K tokens | Ultra-fast responses | 0.3-0.8 seconds |
Common Configurations
Maximum Accuracy
For technical documentation and critical applications:Balanced Performance
For general Q&A and customer support:High-Speed Processing
For real-time chat and instant responses:Creative Generation
For content creation and diverse outputs:Quality Metrics
LLM nodes track response quality using DeepEval metrics:- Contextual Precision: Accuracy of context usage
- Contextual Recall: Completeness of context utilization
- Answer Relevancy: Relevance of response to question
- Faithfulness: Adherence to context without hallucination
JavaScript Example
Python Example
Best Practices
Model Selection
- Use gpt-4o for maximum accuracy in critical applications
- Use gpt-4o-mini for balanced performance in general use cases
- Use Groq models (mixtral, llama) for real-time applications
- Use creative models (gpt-4.1) for content generation
Temperature Settings
- 0.0-0.1: Deterministic responses for factual queries
- 0.2-0.4: Slight variation for natural conversations
- 0.5-0.8: Creative responses for content generation
- 0.9-2.0: Highly creative and diverse outputs
Performance Optimization
- Monitor processing time and adjust model selection accordingly
- Enable streaming for better user experience with longer responses
- Use appropriate context window sizes for your content
- Track quality metrics to ensure response accuracy
Troubleshooting
Common issues and solutions:- Slow responses: Switch to faster models (Groq) or lower temperature
- Inconsistent quality: Use higher-quality models or lower temperature
- Resource usage: Use efficient models (nano, mini) for high-volume processing
- Poor context usage: Optimize prompt templates and verify context formatting

