Build, train, and deploy AI models with confidence
Everything you need to create datasets, fine-tune language models, deploy production APIs, and set up verification models that guarantee your AI behaves correctly.
Getting Started
ANRAK AI is an enterprise platform for fine-tuning large language models. You bring your data, choose a base model, and train a custom AI that understands your domain — then deploy it as an API or run it locally.
Sign Up
Create an account at anrak.ai/auth/signup. Every new account receives free credits to get started. No credit card required.
Dashboard Overview
After signing in, your dashboard shows an overview of your organization:
- Datasets — Training data you've uploaded or generated
- Training Jobs — Fine-tuning runs with real-time progress
- Models — Your trained models, ready to deploy or download
- Deploy — Production API endpoints with key management
- Inference Playground — Test any model interactively
- Evaluations — Benchmark your models against standard tests
Credits
ANRAK uses a credit-based system. Credits are consumed for dataset generation, training compute, and inference requests. Your remaining balance is always visible on the dashboard. See the Pricing page for details.
Creating Datasets
ANRAK offers five ways to create training datasets. You can upload your own data, generate it with AI, use neurosymbolic verification for high-precision generation, convert CSVs into conversational data, or augment existing datasets.
Upload a Dataset
Upload your own training data in JSONL, CSV, or Parquet format. For supervised fine-tuning, use the standard chat format with a messages array:
{"messages": [
{"role": "system", "content": "You are a helpful medical receptionist."},
{"role": "user", "content": "What are your visiting hours?"},
{"role": "assistant", "content": "Our visiting hours are 9 AM to 8 PM daily."}
]}For preference training (DPO), include chosen and rejected response pairs. Maximum file size is 100 MB.
AI Generation
Let a frontier model generate your training data. Provide a topic and instructions describing what you want, then choose:
- Task Type — Instruction Following, Multi-turn Chat, Chain of Thought, or Code Generation
- Teacher Model — GPT-5, Claude Opus 4.5, Claude Sonnet 4.5, o3, and others
- Sample Count — How many examples to generate (100 to 100,000)
Neurosymbolic Generation
The most precise way to generate training data. Neurosymbolic generation combines AI generation with rule-based verification — every sample is checked against your rules before inclusion.
You define three things:
1. Domain & Prompts
Your domain (e.g., "hospital_customer_service"), a system prompt defining the AI's role, and a user template with variables for diverse scenarios.
2. Verification Rules
Rules every sample must pass: required elements, forbidden phrases, regex patterns, length constraints, JSON structure requirements, and factual accuracy checks.
3. Knowledge Base
A set of facts (visiting hours, phone numbers, policies) that the AI must reference accurately. Each entry has a key, value, and optional aliases.
Samples that fail verification are automatically regenerated with feedback. The platform also supports diversity settings — rotating through different scenarios, customer personas, conversation depths, and temperature levels to ensure variety.
For even higher quality, attach verification models (cops) to your generation run. See the Verification Models section below.
CSV to Q&A
Upload a CSV file and the platform converts each row into conversational question-and-answer pairs. This is ideal for turning structured data (product catalogs, FAQs, knowledge bases) into training data.
Configure the number of Q&A pairs per row and provide optional system context to guide the conversion style.
Data Augmentation
Expand an existing dataset by applying transformations. Select a source dataset and choose from six augmentation types:
Set a multiplier (e.g., 2x) to control how many new samples are generated per original example.
Training Models
ANRAK supports five training methods. All use LoRA (Low-Rank Adaptation) for efficient fine-tuning — you get a custom model without the cost of full-parameter training.
Training Types
| Type | What It Does | Best For |
|---|---|---|
| SFT | Trains on input-output pairs with cross-entropy loss | Instruction tuning, format learning, domain adaptation |
| DPO | Learns from preference pairs (chosen vs. rejected responses) | Aligning with quality preferences, safety training |
| RL | Optimizes against a reward function using policy gradients | Math, code, verifiable tasks, custom business metrics |
| RLHF | Full pipeline: SFT, then reward model training, then RL refinement | Subjective quality (helpfulness, tone, style) |
| Distillation | Generates data from a teacher model, then trains a smaller student | Knowledge transfer, eliminating expensive system prompts |
Choosing Base Models
Select a base model to fine-tune. Supported families:
Hyperparameters
Key settings you can configure for each training run:
- Epochs — Number of full passes through the dataset (1-10)
- Batch Size — Samples per training step (1-32)
- Learning Rate — Step size for weight updates (typical: 2e-5)
- LoRA Rank — Dimensionality of the adapter (higher = more capacity, more compute)
- Loss Function — Cross Entropy (default), PPO, CISPO, DRO, or Importance Sampling
The platform provides sensible defaults. Adjust only if you have specific requirements.
Monitoring Training
Track your training job in real time:
- Live loss curve and step progress
- Training logs with warnings and errors
- Automatic checkpoints saved during training
- Inline evaluation benchmarks (IFEval, MMLU, GSM8K, HellaSwag, ARC)
- Training duration and cost tracking
Deploying Models
Deploy your fine-tuned model as a production API with one click. The endpoint is OpenAI-compatible, so existing client code works by just changing the base URL.
Deploy a Model
Navigate to the Deploypage and click "Deploy Model." Select your trained model from the dropdown. The platform provisions a serverless GPU endpoint that scales automatically.
API Keys
Create API keys for each deployment. Keys are shown only once at creation — store them securely. You can create multiple keys per deployment and revoke them individually.
Inference Endpoint
Send requests to your deployed model using standard HTTP:
curl -X POST "https://api.anrak.ai/api/v1/deployments/inference" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Your prompt here..."}
],
"max_tokens": 1024,
"temperature": 0.7
}'import requests
response = requests.post(
"https://api.anrak.ai/api/v1/deployments/inference",
headers={
"X-API-Key": "YOUR_API_KEY",
"Content-Type": "application/json"
},
json={
"messages": [
{"role": "user", "content": "Your prompt here..."}
],
"max_tokens": 1024,
"temperature": 0.7
}
)
print(response.json()["content"])The response follows the OpenAI chat completions format. You can also use the /v1/chat/completions endpoint with the OpenAI Python SDK by setting base_url="https://api.anrak.ai".
Verification Models
Verification models — called "cops" — are ANRAK's approach to neurosymbolic AI. They are small, specialized models that verify your primary model's outputs in real time, catching hallucinations, rule violations, and inconsistencies that regex rules miss.
What Are Cops?
A cop is a small fine-tuned model (typically 500M to 7B parameters) trained for a single verification task. Think of it as a quality control inspector that checks every response your primary model generates.
Why small models? Because they're:
- Fast — 50-150ms per check, running in parallel
- Reliable — Trained on one narrow task, they're consistent where large models aren't
- Semantic — Unlike regex rules, cops understand meaning: "9 AM to 8 PM" and "9:00 AM to 8:00 PM" are the same fact
- Independent — They can't be tricked by the primary model because they're separate systems
Training a Cop
Creating a cop follows the same training flow as any other model, with three additional steps:
Prepare verification training data
Create a dataset where each example is a response paired with a judgment. For a grounding cop, examples would include responses labeled as "grounded" (factually supported by context) or "hallucinated" (contains unsupported claims).
{"messages": [
{"role": "system", "content": "You check if responses are grounded in the provided context. Output JSON."},
{"role": "user", "content": "Context: Visiting hours are 9 AM to 8 PM.\nResponse: Our visiting hours are from nine to eight."},
{"role": "assistant", "content": "{\"pass\": true, \"reason\": \"Hours match the context\"}"}
]}
{"messages": [
{"role": "system", "content": "You check if responses are grounded in the provided context. Output JSON."},
{"role": "user", "content": "Context: Visiting hours are 9 AM to 8 PM.\nResponse: We offer free valet parking."},
{"role": "assistant", "content": "{\"pass\": false, \"reason\": \"Parking info not in context\"}"}
]}Train a small model
Use SFT with a small base model (Llama 3.2 1B or 3B recommended). Cops don't need to be large — their power comes from specialization, not size. Training is fast and inexpensive.
Set the model role to "cop"
After training, go to your model's detail page, open the COPStab, and set the role to "Cop" with the appropriate cop type (e.g., Grounding). This marks the model as a verification agent.
Attaching Cops
Once you have a trained cop, attach it to any primary model:
- Go to your primary model's detail page
- Open the COPS tab
- Click Attach Cop
- Select the cop model from the dropdown
- Choose when it runs (generation, inference, or both)
- Set the severity level (critical, error, or warning)
You can attach multiple cops to the same model. They run in parallel, so adding more cops doesn't significantly increase latency. Each cop can be toggled on/off individually.
Cop Types
ANRAK supports seven cop types, each specialized for a different verification task:
Grounding
Ensures every factual claim is traceable to the provided context or knowledge base. Catches hallucinated facts.
Domain Constraint
Validates responses stay within domain boundaries. Catches medical advice from a receptionist, legal opinions from a chatbot, etc.
Consistency
Checks for contradictions with prior conversation turns or the system prompt. Catches "We close at 5" followed by "Open until 8."
Reasoning
Validates that logical reasoning chains support the conclusion. Catches flawed step-by-step reasoning.
Tool Use
Verifies the model actually called the tools it claims and that responses match tool outputs.
Instruction
Checks that responses follow all system prompt constraints: formatting, tone, length limits, behavioral rules.
Custom
User-defined verification logic for domain-specific checks not covered by the built-in types.
Generation vs. Inference
Cops can run at two points in the lifecycle:
At Generation Time
When generating training datasets, cops verify each sample before it's included. Failed samples are regenerated with the cop's feedback. This ensures your training data is clean before it ever enters the fine-tuning pipeline.
At Inference Time
In production, cops check every response before it reaches the end user. If a cop rejects a response, the model automatically regenerates with the cop's critique. If it still fails after retries, a safe fallback response is returned.
Both
The recommended setting. The same cop guards both training data quality and production behavior, providing consistent verification across the entire lifecycle.
Severity Levels
Each cop attachment has a severity that controls what happens when it flags an issue:
| Severity | At Generation | At Inference |
|---|---|---|
| Critical | Sample is rejected permanently | Response blocked, safe fallback returned |
| Error | Sample is regenerated with feedback | Response regenerated with cop critique (up to 2 retries) |
| Warning | Sample is included, issue logged | Response served, issue logged for review |
Running Models Locally
Download your fine-tuned model as a GGUF file and run it locally with Ollama. No API calls, no latency, complete privacy.
GGUF Export
GGUF is the standard format for running models locally. ANRAK automatically converts your fine-tuned model and offers three quantization levels:
| Quantization | Quality | RAM Required |
|---|---|---|
| Q4_K_M | Good — best balance of size and quality | ~6 GB |
| Q5_K_M | Better — higher quality, moderate size | ~8 GB |
| Q8_0 | Best — near-original quality, largest | ~12 GB |
Go to your model's Run Locally tab to export and download.
Ollama Setup
Three steps to run your model locally:
# 1. Install Ollama curl -fsSL https://ollama.com/install.sh | sh # 2. Create the model from your downloaded GGUF ollama create my-model -f Modelfile # 3. Run it ollama run my-model
Once running, Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1/chat/completions — any OpenAI client library works out of the box.
MCP Server
ANRAK provides a Model Context Protocol (MCP) server that lets AI coding assistants manage your datasets, training jobs, and models directly from your IDE.
Overview
With MCP, you can ask your AI assistant to "create a dataset," "start a training job," or "deploy my model" — and it calls the ANRAK API on your behalf. No switching between the dashboard and your editor.
Setup
Generate a platform API key from Settings > API Keys in the ANRAK dashboard. Then add the MCP server to your AI client:
{
"mcpServers": {
"anrak-ai": {
"url": "https://anrak.ai/mcp",
"headers": {
"Authorization": "Bearer anrak_pk_YOUR_API_KEY"
}
}
}
}{
"mcpServers": {
"anrak-ai": {
"serverUrl": "https://anrak.ai/mcp",
"headers": {
"Authorization": "Bearer anrak_pk_YOUR_API_KEY"
}
}
}
}{
"mcpServers": {
"anrak-ai": {
"command": "npx",
"args": ["-y", "@anrak/mcp-server"],
"env": {
"ANRAK_API_KEY": "anrak_pk_YOUR_API_KEY"
}
}
}
}Available Tools
The MCP server provides 50+ tools organized into categories:
API Reference
All platform functionality is available through the REST API. Authenticated with either a platform API key or a deployment-specific API key.
Authentication
Two authentication methods:
Platform API Key
For managing resources (datasets, training, models). Prefix: anrak_pk_. Pass as Authorization: Bearer anrak_pk_...
Deployment API Key
For inference on deployed models. Prefix: anrak_sk_. Pass as X-API-Key: anrak_sk_...
Endpoints
Base URL: https://api.anrak.ai
| Path | Description |
|---|---|
| /api/v1/datasets | Create, list, upload, generate datasets |
| /api/v1/training | Start, monitor, and manage training jobs |
| /api/v1/models | Manage models, cops, GGUF exports |
| /api/v1/deployments | Deploy models, manage API keys |
| /v1/chat/completions | OpenAI-compatible inference endpoint |
| /api/v1/evaluations | Run and view model evaluations |
Code Examples
Start a Training Job
import requests
response = requests.post(
"https://api.anrak.ai/api/v1/training",
headers={
"Authorization": "Bearer anrak_pk_YOUR_KEY",
"Content-Type": "application/json"
},
json={
"name": "my-customer-service-model",
"base_model": "meta-llama/Llama-3.2-3B",
"dataset_id": "YOUR_DATASET_ID",
"training_type": "sft",
"config": {
"epochs": 3,
"learning_rate": "2e-5",
"batch_size": 8
}
}
)
print(response.json())Chat with a Deployed Model
from openai import OpenAI
client = OpenAI(
base_url="https://api.anrak.ai",
api_key="anrak_pk_YOUR_KEY"
)
response = client.chat.completions.create(
model="my-customer-service-model",
messages=[
{"role": "user", "content": "When are your visiting hours?"}
]
)
print(response.choices[0].message.content)Ready to get started?
Create your account and start training custom AI models in minutes.