Documentation

Build, train, and deploy AI models with confidence

Everything you need to create datasets, fine-tune language models, deploy production APIs, and set up verification models that guarantee your AI behaves correctly.

Getting Started

ANRAK AI is an enterprise platform for fine-tuning large language models. You bring your data, choose a base model, and train a custom AI that understands your domain — then deploy it as an API or run it locally.

Sign Up

Create an account at anrak.ai/auth/signup. Every new account receives free credits to get started. No credit card required.

Dashboard Overview

After signing in, your dashboard shows an overview of your organization:

  • Datasets — Training data you've uploaded or generated
  • Training Jobs — Fine-tuning runs with real-time progress
  • Models — Your trained models, ready to deploy or download
  • Deploy — Production API endpoints with key management
  • Inference Playground — Test any model interactively
  • Evaluations — Benchmark your models against standard tests

Credits

ANRAK uses a credit-based system. Credits are consumed for dataset generation, training compute, and inference requests. Your remaining balance is always visible on the dashboard. See the Pricing page for details.

Creating Datasets

ANRAK offers five ways to create training datasets. You can upload your own data, generate it with AI, use neurosymbolic verification for high-precision generation, convert CSVs into conversational data, or augment existing datasets.

Upload a Dataset

Upload your own training data in JSONL, CSV, or Parquet format. For supervised fine-tuning, use the standard chat format with a messages array:

JSONL
{"messages": [
  {"role": "system", "content": "You are a helpful medical receptionist."},
  {"role": "user", "content": "What are your visiting hours?"},
  {"role": "assistant", "content": "Our visiting hours are 9 AM to 8 PM daily."}
]}

For preference training (DPO), include chosen and rejected response pairs. Maximum file size is 100 MB.

AI Generation

Let a frontier model generate your training data. Provide a topic and instructions describing what you want, then choose:

  • Task Type — Instruction Following, Multi-turn Chat, Chain of Thought, or Code Generation
  • Teacher Model — GPT-5, Claude Opus 4.5, Claude Sonnet 4.5, o3, and others
  • Sample Count — How many examples to generate (100 to 100,000)
Tip
Start with a small batch (100-500 samples), review the quality, then scale up. You can always generate more later.

Neurosymbolic Generation

The most precise way to generate training data. Neurosymbolic generation combines AI generation with rule-based verification — every sample is checked against your rules before inclusion.

You define three things:

1. Domain & Prompts

Your domain (e.g., "hospital_customer_service"), a system prompt defining the AI's role, and a user template with variables for diverse scenarios.

2. Verification Rules

Rules every sample must pass: required elements, forbidden phrases, regex patterns, length constraints, JSON structure requirements, and factual accuracy checks.

3. Knowledge Base

A set of facts (visiting hours, phone numbers, policies) that the AI must reference accurately. Each entry has a key, value, and optional aliases.

Samples that fail verification are automatically regenerated with feedback. The platform also supports diversity settings — rotating through different scenarios, customer personas, conversation depths, and temperature levels to ensure variety.

Context Import
You can upload a document (PDF, TXT, JSON) or paste text, and the platform will analyze it to auto-fill your domain, rules, knowledge base, and scenarios. This saves significant setup time.

For even higher quality, attach verification models (cops) to your generation run. See the Verification Models section below.

CSV to Q&A

Upload a CSV file and the platform converts each row into conversational question-and-answer pairs. This is ideal for turning structured data (product catalogs, FAQs, knowledge bases) into training data.

Configure the number of Q&A pairs per row and provide optional system context to guide the conversion style.

Data Augmentation

Expand an existing dataset by applying transformations. Select a source dataset and choose from six augmentation types:

Paraphrase
Rewrite examples with different wording
Translate
Convert to other languages
Rephrase Formal
Make language more professional
Rephrase Casual
Make language more conversational
Add Variations
Create alternative versions
Expand
Add detail and depth to examples

Set a multiplier (e.g., 2x) to control how many new samples are generated per original example.

Training Models

ANRAK supports five training methods. All use LoRA (Low-Rank Adaptation) for efficient fine-tuning — you get a custom model without the cost of full-parameter training.

Training Types

TypeWhat It DoesBest For
SFTTrains on input-output pairs with cross-entropy lossInstruction tuning, format learning, domain adaptation
DPOLearns from preference pairs (chosen vs. rejected responses)Aligning with quality preferences, safety training
RLOptimizes against a reward function using policy gradientsMath, code, verifiable tasks, custom business metrics
RLHFFull pipeline: SFT, then reward model training, then RL refinementSubjective quality (helpfulness, tone, style)
DistillationGenerates data from a teacher model, then trains a smaller studentKnowledge transfer, eliminating expensive system prompts
Tip
Start with SFT for most use cases. It's the simplest, fastest, and works well with as few as 100 high-quality examples. Move to DPO or RL once you have preference data or a reward signal.

Choosing Base Models

Select a base model to fine-tune. Supported families:

Llama 3.1 / 3.2 / 3.3
1B, 3B, 8B, 70B
Qwen3
4B, 8B, 30B, 32B, 235B (MoE)
DeepSeek V3.1
MoE architecture
GPT OSS
20B, 120B (MoE)
Kimi K2
MoE, reasoning-optimized
Note
Smaller models (1B-8B) train faster and are cheaper. Larger models (30B+) have more capacity but require more data and compute. For verification models (cops), we recommend 1B-3B.

Hyperparameters

Key settings you can configure for each training run:

  • Epochs — Number of full passes through the dataset (1-10)
  • Batch Size — Samples per training step (1-32)
  • Learning Rate — Step size for weight updates (typical: 2e-5)
  • LoRA Rank — Dimensionality of the adapter (higher = more capacity, more compute)
  • Loss Function — Cross Entropy (default), PPO, CISPO, DRO, or Importance Sampling

The platform provides sensible defaults. Adjust only if you have specific requirements.

Monitoring Training

Track your training job in real time:

  • Live loss curve and step progress
  • Training logs with warnings and errors
  • Automatic checkpoints saved during training
  • Inline evaluation benchmarks (IFEval, MMLU, GSM8K, HellaSwag, ARC)
  • Training duration and cost tracking

Deploying Models

Deploy your fine-tuned model as a production API with one click. The endpoint is OpenAI-compatible, so existing client code works by just changing the base URL.

Deploy a Model

Navigate to the Deploypage and click "Deploy Model." Select your trained model from the dropdown. The platform provisions a serverless GPU endpoint that scales automatically.

Note
Models not eligible for direct deployment can be downloaded as LoRA adapters or GGUF files and hosted on your own infrastructure.

API Keys

Create API keys for each deployment. Keys are shown only once at creation — store them securely. You can create multiple keys per deployment and revoke them individually.

Inference Endpoint

Send requests to your deployed model using standard HTTP:

cURL
curl -X POST "https://api.anrak.ai/api/v1/deployments/inference" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Your prompt here..."}
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'
Python
import requests

response = requests.post(
    "https://api.anrak.ai/api/v1/deployments/inference",
    headers={
        "X-API-Key": "YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "messages": [
            {"role": "user", "content": "Your prompt here..."}
        ],
        "max_tokens": 1024,
        "temperature": 0.7
    }
)

print(response.json()["content"])

The response follows the OpenAI chat completions format. You can also use the /v1/chat/completions endpoint with the OpenAI Python SDK by setting base_url="https://api.anrak.ai".

Verification Models

Verification models — called "cops" — are ANRAK's approach to neurosymbolic AI. They are small, specialized models that verify your primary model's outputs in real time, catching hallucinations, rule violations, and inconsistencies that regex rules miss.

What Are Cops?

A cop is a small fine-tuned model (typically 500M to 7B parameters) trained for a single verification task. Think of it as a quality control inspector that checks every response your primary model generates.

Why small models? Because they're:

  • Fast — 50-150ms per check, running in parallel
  • Reliable — Trained on one narrow task, they're consistent where large models aren't
  • Semantic — Unlike regex rules, cops understand meaning: "9 AM to 8 PM" and "9:00 AM to 8:00 PM" are the same fact
  • Independent — They can't be tricked by the primary model because they're separate systems
How it works
Primary Model
generates response
Cop Squad
verifies in parallel
Verified Response
served to user

Training a Cop

Creating a cop follows the same training flow as any other model, with three additional steps:

1

Prepare verification training data

Create a dataset where each example is a response paired with a judgment. For a grounding cop, examples would include responses labeled as "grounded" (factually supported by context) or "hallucinated" (contains unsupported claims).

JSONL
{"messages": [
  {"role": "system", "content": "You check if responses are grounded in the provided context. Output JSON."},
  {"role": "user", "content": "Context: Visiting hours are 9 AM to 8 PM.\nResponse: Our visiting hours are from nine to eight."},
  {"role": "assistant", "content": "{\"pass\": true, \"reason\": \"Hours match the context\"}"}
]}
{"messages": [
  {"role": "system", "content": "You check if responses are grounded in the provided context. Output JSON."},
  {"role": "user", "content": "Context: Visiting hours are 9 AM to 8 PM.\nResponse: We offer free valet parking."},
  {"role": "assistant", "content": "{\"pass\": false, \"reason\": \"Parking info not in context\"}"}
]}
2

Train a small model

Use SFT with a small base model (Llama 3.2 1B or 3B recommended). Cops don't need to be large — their power comes from specialization, not size. Training is fast and inexpensive.

3

Set the model role to "cop"

After training, go to your model's detail page, open the COPStab, and set the role to "Cop" with the appropriate cop type (e.g., Grounding). This marks the model as a verification agent.

Tip
You can generate cop training data by taking known-good responses, systematically corrupting them (changing facts, adding hallucinations), and labeling both versions. Start with 500-1000 examples.

Attaching Cops

Once you have a trained cop, attach it to any primary model:

  1. Go to your primary model's detail page
  2. Open the COPS tab
  3. Click Attach Cop
  4. Select the cop model from the dropdown
  5. Choose when it runs (generation, inference, or both)
  6. Set the severity level (critical, error, or warning)

You can attach multiple cops to the same model. They run in parallel, so adding more cops doesn't significantly increase latency. Each cop can be toggled on/off individually.

Cop Types

ANRAK supports seven cop types, each specialized for a different verification task:

Grounding

Ensures every factual claim is traceable to the provided context or knowledge base. Catches hallucinated facts.

Domain Constraint

Validates responses stay within domain boundaries. Catches medical advice from a receptionist, legal opinions from a chatbot, etc.

Consistency

Checks for contradictions with prior conversation turns or the system prompt. Catches "We close at 5" followed by "Open until 8."

Reasoning

Validates that logical reasoning chains support the conclusion. Catches flawed step-by-step reasoning.

Tool Use

Verifies the model actually called the tools it claims and that responses match tool outputs.

Instruction

Checks that responses follow all system prompt constraints: formatting, tone, length limits, behavioral rules.

Custom

User-defined verification logic for domain-specific checks not covered by the built-in types.

Generation vs. Inference

Cops can run at two points in the lifecycle:

At Generation Time

When generating training datasets, cops verify each sample before it's included. Failed samples are regenerated with the cop's feedback. This ensures your training data is clean before it ever enters the fine-tuning pipeline.

At Inference Time

In production, cops check every response before it reaches the end user. If a cop rejects a response, the model automatically regenerates with the cop's critique. If it still fails after retries, a safe fallback response is returned.

Both

The recommended setting. The same cop guards both training data quality and production behavior, providing consistent verification across the entire lifecycle.

Severity Levels

Each cop attachment has a severity that controls what happens when it flags an issue:

SeverityAt GenerationAt Inference
CriticalSample is rejected permanentlyResponse blocked, safe fallback returned
ErrorSample is regenerated with feedbackResponse regenerated with cop critique (up to 2 retries)
WarningSample is included, issue loggedResponse served, issue logged for review

Running Models Locally

Download your fine-tuned model as a GGUF file and run it locally with Ollama. No API calls, no latency, complete privacy.

GGUF Export

GGUF is the standard format for running models locally. ANRAK automatically converts your fine-tuned model and offers three quantization levels:

QuantizationQualityRAM Required
Q4_K_MGood — best balance of size and quality~6 GB
Q5_K_MBetter — higher quality, moderate size~8 GB
Q8_0Best — near-original quality, largest~12 GB

Go to your model's Run Locally tab to export and download.

Ollama Setup

Three steps to run your model locally:

Terminal
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Create the model from your downloaded GGUF
ollama create my-model -f Modelfile

# 3. Run it
ollama run my-model

Once running, Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1/chat/completions — any OpenAI client library works out of the box.

MCP Server

ANRAK provides a Model Context Protocol (MCP) server that lets AI coding assistants manage your datasets, training jobs, and models directly from your IDE.

Overview

With MCP, you can ask your AI assistant to "create a dataset," "start a training job," or "deploy my model" — and it calls the ANRAK API on your behalf. No switching between the dashboard and your editor.

Setup

Generate a platform API key from Settings > API Keys in the ANRAK dashboard. Then add the MCP server to your AI client:

Claude Code / Codex
{
  "mcpServers": {
    "anrak-ai": {
      "url": "https://anrak.ai/mcp",
      "headers": {
        "Authorization": "Bearer anrak_pk_YOUR_API_KEY"
      }
    }
  }
}
Cursor / Windsurf
{
  "mcpServers": {
    "anrak-ai": {
      "serverUrl": "https://anrak.ai/mcp",
      "headers": {
        "Authorization": "Bearer anrak_pk_YOUR_API_KEY"
      }
    }
  }
}
Claude Desktop
{
  "mcpServers": {
    "anrak-ai": {
      "command": "npx",
      "args": ["-y", "@anrak/mcp-server"],
      "env": {
        "ANRAK_API_KEY": "anrak_pk_YOUR_API_KEY"
      }
    }
  }
}

Available Tools

The MCP server provides 50+ tools organized into categories:

Datasets
11 tools
Create, upload, generate, augment, preview, delete
Training
10 tools
Create jobs, monitor, pause, resume, get checkpoints
Models
15 tools
List, deploy, download, publish, manage cops
Deployments
5 tools
Manage endpoints, API keys, usage stats
Inference
2 tools
Chat with any model, list available models
Evaluations
4 tools
Create benchmarks, view results
Usage
5 tools
Credits, costs, token usage, pricing

API Reference

All platform functionality is available through the REST API. Authenticated with either a platform API key or a deployment-specific API key.

Authentication

Two authentication methods:

Platform API Key

For managing resources (datasets, training, models). Prefix: anrak_pk_. Pass as Authorization: Bearer anrak_pk_...

Deployment API Key

For inference on deployed models. Prefix: anrak_sk_. Pass as X-API-Key: anrak_sk_...

Endpoints

Base URL: https://api.anrak.ai

PathDescription
/api/v1/datasetsCreate, list, upload, generate datasets
/api/v1/trainingStart, monitor, and manage training jobs
/api/v1/modelsManage models, cops, GGUF exports
/api/v1/deploymentsDeploy models, manage API keys
/v1/chat/completionsOpenAI-compatible inference endpoint
/api/v1/evaluationsRun and view model evaluations

Code Examples

Start a Training Job

Python
import requests

response = requests.post(
    "https://api.anrak.ai/api/v1/training",
    headers={
        "Authorization": "Bearer anrak_pk_YOUR_KEY",
        "Content-Type": "application/json"
    },
    json={
        "name": "my-customer-service-model",
        "base_model": "meta-llama/Llama-3.2-3B",
        "dataset_id": "YOUR_DATASET_ID",
        "training_type": "sft",
        "config": {
            "epochs": 3,
            "learning_rate": "2e-5",
            "batch_size": 8
        }
    }
)

print(response.json())

Chat with a Deployed Model

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.anrak.ai",
    api_key="anrak_pk_YOUR_KEY"
)

response = client.chat.completions.create(
    model="my-customer-service-model",
    messages=[
        {"role": "user", "content": "When are your visiting hours?"}
    ]
)

print(response.choices[0].message.content)

Ready to get started?

Create your account and start training custom AI models in minutes.