Documentation

Build, train, and deploy AI models with confidence

Everything you need to create datasets, fine-tune language models, deploy production APIs, and set up verification models that guarantee your AI behaves correctly.

Getting Started

ANRAK AI is an enterprise platform for fine-tuning large language models. You bring your data, choose a base model, and train a custom AI that understands your domain — then deploy it as an API or run it locally.

Dashboard Overview

After signing in, your dashboard shows an overview of your organization:

Datasets — Training data you've uploaded or generated
Training Jobs — Fine-tuning runs with real-time progress
Models — Your trained models, ready to deploy or download
Deploy — Production API endpoints with key management
Inference Playground — Test any model interactively
Evaluations — Benchmark your models against standard tests

Credits

ANRAK uses a credit-based system. Credits are consumed for dataset generation, training compute, and inference requests. Your remaining balance is always visible on the dashboard. See the Pricing page for details.

Creating Datasets

ANRAK offers five ways to create training datasets. You can upload your own data, generate it with AI, use neurosymbolic verification for high-precision generation, convert CSVs into conversational data, or augment existing datasets.

Upload a Dataset

Upload your own training data in JSONL, CSV, or Parquet format. For supervised fine-tuning, use the standard chat format with a messages array:

JSONL

{"messages": [
  {"role": "system", "content": "You are a helpful medical receptionist."},
  {"role": "user", "content": "What are your visiting hours?"},
  {"role": "assistant", "content": "Our visiting hours are 9 AM to 8 PM daily."}
]}

For preference training (DPO), include chosen and rejected response pairs. Maximum file size is 100 MB.

AI Generation

Let a frontier model generate your training data. Provide a topic and instructions describing what you want, then choose:

Task Type — Instruction Following, Multi-turn Chat, Chain of Thought, or Code Generation
Teacher Model — GPT-5, Claude Opus 4.5, Claude Sonnet 4.5, o3, and others
Sample Count — How many examples to generate (100 to 100,000)

Tip

Start with a small batch (100-500 samples), review the quality, then scale up. You can always generate more later.

Neurosymbolic Generation

The most precise way to generate training data. Neurosymbolic generation combines AI generation with rule-based verification — every sample is checked against your rules before inclusion.

You define three things:

1. Domain & Prompts

Your domain (e.g., "hospital_customer_service"), a system prompt defining the AI's role, and a user template with variables for diverse scenarios.

2. Verification Rules

Rules every sample must pass: required elements, forbidden phrases, regex patterns, length constraints, JSON structure requirements, and factual accuracy checks.

3. Knowledge Base

A set of facts (visiting hours, phone numbers, policies) that the AI must reference accurately. Each entry has a key, value, and optional aliases.

Samples that fail verification are automatically regenerated with feedback. The platform also supports diversity settings — rotating through different scenarios, customer personas, conversation depths, and temperature levels to ensure variety.

Context Import

You can upload a document (PDF, TXT, JSON) or paste text, and the platform will analyze it to auto-fill your domain, rules, knowledge base, and scenarios. This saves significant setup time.

For even higher quality, attach verification models (cops) to your generation run. See the Verification Models section below.

CSV to Q&A

Upload a CSV file and the platform converts each row into conversational question-and-answer pairs. This is ideal for turning structured data (product catalogs, FAQs, knowledge bases) into training data.

Configure the number of Q&A pairs per row and provide optional system context to guide the conversion style.

Data Augmentation

Expand an existing dataset by applying transformations. Select a source dataset and choose from six augmentation types:

Paraphrase

Rewrite examples with different wording

Translate

Convert to other languages

Rephrase Formal

Make language more professional

Rephrase Casual

Make language more conversational

Add Variations

Create alternative versions

Expand

Add detail and depth to examples

Set a multiplier (e.g., 2x) to control how many new samples are generated per original example.

Training Models

ANRAK supports five training methods. All use LoRA (Low-Rank Adaptation) for efficient fine-tuning — you get a custom model without the cost of full-parameter training.

Training Types

Type	What It Does	Best For
SFT	Trains on input-output pairs with cross-entropy loss	Instruction tuning, format learning, domain adaptation
DPO	Learns from preference pairs (chosen vs. rejected responses)	Aligning with quality preferences, safety training
RL	Optimizes against a reward function using policy gradients	Math, code, verifiable tasks, custom business metrics
RLHF	Full pipeline: SFT, then reward model training, then RL refinement	Subjective quality (helpfulness, tone, style)
Distillation	Generates data from a teacher model, then trains a smaller student	Knowledge transfer, eliminating expensive system prompts

Tip

Start with SFT for most use cases. It's the simplest, fastest, and works well with as few as 100 high-quality examples. Move to DPO or RL once you have preference data or a reward signal.

Choosing Base Models

Select a base model to fine-tune. Supported families:

Llama 3.1 / 3.2 / 3.3

1B, 3B, 8B, 70B

Qwen3

4B, 8B, 30B, 32B, 235B (MoE)

DeepSeek V3.1

MoE architecture

GPT OSS

20B, 120B (MoE)

Kimi K2

MoE, reasoning-optimized

Note

Smaller models (1B-8B) train faster and are cheaper. Larger models (30B+) have more capacity but require more data and compute. For verification models (cops), we recommend 1B-3B.

Hyperparameters

Key settings you can configure for each training run:

Epochs — Number of full passes through the dataset (1-10)
Batch Size — Samples per training step (1-32)
Learning Rate — Step size for weight updates (typical: 2e-5)
LoRA Rank — Dimensionality of the adapter (higher = more capacity, more compute)
Loss Function — Cross Entropy (default), PPO, CISPO, DRO, or Importance Sampling

The platform provides sensible defaults. Adjust only if you have specific requirements.

Monitoring Training

Track your training job in real time:

Live loss curve and step progress
Training logs with warnings and errors
Automatic checkpoints saved during training
Inline evaluation benchmarks (IFEval, MMLU, GSM8K, HellaSwag, ARC)
Training duration and cost tracking

Deploying Models

Deploy your fine-tuned model as a production API with one click. The endpoint is OpenAI-compatible, so existing client code works by just changing the base URL.

Deploy a Model

Navigate to the Deploypage and click "Deploy Model." Select your trained model from the dropdown. The platform provisions a serverless GPU endpoint that scales automatically.

Note

Models not eligible for direct deployment can be downloaded as LoRA adapters or GGUF files and hosted on your own infrastructure.

API Keys

Create API keys for each deployment. Keys are shown only once at creation — store them securely. You can create multiple keys per deployment and revoke them individually.

Inference Endpoint

Send requests to your deployed model using standard HTTP:

cURL

curl -X POST "https://api.anrak.ai/api/v1/deployments/inference" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Your prompt here..."}
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'

Python

import requests

response = requests.post(
    "https://api.anrak.ai/api/v1/deployments/inference",
    headers={
        "X-API-Key": "YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "messages": [
            {"role": "user", "content": "Your prompt here..."}
        ],
        "max_tokens": 1024,
        "temperature": 0.7
    }
)

print(response.json()["content"])

The response follows the OpenAI chat completions format. You can also use the /v1/chat/completions endpoint with the OpenAI Python SDK by setting base_url="https://api.anrak.ai".

Verification Models

Verification models — called "cops" — are ANRAK's approach to neurosymbolic AI. They are small, specialized models that verify your primary model's outputs in real time, catching hallucinations, rule violations, and inconsistencies that regex rules miss.

What Are Cops?

A cop is a small fine-tuned model (typically 500M to 7B parameters) trained for a single verification task. Think of it as a quality control inspector that checks every response your primary model generates.

Why small models? Because they're:

Fast — 50-150ms per check, running in parallel
Reliable — Trained on one narrow task, they're consistent where large models aren't
Semantic — Unlike regex rules, cops understand meaning: "9 AM to 8 PM" and "9:00 AM to 8:00 PM" are the same fact
Independent — They can't be tricked by the primary model because they're separate systems

How it works

Primary Model

generates response

→

Cop Squad

verifies in parallel

→

Verified Response

served to user

Training a Cop

Creating a cop follows the same training flow as any other model, with three additional steps:

Prepare verification training data

Create a dataset where each example is a response paired with a judgment. For a grounding cop, examples would include responses labeled as "grounded" (factually supported by context) or "hallucinated" (contains unsupported claims).

JSONL

{"messages": [
  {"role": "system", "content": "You check if responses are grounded in the provided context. Output JSON."},
  {"role": "user", "content": "Context: Visiting hours are 9 AM to 8 PM.\nResponse: Our visiting hours are from nine to eight."},
  {"role": "assistant", "content": "{\"pass\": true, \"reason\": \"Hours match the context\"}"}
]}
{"messages": [
  {"role": "system", "content": "You check if responses are grounded in the provided context. Output JSON."},
  {"role": "user", "content": "Context: Visiting hours are 9 AM to 8 PM.\nResponse: We offer free valet parking."},
  {"role": "assistant", "content": "{\"pass\": false, \"reason\": \"Parking info not in context\"}"}
]}

Train a small model

Use SFT with a small base model (Llama 3.2 1B or 3B recommended). Cops don't need to be large — their power comes from specialization, not size. Training is fast and inexpensive.

Set the model role to "cop"

After training, go to your model's detail page, open the COPStab, and set the role to "Cop" with the appropriate cop type (e.g., Grounding). This marks the model as a verification agent.

Tip

You can generate cop training data by taking known-good responses, systematically corrupting them (changing facts, adding hallucinations), and labeling both versions. Start with 500-1000 examples.

Attaching Cops

Once you have a trained cop, attach it to any primary model:

Go to your primary model's detail page
Open the COPS tab
Click Attach Cop
Select the cop model from the dropdown
Choose when it runs (generation, inference, or both)
Set the severity level (critical, error, or warning)

You can attach multiple cops to the same model. They run in parallel, so adding more cops doesn't significantly increase latency. Each cop can be toggled on/off individually.

Cop Types

ANRAK supports seven cop types, each specialized for a different verification task:

Grounding

Ensures every factual claim is traceable to the provided context or knowledge base. Catches hallucinated facts.

Domain Constraint

Validates responses stay within domain boundaries. Catches medical advice from a receptionist, legal opinions from a chatbot, etc.

Consistency

Checks for contradictions with prior conversation turns or the system prompt. Catches "We close at 5" followed by "Open until 8."

Reasoning

Validates that logical reasoning chains support the conclusion. Catches flawed step-by-step reasoning.

Tool Use

Verifies the model actually called the tools it claims and that responses match tool outputs.

Instruction

Checks that responses follow all system prompt constraints: formatting, tone, length limits, behavioral rules.

Custom

User-defined verification logic for domain-specific checks not covered by the built-in types.

Generation vs. Inference

Cops can run at two points in the lifecycle:

At Generation Time

When generating training datasets, cops verify each sample before it's included. Failed samples are regenerated with the cop's feedback. This ensures your training data is clean before it ever enters the fine-tuning pipeline.

At Inference Time

In production, cops check every response before it reaches the end user. If a cop rejects a response, the model automatically regenerates with the cop's critique. If it still fails after retries, a safe fallback response is returned.

Both

The recommended setting. The same cop guards both training data quality and production behavior, providing consistent verification across the entire lifecycle.

Severity Levels

Each cop attachment has a severity that controls what happens when it flags an issue:

Severity	At Generation	At Inference
Critical	Sample is rejected permanently	Response blocked, safe fallback returned
Error	Sample is regenerated with feedback	Response regenerated with cop critique (up to 2 retries)
Warning	Sample is included, issue logged	Response served, issue logged for review

Running Models Locally

Download your fine-tuned model as a GGUF file and run it locally with Ollama. No API calls, no latency, complete privacy.

GGUF Export

GGUF is the standard format for running models locally. ANRAK automatically converts your fine-tuned model and offers three quantization levels:

Quantization	Quality	RAM Required
Q4_K_M	Good — best balance of size and quality	~6 GB
Q5_K_M	Better — higher quality, moderate size	~8 GB
Q8_0	Best — near-original quality, largest	~12 GB

Go to your model's Run Locally tab to export and download.

Ollama Setup

Three steps to run your model locally:

Terminal

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Create the model from your downloaded GGUF
ollama create my-model -f Modelfile

# 3. Run it
ollama run my-model

Once running, Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1/chat/completions — any OpenAI client library works out of the box.

MCP Server

ANRAK provides a Model Context Protocol (MCP) server that lets AI coding assistants manage your datasets, training jobs, and models directly from your IDE.

Overview

With MCP, you can ask your AI assistant to "create a dataset," "start a training job," or "deploy my model" — and it calls the ANRAK API on your behalf. No switching between the dashboard and your editor.

Setup

Generate a platform API key from Settings > API Keys in the ANRAK dashboard. Then add the MCP server to your AI client:

Claude Code / Codex

{
  "mcpServers": {
    "anrak-ai": {
      "url": "https://anrak.ai/mcp",
      "headers": {
        "Authorization": "Bearer anrak_pk_YOUR_API_KEY"
      }
    }
  }
}

Cursor / Windsurf

{
  "mcpServers": {
    "anrak-ai": {
      "serverUrl": "https://anrak.ai/mcp",
      "headers": {
        "Authorization": "Bearer anrak_pk_YOUR_API_KEY"
      }
    }
  }
}

Claude Desktop

{
  "mcpServers": {
    "anrak-ai": {
      "command": "npx",
      "args": ["-y", "@anrak/mcp-server"],
      "env": {
        "ANRAK_API_KEY": "anrak_pk_YOUR_API_KEY"
      }
    }
  }
}

Available Tools

The MCP server provides 50+ tools organized into categories:

Datasets

11 tools

Create, upload, generate, augment, preview, delete

Training

10 tools

Create jobs, monitor, pause, resume, get checkpoints

Models

15 tools

List, deploy, download, publish, manage cops

Deployments

5 tools

Manage endpoints, API keys, usage stats

Inference

2 tools

Chat with any model, list available models

Evaluations

4 tools

Create benchmarks, view results

Usage

5 tools

Credits, costs, token usage, pricing

API Reference

All platform functionality is available through the REST API. Authenticated with either a platform API key or a deployment-specific API key.

Authentication

Two authentication methods:

Platform API Key

For managing resources (datasets, training, models). Prefix: anrak_pk_. Pass as Authorization: Bearer anrak_pk_...

Deployment API Key

For inference on deployed models. Prefix: anrak_sk_. Pass as X-API-Key: anrak_sk_...

Endpoints

Base URL: https://api.anrak.ai

Path	Description
/api/v1/datasets	Create, list, upload, generate datasets
/api/v1/training	Start, monitor, and manage training jobs
/api/v1/models	Manage models, cops, GGUF exports
/api/v1/deployments	Deploy models, manage API keys
/v1/chat/completions	OpenAI-compatible inference endpoint
/api/v1/evaluations	Run and view model evaluations

Code Examples

Start a Training Job

Python

import requests

response = requests.post(
    "https://api.anrak.ai/api/v1/training",
    headers={
        "Authorization": "Bearer anrak_pk_YOUR_KEY",
        "Content-Type": "application/json"
    },
    json={
        "name": "my-customer-service-model",
        "base_model": "meta-llama/Llama-3.2-3B",
        "dataset_id": "YOUR_DATASET_ID",
        "training_type": "sft",
        "config": {
            "epochs": 3,
            "learning_rate": "2e-5",
            "batch_size": 8
        }
    }
)

print(response.json())

Chat with a Deployed Model

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.anrak.ai",
    api_key="anrak_pk_YOUR_KEY"
)

response = client.chat.completions.create(
    model="my-customer-service-model",
    messages=[
        {"role": "user", "content": "When are your visiting hours?"}
    ]
)

print(response.choices[0].message.content)

↑ Back to top

Ready to get started?

Create your account and start training custom AI models in minutes.

Get Started Free View Pricing

Build, train, and deploy AI models with confidence

Getting Started

Sign Up

Dashboard Overview

Credits

Creating Datasets

Upload a Dataset

AI Generation

Neurosymbolic Generation

1. Domain & Prompts

2. Verification Rules

3. Knowledge Base

CSV to Q&A

Data Augmentation

Training Models

Training Types

Choosing Base Models

Hyperparameters

Monitoring Training

Deploying Models

Deploy a Model

API Keys

Inference Endpoint

Verification Models

What Are Cops?

Training a Cop

Prepare verification training data

Train a small model

Set the model role to "cop"

Attaching Cops

Cop Types

Grounding

Domain Constraint

Consistency

Reasoning

Tool Use

Instruction

Custom

Generation vs. Inference

At Generation Time

At Inference Time

Both

Severity Levels

Running Models Locally

GGUF Export

Ollama Setup

MCP Server

Overview

Setup

Available Tools

API Reference

Authentication

Platform API Key

Deployment API Key

Endpoints

Code Examples

Start a Training Job

Chat with a Deployed Model

Ready to get started?