SLM Reasoning Layers
A new architecture for trustworthy AI — small language models as deterministic verification agents for large model systems.
Kapil Chandwani, ANRAK AI · March 2026
Large Language Models hallucinate. They fabricate facts, ignore constraints, and confidently present fiction as truth. Current approaches — prompt engineering, RLHF, constitutional AI, RAG — all ask the model to police itself. This is fundamentally flawed.
We propose Small Language Models (500M–7B parameters) deployed as independent, specialized verification layers that monitor larger models in real time. These SLMs are trained on narrow tasks with high reliability, incapable of the creative deception that makes large models untrustworthy, and fast enough to operate at inference time.
This is a new class of neurosymbolic AI where the “symbolic” layer is itself a neural network — but one small enough and deterministic enough to function as a reliable reasoning primitive.
LLMs lie in five predictable ways
These are not edge cases. They are systematic, emergent properties of how large neural networks process and generate language.
Tool Use Fabrication
Claims to have used a tool or searched a database when it hasn't
Source Attribution
"Based on the document..." — then generates content not in the source
Retroactive Reasoning
Generates the answer first, then constructs reasoning to justify it
Confident Uncertainty
Presents uncertain info with the same conviction as known facts
Constraint Performance
Performs compliance rather than achieving it — finds creative workarounds
The SLM Cop Framework
Multiple small models run in parallel between your primary model and the end user. Each checks one property. A deterministic verdict engine aggregates their outputs.
Less capability, more reliability
Small models are better judges because they lack the capacity for creative deception. They can't construct elaborate lies — they can only check the one thing they were trained on.
Can't lie convincingly
A 500M model lacks the representational capacity to construct multi-layered, contextually appropriate deceptions. It checks a property and reports.
No sycophancy training
Never trained on conversations or human preferences. Has no concept of "pleasing the user." Trained only on (input, judgment) pairs.
Resistant to prompt injection
Doesn't process input as "instructions" — processes it as features for classification. No instruction-following pathway to exploit.
Deterministic by specialization
All 500M parameters dedicated to one task. Converges to near-deterministic behavior — the same input produces the same output.
The Honesty Spectrum
Seven specialized verification agents
Each cop answers exactly one question. This specificity is what makes them reliable.
Grounding Cop
1B–3BIs every claim traceable to the provided context? Catches hallucinated facts, wrong attributions, subtle distortions.
Domain Constraint Cop
1B–3BDoes the response violate domain rules? Catches medical advice from receptionists, legal opinions from chatbots.
Consistency Cop
1B–3BDoes this contradict anything said before? Catches 'We close at 5' followed by 'Open until 8.'
Reasoning Cop
3B–7BDoes the chain-of-thought support the conclusion? Catches logical jumps and circular reasoning.
Tool Use Cop
500M–1BDid the model actually call the tools it claims? Does the response match tool outputs?
Instruction Cop
500M–1BDoes the response follow all system prompt constraints? Formatting, tone, length, behavior.
Custom Cop
AnyYour own verification logic for domain-specific checks not covered by built-in types.
Guards training data and production responses
At Generation Time
Cops verify each sample during dataset creation. Failed samples are regenerated with the cop's feedback — so your training data is clean before it enters the pipeline.
At Inference Time
In production, cops check every response before it reaches the user. If rejected, the model regenerates with the cop's critique. Critical failures return a safe fallback.
How SLM Cops compare
How cops connect to any LLM
The cop system is a verification loop — not a modification to the LLM itself. It works with any model, any API, any framework.
Call LLM
Claude, GPT, or your fine-tuned model generates a response
Cops verify
Small models check the response in parallel (~100ms total)
The cop is just another LLM call
A cop model is a small language model (1B-3B parameters) running on any inference server — Ollama locally, a GPU server, or a cloud endpoint. You send it the primary model's response plus context, and it returns a JSON judgment. That's it.
response = call_primary_llm(messages) # Claude, GPT, your model
for cop in cop_squad:
judgment = call_cop_model(cop, response, context)
if not judgment["pass"]:
messages += [
{"role": "assistant", "content": response},
{"role": "user", "content": f"Rejected: {judgment['reason']}. Regenerate."}
]
response = call_primary_llm(messages) # retry with feedback
return responseWhere cops run
Build it with any tool
The orchestration is simple. The value is in the trained cop models — small models fine-tuned to reliably detect hallucinations, rule violations, and inconsistencies in your specific domain.
n8n / Make
Webhook → HTTP node (LLM) → HTTP nodes (cops in parallel) → IF node (verdict) → loop on failure. Visual, no code.
LangChain / CrewAI
Primary agent generates, cop agents verify. Orchestrator manages the feedback loop. Each cop is a tool or agent.
Raw Python / cURL
40 lines of code. Call LLM API, call cop API, check JSON, loop. Works in any language, any framework.
“A genius who sometimes lies needs a simple, honest cop. Not a smarter genius.”
The path to trustworthy AI runs through building smaller, less capable models and ensuring they behave — then using them to police the large ones.
Build trustworthy AI today
Train your own verification models on ANRAK AI. Your domain expertise becomes an executable guardrail.