Introducing the Bolt Model Family

Summarize this article with:

By Jacob Renn May 4, 2026

Enterprise AI has a token problem. Most organizations are using large frontier models for work that does not require frontier-scale reasoning: document parsing, routing, retrieval, guardrails, classification, and structured extraction. The result is simple: every workflow costs more than it should.

Bolt was built to change that. The Bolt model family is a set of open-weight models designed for enterprise AI systems that need strong task performance without the cost, latency, and infrastructure burden of using the largest model for every request.

Instead of sending every task to a high-cost general model, Bolt helps enterprises use the right model for the right job: smaller, specialized models for routine work, larger models only when the task requires them.

That matters because production AI cost is driven by repeated, high-volume tasks. A single invoice, policy check, routing decision, or retrieval call may look cheap on its own. But at enterprise scale, millions of requests turn token waste into a serious operating cost.

Bolt is designed around a different principle: maximum accuracy per token, per dollar, and per workflow.

Built for real workflows: Bolt models are optimized for production use cases, not just aggregate benchmark performance.
Focused performance: Through targeted fine-tuning, Bolt models deliver strong results on domain-specific tasks while retaining general capabilities.
Reliable in practice:
- Bolt Instruct improves structured outputs, routing, and safety detection
- Bolt Embedding retrieves relevant context more consistently in real-world RAG scenarios, including against larger models
- Bolt Vision improves document understanding and structured extraction for business documents.
Efficient by design: These models are parameter-efficient and suitable for deployment in on-premise and resource-constrained environments.
Open and accessible: Released under the AI Squared Community License for evaluation and usage.

The takeaway:Bolt models help enterprises reduce token usage, lower AI operating cost, and run production workflows with models built for that specific task.

Introduction

Large frontier language models are powerful generalists. They can handle a wide range of tasks, and they are valuable when a workflow truly requires deep reasoning or broad language ability. But most enterprise AI work is not one-off reasoning. It is repeated, structured, high-volume work: retrieving the right context, parsing documents, detecting sensitive data, routing requests, checking policy, and turning business inputs into clean outputs. Using the largest model for every one of those tasks creates a token burden. Enterprises end up paying frontier-model prices for routine work that a smaller, specialized model can often handle faster, at lower cost, and with stronger task-specific results.

At AISquared, we call this approach vertical intelligence: using focused models built for the actual workflows enterprises need to run in production.

Today, we’re excited to introduce the Bolt family of language models: a suite of open-weight models designed to reduce the cost and complexity of enterprise AI while delivering strong performance on targeted, high-value tasks.

Bolt models are optimized to work with the AISquared UNIFI platform across workflows such as retrieval-augmented generation (RAG), guardrails, document processing, and model routing. Together, they support a more efficient AI architecture: route routine tasks to smaller purpose-built models, reserve larger models for the hardest requests, and keep every workflow governed through UNIFI.

A key design goal for Bolt was efficiency without compromise. These models are parameter-efficient and built for deployment in resource-limited environments, including on-premise and edge systems. That makes them practical not just for testing, but for real production use where cost, latency, control, and scale all matter.

All Bolt models are released under the AI Squared Community License, enabling individuals and organizations to download, evaluate, and build on them for their own use cases. The Bolt family consists of three specialized sub-families, each targeting a core part of enterprise AI systems:

Bolt Instruct: Instruction-following, structured outputs, and guardrails
Bolt Embedding: High-performance vector search and enterprise RAG
Bolt Vision: Visual document understanding and structured text extraction

In this post, we’ll explore each of these model families and examine their performance across both industry-standard benchmarks and enterprise-focused evaluations.

Bolt Instruct

The Bolt Instruct models are fine-tuned versions of the Allen Institute for AI’s OLMo models, optimized for instruction following, structured extraction, and reduced hallucination—particularly in retrieval-augmented generation (RAG) workflows. In addition to general-purpose usage, these models are designed to power several core capabilities within the AISquared UNIFI platform, including model routing and guardrails for detecting sensitive or unsafe content such as PII.

We release three variants:

bolt-instruct-1b (based on OLMo-2-0425-1B-Instruct)
bolt-instruct-7b (based on OLMo-3-7B-Instruct)
bolt-instruct-32b (based on OLMo-3.1-32B-Instruct)

Training Approach

Bolt Instruct models were trained using supervised fine-tuning on a dataset of over 100,000 conversations. This dataset combines:

AISquared’s internal “identity” dataset, enabling the model to respond as “Bolt, a language model trained by AI Squared”
Open-source conversational data with commercially permissive licenses, including both human-curated and synthetic examples
Internally developed datasets designed to reflect real-world UNIFI use cases, such as guardrails enforcement and model routing

This mix of data was carefully curated to preserve general-purpose capabilities while improving performance on targeted, high-value tasks.

From an infrastructure perspective, all models were trained efficiently on a single A100 GPU (80GB VRAM) using Hugging Face tooling. The 1B model was fully fine-tuned, while the 7B and 32B models were trained using QLoRA at 4-bit precision to reduce memory and compute requirements.

Model Characteristics

Context Length
- 1B: 4,096 tokens
- 7B and 32B: 65,536 tokens
Key Capabilities
- Strong instruction following
- Reliable structured (e.g., JSON) outputs
- Reduced hallucination in RAG workflows
- Guardrail enforcement (PII detection, unsafe content, jailbreak attempts)
- Lightweight model routing

These characteristics make Bolt Instruct well-suited for orchestrating complex enterprise workflows where reliability, structure, and efficiency are critical. In many enterprise systems, the routing layer is where token savings begin. A router decides whether a request needs a large model, a fine-tuned model, a tool, or a structured workflow. Bolt Instruct is designed to make those decisions reliably, helping reduce unnecessary calls to expensive models while keeping quality high.

Evaluations

To evaluate Bolt Instruct, we used a combination of industry-standard benchmarks and targeted internal evaluations designed to reflect real-world UNIFI use cases. For general-purpose performance, we leveraged the EleutherAI Language Model Evaluation Harness (lm-eval). For task-specific performance, we developed internal benchmarks focused on guardrails and model routing.

General-Purpose Performance

To assess whether fine-tuning impacted general capabilities, we evaluated all Bolt Instruct models on a subset of lm-eval tasks and compared them to their respective base models.

While some degradation is expected when optimizing for specialized tasks, the results show that Bolt Instruct models retain strong general performance overall. In several cases, fine-tuning led to measurable improvements. Most notably, bolt-instruct-32b outperformed its base model on tasks such as ARC-Easy, BBH, GSM8K, and TruthfulQA (MC2)—benchmarks that align closely with reasoning and structured problem-solving.

These results indicate that Bolt Instruct models remain capable generalists, while also being better aligned for targeted, high-value tasks.

Guardrails and Safety Evaluation

We next evaluated the models on internal benchmarks focused on guardrails and safety. These tests measure the model’s ability to:

Detect personally identifiable information (PII)
Identify unsafe or disallowed requests (e.g., NSFW or harmful content)
Recognize jailbreak attempts

For each evaluation, models were prompted with a variety of inputs and asked to return structured JSON outputs indicating the presence or absence of these attributes.

We compared Bolt Instruct models against previously deployed production models, including GPT OSS 20B and GPT OSS 120B. Results (shown below) demonstrate that Bolt Instruct models achieve strong performance on these tasks, making them well-suited for enforcing guardrails in production systems.

Model Routing Evaluation

Finally, we evaluated Bolt Instruct models on model routing tasks—an important capability for orchestrating multi-model systems.

For this benchmark, we generated synthetic conversations of varying complexity and defined multiple possible downstream routes (e.g., small general models, large frontier models, or specialized tool-augmented systems). The model’s task was to select the most appropriate route for each input.

We report two evaluation metrics:

Strict Accuracy: The model selects the single best possible downstream system
Lenient Accuracy: The model selects any acceptable system capable of completing the task

For example, a simple prompt like “Tell me a quick joke about cats” is best handled by a small, cost-efficient model, but could also be answered by a larger model. However, routing this request to a specialized system (e.g., a text-to-SQL model) would be incorrect.

This distinction allows us to evaluate both optimal decision-making (strict) and practical system performance (lenient).

Results show that Bolt Instruct models perform strongly across both metrics, indicating their effectiveness as reliable routers within complex AI systems.

Bolt Embedding

Retrieval quality also affects token cost. Poor retrieval sends too much context to the model, adds noise to the prompt, and can force repeat calls when the answer is weak. Bolt Embedding is designed to retrieve the right context more consistently, helping reduce wasted tokens before generation even begins.

The Bolt Embedding models are fine-tuned versions of IBM’s Granite Embedding models (R2), which are based on the ModernBERT architecture. These models have been further optimized for enterprise retrieval-augmented generation (RAG) use cases, while maintaining strong general-purpose embedding performance.

We release two variants:

Bolt Embedding Small (45M parameters)
Bolt Embedding Large (100M parameters)

Both models are designed for efficient, large-scale vector search and are well-suited for embedding enterprise documents during ingestion into an AISquared knowledge base.

Training Approach

To train these models, we assembled a large-scale dataset of approximately 7.5 million triplets, each consisting of:

An anchor query
A positively associated text chunk
A hard negative sample

While many source datasets only included positive pairs, we augmented them by generating highly similar negative samples using generative models. This significantly increases task difficulty and improves retrieval quality in practice.

Training was performed using cached multiple negatives ranking loss, allowing for an effective batch size of 1,024 samples. This setup ensures that each training batch closely approximates a realistic, large-scale RAG retrieval scenario.

Evaluations

We evaluated Bolt Embedding models across multiple settings to measure both general-purpose embedding quality and real-world retrieval performance.

MTEB Benchmark Evaluation

We first evaluated the models on a subset of tasks from the Massive Text Embedding Benchmark (MTEB) and compared them against their respective base models.

As expected, we observe modest degradation on some general-purpose benchmark tasks following fine-tuning. This reflects the tradeoff inherent in optimizing models for more specialized, domain-relevant behavior.

However, these results are consistent with our design goals: prioritizing performance in applied enterprise settings over maximizing aggregate benchmark scores.

Real-World RAG Evaluation (LLM-as-a-Judge)

To better reflect real-world performance, we conducted a second evaluation using a fully simulated enterprise RAG pipeline across multiple domains.

In this setup:

Queries were issued against domain-specific corpora
Each model retrieved candidate contexts
An LLM-as-a-judge evaluated whether the correct piece of context required to answer the query was retrieved, and at what rank

We report the rank position of the correct context, where a score of 1 indicates perfect retrieval (i.e., the correct document was the top result). Lower scores are therefore better.

Across ~900 queries per model, all models performed well on straightforward retrieval tasks—the correct document was ranked #1 in the majority of cases. However, meaningful differences emerged on more challenging queries.

Bolt Embedding models achieved:

Lower average retrieval rank (better overall performance)
Lower standard deviation (more consistent performance)

In particular:

bolt-embedding-large achieved the best overall performance, with the lowest mean retrieval rank and lowest variance
bolt-embedding-small also outperformed its base model and remained competitive with larger alternatives

Compared to baseline models:

Both Bolt variants outperform their corresponding Granite Embedding R2 base models
Both models also outperform Google’s EmbeddingGemma (300M) model, despite its larger size

These results highlight an important distinction: while general-purpose benchmarks may show modest tradeoffs after fine-tuning, Bolt Embedding models deliver more reliable and accurate retrieval in real-world RAG scenarios. The improvements are most pronounced on harder queries, where consistent top-ranked retrieval is critical for downstream answer quality.

Bolt Vision

The Bolt Vision, or Bolt-VL, models are fine-tunes of Qwen3.5-4B (for bolt-vl-4b) Qwen3.5-9B (for bolt-vl-9b) that have been optimized to convert highly structured business documents into structure-preserving Markdown text, while preserving the strong general performance of the base models. Key Capabilites include:

Strong visual reasoning & instruction following
Improved structured information extraction from business documents

These models were fine-tuned using supervised fine-tuning on a selection of images of business documents, primarily invoices, and AI-generated Markdown representations of these documents, using LORA via Unsloth, using Huggingface’s Transformers Reinforcement Learning library. The model was also trained on AI Squared’s “identity” dataset, enabling it to answer user queries as “Bolt, a LLM trained by AI Squared”.

Benchmark Performance

To evaluate bolt-vl, we’ve run OmniDocBench (document parsing in real-world scenarios) and OCRBench v2 (visual text localization and reasoning) to measure differences in performance in model attributes of interest as compared to the base Qwen-3.5-9b model, which we also re-ran on these benchmarks to ensure consistency in the evaluation process. As these benchmarks are typical of document parsing models, this evaluation provides the basis to compare the bolt-vl models against the broader model ecosystem.

Comparison of the performance between the bolt-vl-9b and -4b models and a reference model, Qwen3.5-9B. Scores have been normalized so that higher scores are better for all tasks.

Overall, these results show that general visual performance has not been harmed by the fine-tuning process, while domain-relevant performance has been improved. Document AI is one of the clearest examples of token burden. Invoices, forms, statements, and records are processed at high volume, and every extra token becomes a recurring cost. Bolt Vision is built to extract structured information from business documents without relying on large frontier models for every page.

Internal Invoice Benchmark

The purpose for fine-tuning this model was to excel at translating business documents into Markdown text, which is an essential step in extracting structured information from e.g. invoices. To test this, we used the same document processing engine as AISquared uses for customer-facing invoice parsing use cases, but ran an ablation study where the vision models were treated as the independent variable.

We considered 2 outcome metrics:

Line Item Accuracy
1. The percentage of line items that were perfectly transcribed
Invoice Metadata Accuracy
1. The percentage of metadata fields (e.g. PO number, invoice number, invoice date, etc.) that were perfectly transcribed

The decision to use overall accuracy on complete invoice fields rather than per-character transcription accuracy is because, in the context of the business use case, single character transcription issues can result in e.g. mismatched shipping tracking numbers, invoice numbers, etc. when the extracted data is added to a downstream database.

All models were run over the same subset of 60 single-page internal invoices. The results from running each model were compared against a hand-transcribed ground truth dataset. The results are summarized below:

These results show that our bolt-vl models were able to (1) preserve the strong foundational visual & spatial understanding of the base models while (2) enabling performance on an applied, business-relevant benchmark that exceeds Claude 4.6 Opus is on par with GPT-5.5. Additionally, due to the much smaller size of the new bolt-vl models as compared to other open-source models with comparable performance on our invoice benchmark, e.g. Qwen3.5-35B, we can also reduce the hourly cost of hosting this model by over 60% as well as deploy the new model on lower-end on-premise systems and the edge.

Bolt Models: Driving Value in the AISquared UNIFI Platform

The Bolt model family is purpose-built to integrate seamlessly with the AISquared UNIFI platform, ensuring that AI outputs are securely and efficiently embedded into existing business applications. By excelling at targeted enterprise tasks, Bolt models are critical for maximizing UNIFI’s core value propositions: Model Flexibility, Enterprise Guardrails, and Closed-Loop Improvement.

Bolt Instruct: Orchestration, Governance, and General Intelligence

Bolt Instruct models are the foundation for complex AI orchestration and governance within UNIFI.

General Chat & Instruction Following: Instruct models have been tuned to excel as conversational chatbots, able to respond to user requests quickly and effectively.
Guardrails and Safety: Instruct models are used for providing guardrails on inputs and outputs, directly supporting UNIFI’s Enterprise Guardrails and Inline Governance features. The Guardrails and Safety Evaluation demonstrated that Bolt Instruct models achieve strong performance in detecting PII, unsafe content, and jailbreak attempts, even compared to larger production models. This ensures secure and compliance-aware operations for both enterprise and federal environments.
Model Routing: By acting as the engine for routing requests to the appropriate downstream model(s), Bolt Instruct ensures optimal resource usage and cost efficiency. The Model Routing Evaluation confirmed the models’ effectiveness, achieving strong Strict and Lenient Accuracy metrics required for reliably orchestrating multi-model systems.
LLM-as-a-Judge & General Capabilities: The models serve as a general chatbot, for translation, and for built-in LLM-as-a-judge functionality, which is vital for closing the feedback loop and iteratively tuning models. The models’ strong performance retention on reasoning benchmarks (ARC-Easy, BBH) ensures reliable decision-making and instruction-following for these complex, general-purpose tasks.

Bolt Embedding: High-Performance Retrieval for Enterprise RAG

Bolt Embedding models are essential for turning raw data into actionable knowledge within the UNIFI platform.

Search and Retrieval: Embeddings models convert ingested documents (from lakes, warehouses, and apps) to vector embeddings for search and retrieval. The model’s design for efficient, large-scale vector search supports UNIFI’s Plug-and-Play Data Integration.
Reliable RAG: The Real-World RAG Evaluation is the key indicator of value. By achieving a lower average retrieval rank and more consistent performance on challenging queries than larger models like EmbeddingGemma, Bolt Embedding ensures that the UNIFI RAG pipeline retrieves the most relevant context reliably, leading to higher-quality, data-backed insights for end-users.

Bolt Vision: Unlocking Business-Critical Data from Documents

Bolt Vision models are designed to enable the ingestion and use of multimodal and unstructured data, a core requirement for enterprise intelligence.

Structured Data Extraction: Bolt Vision models are available to extract information from charts and tables, and to extract structured data from business documents like invoices, which UNIFI then sends to a downstream database. The Internal Invoice Benchmark validates this capability: the models exceeded the performance of Claude 4.6 Opus and matched GPT-5.5 on Line Item and Invoice Metadata Accuracy. This performance is critical for ensuring the extracted data is perfectly transcribed, preventing errors when added to downstream systems.
Efficiency and Multimodality: The preserved general visual performance allows the models to answer multimodal chat queries. Furthermore, the model’s small size and efficiency—leading to over 60% reduction in hosting costs compared to other comparable models—supports UNIFI’s commitment to resource-constrained and on-premise deployment.

Conclusion

The Bolt model family represents a practical demonstration of vertical intelligence in action. As our evaluations illustrate, smaller, well-tuned models consistently deliver strong—and often superior—performance on the targeted, high-value tasks that define enterprise workflows, all while remaining efficient and practical to deploy. This is the core premise of Bolt: the most effective model for a specific workflow isn’t necessarily the largest, but the one most precisely optimized for the task. With Bolt Instruct, Bolt Embedding, and Bolt Vision, we’ve engineered a suite of models designed to handle real production workloads reliably across on-premise and resource-constrained environments. We have released all Bolt models under the AI Squared Community License and invite you to download, evaluate, and build on them for your own enterprise use cases.

Get Started Now