MagStack™ Cluster Models
Enterprise AI models optimized for distributed inference across MagStack clusters
Thox.ai provides specialized AI models designed for distributed inference across MagStack™ device clusters. These models leverage tensor parallelism and MoE (Mixture of Experts) architectures to run frontier-class AI on your desk with complete privacy.
Healthcare
HIPAA Compliant
Legal
Attorney-Client Privilege
Enterprise
SOC2 & GDPR
Research
FERPA Compliant
Recommended for Most Users
Start with thox-cluster-nano - it offers a 1 million token context window in just 24GB, making it ideal for processing entire documents, codebases, and datasets on a 2-device cluster.
Core Cluster Models
Cluster Nano
RecommendedEntryThox-ai/thox-cluster-nano
Long-context model with 1 million token window for processing entire documents, datasets, and complex analyses. MoE architecture with 128 experts.
Parameters
30B
3.5B (MoE)
Context
1M tokens
Memory
24GB
Min Devices
2x
Speed
80-120 tok/s
Base Model
Nemotron-3-Nano
Benchmarks:
Key Features:
- 1M token context - process entire documents
- MoE architecture (3.5B active of 30B)
- 5-15 concurrent professional users
- HIPAA, GDPR, SOC2, FERPA compliant
Cluster Scout
ProfessionalThox-ai/thox-cluster-scout
Professional multimodal model with vision capabilities and industry-leading 10M token context. Native image understanding for healthcare, legal, and finance.
Parameters
109B
17B (MoE)
Context
10M tokens
Memory
67GB
Min Devices
4x
Speed
60-90 tok/s
Base Model
Llama 4 Scout
Benchmarks:
Key Features:
- 10M token context - industry-leading
- Native vision & image understanding
- Multilingual (12 languages)
- Medical imaging, chart analysis, OCR
Cluster Maverick
EnterpriseThox-ai/thox-cluster-maverick
Enterprise flagship model with frontier multimodal intelligence. For Fortune 500, hospitals, universities, and government.
Parameters
400B
17B (MoE)
Context
1M tokens
Memory
245GB
Min Devices
12x
Speed
30-50 tok/s
Base Model
Llama 4 Maverick
Benchmarks:
Key Features:
- Frontier-class multimodal AI
- 200+ concurrent enterprise users
- All major compliance frameworks
- Fortune 500, hospitals, government
Specialized Cluster Variants
Purpose-built models optimized for specific professional workflows: software engineering, high-speed operations, frontier reasoning, and government/defense security requirements.
Cluster Code
ProfessionalThox-ai/thox-cluster-code
Elite software engineering model with GPT-4o competitive performance. Supports 92 programming languages with repository-level analysis, code generation, debugging, and collaborative code review.
Size
32B
Context
128K tokens
Devices
4+
Key Features:
- 92 programming languages support
- GPT-4o competitive on code generation
- Repository-level analysis (128K context)
- 73.7 Aider score - elite code repair
Benchmarks:
Cluster Swift
ProfessionalThox-ai/thox-cluster-swift
Speed-optimized model for high-volume, real-time applications. Handles 30-50+ concurrent users with <100ms latency. Ideal for customer support, call centers, and interactive applications.
Size
8B
Context
32K tokens
Devices
2+
Key Features:
- 50+ tokens/sec ultra-fast responses
- <100ms first token latency
- 30-50+ concurrent users
- Real-time chat & customer support
Cluster Deep
EnterpriseThox-ai/thox-cluster-deep
Frontier reasoning model with state-of-the-art capabilities. Largest openly available model for research institutions, strategic consulting, financial modeling, legal research, and complex quantitative analysis.
Size
405B
Context
128K tokens
Devices
12+
Key Features:
- 405B parameters - largest open model
- Frontier-class reasoning capabilities
- Research-grade deep analysis
- Strategic consulting & financial modeling
Benchmarks:
Cluster Secure
EnterpriseThox-ai/thox-cluster-secure
Government/defense-grade model with maximum security. Supports UNCLASSIFIED through SECRET workloads with N+2 redundancy, air-gap deployment, ITAR compliance, and FedRAMP High authorization.
Size
72B
Context
128K tokens
Devices
6+
Key Features:
- ITAR, FedRAMP High, FISMA compliant
- Air-gapped deployment ready
- UNCLASSIFIED to SECRET workloads
- N+2 redundancy for mission assurance
Detailed Use Cases & ROI
Cluster Code - Software Engineering Teams
Use Cases:
- Software Teams (25 engineers): Repository analysis, code reviews, architecture design, testing
- Startups (15-20 engineers): Rapid prototyping, technical debt reduction, performance optimization
- DevOps (20 engineers): IaC, CI/CD pipelines, monitoring, incident response
- QA Engineering (15 engineers): Test automation, coverage analysis, security testing
Engineering Team ROI:
- Development Velocity: 40-50% faster feature development
- Code Quality: 80%+ test coverage, reduced vulnerabilities
- Onboarding: 60% faster new engineer ramp-up
- Cost Savings: $50,000 - $150,000/year vs cloud alternatives
3-Year TCO (6 devices, 30 engineers):
$29,500 (~$985/user)
vs GitHub Copilot: $54,000 - $67,500
Pays for itself in 12-18 months
IDE Integration:
Cluster Swift - High-Speed Operations
Use Cases:
- Customer Support (40-50 agents): Real-time query assistance, knowledge base search
- Call Centers (30-40 agents): Live transcription, agent assistance, sentiment analysis
- Interactive Apps (50+ users): Real-time chatbots, collaboration, dynamic content
- Healthcare Admissions (20-30 staff): Patient intake, scheduling, HIPAA-compliant support
Enterprise ROI:
- Customer Support: 50% faster response times, higher satisfaction
- Call Center: $80,000/year operational efficiency gains
- Healthcare: 40% faster patient processing
- Cost Savings: $40,000 - $130,000/year vs cloud APIs
3-Year TCO (3 devices, 50 users):
$13,000 (~$260/user)
vs Cloud AI APIs: $54,000 - $144,000
Pays for itself in 3-9 months
Real-Time Performance:
First Token
<100ms
Throughput
50+ tok/s
Response Time
<1 second
Concurrent
50+ users
Uptime
99.9%
Cluster Deep - Frontier Reasoning
Use Cases:
- R1 Universities (30-40 faculty): Grant proposals, literature reviews, collaboration
- Consulting Firms (20-30 consultants): Strategic analysis, market research, forecasting
- Financial Analysis (25-35 analysts): Quantitative modeling, risk assessment, compliance
- Legal Research (20-30 attorneys): Case law analysis, litigation strategy, compliance
Research Institution ROI:
- Grant Success: 4-5x improvement in grant funding
- Publications: 3-5x increase in peer-reviewed output
- Research Speed: 60% faster literature review and analysis
- Cost Savings: $100,000 - $300,000/year vs cloud frontiers
3-Year TCO (14 devices, 40 researchers):
$79,000 (~$1,975/user)
vs Claude Opus/GPT-4: $108,000 - $288,000
Pays for itself in 9-24 months
Frontier Capabilities:
Cluster Secure - Government/Defense
Use Cases:
- DOD (20-30 analysts): Intelligence analysis, mission planning, threat assessment
- Intelligence Community (15-25 analysts): All-source analysis, counterintel, OSINT
- Defense Contractors (20-30 personnel): ITAR-controlled analysis, export control
- Federal Law Enforcement (15-25 agents): Case investigation, counterterrorism
Government/Defense ROI:
- Mission Effectiveness: $200,000+/year improvement
- Compliance: Avoid classification spills and violations
- Security: Zero external data exposure
- Availability: 99.99% mission assurance with N+2 redundancy
3-Year TCO (10 devices, 30 cleared personnel):
$124,000 (~$4,133/user)
Cloud AI: NOT AUTHORIZED for SECRET
Only authorized solution for classified AI
Security Features:
Compliance & Certification:
Additional Cluster Models
Cluster 70B
ProfessionalThox-ai/thox-cluster-70b
Enterprise-grade model for complex reasoning, analysis, and professional workflows.
Size
72B
Context
64K tokens
Devices
2x
Cluster 100B
ProfessionalThox-ai/thox-cluster-100b
Expert-level model for enterprise, research, healthcare, and legal workloads.
Size
110B
Context
96K tokens
Devices
4x
Cluster 200B
EnterpriseThox-ai/thox-cluster-200b
Frontier-class model matching cloud AI capabilities for any industry application.
Size
405B
Context
128K tokens
Devices
8x
Cluster Coordinator
Coordinator
UtilityLightweight cluster orchestration and management model.
Quick Start
# Pull and run thox-cluster-nano (recommended)
$ ollama pull Thox-ai/thox-cluster-nano
$ ollama run Thox-ai/thox-cluster-nano
# For software engineering teams (4+ devices)
$ ollama pull Thox-ai/thox-cluster-code
$ ollama run Thox-ai/thox-cluster-code
# For high-volume/real-time apps (2+ devices)
$ ollama pull Thox-ai/thox-cluster-swift
$ ollama run Thox-ai/thox-cluster-swift
# For frontier reasoning research (12+ devices)
$ ollama pull Thox-ai/thox-cluster-deep
$ ollama run Thox-ai/thox-cluster-deep
# For government/defense (6+ devices)
$ ollama pull Thox-ai/thox-cluster-secure
$ ollama run Thox-ai/thox-cluster-secure
# For vision capabilities (4+ devices)
$ ollama pull Thox-ai/thox-cluster-scout
$ ollama run Thox-ai/thox-cluster-scout
# Enterprise flagship (12+ devices)
$ ollama pull Thox-ai/thox-cluster-maverick
$ ollama run Thox-ai/thox-cluster-maverick
# Use clusterctl for automatic model selection
$ clusterctl recommend
# Output: Recommended: thox-cluster-nano (2 devices, 32GB RAM)
Model Selection Guide
| Use Case | Recommended | Devices | Why |
|---|---|---|---|
| Full document/codebase analysis | thox-cluster-nano | 2+ | 1M token context handles entire repos |
| Software engineering teams | thox-cluster-code | 4+ | 92 languages, GPT-4o competitive coding |
| High-volume customer support | thox-cluster-swift | 2+ | 50+ concurrent users, <100ms latency |
| Advanced research & analysis | thox-cluster-deep | 12+ | 405B frontier reasoning model |
| Government/defense classified | thox-cluster-secure | 6+ | ITAR, FedRAMP, air-gap ready |
| Medical imaging & chart analysis | thox-cluster-scout | 4+ | Native vision, 10M context, HIPAA |
| Fortune 500 enterprise AI | thox-cluster-maverick | 12+ | Frontier-class, 200+ concurrent users |
Complete Privacy & Compliance
All cluster models run entirely on your local MagStack configuration. No data is transmitted to external servers. Your code, documents, medical records, and conversations never leave your devices.
CONFIDENTIAL AND PROPRIETARY INFORMATION
This documentation is provided for informational and operational purposes only. The specifications and technical details herein are subject to change without notice. Thox.ai LLC reserves all rights in the technologies, methods, and implementations described.
Nothing in this documentation shall be construed as granting any license or right to use any patent, trademark, trade secret, or other intellectual property right of Thox.ai LLC, except as expressly provided in a written agreement.
Patent Protection
The MagStack™ magnetic stacking interface technology, including the magnetic alignment system, automatic cluster formation, NFC-based device discovery, and distributed inference me...
Reverse Engineering Prohibited
You may not reverse engineer, disassemble, decompile, decode, or otherwise attempt to derive the source code, algorithms, data structures, or underlying ideas of any Thox.ai hardwa...
Thox.ai™, Thox OS™, MagStack™, and the Thox.ai logo are trademarks or registered trademarks of Thox.ai LLC in the United States and other countries.
NVIDIA, Jetson, TensorRT, and related marks are trademarks of NVIDIA Corporation. Ollama is a trademark of Ollama, Inc. All other trademarks are the property of their respective owners.
© 2026 Thox.ai LLC. All Rights Reserved.