Thox.ai Edge Device

The most powerful edge AI device for professionals. Run any Ollama-compatible model locally with blazing-fast inference. For healthcare, legal, research, development, and beyond.

100 TOPS AI Accelerator
16GB LPDDR5 RAM
MagStack™ Clustering
Wi-Fi 6E + 2.5G Ethernet
TPM 2.0 Security
6" x 4" Form Factor
$799Ships Q3 2026
Thox.aiHybrid AI Inference EngineThox OS v1.1SYSNETAIPWRETHUSB-CHDMIUSB-ADCv1.0THOX.AI

Available Colors

Thox.aiHybrid AI Inference EngineThox OS v1.1SYSNETAIPWRETHUSB-CHDMIUSB-ADCv1.0THOX.AI

Midnight Black

Thox.aiHybrid AI Inference EngineThox OS v1.1SYSNETAIPWRETHUSB-CHDMIUSB-ADCv1.0THOX.AI

Arctic White

Thox.aiHybrid AI Inference EngineThox OS v1.1SYSNETAIPWRETHUSB-CHDMIUSB-ADCv1.0THOX.AI

Space Gray

Technical Specifications

Enterprise-grade hardware engineered for any AI workload. Built for professionals who demand performance and privacy.

Dimensions

Height
6 inches (152.4 mm)
Width
4 inches (101.6 mm)
Depth
1.2 inches (30.5 mm)
Weight
450g (0.99 lbs)

Compute

CPU
ARM Cortex-A78AE 8-core @ 2.84 GHz
GPU
NVIDIA Jetson Orin NX 16GB
NPU
100 TOPS AI Accelerator
RAM
16GB LPDDR5 6400 MT/s
Storage
2 Terabyte NVMe SSD

AI Performance

INT8 Inference
100 TOPS
FP16 Performance
25 TFLOPS
7B Model (Ollama)
45-72 tokens/s
14B Model (TensorRT)
45-56 tokens/s (+60%)
32B Model (TensorRT)
20-24 tokens/s (+100%)
Max Context
128K tokens

Connectivity

Ethernet
2.5 Gigabit
Wi-Fi
Wi-Fi 6E (802.11ax)
Bluetooth
5.3 LE
USB
2x USB-C 3.2, 1x USB-A 3.0
HDMI
HDMI 2.1 (4K60)

Power

Input
12V DC / USB-C PD 65W
Typical Load
25W
Max Power
45W
Idle Power
5W

MagStack™ Clustering

Stacking Interface
Magnetic alignment (8x N52 magnets)
NFC Discovery
ST25DV64K (30mm range)
Data Connection
12-pin pogo (10 Gbps USB 3.2)
Power Passthrough
USB-PD up to 100W
Alignment Accuracy
±0.5mm, self-centering
Cluster Formation
~10 seconds automatic
Max Stack Height
8 devices
Cluster Interconnect
Wi-Fi 6E / 2.5GbE
Auto-Discovery
mDNS + NFC handshake
Combined RAM (8x)
Up to 128GB
Combined Compute (8x)
Up to 800 TOPS
Patent Pending

MagStack™ Clustering

Stack multiple devices to combine RAM and compute power. Run larger AI models than ever before.

Single Device
Combined RAM
16GB
Total Compute
100 TOPS
Max Model Size
32B
Performance
20-72 tok/s
2x Stack
Combined RAM
32GB
Total Compute
200 TOPS
Max Model Size
70B
Performance
25-45 tok/s
Popular
4x Stack
Combined RAM
64GB
Total Compute
400 TOPS
Max Model Size
100B+
Performance
15-30 tok/s
8x Stack
Combined RAM
128GB
Total Compute
800 TOPS
Max Model Size
200B+
Performance
10-20 tok/s

How MagStack™ Works

1

Approach

NFC antennas detect proximity at 30mm and initiate handshake

2

Align & Connect

N52 magnets self-align, pogo pins establish 10 Gbps data link

3

Form Cluster

Leader elected in ~10 seconds, models auto-partitioned

4

Run Models

Pipeline parallelism splits layers across devices via 10 Gbps

2-Device Bundle

Run thox-cluster-nano with 1M context or 70B models

$1499$1598Save $99
  • 2x Thox.ai Edge Devices
  • 32GB Combined RAM
  • 200 TOPS Compute
  • Pre-configured cluster
  • thox-cluster-nano included
Best Value

4-Device Bundle

Run thox-cluster-100b for enterprise workloads

$2899$3196Save $297
  • 4x Thox.ai Edge Devices
  • 64GB Combined RAM
  • 400 TOPS Compute
  • Pre-configured cluster
  • thox-cluster-100b included
  • Priority support

Thox.ai™, Thox OS™, and MagStack™ are trademarks of Thox.ai LLC. MagStack magnetic stacking technology is Patent Pending.

MagStack™ Optimized

Cluster AI Models

Models designed for distributed inference across MagStack™ clusters. Available on Ollama.

thox-cluster-nano

RECOMMENDED

Our recommended cluster model featuring a 1 million token context window based on NVIDIA Nemotron-3-Nano. Process entire codebases in a single context - no chunking or summarization needed.

30B Parameters1M Context24GB Memory2+ Devices20-35 tok/s
Recommended

Cluster Nano

Long-context model with 1 million token window for processing entire documents, datasets, and complex analyses. MoE architecture with 128 experts.

Parameters
30B
Context
1M tokens
Min Devices
2x
Speed
80-120 tok/s
Base Model
Nemotron-3-Nano
Thox-ai/thox-cluster-nano

Cluster Code

Elite software engineering model with GPT-4o competitive performance. Supports 92 programming languages with repository-level analysis, code generation, debugging, and collaborative code review.

Parameters
32B
Context
128K tokens
Min Devices
4x
Speed
100-150 tok/s
Base Model
Qwen2.5-Coder
Thox-ai/thox-cluster-code

Cluster Swift

Speed-optimized model for high-volume, real-time applications. Handles 30-50+ concurrent users with <100ms latency. Ideal for customer support, call centers, and interactive applications.

Parameters
8B
Context
32K tokens
Min Devices
2x
Speed
50+ tok/s
Base Model
Ministral-3
Thox-ai/thox-cluster-swift

Cluster Deep

Frontier reasoning model with state-of-the-art capabilities. Largest openly available model for research institutions, strategic consulting, financial modeling, legal research, and complex quantitative analysis.

Parameters
405B
Context
128K tokens
Min Devices
12x
Speed
120-180 tok/s
Base Model
Llama 3.1
Thox-ai/thox-cluster-deep

Cluster Secure

Government/defense-grade model with maximum security. Supports UNCLASSIFIED through SECRET workloads with N+2 redundancy, air-gap deployment, ITAR compliance, and FedRAMP High authorization.

Parameters
72B
Context
128K tokens
Min Devices
6x
Speed
60-90 tok/s
Base Model
Qwen2.5
Thox-ai/thox-cluster-secure

Cluster Scout

Professional multimodal model with vision capabilities and industry-leading 10M token context. Native image understanding for healthcare, legal, and finance.

Parameters
109B
Context
10M tokens
Min Devices
4x
Speed
60-90 tok/s
Base Model
Llama 4 Scout
Thox-ai/thox-cluster-scout

Cluster Maverick

Enterprise flagship model with frontier multimodal intelligence. For Fortune 500, hospitals, universities, and government.

Parameters
400B
Context
1M tokens
Min Devices
12x
Speed
30-50 tok/s
Base Model
Llama 4 Maverick
Thox-ai/thox-cluster-maverick

Cluster 70B

Enterprise-grade model for complex reasoning, analysis, and professional workflows.

Parameters
72B
Context
64K tokens
Min Devices
2x
Speed
25-45 tok/s
Base Model
Qwen 3
Thox-ai/thox-cluster-70b

Cluster 100B

Expert-level model for enterprise, research, healthcare, and legal workloads.

Parameters
110B
Context
96K tokens
Min Devices
4x
Speed
15-30 tok/s
Base Model
Qwen 3
Thox-ai/thox-cluster-100b
Frontier

Cluster 200B

Frontier-class model matching cloud AI capabilities for any industry application.

Parameters
405B
Context
128K tokens
Min Devices
8x
Speed
10-20 tok/s
Base Model
Llama 3.3
Thox-ai/thox-cluster-200b

Which Model Should I Use?

Use CaseRecommended ModelWhy
Large document analysisthox-cluster-nano1M context for full documents and datasets
Research & complex reasoningthox-cluster-70b70B params for advanced analysis
Healthcare, legal, enterprisethox-cluster-100bExpert-level professional workloads
Frontier-class AI tasksthox-cluster-200bMatches cloud AI capabilities locally

Latest Compatible Models

The newest Ollama models from 2024-2025, optimized for Thox.ai devices. Vision-enabled, multilingual, and professional-grade.

View complete model catalog and compatibility guide
Recommended

Ministral-3 8B

Vision, 32+ languages, edge AI

VisionMultilingualTools
Speed
40-60 tokens/s
Backend
Ollama
Context
256K tokens
Recommended

Llama 4 Scout

Frontier multimodal, 12 languages

VisionMultilingual10M context
Speed
35-50 tokens/s
Backend
Hybrid
Context
10M tokens
Min Devices
2x
Recommended+60% faster

Qwen 3 14B

Advanced reasoning, vision

VisionThinkingTools
Speed
30-45 tokens/s
Backend
TensorRT-LLM
Context
128K tokens

Phi-4 Mini (3.8B)

Ultra-fast, multilingual, tools

FastMultilingualLow memory
Speed
70-95 tokens/s
Backend
Ollama
Context
128K tokens
+60% faster

Qwen 2.5 Coder 14B

Code specialist, reasoning

CodingThinkingTools
Speed
28-42 tokens/s
Backend
TensorRT-LLM
Context
128K tokens

Gemma 3 8B

Vision, single GPU optimized

VisionToolsCloud-ready
Speed
38-55 tokens/s
Backend
Ollama
Context
128K tokens

Latest 2024-2025 models with vision, multilingual (32+ languages), and thinking capabilities. Hybrid Ollama + TensorRT-LLM inference delivers 60-100% faster performance on 14B+ models. Compatible with 100+ Ollama models.

What's in the Box

Everything you need to get started.

  • Thox.ai Edge Device
  • 65W GaN USB-C Power Adapter
  • USB-C to USB-C Cable (1m)
  • Quick Start Guide
  • Ethernet Cable (CAT6, 1m)
  • Mounting Bracket Kit
  • Thermal Pad Set
Operating System

Powered by Thox OS

A custom operating system purpose-built for AI inference at the edge.

TensorRT-LLM Acceleration

60-100% faster inference on 14B+ models via TensorRT-LLM

Hybrid Smart Routing

Auto-routes to optimal backend: Ollama or TensorRT-LLM

Native Jetson Execution

Runs directly on device with JetPack 6.x integration

Hybrid AI Runtime

  • Ollama + TensorRT-LLM backends
  • Thox.ai Coder models (7B/14B/32B)
  • Smart router with auto-backend
  • OpenAI-compatible API
  • 60-100% faster on 14B+ models

Ready for Any Workflow

  • Intuitive web dashboard
  • API access for any application
  • CLI tools for power users
  • Automatic updates (OTA)

Thox OS™ is a trademark of Thox.ai LLC. All rights reserved.

Frequently Asked Questions

Got questions? We've got answers.

Ready to Order?

Secure your Thox.ai Edge Device today with a $99 refundable deposit. Expected shipping Q3 2026.