Advanced Configuration
Fine-tune your THOX.ai device for optimal performance and security.
Model Management
Listing Models
# List all available models
thox models list --all
# List installed models
thox models list --installed
Installing Models
# Install a model
thox models pull thox-coder-large
# Install with specific quantization
thox models pull thox-coder-large:q4_k_m
Model Priority
Configure model loading priority and memory allocation:
# /etc/thox/models.yaml
models:
thox-coder:
priority: high
memory_limit: 8GB
auto_load: true
thox-chat:
priority: medium
memory_limit: 4GB
auto_load: falseCustom Models
Import GGUF-compatible models:
# Import from local file
thox models import ./my-model.gguf --name custom-model
# Import from URL
thox models import https://example.com/model.gguf --name custom-model
Hybrid Inference
THOX.ai uses a hybrid architecture with Ollama for smaller models and Hardware-Accelerated Inference for larger models, providing 60-100% performance improvement for 14B+ models.
Router Configuration
# /opt/thox/configs/router-config.json
{
"strategy": "model_size",
"model_size_threshold_b": 10.0,
"fallback_enabled": true,
"accelerated_models": {
"thox-coder-max": "thox-coder-32b-accel",
"thox-coder-pro": "thox-coder-14b-accel"
}
}Routing Strategies
model_size (default)
Routes based on parameter count. Models 10B+ use Hardware-Accelerated Inference.
explicit
Direct model-to-backend mapping for production control.
performance
Route based on latency requirements for real-time apps.
fallback
Accelerated Inference first, Ollama as backup for reliability.
Check Router Status
curl http://thox.local:8080/router/status | jq
Hardware-Accelerated Inference
Hardware-Accelerated Inference provides high-performance inference for large models with custom attention kernels, paged KV caching, and advanced compression.
Building Accelerated Inference Engines
# List available models
./build-accel-engines.sh --list
# Build recommended models
./build-accel-engines.sh --all
# Build a specific model
./build-accel-engines.sh --model thox-coder-14b
Performance Comparison
| Model | Ollama (tok/s) | Hardware-Accelerated Inference (tok/s) | Improvement |
|---|---|---|---|
| 14B (thox-coder-pro) | 28 | 45-56 | +60-100% |
| 32B (thox-coder-max) | 12 | 20-24 | +67-100% |
| 13B (thox-review) | 30 | 48-55 | +60-83% |
Service Management
# Check Accelerated Inference service status
systemctl status thox-accelerator
# Restart Accelerated Inference service
sudo systemctl restart thox-accelerator
# View Accelerated Inference logs
journalctl -u thox-accelerator -f
Performance Tuning
Ollama Inference Settings
# /etc/thox/inference.yaml
inference:
# Number of threads for CPU operations
threads: 4
# Batch size for inference
batch_size: 512
# Context window size
context_length: 8192
# GPU memory fraction to use
gpu_memory_fraction: 0.9
# Enable flash attention
flash_attention: true
# KV cache quantization
kv_cache_type: q8_0Hardware-Accelerated Inference Settings
# Environment variables in thox-accelerator.service
TRT_MAX_BATCH_SIZE=4
TRT_MAX_INPUT_LEN=4096
TRT_MAX_OUTPUT_LEN=2048
TRT_KV_CACHE_FREE_GPU_MEM_FRACTION=0.4
TRT_ENABLE_PAGED_KV_CACHE=1
TRT_ENABLE_CHUNKED_CONTEXT=1Memory Optimization
Low Memory Mode
Use smaller context, aggressive offloading
thox config set memory_mode lowHigh Performance Mode
Maximize speed, use full memory
thox config set memory_mode highBenchmarking
# Run performance benchmark
thox benchmark --model thox-coder
# Expected output:
Model: thox-coder
Prompt eval: 125 tokens/s
Generation: 45 tokens/s
Memory usage: 6.2GB
Security Settings
API Authentication
Enable API key authentication for remote access:
# Generate API key
thox auth generate-key --name "my-app"
# Enable authentication
thox config set auth.enabled true
# Use in requests
curl -H "Authorization: Bearer sk-xxx" http://thox.local:8080/v1/models
Network Access Control
# /etc/thox/security.yaml
network:
# Bind to specific interface
bind_address: "0.0.0.0"
# Allowed IP ranges
allowed_ips:
- "192.168.1.0/24"
- "10.0.0.0/8"
# Rate limiting
rate_limit:
requests_per_minute: 60
tokens_per_minute: 100000TLS/HTTPS
Enable HTTPS for secure connections:
# Generate self-signed certificate
thox tls generate --hostname thox.local
# Or use existing certificate
thox tls import --cert /path/to/cert.pem --key /path/to/key.pem
# Enable TLS
thox config set tls.enabled true
Security Note: When exposing your device to the internet, always enable authentication, use HTTPS, and configure firewall rules to restrict access.
Backup & Restore
Creating Backups
# Full backup (config + models)
thox backup create --output /path/to/backup.tar.gz
# Config only backup
thox backup create --config-only --output /path/to/config-backup.tar.gz
# Automatic scheduled backup
thox backup schedule --daily --keep 7 --output /backups/
Restoring from Backup
# Full restore
thox backup restore /path/to/backup.tar.gz
# Restore config only
thox backup restore --config-only /path/to/backup.tar.gz
What's Included
Configuration
- • Device settings
- • Model configurations
- • Network settings
- • API keys
Data
- • Installed models
- • Custom prompts
- • Chat history (optional)
- • Usage statistics
Factory Reset
# Reset to factory defaults (keeps models)
thox system reset --keep-models
# Full factory reset
thox system reset --full
Warning: Factory reset is irreversible. Always create a backup before performing a reset.
CONFIDENTIAL AND PROPRIETARY INFORMATION
This documentation is provided for informational and operational purposes only. The specifications and technical details herein are subject to change without notice. THOX.ai LLC reserves all rights in the technologies, methods, and implementations described.
Nothing in this documentation shall be construed as granting any license or right to use any patent, trademark, trade secret, or other intellectual property right of THOX.ai LLC, except as expressly provided in a written agreement.
Patent Protection
The MagStack™ magnetic stacking interface technology is proprietary technology of THOX.ai LLC, protected by trade secrets and intellectual property laws....
Reverse Engineering Prohibited
You may not reverse engineer, disassemble, decompile, decode, or otherwise attempt to derive the source code, algorithms, data structures, or underlying ideas of any THOX.ai hardwa...
THOX.ai™, ThoxOS™, MagStack™, MeshStack™, ThoxMigrate™, the THOX Edge Series™, the THOX Nova Series™, and the THOX.ai logo are trademarks or registered trademarks of THOX.ai LLC in the United States and other countries. WireGuard® is a registered trademark of Jason A. Donenfeld.
All other trademarks are the property of their respective owners.
© 2026 THOX.ai LLC. All Rights Reserved.