Pylox Forge.
Rented intelligence, or owned behavior.
Trained by us. Served on cloud GPU or on-prem. You choose.
The stack
Four layers. Composable.
Every model we ship gets the same optimization stack: a fine-tuned base, hardware-aware quantization, and one of two speculative-decoding drafters. Deployable on cloud GPU or on-prem.
Base model fine-tune
Strong open base, continued pretrain on your data, air-gapped. The model speaks your domain natively. You own the weights.
Hardware-aware quantization
4-bit floating point on Blackwell, FP8 on Hopper. 2 to 3 times more throughput than BF16 with no measurable quality loss.
EAGLE-3 speculative decoding
A 150M-parameter autoregressive drafter proposes K tokens. The base model verifies in parallel. 3 to 4 times faster than vanilla autoregressive. Ships first, drafter trains in hours.
Block diffusion drafter
A 1B block-diffusion drafter proposes B tokens in a single denoising step. The base model verifies. 4 to 6 times faster than autoregressive. Upgrade path from EAGLE-3, drafter trains in days.
Adapter heads
One base. Many heads.
A LoRA or DoRA adapter is a pluggable head on top of the same base model. Legal head, customer service head, code head. Same base in memory. Heads hot-swap at inference.
LoRA
Freeze the base model. Add a small low-rank matrix that approximates the weight update you would have made. Trains 0.1 to 1% of the parameters. Adapter file is 50 to 200 MB.
DoRA
LoRA's successor. Splits the weight update into magnitude and direction, trains both separately. Quality lands closer to full fine-tuning at the same parameter cost.
Pricing
Fine-tunes from $1,000.
Scope depends on the model, the data, and what you want it to do. Tell us, and we will quote.