Pylox Forge.

Rented intelligence, or owned behavior.

Trained by us. Served on cloud GPU or on-prem. You choose.

The stack

Four layers. Composable.

Every model we ship gets the same optimization stack: a fine-tuned base, hardware-aware quantization, and one of two speculative-decoding drafters. Deployable on cloud GPU or on-prem.

01

Base model fine-tune

Strong open base, continued pretrain on your data, air-gapped. The model speaks your domain natively. You own the weights.

02

Hardware-aware quantization

4-bit floating point on Blackwell, FP8 on Hopper. 2 to 3 times more throughput than BF16 with no measurable quality loss.

03

EAGLE-3 speculative decoding

A 150M-parameter autoregressive drafter proposes K tokens. The base model verifies in parallel. 3 to 4 times faster than vanilla autoregressive. Ships first, drafter trains in hours.

04

Block diffusion drafter

A 1B block-diffusion drafter proposes B tokens in a single denoising step. The base model verifies. 4 to 6 times faster than autoregressive. Upgrade path from EAGLE-3, drafter trains in days.

Configuration
Single user
Aggregate
BF16 baseline
80-100 tps
800-1,000 tps
Quantized
200-300 tps
1,500-2,500 tps
Quantized + EAGLE-3 drafter
500-800 tps
3,000-4,000 tps
Quantized + block diffusion drafter
800-1,200 tps
6,000-8,000 tps
100+
concurrent users / GPU
< 300ms
time to first token
6-8x
cheaper to serve
Owned
weights, deployable anywhere

Adapter heads

One base. Many heads.

A LoRA or DoRA adapter is a pluggable head on top of the same base model. Legal head, customer service head, code head. Same base in memory. Heads hot-swap at inference.

Low-Rank Adaptation

LoRA

Freeze the base model. Add a small low-rank matrix that approximates the weight update you would have made. Trains 0.1 to 1% of the parameters. Adapter file is 50 to 200 MB.

Weight-Decomposed Low-Rank Adaptation

DoRA

LoRA's successor. Splits the weight update into magnitude and direction, trains both separately. Quality lands closer to full fine-tuning at the same parameter cost.

Hours
to train
100 MB
to store
Reversible
base stays unchanged
Yours
deliverable, no lock-in

Pricing

Fine-tunes from $1,000.

Scope depends on the model, the data, and what you want it to do. Tell us, and we will quote.

If we do not beat your baseline by the agreed range, you get your money back.