Pylox Forge.

Rented intelligence, or owned behavior.

huggingface.co/pyloxsystemsgithub.com/pyloxsystems

Trained by us. Served on cloud GPU or on-prem. You choose.

The stack

Four layers. Composable.

Every model we ship gets the same optimization stack: a fine-tuned base, hardware-aware quantization, and one of two speculative-decoding drafters. Deployable on cloud GPU or on-prem.

Base model fine-tune

Strong open base, continued pretrain on your data, air-gapped. The model speaks your domain natively. You own the weights.

Hardware-aware quantization

4-bit floating point on Blackwell, FP8 on Hopper. 2 to 3 times more throughput than BF16 with no measurable quality loss.

EAGLE-3 speculative decoding

A 150M-parameter autoregressive drafter proposes K tokens. The base model verifies in parallel. 3 to 4 times faster than vanilla autoregressive. Ships first, drafter trains in hours.

Block diffusion drafter

A 1B block-diffusion drafter proposes B tokens in a single denoising step. The base model verifies. 4 to 6 times faster than autoregressive. Upgrade path from EAGLE-3, drafter trains in days.

Configuration

Single user

Aggregate

BF16 baseline

80-100 tps

800-1,000 tps

Quantized

200-300 tps

1,500-2,500 tps

Quantized + EAGLE-3 drafter

500-800 tps

3,000-4,000 tps

Quantized + block diffusion drafter

800-1,200 tps

6,000-8,000 tps

100+

concurrent users / GPU

< 300ms

time to first token

6-8x

cheaper to serve

Owned

weights, deployable anywhere

Adapter heads

One base. Many heads.

A LoRA or DoRA adapter is a pluggable head on top of the same base model. Legal head, customer service head, code head. Same base in memory. Heads hot-swap at inference.

Low-Rank Adaptation

LoRA

Freeze the base model. Add a small low-rank matrix that approximates the weight update you would have made. Trains 0.1 to 1% of the parameters. Adapter file is 50 to 200 MB.

Weight-Decomposed Low-Rank Adaptation

DoRA

LoRA's successor. Splits the weight update into magnitude and direction, trains both separately. Quality lands closer to full fine-tuning at the same parameter cost.

Hours

to train

100 MB

to store

Reversible

base stays unchanged

Yours

deliverable, no lock-in

Pricing

Fine-tunes from $1,000.

Scope depends on the model, the data, and what you want it to do. Tell us, and we will quote.

If we do not beat your baseline by the agreed range, you get your money back.

[email protected]