What you’ve probably lived through

The expensive GPU cluster that sat idle for six weeks because nobody could get the networking right. The training run that died at hour forty because storage throughput collapsed under checkpoint writes. The driver and CUDA version matrix that eats a researcher-day every time anything updates. The inference endpoint a researcher exposed to the internet with a default config and no auth, found by a port scan before anyone inside noticed. The model weights that count as the company’s crown jewels, sitting on the same flat network as the office printers.

If any of that sounds familiar, we’ve worked through it before.

Tell us what’s breaking →

Why we’re worth ten minutes

The Craftwork Group is a young entity. The people doing the work aren’t. Our team brings more than a hundred years of combined IT experience, working together for more than 14 years, and we run GPU infrastructure in production today. The H100 cluster in production isn’t a case study. It’s our reference architecture.

That distinction matters in this vertical more than most. Plenty of shops will quote you a cluster build from a parts list and a vendor whitepaper. Far fewer have lived with one: the thermals under sustained load, the storage layout that keeps checkpoint writes from starving the data loaders, the Proxmox passthrough configuration that survives a kernel update, the difference between an inference stack that demos well and one that holds a queue at 2 AM.

What working with us actually looks like. Helpdesk opens at 6:30 AM and runs until 5:00 PM, with on-call coverage after hours. Standard SLA is one-hour response; in practice, most calls get answered live as they come in. We treat phone calls as priority because if you’re calling, it’s urgent. The only thing that bumps a live call is a monitoring alert flagging a system down or under attack. We’re often the ones who tell you something broke before you noticed. Often we have it fixed before you’d have called.

A note on the rest of the field. We’ve spent the last few months calling MSPs posing as a buyer to see how the market actually operates. Over seventy-five percent never picked up the phone. None returned the sales inquiry. If you’ve shopped for IT support before, that probably tracks. We don’t work that way.

We work as a systems integrator, not a reseller. Your environment gets built around your research workflow: who needs raw access, what gets containerized, where the security boundary sits between experiment and production. Built for teams that want the infrastructure to disappear into the background.

What we actually do for AI and ML research teams

// gpu · proxmox · passthrough · cluster-design · thermals · capacity

GPU cluster design and Proxmox deployment

Cluster architecture from the rack up: GPU selection against your actual workload, Proxmox virtualization with passthrough that survives updates, power and thermal planning for sustained load rather than benchmark bursts, and growth paths that don’t require a forklift. Built to be in use from week one, because idle accelerators are the most expensive hardware you own.

// vllm · ollama · inference · containers · orchestration · monitoring

Inference stack configuration

vLLM and Ollama stacks configured for the models you actually serve: quantization choices made deliberately, container orchestration that lets researchers ship without tickets, queue behavior tested under real concurrency, and monitoring that distinguishes a slow model from a dying node. The goal is an inference layer your team trusts enough to build on.

// nvme · zfs · high-throughput · datasets · checkpoints · backup

High-throughput storage for training data

NVMe and ZFS storage engineered for the access patterns training actually produces: sequential dataset streaming, parallel data-loader reads, and checkpoint write bursts that would flatten a general-purpose NAS. Snapshots and backup policies that protect months of training investment without throttling the next run.

// air-gap · model-weights · api-exposure · segmentation · access-control

Air-gapped deployment and secure API exposure

Proprietary weights and training data treated like the assets they are: air-gapped deployment where the threat model calls for it, network segmentation that separates research, production, and office traffic, and API exposure that is deliberate, authenticated, and logged. When a model does face the internet, it faces it on purpose, behind controls you can describe to an investor or an auditor.

Where we fit in your AI work (and where we don’t)

We run the substrate. You do the research. Our job is the layer where research time goes to die: drivers, networking, storage, orchestration, security, and the operational discipline that keeps a shared cluster fair and alive. We don’t pretend to advise on architectures, training methodology, or your research direction.

What we won’t do: sell you hardware you don’t need, or a managed-AI abstraction that puts us between you and your own models. If a cloud instance is honestly the better economics for your stage, we’ll say so.

Your model choice. Your API keys. Your weights and your data on infrastructure you own. We configure, deploy, and operate; you keep the keys and the option to take it all elsewhere.

What happens if you reach out

No phone tree. No demo deck. A real conversation about what’s breaking in your environment, what you’ve already tried, and whether there’s a path where we’d actually be useful. If we’re not the right fit, we’ll tell you and point you somewhere honest.

If we are a fit, the next step is an infrastructure audit. A close look at your cluster, storage, and security posture, then a written assessment of what we’d do and what it would cost. No obligation past that point.

Book the 30-minute call →

Infrastructure built by people who run it in production