AI & ML Research
GPU clusters, inference stacks, and high-throughput storage for research teams whose hardware should be training, not waiting on networking.
AI and ML workloads have infrastructure requirements most MSPs have never seen: GPU passthrough, high-throughput storage, RDMA networking, container orchestration for inference, and the security posture that comes with proprietary model weights and training data. The expensive failure mode isn’t hardware. It’s a cluster sitting idle while researchers fight the environment instead of using it.
Tell us what’s breaking →The expensive GPU cluster that sat idle for six weeks because nobody could get the networking right. The training run that died at hour forty because storage throughput collapsed under checkpoint writes. The driver and CUDA version matrix that eats a researcher-day every time anything updates. The inference endpoint a researcher exposed to the internet with a default config and no auth, found by a port scan before anyone inside noticed. The model weights that count as the company’s crown jewels, sitting on the same flat network as the office printers.
If any of that sounds familiar, we’ve worked through it before.
The Craftwork Group is a young entity. The people doing the work aren’t. Our team brings more than a hundred years of combined IT experience, doing this work since 1999, and we run GPU infrastructure in production today. The H100 cluster in production isn’t a case study. It’s our reference architecture.
That distinction matters in this vertical more than most. Plenty of shops will quote you a cluster build from a parts list and a vendor whitepaper. Far fewer have lived with one: the thermals under sustained load, the storage layout that keeps checkpoint writes from starving the data loaders, the Proxmox passthrough configuration that survives a kernel update, the difference between an inference stack that demos well and one that holds a queue at 2 AM.
What working with us actually looks like. Helpdesk opens at 6:30 AM and runs until 5:00 PM, with on-call coverage after hours. Standard SLA is one-hour response; in practice, most calls get answered live as they come in. We treat phone calls as priority because if you’re calling, it’s urgent. The only thing that bumps a live call is a monitoring alert flagging a system down or under attack. We’re often the ones who tell you something broke before you noticed. Often we have it fixed before you’d have called.
A note on the rest of the field. We’ve spent the last few months calling MSPs posing as a buyer to see how the market actually operates. Over seventy-five percent never picked up the phone. None returned the sales inquiry. If you’ve shopped for IT support before, that probably tracks. We don’t work that way.
We work as a systems integrator, not a reseller. Your environment gets built around your research workflow: who needs raw access, what gets containerized, where the security boundary sits between experiment and production. Built for teams that want the infrastructure to disappear into the background.
Cluster architecture from the rack up: GPU selection against your actual workload, Proxmox virtualization with passthrough that survives updates, power and thermal planning for sustained load rather than benchmark bursts, and growth paths that don’t require a forklift. Built to be in use from week one, because idle accelerators are the most expensive hardware you own.
vLLM and Ollama stacks configured for the models you actually serve: quantization choices made deliberately, container orchestration that lets researchers ship without tickets, queue behavior tested under real concurrency, and monitoring that distinguishes a slow model from a dying node. The goal is an inference layer your team trusts enough to build on.
NVMe and ZFS storage engineered for the access patterns training actually produces: sequential dataset streaming, parallel data-loader reads, and checkpoint write bursts that would flatten a general-purpose NAS. Snapshots and backup policies that protect months of training investment without throttling the next run.
Proprietary weights and training data treated like the assets they are: air-gapped deployment where the threat model calls for it, network segmentation that separates research, production, and office traffic, and API exposure that is deliberate, authenticated, and logged. When a model does face the internet, it faces it on purpose, behind controls you can describe to an investor or an auditor.
We run the substrate. You do the research. Our job is the layer where research time goes to die: drivers, networking, storage, orchestration, security, and the operational discipline that keeps a shared cluster fair and alive. We don’t pretend to advise on architectures, training methodology, or your research direction.
What we won’t do: sell you hardware you don’t need, or a managed-AI abstraction that puts us between you and your own models. If a cloud instance is honestly the better economics for your stage, we’ll say so.
Your model choice. Your API keys. Your weights and your data on infrastructure you own. We configure, deploy, and operate; you keep the keys and the option to take it all elsewhere.
No phone tree. No demo deck. A real conversation about what’s breaking in your environment, what you’ve already tried, and whether there’s a path where we’d actually be useful. If we’re not the right fit, we’ll tell you and point you somewhere honest.
If we are a fit, the next step is an infrastructure audit. A close look at your cluster, storage, and security posture, then a written assessment of what we’d do and what it would cost. No obligation past that point.
Book the 30-minute call →