All systems operational · 99.99% uptime SLA

The compute layer for
AI at scale AI-Created Art

GPU-powered workloads, run serverless AI functions, and access production-
ready model endpoints — all on one unified platform.

GPU Models
0 +
Peak Compute
0 TB/s
API Calls / Day
0 M+
Per GPU/sec
0
Certified
SOC 2

GPU CLOUD

Raw GPU power, on demand

From single A100s to multi-node H100 clusters — spin up bare-metal
GPU capacity in under a minute with no scheduling queues.

Dedicated GPU Instances

Full-node access with NVLink, PCIe Gen5, and direct NVMe storage. No noisy neighbors. Your workload gets all the bandwidth.
FP16 Throughout
0 .9 PFLOPS
HBM3 Bandwidth
0 .5 TB/s
NVLink Bandwidth
0 GB/s

RTX 4090 24 GB

from $ 0.39/hr

RTX 4090 24 GB

from $ 0.39/hr

Instant Provisioning

FBare-metal GPU instances ready in under 60 seconds. No queue, no waiting. Just compute when you need it.

NVLink Clusters

Multi-GPU nodes interconnected at 900 GB/s. Train 70B+ parameter models without bandwidth bottlenecks.

Real-Time Telemetry

Live GPU utilization, memory, power draw, and temperature. Full Prometheus metrics export included.

Persistent Storage

Attach high-speed NVMe volumes up to 25 TB. Data persists between runs, no re-download overhead.

SERVERLESS REPOS

Deploy. Ship. Sleep.
No ops required AI-Created Art

Push your AI code to a repo. We handle containers, scaling, cold
starts, and infrastructure — you ship features.

Auto-containerization

Push raw Python — we detect your stack, pin the right CUDA version, and build a production container automatically. No Dockerfile required.

Burst to Zero Scaling

Scale from zero to hundreds of GPU workers in seconds based on queue depth. Pay only for active inference time, not idle compute.

Git-native CI/CD

Connect your GitHub or GitLab repo. Every push to main triggers an automatic redeploy with zero-downtime blue/green rollout.

Global Edge Routing

Your functions run across 8 global regions. Requests are auto-routed to the nearest warm replica for minimum latency.

Featured Community Repos

llm-inference-server

Production-ready vLLM wrapper with OpenAI-compatible API, streaming support, and auto batching for 70B+ models.

diffusion-api

SDXL + ControlNet in a single serverless endpoint. Handles batching, LoRA hot-swap, and NSFW filtering out of the box.

whisper-streaming

Real-time audio transcription via WebSocket. WhisperX large-v3 with speaker diarization and VAD pre-processing.

embeddings-service

High-throughput text embedding with BGE-M3. Handles 100k+ documents/min with auto chunking and async queuing.

diffusion-api

SDXL + ControlNet in a single serverless endpoint. Handles batching, LoRA hot-swap, and NSFW filtering out of the box.

whisper-streaming

Real-time audio transcription via WebSocket. WhisperX large-v3 with speaker diarization and VAD pre-processing.

PUBLIC ENDPOINTS

150+ Models.
one API Key AI-Created Art

Skip the deployment. Call state-of-the-art models directly via REST or WebSocket — no infra, no cold start, no setup time.

/v1/chat/completions

-38ms

/v1/images/generate

-2.1s

/v1/embeddings

-12ms

/v1/audio/transcribe

-0.9s

/v1/stream/llm

-22ms

/v1/rerank

-18ms

Available Models

PROCESS

How it Works

Deploy GPUs, serverless workloads, and public AI endpoints
with speed, simplicity, and scale.

Define your Handler

Write a Python function with our decorator. No YAML, no Kubemetes, no boilerplate. Just your model logic.

Push & Deploy

Run nf deploy. We build your container, cache your model, and provision GPU replicas globally.

Define your Handler

Write a Python function with our decorator. No YAML, no Kubemetes, no boilerplate. Just your model logic.

Launch your first

Deployment Today

GPU-powered workloads, run serverless AI functions, and access production-
ready model endpoints — all on one unified platform.

PRODUCT

Cloud GPUs

Serverless

Public Endpoints

Hub

RESOURCES

Blogs

Case Studies

Referral Program

Articles

Pricing

COMPANY

About

Contact

Careers

Privacy Policy

Terms & Conditions