Slug: self-hosted-ai-stack-under-500Series: Self-Hosted Everything #1 Tags: Self-Hosted, AI Infrastructure, Coolify, Supabase, vLLM Target length: 3,500 words CTA: Gumroad — Self-Hosted AI Infrastructure Kit Publish: Mar 25, 2026 — 9:00 AM IST Platform: mrgulshanyadav.misar.blog/articles/self-hosted-ai-stack-under-500
A full production AI stack — LLM inference, vector database, auth, cache, email, CI/CD — runs under $500/month on self-hosted VPS
Managed equivalents (Supabase.com, Redis Cloud, Vercel, Resend, GitHub Actions) would cost $1,800–$2,400/month for the same workload
The crossover point is roughly 3 products or $2,000+ in managed spend — below that, managed wins on time savings
vLLM beats Ollama for concurrent production traffic; Coolify removes most of the DevOps overhead
Self-hosting requires 4–8 hours/week of ops time — budget this honestly before committing
I run five AI products — MisarIO, MisarMail, MisarDev, MisarBlog, and Assisters — on two Hetzner VPS instances. The main VPS (€38/month, 8-core, 32GB RAM) runs Coolify as the deployment platform, Forgejo for Git hosting, Redis for caching and queues, and Mailcow for transactional email. A second Hetzner VPS (€54/month) runs five separate Supabase stacks — one per product — each with its own PostgreSQL instance, GoTrue auth, PostgREST API, and S3-compatible storage. For LLM inference, I use a GPU instance (Hetzner GPU Cloud, pay-per-use, ~$80–120/month depending on workload) running vLLM with Llama 3.3 70B. Total monthly cost: approximately $210–250 for fixed infrastructure, $80–120 for GPU inference = $290–370/month total.
That's the complete picture. Everything below explains the why, the how, and what you'd actually need to replicate it.
Most self-hosting arguments start with "you own your data" or "no vendor lock-in." Those are real benefits but they're not why I made the decision. The actual reasons are more concrete.
1. The math at multiple products doesn't add up for managed.
When you run one product, managed services are clearly cheaper. Supabase.com at $25/month, Vercel Pro at $20/month, Redis Cloud at $15/month, Resend at $20/month — you're at $80/month before you've thought about it. That's fine for one product.
Now add a second product. Same costs again. Third product. The managed services scale per-project, not per-company. By the time I had three AI products with real workloads, managed was going to cost $400–500/month — just for the basic infrastructure layer, before LLM API costs.
Self-hosted infrastructure amortises across all products. The Supabase VPS running five stacks costs $54/month total. Five managed Supabase Pro plans would cost $125/month.
2. Predictability over elasticity.
Managed services bill on usage. That's great when you're growing and don't know your volumes yet. It's less great when you have a LLM feature that unpredictably calls the OpenAI API 100,000 times because of a bug, and you get a $900 bill for it.
Self-hosted infrastructure has a fixed cost. My Redis instance doesn't bill me more when I run 500,000 cache operations. My PostgreSQL instance doesn't charge per query. The ceiling is defined by VPS capacity, not by a pricing page I need to re-read every month.
3. Data sovereignty is a real constraint for AI products, not a philosophy.
If you're building AI products that process customer data, you need to think about where that data lives. For EU customers, GDPR applies. For Indian users, India's DPDP Act (2023) has specific provisions about cross-border data transfers. For healthcare or legal use cases, it gets stricter.
Running Supabase on a Hetzner VPS in Helsinki means I know exactly where every byte is stored. I can point to a specific server in a specific data centre. That's not possible when you're using Supabase.com — you're trusting their infrastructure choices.
Hetzner is the best value compute I've found for European-hosted infrastructure. The pricing is roughly 30–40% cheaper than DigitalOcean or Linode for equivalent specs, and the network performance is solid.
Main VPS (CX41, €38/month):
| Specification | Details |
|---|---|
| CPU/RAM | 8 vCPUs, 32GB RAM, 240GB NVMe |
| Services | Coolify, Forgejo, act_runner (CI), Redis, Mailcow, Nginx/Traefik |
| Use Case | Up to ~5 apps with moderate traffic, Git hosting, background jobs |
Supabase VPS (CCX22, €54/month):
| Specification | Details |
|---|---|
| CPU/RAM | 4 dedicated vCPUs, 16GB RAM, 240GB SSD |
| Services | 5× Supabase stacks (one per product) in Docker |
| Port Mapping | 5433–5437 for PostgreSQL, 8001–8005 for Kong API |
| Storage | Separate Hetzner Volume (€4.80/month per 100GB) mounted for each product |
The two-VPS architecture separates concerns cleanly. If the main VPS has an issue, the databases are unaffected. If I need to scale compute for apps, I don't touch the database VPS.
Coolify is a self-hosted PaaS that handles Docker deployment, reverse proxy (Traefik), SSL certificate management (Let's Encrypt), environment variable management, and basic monitoring. Think Vercel or Railway — but running on your own server.
What Coolify gets right:
What Coolify doesn't replace:
Setup time for Coolify: about 30 minutes on a fresh VPS. Ongoing maintenance: near zero.
Supabase open-source is the same software as Supabase.com, packaged as a Docker Compose stack. You get PostgreSQL, GoTrue (auth), PostgREST (auto-generated REST API), Realtime (websockets), Storage (S3-compatible), and Supabase Studio (the dashboard).
The main consideration is running multiple Supabase stacks on one VPS. Each stack consumes about 1–1.5GB RAM at idle. With 16GB RAM, you can comfortably run 8–10 stacks. Five stacks uses about 7GB RAM, leaving headroom for queries and connections.
Per-product Supabase stacks means:
The tradeoff: you manage five PostgreSQL instances instead of one. Backup scripts run per-instance. Schema migrations run per-product. It's more to manage, but the isolation is worth it at the product level.
PostgreSQL configuration I use across all stacks:
max_connections = 100
shared_buffers = 256MB
work_mem = 16MB
maintenance_work_mem = 128MB
This is conservative — the defaults work fine for most early-stage products.
A single Redis instance (the misar-redis container) runs on the main VPS and serves all five products via Docker network. All apps connect to it via redis://:<password>@misar-redis:6379 — internal Docker network, never exposed publicly.
Using Redis for:
Redis is simple to run self-hosted. No managed Redis service I've seen is worth the cost at moderate volumes. Redis Cloud's cheapest paid plan is $15/month for 250MB. The Redis instance on my VPS uses about 180MB for five products combined.
Mailcow is a self-hosted email server that handles SMTP, IMAP, and DKIM/DMARC/SPF configuration. I use it for transactional email via the MisarMail API layer (which is a thin API wrapper over the SMTP connection).
Running your own mail server in 2026 is viable if you do the DNS configuration correctly. The three requirements:
I haven't had deliverability issues with this setup for transactional email. For high-volume marketing email, I'd reconsider.
Forgejo is a self-hosted Git platform (forked from Gitea). It runs on the main VPS, handles all source code, and integrates with act_runner for CI/CD.
act_runner executes Forgejo Actions workflows — the same YAML format as GitHub Actions. All CI runs on the VPS instead of GitHub's cloud runners.
Why this matters for AI products: LLM application tests often involve API keys, model configs, and infrastructure-specific environment variables that you don't want to expose to a third-party CI system. Running CI on your own infra keeps those credentials internal.
This is the variable-cost part of the stack. Hetzner GPU Cloud (A100 40GB, €2.49/hour) runs vLLM serving Llama 3.3 70B for inference.
I don't run a persistent GPU instance. Instead, I use a startup script that provisions the instance when a product feature needs LLM inference capacity, and scales it down during low-traffic periods. Average utilisation across the month translates to about 35–45 hours of GPU time — roughly $87–112/month.
Why vLLM, not Ollama: Ollama is the right choice for local development and single-user scenarios. For concurrent production traffic — multiple users hitting your API simultaneously — vLLM's continuous batching and PagedAttention algorithms give significantly better throughput. At 10 concurrent requests, vLLM handles the queue in roughly the same time it takes Ollama to process 3. The difference compounds under real load.
vLLM deployment is more complex than Ollama (requires CUDA drivers, more memory management config), but it's the production-grade choice.
| Service | Self-Hosted Cost | Managed Equivalent | Managed Cost |
|---|---|---|---|
| App hosting (5 apps) | Coolify on CX41 — €38/mo | Vercel Pro × 5 | $100/mo |
| Database + Auth (5 products) | Supabase VPS (CCX22) — €54/mo | Supabase Pro × 5 | $125/mo |
| Redis | Included in main VPS | Redis Cloud Essentials × 5 | $75/mo |
| Email (transactional) | Mailcow on main VPS | Resend Pro | $89/mo |
| Git hosting | Forgejo on main VPS | GitHub Team | $16/mo |
| CI/CD | act_runner on main VPS | GitHub Actions minutes | $30–50/mo |
| LLM inference | vLLM on GPU Cloud — ~$100/mo | OpenAI GPT-4o (equiv. volume) | $800–1,200/mo |
| Storage (500GB total) | Hetzner Volumes — €24/mo | S3/Supabase Storage | $60/mo |
| Total | ~$310–370/mo | ~$1,295–1,655/mo |
The LLM inference column deserves attention. At equivalent inference volume, self-hosted Llama 3.3 70B costs roughly $0.0008–0.001 per 1K tokens (GPU time amortised). GPT-4o is $0.005–0.015 per 1K tokens input/output. At 50M tokens/month, that's a $250 vs $400–750 difference — just on that one service.
Self-hosting isn't free. The cost that doesn't appear in the table above:
Engineering time: 4–8 hours/week ongoing. This includes monitoring alerts, package/security updates, occasional incident response, and infrastructure changes. If your time is worth $100/hour, that's $400–800/week of implicit cost — far more than the managed alternative. Self-hosting only makes financial sense if you have the engineering capacity to absorb this.
On-call responsibility. If Redis goes down at 2am, it's your problem. If Supabase.com has an outage at 2am, it's their problem. Managed services include a support and SRE team in the price. Self-hosted means you are the SRE.
Managed services move faster. Supabase.com ships new features — branching, edge functions, realtime improvements — faster than the open-source release cycle. If you need those features immediately, managed has an advantage.
Cold-start complexity. Setting up this stack from scratch takes about 2–3 weeks of focused work. Supabase.com + Vercel + Redis Cloud takes about 2 hours.
Self-hosting starts making sense when two conditions are met:
Condition 1: You're running 3+ products or services on the same infrastructure. Fixed overhead amortises across all products. Below three products, the engineering time cost exceeds the managed service savings for most teams.
Condition 2: Your managed service spend exceeds $800–1,000/month. At that level, self-hosted infrastructure pays for itself within 3–4 months including setup time. Below that, the time investment rarely pencils out unless you have a specific compliance or data sovereignty requirement.
If you're at one product, under $500/month in managed costs, and don't have a specific technical or regulatory reason to self-host — use managed services. They're genuinely better for that stage.
If you're at three or more products, spending $1,000+/month on infrastructure, and have the engineering bandwidth to maintain a VPS — self-hosting is the right call.
This is the order I'd follow if I were setting this up again:
Week 1: Compute and networking
Week 2: Core services
Week 3: Supabase and apps
Ongoing:
The biggest operational risk with self-hosted infrastructure isn't the setup — it's not knowing something is broken until a user tells you.
Minimum monitoring stack:
Alerts I actually act on:
Monitoring setup takes about 4 hours. Skipping it costs you hours of debugging the first time something silently fails.
This article covered the full stack overview. Upcoming in Self-Hosted Everything:
If you want to skip the 3-week setup and start from a working template, I've packaged the full infrastructure configuration — Docker Compose files for every service, Coolify configuration, Supabase VPS setup scripts, vLLM deployment scripts, and the monitoring stack — into the Self-Hosted AI Infrastructure Kit.
It's the exact setup documented in this article, ready to deploy.
→ Self-Hosted AI Infrastructure Kit — $19
1 followers
AI systems builder · 7 years in production. RAG, self-hosted infra, agent architecture. 📬 Deep-dives → mrgulshanyadav.substack.com
Comments
Sign in to join the conversation
No comments yet. Be the first to share your thoughts!