Tech behind the tool

Vadim Kravcenko
Vadim Kravcenko
Sep 13, 2024 · 5 min read

When I set out to build SEOJuice, the first real decision wasn't a feature or a pricing model. It was a question I kept coming back to: do I reach for the stack I already know, or the one that looks impressive on a "How We Built It" blog post? I chose boring. And I'd do it again, because boring technology that you understand deeply will outperform trendy technology you're learning on the fly every single time. (Ask me about the three weeks I lost evaluating a graph database before admitting PostgreSQL could do everything I needed.)

This article is a transparent look at every layer of SEOJuice's infrastructure -- not just the final choices, but the trade-offs behind them. If you're a founder evaluating your own stack, I hope the reasoning is more useful than the labels.

Tech-stack Philosophy

My philosophy boils down to three rules I learned the hard way running side projects for years before SEOJuice:

Rule 1: Own the critical path. If the core functionality of your product depends on a third-party API, you've handed someone else a kill switch. SEOJuice's crawling engine, link analysis, and scoring algorithms all run on infrastructure I control. The moment you outsource your core differentiator, you stop being a product and start being a reseller.

Rule 2: Pick technologies you can debug at 2 AM. I considered FastAPI for the backend. It's faster in benchmarks, more modern, and the async story is cleaner. But I've been writing Django since 2016. When Sentry fires at midnight, I don't want to spend 40 minutes reading documentation -- I want to know exactly which middleware is misbehaving. Familiarity isn't laziness; it's operational maturity.

Rule 3: Minimize moving parts. Every additional service in your architecture is another thing that can fail, another thing that needs monitoring, another dependency to update. I count the services in my docker-compose file the way a backpacker counts ounces.

Back-End Stack

At the heart of SEOJuice is Django. I chose it over Flask (too bare-bones for a product this complex), FastAPI (great for APIs, but I needed Django's admin, ORM, and template engine), and Rails (I don't know Ruby well enough to debug it blind). Django's "batteries-included" philosophy meant I didn't spend the first three months assembling an authentication system, an admin panel, and a migration framework from separate packages. The trade-off is that Django can feel heavy for simple microservices -- but SEOJuice isn't a simple microservice. It's a monolith with 15+ apps, and Django handles that well.

The application runs on ASGI via Uvicorn, sitting behind Nginx as a reverse proxy. I switched from Gunicorn (WSGI) to Uvicorn about a year in, when I added WebSocket support for real-time notifications and the MCP server. That migration was surprisingly painless -- one of the benefits of Django's gradual ASGI adoption. Cloudflare sits in front of everything, handling DNS, SSL termination, and DDoS protection. I've watched Cloudflare absorb traffic spikes that would have taken my servers down without breaking a sweat.

All servers run on Linux hosted by Hetzner. Why Hetzner over AWS? Honestly, cost. A dedicated Hetzner server with 64GB RAM costs roughly what an equivalent EC2 instance costs for one week. For a bootstrapped product, that difference matters enormously. The trade-off is less automation -- no auto-scaling, no managed Kubernetes. I handle deployments with Docker and some shell scripts. It's not glamorous, but it works, and the money I save goes directly into product development.

The database is PostgreSQL with the pgvector extension. I evaluated MongoDB early on (everyone was using it in 2022) and quickly realized that SEO data is deeply relational -- pages belong to websites, keywords belong to pages, links connect pages to each other. A document store would have meant reimplementing half of what Postgres gives you for free. The pgvector extension was the clincher: it lets me store and query embedding vectors directly alongside the relational data, which powers our content similarity matching and AI features without needing a separate vector database.

For background tasks, I use Celery with Redis as the broker. Celery gets a lot of criticism in the Python community, and some of it is deserved -- the documentation is scattered, the configuration options are overwhelming, and debugging task failures can be maddening. But I haven't found anything better for the sheer volume of background work SEOJuice does: crawling thousands of pages, running NLP analysis, generating reports, monitoring backlinks. I considered Dramatiq and Huey as lighter alternatives. Dramatiq is genuinely good, but Celery's ecosystem (beat scheduler, flower monitoring, django-celery-results) won me over.

Authentication goes through Auth0. I built a custom auth system for a previous project and spent an embarrassing amount of time dealing with password reset edge cases, rate limiting, and MFA flows. Auth0 handles all of that. The cost scales with users, which stings at certain thresholds, but the engineering time it saves is worth multiples of the subscription.

Everything is containerized with Docker. My development environment and production environment run identical containers. This eliminated an entire class of "works on my machine" bugs that plagued my earlier projects. Sentry monitors exceptions in real-time -- I can trace a user-reported bug from their session to the exact line of code in under a minute.

Front-End Stack

I made a deliberate decision to avoid React, Vue, and every other JavaScript framework. The SEOJuice dashboard is server-rendered HTML with Tailwind CSS and Alpine.js for interactivity. This is the decision I get the most pushback on from other developers, and I understand why -- it feels like building with hand tools when power tools exist.

Here's my reasoning: SEOJuice is a data-heavy dashboard. Users look at tables, charts, and reports. They don't need real-time collaborative editing or complex client-side state management. Server rendering means every page load is fast, the initial payload is small, and I don't maintain a separate frontend build pipeline, API layer, or state management library. Alpine.js handles the dropdowns, modals, and interactive filters. For the few places where I need richer interactivity (the real-time monitoring dashboard), I use htmx to swap HTML fragments without a full page reload.

Static assets go through Bunny.net CDN. I tested Cloudflare's CDN (which I already use for DNS) but found Bunny's per-request pricing more predictable for my traffic patterns. The difference is marginal -- either would work.

Processing and AI

A significant chunk of what makes SEOJuice useful is processing large volumes of data and using AI for features that would be impossible to build manually.

For language models, I integrate with OpenAI and Claude. I use different models for different tasks based on months of testing: GPT-4o for structured data extraction (it follows JSON schemas more reliably), Claude for content analysis and suggestions (it writes more naturally and handles nuance better). I tried running open-source models locally -- Llama 2, then Llama 3 -- and the quality gap for production use was still too wide to justify the infrastructure cost. That calculus changes every few months, so I revisit it regularly.

I use pgvector for embedding search rather than a dedicated vector database like Pinecone or Weaviate. Having vectors in PostgreSQL alongside the rest of the data means I can join embedding similarity results with relational queries in a single SQL statement. The performance is adequate for our scale. If I were running billions of vectors, I'd need something specialized. At millions, Postgres handles it fine.

On the NLP side, NumPy, NLTK, and Scikit-learn do the heavy lifting for keyword extraction, content scoring, and classification. These libraries aren't exciting, but they're battle-tested. When I run a TF-IDF analysis across 50,000 pages, I need the result to be correct, not innovative.

For crawling JavaScript-heavy sites, I use Playwright running in a separate container. It's more resource-intensive than simple HTTP requests, but modern websites often render critical content via JavaScript. If our crawler can't see it, we can't analyze it. Playwright handles SPAs, lazy-loaded content, and client-side routing that would be invisible to a traditional crawler.

Additional Tools and Services

The supporting infrastructure is where I've been most willing to use third-party services, since none of these are core differentiators:

Crisp for live chat. I tried Intercom first -- it was too expensive and too complex for a one-person support team. Crisp does what I need at a fraction of the cost.

Customer.io for email. Transactional emails (password resets, report delivery), onboarding sequences, and product updates all run through Customer.io. I switched from a simpler setup (Django's built-in email with SendGrid) because I needed behavioral triggers -- "send a tip email if the user hasn't connected their WordPress site within 3 days."

Paddle for payments. I started with Stripe and migrated to Paddle as a Merchant of Record. The reason was entirely practical: Paddle handles VAT calculation, collection, and remittance for every country. As a bootstrapped European founder selling globally, building my own tax compliance system would have been a full-time job. Paddle takes a larger cut than Stripe, but it eliminates an entire category of operational burden.

ChartMogul for revenue analytics. MRR, churn rate, LTV, expansion revenue -- I need these numbers accurate and updating automatically. I could compute them from Paddle's API, but ChartMogul does it better and surfaces trends I wouldn't think to check.

For visitor analytics, I run a self-hosted instance of Plausible. No cookies, no consent banners, no sending visitor data to Google. It shows me traffic sources, top pages, and referrers -- which is genuinely all I need. (I wrote about Plausible in more detail in our open-source SEO tools guide.)

What I'd Change

No stack is perfect, and I've learned enough to know what I'd do differently if I were starting from scratch:

I'd invest in Kubernetes earlier. Not day one -- that's overkill for an MVP -- but around the time I hit 10 background workers. Orchestrating containers with shell scripts and systemd works until it doesn't, and the transition to proper orchestration while serving production traffic is nerve-wracking.

I'd separate the API from the monolith sooner. Having the dashboard, the public API, the WordPress plugin API, and the MCP server all sharing a single Django process means a spike in API traffic can slow down the dashboard. I'm gradually extracting these into separate services, but it would have been cleaner to design for it from the start.

I would not change the core choices -- Django, PostgreSQL, Celery, server-rendered HTML. Those have proven themselves through two years of production traffic, and the simplicity they provide is worth more than the performance gains I might get from trendier alternatives.

Conclusion

As a solo founder, the most dangerous trap is building your stack for the Hacker News audience instead of your actual product requirements. Every technology choice I've described here was driven by a specific need, evaluated against alternatives, and chosen because it let me ship faster with fewer surprises. The result is a system I can operate alone, debug quickly, and extend without rewriting.

But this is just the beginning. The stack evolves as the product does -- I'm continuously working to improve SEOJuice, and the infrastructure has to keep pace. If you're a founder making these same decisions, I hope seeing the reasoning behind mine saves you some of the trial-and-error I went through.

Let's keep building.

Cheers,
Vadim

Related reading:

Discussion (3 comments)

Lisa Wang, Content Marketing Lead

Lisa Wang, Content Marketing Lead

7 months, 1 week

In my 12 years building B2B SaaS, choosing Django for core routing/ORM and owning critical components (instead of leaning on many third-party providers) is a pragmatic move for control and security. Practically, pairing Django with async workers (Celery + Redis), Postgres connection pooling, and lightweight observability (Prometheus/Grafana + Sentry) keeps latency predictable — we cut background-job latency ~40% after that stack change. Happy to connect and compare notes on scaling patterns.

TrafficBooster

TrafficBooster

7 months, 1 week

Nice—tbh that stack is solid and predictable. One extra thing I'd add: invest in distributed tracing (OpenTelemetry/Jaeger) so you can pinpoint whether Celery, DB, or Redis is the culprit. Actionable tips that helped my team: set worker_prefetch_multiplier=1 + ack_late for fairness, use pgBouncer in transaction pooling (watch prepared-statement quirks), and track queue length + task runtime histograms. ngl, owning infra adds ops overhead but gives control. Which change gave you the biggest latency drop — Celery tuning, pooling, or observability?

MarketingNinja42

MarketingNinja42

7 months, 1 week

tbh love the transparency about using Django and minimizing third-party deps — ngl we moved a bunch of vendor logic in-house once and the maintenance surprised us. How are you handling async scraping and rate limits (Celery/RQ or something custom)?

WebDev_Guru

WebDev_Guru

6 months, 3 weeks

tbh love the “own the critical components” angle — Django's a solid pick for the backend. ngl, if you're running heavy crawls I'd layer Celery + Redis for async workers, Postgres partitioning and caching, plus a rotating proxy pool with exponential backoff (saved my scraper after a couple of bans). How are you handling proxy rotations and crawl rate‑limits?

SEOJuice
Stay visible everywhere
Get discovered across Google and AI platforms with research-based optimizations.
Works with any CMS
Automated Internal Links
On-Page SEO Optimizations
Get Started Free

no credit card required