I self-host analytics with SQLite + Astro middleware — and GA4 is never coming back
One .db file, 200 lines of TypeScript, no client-side JS, no cookie banner. Here's how I count visitors for this portfolio — and when this setup is NOT enough.
The site you’re reading has a “X people visited today” counter on the homepage. It doesn’t call Google Analytics, doesn’t embed Plausible / Umami / Fathom, doesn’t ship a single line of third-party JavaScript. The whole counter runs on:
- A
~2 MBapp.dbfile sitting next to the app. - A 60-line Astro middleware.
- One
INSERTper valid pageview. - One
SELECT COUNT(DISTINCT ip)per homepage render.
That’s it.
I switched off Google Analytics 4 and onto this setup 14 months ago for 3 freelance projects. After 14 months, I can’t remember a single reason to go back. Here’s the setup, the real trade-offs, and when you shouldn’t follow this path.

Why I left GA4
Three reasons, in order of how much they bothered me:
1. Cookie banner. GDPR + ePrivacy force you to ask consent before loading gtag.js. You drop in Cookiebot / Termly, spend 3 days making the banner just-pretty-enough, and afterward 70% of users hit “Reject”. The numbers in GA represent 30% of reality. Making business decisions on that 30% is taking placebo for a real headache.
2. Performance budget burned for no payoff. gtag.js is ~50 KB compressed. It loads, then fires 2-3 requests to Google. My mobile Lighthouse dropped from 99 to 87 because of that script alone. A portfolio site can’t live with 87.
3. I don’t need 90% of GA4’s features. No conversion funnel, no e-commerce, no audience segmentation. I just want to know: “how many people read the Chainsaw Man post today?”. GA4 is Photoshop when I need a pencil.
Architecture — 3 pieces
┌──────────────────┐
│ Browser GET │
│ /blog/foo │
└────────┬─────────┘
│ HTTP request
▼
┌──────────────────┐
│ Astro middleware │ ──► trackVisit({ ip, ua, path, ... })
│ (server-side) │
└────────┬─────────┘
│
▼
┌──────────────────┐ ┌──────────────────┐
│ visits.ts logic │ ────► │ SQLite (WAL) │
│ - bot UA filter │ │ visits table │
│ - 60s dedupe │ │ │
│ - day bucketing │ └──────────────────┘
└──────────────────┘
The browser sees nothing. No request, no cookie, no script. All tracking happens server-side, inside the same request that renders the page.
Part 1: Schema — 1 table, 9 columns
CREATE TABLE visits (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ip TEXT NOT NULL,
ua TEXT,
path TEXT,
referrer TEXT,
lang TEXT,
country TEXT,
day TEXT NOT NULL, -- YYYY-MM-DD in Asia/Ho_Chi_Minh
created_at INTEGER NOT NULL DEFAULT (strftime('%s','now'))
);
CREATE INDEX idx_visits_day ON visits(day);
CREATE INDEX idx_visits_ip_day ON visits(ip, day);
CREATE INDEX idx_visits_created ON visits(created_at DESC);
A few intentional choices:
-
dayisTEXT YYYY-MM-DD, notINTEGER timestamp. IGROUP BY dayconstantly — string equality is faster than date arithmetic, and I avoid timezone math on every query. Bucketing happens at insert time, in Vietnam time:function todayVN(): string { return new Date().toLocaleDateString('en-CA', { timeZone: 'Asia/Ho_Chi_Minh', }); }'en-CA'givesYYYY-MM-DD. Trick works in any runtime withIntl.DateTimeFormat. -
IPs stored raw, no hashing. GDPR says IP is PII — true. But this is first-party log on a server I own, never leaves the box, kept for forensic use when abuse hits (form spam, API scraping). I
chmod 600 app.dband don’t expose it through any public API. -
No
user_id, nosession_id. Both require cookies or fingerprinting — instant GDPR territory. “Distinct IP per day” is a good-enough proxy for “person” at my traffic level.
Part 2: Middleware — 30 lines

src/middleware.ts:
import { defineMiddleware } from 'astro:middleware';
import { trackVisit } from '@/lib/visits';
export const onRequest = defineMiddleware(async (ctx, next) => {
if (ctx.request.method === 'GET') {
const headers = ctx.request.headers;
let ip: string | null = null;
try {
ip = ctx.clientAddress || null;
} catch {
ip = null;
}
if (!ip) {
const fwd = headers.get('x-forwarded-for');
if (fwd) ip = fwd.split(',')[0].trim();
else ip = headers.get('x-real-ip') || headers.get('cf-connecting-ip');
}
trackVisit({
ip,
ua: headers.get('user-agent'),
path: new URL(ctx.request.url).pathname,
referrer: headers.get('referer'),
lang: ctx.locals.lang ?? null,
country:
headers.get('cf-ipcountry') ||
headers.get('x-vercel-ip-country') ||
null,
});
}
return next();
});
Subtleties:
ctx.clientAddresscan throw on prerendered routes (Astro hybrid: static pages have no per-request address). Wrap it in try/catch.- IP fallback chain:
clientAddress→x-forwarded-for[0]→x-real-ip→cf-connecting-ip. Behind local nginx the first one is fine; behind Cloudflare the last one is canonical. - Country comes from Cloudflare / Vercel header — they tag based on edge location. I don’t run MaxMind, don’t store IPs to self-geolocate.
Part 3: Filter — 3 layers before insert
export function trackVisit(v: TrackInput): void {
try {
if (!v.ip) return;
if (isBotUA(v.ua)) return;
if (!isTrackablePath(v.path)) return;
if (!shouldRecord(v.ip, v.path)) return;
insertStmt.run(v.ip, v.ua, v.path, v.referrer, v.lang, v.country, todayVN());
} catch {
// analytics must never throw
}
}
Bot UA filter:
const BOT_UA = /bot|crawler|spider|slurp|googlebot|yandex|baidu|duckduck|
facebookexternalhit|twitterbot|whatsapp|telegrambot|
linkedinbot|pingdom|uptimerobot|ahrefs|semrush|mj12|petalbot|
applebot|headless|monitor|fetch|preview/i;
One regex covers ~95% of bots I see in nginx access logs. Honest bots (Googlebot Search Indexing) self-identify in their UA — there’s no clean way to fool this regex without trying hard.
Path filter:
function isTrackablePath(p: string): boolean {
if (p.startsWith('/api/')) return false;
if (p.startsWith('/admin')) return false;
if (p.startsWith('/og/')) return false;
if (p.startsWith('/_astro/')) return false;
if (p === '/rss.xml') return false;
// Skip anything with an extension (asset)
const last = p.split('/').pop() || '';
if (last.includes('.')) return false;
return true;
}
I only track real HTML pages, never API calls or asset requests.
60s dedupe:
const recent = new Map<string, number>();
const DEDUPE_WINDOW_MS = 60_000;
function shouldRecord(ip: string, path: string): boolean {
const key = ip + '|' + path;
const now = Date.now();
const last = recent.get(key);
if (last !== undefined && now - last < DEDUPE_WINDOW_MS) return false;
recent.set(key, now);
// Lazy evict if Map > 5000 entries
if (recent.size > 5000) {
const cutoff = now - DEDUPE_WINDOW_MS * 5;
for (const [k, v] of recent) if (v < cutoff) recent.delete(k);
}
return true;
}
In-memory Map. A user mashing F5 doesn’t double-count. A user with 5 tabs of the same URL within 60s only counts once. Resets on container restart — fine, dedupe is just noise reduction.
Part 4: Reading numbers — COUNT(DISTINCT ip)
For the homepage’s “X people today” counter:
const todayUniqueStmt = db.prepare(
'SELECT COUNT(DISTINCT ip) AS c FROM visits WHERE day = ?'
);
export function publicTodayStats() {
const day = todayVN();
return {
today: todayUniqueStmt.get(day)?.c ?? 0,
pageviewsToday: todayPageviewsStmt.get(day)?.c ?? 0,
totalUnique: totalUniqueStmt.get()?.c ?? 0,
totalPageviews: totalPageviewsStmt.get()?.c ?? 0,
};
}
db.prepare runs once at module load — after that .get(day) is just bind + execute. SQLite returns in < 1ms with the idx_visits_day index across ~50k rows. I run this SSR on every homepage request and Lighthouse stays 100.
The admin page has heavier queries (top paths, top countries, 14-day daily series) — all return < 5ms on 6 months of data.
Trade-offs — including what’s NOT here
After 14 months, here’s what I don’t have with this setup, ranked by “do I miss it”:
| GA feature | Do I need it? | Replacement |
|---|---|---|
| Real-time visitors | No | tail -f log |
| Conversion funnel | No (portfolio) | — |
| User flow / session replay | No | — |
| Audience segmentation | No | — |
| E-commerce events | Yes (1 freelance project) | Separate events table |
| Click heatmap | Sometimes | Microsoft Clarity (banner cost) |
| Auto email reports | Maybe | Cron + script that emails top_paths |
| Multi-domain tracking | No | — |
| Goals + custom events | Sometimes | Add event_name column |
Heatmap is the only thing I occasionally miss. When I need it — I temporarily turn on Microsoft Clarity for a week, look at the heatmap, turn it off. A week’s worth of data answers the question; no need to keep a banner alive year-round.
When you should NOT self-host SQLite analytics
Honest take. Don’t follow this if:
- You need to share a dashboard with clients / boss. SQLite + custom UI means you build the admin page yourself. Plausible / Umami have cloud versions that share with one click.
- Site has > 1M pageviews/month. SQLite still handles it, but
COUNT(DISTINCT)on 10M rows starts to slow down. At that scale consider PostgreSQL or ClickHouse, or just self-host Plausible. - You need geolocation more granular than country. Nothing free replaces the MaxMind GeoIP database. Cloudflare headers only have country, not city.
- You deploy to Vercel / Netlify / Cloudflare Pages. Serverless has no persistent disk — SQLite can’t survive there. You’d need Turso / Cloudflare D1 (SQLite-compatible but with quirks) or just go to Postgres.
This setup fits: Astro / Next / Laravel running on a VPS with disk. Site < 500k pageviews / month. Owner can write 200 lines of TypeScript when needed.
Real numbers — this site
I run this for nhatdote.io.vn. After 8 months:
app.dbfile: 2.3 MB (covers donations, bug_reports, coffee_clicks, visits, issue_comments).visitstable: ~47k rows, average ~180/day.- Lighthouse mobile: 100/100 every metric (vs 87 with GA4).
- Backup:
cp app.db backup-$(date +%F).dbevery 3 hours via cron — one 2 MB file. 7 days = 14 MB. A year of rolling window = ~17 GB if I kept everything — I rotate 30 days. - Cost: $0 (VPS already paid for the app, SQLite adds nothing).
GA4 for the same site would be $0 but cost:
- A cookie banner I don’t want
- 50 KB of JS shipped to every visitor
- A dashboard I have to log into
- Numbers off by 70% because users hit reject
Closing
GA4 is built for enterprise marketing teams measuring $100k campaigns. If you’re not that team — you’re using a butcher knife to cut salad.
200 lines of TypeScript + 1 SQLite file + 1 middleware is the entire analytics stack I need for a portfolio and 3 freelance projects. Privacy-first not because of law — because most of the time I just want to know “how many people read this today”. SELECT COUNT(DISTINCT ip) FROM visits WHERE day = '2026-05-09' answers that faster than the GA4 dashboard loads.
Sometimes the right answer is less, not more.
Sources
- The schema, middleware and filter logic above are the actual code running on
nhatdote.io.vn. You can read aboutAstro.locals.langmiddleware patterns in the Astro middleware docs. better-sqlite3(Node binding for SQLite) — github.com/WiseLibs/better-sqlite3.