No vendor promises. No marketing fluff.
Just the stuff that actually moves the needle.
Principal Product Solutions Engineer
Charly Batista
Making databases go brrr since the early 2000s 🐘
EXPLAIN output is always ugly.
* ∞ may be a slight exaggeration
The Reality Check
"It was fast last week. I didn't change anything. Now it's crawling."
max_connections"This talk gives you that mental model.
Agenda
Mental Model
Every layer can be a bottleneck.
Tuning the wrong layer wastes time.
Always measure first.
Layer 1 — Query
Without index (Seq Scan)
-- Reads EVERY row
SELECT * FROM orders
WHERE customer_id = 42;
-- Cost: O(n) → table grows = pain
With index
CREATE INDEX idx_orders_customer
ON orders (customer_id);
-- Cost: O(log n) → scales cleanly
Layer 1 — Query
Composite index order matters
-- Query filters status first, then date
-- GOOD:
CREATE INDEX ON events (status, created_at);
-- BAD for this query:
CREATE INDEX ON events (created_at, status);
PG Partial index
-- Index ONLY unprocessed rows
CREATE INDEX ON jobs (created_at)
WHERE status = 'pending';
PG Covering index (Index-Only Scan)
CREATE INDEX ON orders (customer_id)
INCLUDE (total, status);
-- No heap access needed!
MY Covering index (same idea)
CREATE INDEX idx_cover
ON orders (customer_id, total, status);
-- InnoDB: leaf stores all values
Layer 2 — Engine Config
The database exists to avoid going to disk. Everything hot should live in RAM.
shared_buffers — shared page cacheeffective_cache_size — planner hintwork_mem — per-sort / per-hashmaintenance_work_mem — VACUUM, index buildswal_buffers — WAL write buffer
innodb_buffer_pool_size — page cache + indexesinnodb_buffer_pool_instances — reduce contentioninnodb_log_buffer_size — redo log buffertmp_table_size / max_heap_table_sizesort_buffer_size — per-session
Layer 2 — Engine Config
effective_cache_size = 50–75% RAMshared_buffers > 40% of RAM.innodb_buffer_pool_instancesLayer 2 — Engine Config
work_mem Trap-- You see this in EXPLAIN ANALYZE:
Sort (cost=120345.00...)
Sort Method: external merge Disk: 87456kB
-- ^^^^ BAD
-- Fix: raise work_mem for this session
SET work_mem = '256MB';
-- now Sort Method: quicksort Memory: 25kB
work_mem is allocated per sort / per hash node per query, per connection.work_mem low (16–64 MB).hash_mem_multiplier for fine-grained control of hash joins.
sort_buffer_size is also per-session. Same trap applies.
Layer 3 — I/O
NVMe SSD
random read latency
SATA SSD
random read latency
Spinning HDD
random read latency
synchronous_commit = off for async writes
innodb_flush_method = O_DIRECT avoids double-buffering
Layer 4 — OS / Kernel
Virtual Memory
# Don't swap under databases!
vm.swappiness = 1 # (not 0, keep swapper alive)
# Dirty page write-back
vm.dirty_ratio = 15 # max dirty pages before blocking writes
vm.dirty_background_ratio = 3 # background flush threshold
# Huge pages (PostgreSQL)
vm.nr_hugepages = <size/2MB>
# saves TLB pressure on large shared_buffers
Network & File Descriptors
# Max open files
fs.file-max = 2097152
# For the DB process ulimit:
# nofile = 65535
# TCP tuning (high-concurrency)
net.core.somaxconn = 65535
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_fin_timeout = 15
# NUMA: pin buffer pool to NUMA node
numactl --cpunodebind=0 --membind=0 \
mysqld
Layer 4 — OS / Kernel
Disable THP
# Check
cat /sys/kernel/mm/transparent_hugepage/enabled
# Disable (add to rc.local / systemd unit)
echo never > \
/sys/kernel/mm/transparent_hugepage/enabled
echo never > \
/sys/kernel/mm/transparent_hugepage/defrag
I/O Scheduler
# Check current scheduler
cat /sys/block/nvme0n1/queue/scheduler
# [none] mq-deadline kyber bfq
# NVMe/SSD: use 'none' or 'mq-deadline'
echo mq-deadline > \
/sys/block/nvme0n1/queue/scheduler
# HDD: bfq or mq-deadline
none is often optimal on NVMe.
Layer 2 — Engine Config
# PostgreSQL: use a connection pooler
# PgBouncer (transaction mode)
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 50 # actual PG connections
thread_cache_size and innodb_thread_concurrency still matter at high concurrency.
max_connections > 200–300 in PostgreSQL without a pooler.
Reality Check
max_connections handles more users"SELECT * prevents covering index scans, inflates network/memory, and breaks query plans.
Diagnose First
PG EXPLAIN ANALYZE BUFFERS
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE customer_id = 42;
-- Key things to look for:
-- "Seq Scan" → missing index?
-- "Rows=10000 vs actual rows=3"
-- → stale statistics, run ANALYZE
-- "Buffers: shared hit=0 read=8000"
-- → cache miss → I/O bound
-- "Sort Method: external merge Disk"
-- → raise work_mem
MY EXPLAIN FORMAT=JSON
EXPLAIN FORMAT=JSON
SELECT * FROM orders
WHERE customer_id = 42;
-- Look for:
-- "access_type": "ALL" → Seq Scan
-- "rows": 500000 → bad estimate
-- "Using filesort" → no useful index for ORDER BY
-- "Using temporary" → temp table → slow
-- Enable slow query log!
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 0.1;
Maintenance Matters
-- Per-table override for hot tables
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_analyze_scale_factor = 0.005,
autovacuum_vacuum_cost_delay = 2
);
Bloated tables = more pages to scan = slower everything.
-- Persistent statistics (default in 5.6+)
SET GLOBAL innodb_stats_persistent = ON;
SET GLOBAL innodb_stats_auto_recalc = ON;
-- Force refresh after bulk load
ANALYZE TABLE orders;
Stale stats → wrong index choice → 100× slower plans.
ANALYZE (PG) or ANALYZE TABLE (MySQL).
The Inflection Point
You've done the work. Indexes are optimal. Config is tuned. And it's still slow. Now what?
iostat -x 1 # I/O saturation
sar -u 1 # CPU iowait
vmstat 1 # paging / blocking
perf top # CPU profiling
# PG:
SELECT * FROM pg_stat_bgwriter;
SELECT * FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
# MySQL:
SHOW ENGINE INNODB STATUS\G
Layer 5 — Hardware
Random I/O latency is the #1 bottleneck for OLTP. NVMe vs SATA SSD can be 5× difference. HDD is 80× worse.
Upgrade when: iowait > 20%, iostat shows 100% util
More RAM = larger buffer pool = fewer disk reads. If your working set doesn't fit, add RAM before anything else.
Upgrade when: buffer hit rate < 95%
Databases are rarely CPU-bound in OLTP. More cores help with concurrency. Clock speed matters for complex queries.
Upgrade when: CPU > 80% with no iowait
Layer 5 — Hardware
| Storage Type | Random IOPS | Latency | Best For | Avoid For |
|---|---|---|---|---|
| NVMe U.2 / PCIe | 500K–2M+ | < 100μs | Hot OLTP, WAL, primary DB | — |
| SATA SSD | 50K–100K | 200–500μs | Read replicas, dev/staging | Write-heavy primaries |
| HDD (7200 RPM) | 100–200 | 8–12ms | Cold backups, archives | Any live DB workload |
| AWS EBS gp3 | 3K–16K | 1–2ms | General cloud OLTP | High-frequency micro-transactions |
| AWS EBS io2 / local NVMe | Up to 256K | < 500μs | Production OLTP in cloud | Cost-sensitive workloads |
The Framework
Take-Home Reference
| shared_buffers | 25% RAM |
| effective_cache_size | 50–75% RAM |
| work_mem | 16–64 MB (global) |
| maintenance_work_mem | 1–2 GB (autovacuum) |
| max_connections | < 200–300 + pooler |
| checkpoint_completion_target | 0.9 |
| random_page_cost | 1.1–1.5 (NVMe) |
| wal_compression | on (PG 15+: lz4) |
| innodb_buffer_pool_size | 70–80% RAM |
| innodb_buffer_pool_instances | 1 per GB, max 16 |
| innodb_flush_method | O_DIRECT |
| innodb_flush_log_at_trx_commit | 1 (safe) / 2 (fast) |
| innodb_io_capacity | match device IOPS |
| innodb_log_file_size | 256 MB – 2 GB |
| max_connections | 151 default → tune |
| slow_query_log | always ON in prod |
Summary
EXPLAIN ANALYZE, pg_stat_statements, slow query log.Performance Without Magic: How Open Source Databases Scale · FOSSASIA 2026

linkedin.com/in/charlybatista
📦 Slides
github.com/elchinoo/fossasia-2026
🐘 EDB
enterprisedb.com