FOSSASIA Summit 2026

Performance Without Magic:
How Open Source Databases Scale

No vendor promises. No marketing fluff.
Just the stuff that actually moves the needle.

Charly Batista · March 2026
PostgreSQL + MySQL + Linux Internals

Principal Product Solutions Engineer

Charly Batista

Making databases go brrr since the early 2000s 🐘

🗓️ Over two decades in the trenches with open-source databases — enough to know where all the bodies are buried.

🏛️ President, Brazilian PostgreSQL Association — because someone has to herd the elephants.

✈️ Shipped production DBs on 5 continents. The bugs look different everywhere; the EXPLAIN output is always ugly.

🇧🇷 Brazil 🇺🇸 USA 🇨🇳 China 🇪🇺 Europe 🇦🇺 Australia + more

🐘

20+

years open-source

continents

∞

slow queries fixed

PG president 🏆

* ∞ may be a slight exaggeration

The Reality Check

We've All Been Here

"It was fast last week. I didn't change anything. Now it's crawling."

What we usually hear

"Just tune your queries"
"Add more RAM"
"Throw it in the cloud"
"Increase max_connections"

What we actually need

Understanding why it's slow
A mental model of the engine
Targeted, measurable changes
Knowing when to buy hardware

This talk gives you that mental model.

Agenda

What We'll Cover

The Performance Stack
Indexing Done Right
Memory Management
I/O: The Real Bottleneck
OS / Kernel Tuning

Connection Management
Busting the Myths
Reading Query Plans
When to Buy Hardware
The Decision Framework

Focus: PostgreSQL & MySQL/InnoDB — principles apply across forks (Aurora, Percona, Citus, etc.)

Mental Model

The Performance Stack

Application & ORM

SQL & Query Design

PG DB Engine Config MY

OS / Kernel

File System & Block Layer

Hardware: CPU · RAM · Storage · Network

Every layer can be a bottleneck.

Tuning the wrong layer wastes time.
Always measure first.

Fixing a query can give you 10–1000× improvement.
Buying a bigger server might give you 2×.

Layer 1 — Query

Indexing: The First Line of Defense

Without index (Seq Scan)

-- Reads EVERY row
SELECT * FROM orders
WHERE customer_id = 42;

-- Cost: O(n)  →  table grows = pain

With index

CREATE INDEX idx_orders_customer
  ON orders (customer_id);

-- Cost: O(log n)  →  scales cleanly

PG B-Tree (default), Hash, GIN (JSONB/arrays), BRIN (time-series), GiST (geo)

MY B-Tree (default), Full-Text, Spatial (R-Tree)

Indexes are not free.
Every write pays a cost. Don't index everything.

Rule: index columns you filter, join, or sort on — not columns you select.

Layer 1 — Query

Indexing Patterns That Actually Help

Composite index order matters

-- Query filters status first, then date
-- GOOD:
CREATE INDEX ON events (status, created_at);

-- BAD for this query:
CREATE INDEX ON events (created_at, status);

PG Partial index

-- Index ONLY unprocessed rows
CREATE INDEX ON jobs (created_at)
  WHERE status = 'pending';

PG Covering index (Index-Only Scan)

CREATE INDEX ON orders (customer_id)
  INCLUDE (total, status);
-- No heap access needed!

MY Covering index (same idea)

CREATE INDEX idx_cover
  ON orders (customer_id, total, status);
-- InnoDB: leaf stores all values

Covering indexes eliminate the most expensive operation: random I/O to the heap/clustered index.

Layer 2 — Engine Config

Memory: Your Most Impactful Lever

The database exists to avoid going to disk. Everything hot should live in RAM.

PG Key memory settings

shared_buffers — shared page cache
effective_cache_size — planner hint
work_mem — per-sort / per-hash
maintenance_work_mem — VACUUM, index builds
wal_buffers — WAL write buffer

MY Key memory settings

innodb_buffer_pool_size — page cache + indexes
innodb_buffer_pool_instances — reduce contention
innodb_log_buffer_size — redo log buffer
tmp_table_size / max_heap_table_size
sort_buffer_size — per-session

Layer 2 — Engine Config

The Buffer Pool / Shared Buffers Math

PG shared_buffers

Rule of thumb: 25% of RAM
(32 GB server → 8 GB)

PG also uses the OS page cache.
Set effective_cache_size = 50–75% RAM
(helps the planner choose index scans)

Don't set shared_buffers > 40% of RAM.
You still need OS cache headroom.

MY innodb_buffer_pool_size

Rule of thumb: 70–80% of RAM
(InnoDB manages its own cache; less OS cache reliance)

innodb_buffer_pool_instances
= 1 per GB, max 8–16
(reduces mutex contention under concurrency)

Undersizing the buffer pool = 100% cache miss rate on working set = death by I/O.

Layer 2 — Engine Config

The `work_mem` Trap

-- You see this in EXPLAIN ANALYZE:
Sort  (cost=120345.00...)
  Sort Method: external merge  Disk: 87456kB
  --                           ^^^^ BAD

-- Fix: raise work_mem for this session
SET work_mem = '256MB';
-- now Sort Method: quicksort  Memory: 25kB

Danger: work_mem is allocated per sort / per hash node per query, per connection.

100 connections × 5 nodes × 256 MB = 128 GB 💀

Safe pattern:
Keep global work_mem low (16–64 MB).
Raise it per-session for known heavy queries.

PG 13+ supports hash_mem_multiplier for fine-grained control of hash joins.

MySQL: sort_buffer_size is also per-session. Same trap applies.

Layer 3 — I/O

I/O: Where Databases Go to Die

~100μs

NVMe SSD
random read latency

~500μs

SATA SSD
random read latency

~8ms

Spinning HDD
random read latency

PG checkpoint_completion_target = 0.9
max_wal_size — controls checkpoint pressure
random_page_cost — NVMe: set to 1.1–1.5
synchronous_commit = off for async writes

MY innodb_flush_log_at_trx_commit = 2 (safe-ish)
innodb_io_capacity = match your device IOPS
innodb_read_io_threads / write_io_threads
innodb_flush_method = O_DIRECT avoids double-buffering

Key insight: if your working set fits in RAM, I/O tuning barely matters. If it doesn't, nothing else will save you.

Layer 4 — OS / Kernel

Linux Kernel Settings That Matter

Virtual Memory

# Don't swap under databases!
vm.swappiness = 1          # (not 0, keep swapper alive)

# Dirty page write-back
vm.dirty_ratio = 15        # max dirty pages before blocking writes
vm.dirty_background_ratio = 3   # background flush threshold

# Huge pages (PostgreSQL)
vm.nr_hugepages = <size/2MB>
# saves TLB pressure on large shared_buffers

Network & File Descriptors

# Max open files
fs.file-max = 2097152
# For the DB process ulimit:
# nofile = 65535

# TCP tuning (high-concurrency)
net.core.somaxconn = 65535
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_fin_timeout = 15

# NUMA: pin buffer pool to NUMA node
numactl --cpunodebind=0 --membind=0 \
  mysqld

Disable Transparent Huge Pages (THP)
THP causes latency spikes in both PG & MySQL.

Layer 4 — OS / Kernel

Two More OS Tunings You Can't Ignore

Disable THP

# Check
cat /sys/kernel/mm/transparent_hugepage/enabled
# Disable (add to rc.local / systemd unit)
echo never > \
  /sys/kernel/mm/transparent_hugepage/enabled
echo never > \
  /sys/kernel/mm/transparent_hugepage/defrag

THP defrag can stall DB processes for hundreds of ms — invisible in query plans!

I/O Scheduler

# Check current scheduler
cat /sys/block/nvme0n1/queue/scheduler
# [none] mq-deadline kyber bfq

# NVMe/SSD: use 'none' or 'mq-deadline'
echo mq-deadline > \
  /sys/block/nvme0n1/queue/scheduler

# HDD: bfq or mq-deadline

Setting the wrong scheduler on NVMe can halve your IOPS. none is often optimal on NVMe.

Layer 2 — Engine Config

Connections Are Not Free

Each idle PostgreSQL connection costs ~5–10 MB RAM.
500 connections = 2.5–5 GB just for idle backends.

# PostgreSQL: use a connection pooler
# PgBouncer (transaction mode)
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 50     # actual PG connections

MY MySQL threads are lighter, but thread_cache_size and innodb_thread_concurrency still matter at high concurrency.

PG Poolers
PgBouncer → lightweight, battle-tested
PgPool-II → pooler + HA + load balance
Odyssey → multi-threaded

Rule: Never set max_connections > 200–300 in PostgreSQL without a pooler.

Connection storms are a real outage pattern. Poolers are infrastructure, not optional.

Reality Check

Busting the Performance Myths

Myth "More RAM always fixes slow queries"
Truth If your query lacks an index, no amount of RAM eliminates full-table scans.

Myth "Increasing max_connections handles more users"
Truth More connections increase contention and memory pressure. Use a pooler.

Myth "SELECT * is fine — the DB fetches only what it uses"
Truth SELECT * prevents covering index scans, inflates network/memory, and breaks query plans.

Myth "The cloud is faster"
Truth Cloud storage (EBS gp2, standard) can have 5–10× worse I/O latency than local NVMe. Know your instance type.

Myth "VACUUM / ANALYZE runs automatically — I don't need to worry"
Truth Autovacuum needs tuning too. Bloat and stale stats silently kill performance.

Diagnose First

Reading Query Plans: The Basics

PG EXPLAIN ANALYZE BUFFERS

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE customer_id = 42;

-- Key things to look for:
-- "Seq Scan"   → missing index?
-- "Rows=10000 vs actual rows=3"
--    → stale statistics, run ANALYZE
-- "Buffers: shared hit=0 read=8000"
--    → cache miss → I/O bound
-- "Sort Method: external merge  Disk"
--    → raise work_mem

MY EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSON
SELECT * FROM orders
WHERE customer_id = 42;

-- Look for:
-- "access_type": "ALL"  → Seq Scan
-- "rows": 500000        → bad estimate
-- "Using filesort"      → no useful index for ORDER BY
-- "Using temporary"     → temp table → slow

-- Enable slow query log!
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 0.1;

pg_stat_statements (PG) and performance_schema (MySQL) are your production profilers.

Maintenance Matters

Statistics & Bloat: The Silent Killers

PG Autovacuum tuning

-- Per-table override for hot tables
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_analyze_scale_factor = 0.005,
  autovacuum_vacuum_cost_delay = 2
);

Bloated tables = more pages to scan = slower everything.

MY InnoDB statistics

-- Persistent statistics (default in 5.6+)
SET GLOBAL innodb_stats_persistent = ON;
SET GLOBAL innodb_stats_auto_recalc = ON;

-- Force refresh after bulk load
ANALYZE TABLE orders;

Stale stats → wrong index choice → 100× slower plans.

After large batch imports, always run ANALYZE (PG) or ANALYZE TABLE (MySQL).

The Inflection Point

When Software Tuning Runs Out of Road

You've done the work. Indexes are optimal. Config is tuned. And it's still slow. Now what?

Signs you've hit a software ceiling

CPU iowait consistently > 20%
Buffer pool hit rate < 95%
Disk utilization at 100% (iostat)
No slow queries, but latency still high
Checkpoint storms or WAL lag

Diagnostic commands

iostat -x 1          # I/O saturation
sar -u 1             # CPU iowait
vmstat 1             # paging / blocking
perf top             # CPU profiling
# PG:
SELECT * FROM pg_stat_bgwriter;
SELECT * FROM pg_stat_user_tables
  ORDER BY n_dead_tup DESC;
# MySQL:
SHOW ENGINE INNODB STATUS\G

Layer 5 — Hardware

Hardware: What Actually Moves the Needle

The Spoiler: For most database workloads, the order is Storage > RAM > Network > CPU

Storage ★★★★★

Random I/O latency is the #1 bottleneck for OLTP. NVMe vs SATA SSD can be 5× difference. HDD is 80× worse.

Upgrade when: iowait > 20%, iostat shows 100% util

RAM ★★★★

More RAM = larger buffer pool = fewer disk reads. If your working set doesn't fit, add RAM before anything else.

Upgrade when: buffer hit rate < 95%

CPU ★★★

Databases are rarely CPU-bound in OLTP. More cores help with concurrency. Clock speed matters for complex queries.

Upgrade when: CPU > 80% with no iowait

Network: matters for replicas, bulk transfers, distributed queries. 10GbE is table stakes for production. 25GbE for high-throughput replication.

Layer 5 — Hardware

Storage: Making the Right Choice

Storage Type	Random IOPS	Latency	Best For	Avoid For
NVMe U.2 / PCIe	500K–2M+	< 100μs	Hot OLTP, WAL, primary DB	—
SATA SSD	50K–100K	200–500μs	Read replicas, dev/staging	Write-heavy primaries
HDD (7200 RPM)	100–200	8–12ms	Cold backups, archives	Any live DB workload
AWS EBS gp3	3K–16K	1–2ms	General cloud OLTP	High-frequency micro-transactions
AWS EBS io2 / local NVMe	Up to 256K	< 500μs	Production OLTP in cloud	Cost-sensitive workloads

RAID-10 on NVMe for write throughput + redundancy. Avoid RAID-5/6 for databases (write hole + parity overhead).

The Framework

Software or Hardware? A Decision Tree

Try software first if...

Sequential scans on large tables (add index)
High CPU with low iowait (optimize query)
Cache miss rate > 5% (tune buffer pool)
Stale statistics / visible bloat
Excessive connections (add pooler)
Checkpoint storms (tune WAL settings)

Invest in hardware when...

⚡ Disk is 100% utilized, queries are optimal
⚡ RAM is maxed, buffer pool as large as possible
⚡ Working set fundamentally won't fit in RAM
⚡ Write throughput exceeds what NVMe can deliver
⚡ Replication lag despite optimal WAL tuning

Hardware scales linearly at best.
A bad query can eat orders of magnitude more resources than a good one.

Take-Home Reference

Key Parameters Cheat Sheet

PostgreSQL

shared_buffers	25% RAM
effective_cache_size	50–75% RAM
work_mem	16–64 MB (global)
maintenance_work_mem	1–2 GB (autovacuum)
max_connections	< 200–300 + pooler
checkpoint_completion_target	0.9
random_page_cost	1.1–1.5 (NVMe)
wal_compression	on (PG 15+: lz4)

MySQL / InnoDB

innodb_buffer_pool_size	70–80% RAM
innodb_buffer_pool_instances	1 per GB, max 16
innodb_flush_method	O_DIRECT
innodb_flush_log_at_trx_commit	1 (safe) / 2 (fast)
innodb_io_capacity	match device IOPS
innodb_log_file_size	256 MB – 2 GB
max_connections	151 default → tune
slow_query_log	always ON in prod

OS vm.swappiness=1 · disable THP · I/O scheduler · hugepages

Summary

Key Takeaways

Measure before you tune. Use EXPLAIN ANALYZE, pg_stat_statements, slow query log.
Fix the query first. A missing index can be 1000× faster than any config change.
Right-size memory. Buffer pool hit rate should be ≥ 95%.
Connections are threads / processes. Use a connection pooler.
Storage is often the real bottleneck. NVMe for OLTP, always.

OS tuning is not optional. THP, swappiness, I/O scheduler matter.
Maintenance keeps performance. VACUUM, ANALYZE, statistics.
Hardware buys headroom,
not fixes. Buy it when software is maxed out.
The stack matters. Tune the right layer, in the right order.

The biggest performance wins almost always come from removing unnecessary work — not from hardware or configuration.

Questions?

Performance Without Magic: How Open Source Databases Scale · FOSSASIA 2026

📚 Resources

postgresql.org/docs
dev.mysql.com/doc
pgtune.leopard.in.ua
percona.com/software
use-the-index-luke.com

Principal Product Solutions Engineer

🔗 LinkedIn

linkedin.com/in/charlybatista

📦 Slides

github.com/elchinoo/fossasia-2026

🐘 EDB

enterprisedb.com

Performance Without Magic:How Open Source Databases Scale

We've All Been Here

What we usually hear

What we actually need

What We'll Cover

The Performance Stack

Indexing: The First Line of Defense

Indexing Patterns That Actually Help

Memory: Your Most Impactful Lever

PG Key memory settings

MY Key memory settings

The Buffer Pool / Shared Buffers Math

The work_mem Trap

I/O: Where Databases Go to Die

Linux Kernel Settings That Matter

Two More OS Tunings You Can't Ignore

Connections Are Not Free

Busting the Performance Myths

Reading Query Plans: The Basics

Statistics & Bloat: The Silent Killers

When Software Tuning Runs Out of Road

Hardware: What Actually Moves the Needle

Storage ★★★★★

RAM ★★★★

CPU ★★★

Storage: Making the Right Choice

Software or Hardware? A Decision Tree

Try software first if...

Invest in hardware when...

Key Parameters Cheat Sheet

Key Takeaways

Questions?

📚 Resources

Performance Without Magic:
How Open Source Databases Scale

The `work_mem` Trap