There are great resources to help tune PostgresQL for performance, but say you've got a Postgres database which suddenly becomes very slow. How can you figure out what's going on?
General system state
A slow database server will typically be running out of:
- cpu,
- memory or
- disk.
So the first thing you should do is to get an overview of the system state using common UNIX tools such as:
top
(press c to see the full command line): look at CPU and memory usageiostat -x -m 5
: look at i/o wait and service time
If you notice that memory is exhausted, you could play with the following postgresql.conf
settings:
shared_buffers
work_mem
effective_cache_size
If you see a lot of I/O, then try adjusting these settings:
checkpoint_segments
autovacuum_naptime
wal_buffers
Finding slow queries
On the database side, start by determining whether:
- there is one large query holding up all of the resources or
- the number of queries is the main problem.
To do this, make sure that the following setting is turned on in your postgresql.conf
:
stats_command_string = on
Then fire up psql dbname and look at the currently executing queries:
SELECT usename, current_query, query_start
FROM pg_stat_activity;
(You can also add the procid
column to the query if you want to map a query to the process ID you see in top
.)
If you can't get anything useful out of pg_stat_activity
, you may want to enable slow query logging by adding this to your postgresql.conf
:
log_min_duration_statement = 100
(All queries which take more than 100 ms to execute will be printed to the main Postgres log file.)
Examining a particular query
Once you have identified a slow query, you can time it by running this command before executing the query manually:
\timing
To get an idea of where Postgres spends its time when executing that query, look at the query plan:
EXPLAIN your_query;
The numbers you see there are estimates. To run the query and get actual numbers, use this instead:
EXPLAIN ANALYZE your_query;
If you still can't figure out why a certain query takes so long, have a look at the ratio between live and dead rows in the relevant tables:
ANALYZE VERBOSE tablename;
Having too many dead rows is often a sign of insufficient vacuuming. You might want to turn autovacuum on if it isn't already, or to make it a bit more aggressive by tweaking these settings:
autovacuum_vacuum_scale_factor
autovacuum_analyze_scale_factor
Troubleshooting database performance is a bit of a black art, many thanks to Mark Kirkwood for sharing his Postgres wisdom with me.
If there's anything else you have successfully used to find the cause of your performance woes, please feel free to leave a comment.
The PostgreSQL wiki has an abundance of further information on this topic.
As rule of thumb:
If you're CPU-bound (backends consuming all available CPU) you need to reduce the complexity and/or amount of the queries hitting the database.
If you're disk-bound (backends waiting for disk IO)
on writes:
Get a RAID-Controller with battery backed write cache
on reads:
You need to either increase your RAM to match the size of your working set or reduce the size of the working set.
Most often people mix OLTP (live/fresh/regularly used) and DWH (old/not regularly used/historical) data which causes the working set to grow indefinitely. Partitioning/Archiving historical data usually fixes these issues.