There are great resources to help tune PostgresQL for performance, but say you've got a Postgres database which suddenly becomes very slow. How can you figure out what's going on?

General system state

A slow database server will typically be running out of:

  • cpu,
  • memory or
  • disk.

So the first thing you should do is to get an overview of the system state using common UNIX tools such as:

  • top (press c to see the full command line): look at CPU and memory usage
  • iostat -x -m 5: look at i/o wait and service time

If you notice that memory is exhausted, you could play with the following postgresql.conf settings:

shared_buffers  
work_mem  
effective_cache_size

If you see a lot of I/O, then try adjusting these settings:

checkpoint_segments  
autovacuum_naptime  
wal_buffers  

Finding slow queries

On the database side, start by determining whether:

  • there is one large query holding up all of the resources or
  • the number of queries is the main problem.

To do this, make sure that the following setting is turned on in your postgresql.conf:

stats_command_string = on

Then fire up psql dbname and look at the currently executing queries:

SELECT usename, current_query, query_start  
FROM pg_stat_activity;

(You can also add the procid column to the query if you want to map a query to the process ID you see in top.)

If you can't get anything useful out of pg_stat_activity, you may want to enable slow query logging by adding this to your postgresql.conf:

log_min_duration_statement = 100

(All queries which take more than 100 ms to execute will be printed to the main Postgres log file.)

Examining a particular query

Once you have identified a slow query, you can time it by running this command before executing the query manually:

\timing

To get an idea of where Postgres spends its time when executing that query, look at the query plan:

EXPLAIN your_query;

The numbers you see there are estimates. To run the query and get actual numbers, use this instead:

EXPLAIN ANALYZE your_query;

If you still can't figure out why a certain query takes so long, have a look at the ratio between live and dead rows in the relevant tables:

ANALYZE VERBOSE tablename;

Having too many dead rows is often a sign of insufficient vacuuming. You might want to turn autovacuum on if it isn't already, or to make it a bit more aggressive by tweaking these settings:

autovacuum_vacuum_scale_factor  
autovacuum_analyze_scale_factor  

Troubleshooting database performance is a bit of a black art, many thanks to Mark Kirkwood for sharing his Postgres wisdom with me.

If there's anything else you have successfully used to find the cause of your performance woes, please feel free to leave a comment.

The PostgreSQL wiki has an abundance of further information on this topic.

As rule of thumb:

If you're CPU-bound (backends consuming all available CPU) you need to reduce the complexity and/or amount of the queries hitting the database.

If you're disk-bound (backends waiting for disk IO)

on writes:

Get a RAID-Controller with battery backed write cache

on reads:

You need to either increase your RAM to match the size of your working set or reduce the size of the working set.

Most often people mix OLTP (live/fresh/regularly used) and DWH (old/not regularly used/historical) data which causes the working set to grow indefinitely. Partitioning/Archiving historical data usually fixes these issues.

Comment by Robe