Anand Sukumaran Nair
Startup Founder, Software Engineer, Abstract thinker
Co-founder & CTO @ Engagespot (Techstars NYC '24)
Tracking progress of ALTER TABLE query in PostgreSQL with 100M+ rows
Jul 22, 2023Recently I had to perform an ALTER TABLE
query on a huge PostgreSQL table with almost 600 Million rows. The query will take a couple of hours to finish. Postgres doesn’t give any updates on the progress so we’ll have no idea about the ETA.
But there is a hack to track the progress. As you know Postgres stores the table data as page files in disk. Every table will have one or page file segments that looks like 3746
, 3746.1
, 3746.2
etc. Depending on the table size, there will be multiple file segments where each file will be 1GB in size.
When you issue an ALTER TABLE
query that causes modification to the page layout, Postgres will recreate the page files from scratch. So, if you already know the size of your table files before running this query, you can track the growth of new data files being created.
Finding the table file
To find out the data file of your table, execute this query.
SELECT d.oid::text || '/' || t.relfilenode AS datafile
FROM pg_database AS d
CROSS JOIN pg_class AS t
WHERE d.datname = current_database()
AND t.relname = 'TableNAME';
You’ll get the value as something like - 25071/34216
. Here, 25071 denotes the database folder, and 34216 denotes the table file.
Now, open terminal on the db server, open the data file directory (usually - /var/lib/postgresql/data
)
Execute the command ls -l base/25071/34216* --block-size=M
` (make sure to replace the number with the value you get on step 1). This command will list all the data files of the table/index. Copy them down. It would look like this -
-rw------- 1 postgres postgres 1024M Jul 21 14:53 base/25071/34216
-rw------- 1 postgres postgres 1024M Jul 21 14:53 base/25071/34216.1
-rw------- 1 postgres postgres 913M Jul 21 14:53 base/25071/34216.2
This means, you have three data files (or index) for this table. Note down the file size for each file.
Now execute the ALTER
query on your table.
Again, goto the data directory and execute this command ls -latr base/25071 --block-size=M | tail -1
(note that I used only the first portion of the number). This will return information about the new files that are being created. You can perform this command anytime as required and find out which file is now being created, and what is it’s size.
We’ll get results like -
-rw------- 1 postgres postgres 308M Jul 21 14:54 34223
Means, the first file is now being created and it’s now 308MB. But we know its actual size was 1024M. So we know the progress.
After sometime when you execute this command, it might be like
-rw------- 1 postgres postgres 39M Jul 21 14:54 34223.1
Means, now it’s building the second file, and it’s now 39M. That also needs to be around 1024M to finish This way, we can track the progress estimate!
This approach may not work if the ALTER
query doesn’t need a page rebuild.
If you’re interested, you can watch my video that explains the internal architecture of PostreSQL and WAL files for durability