Edit

Share via


Determine table and relation size in Azure Cosmos DB for PostgreSQL

Important

Azure Cosmos DB for PostgreSQL is on a retirement path and no longer recommended for new projects. Instead, use one of these two services:

  • For PostgreSQL workloads: use the Elastic Clusters feature of Azure Database For PostgreSQL to use the horizontal scale-out and distributed PostgreSQL features contained within the open source Citus extension.

  • For NoSQL workloads, use Azure Cosmos DB for NoSQL for a distributed database solution that includes a 99.999% availability service level agreement (SLA), instant autoscale, and automatic failover across multiple regions.

The usual way to find table sizes in PostgreSQL, pg_total_relation_size, drastically under-reports the size of distributed tables on Azure Cosmos DB for PostgreSQL. All this function does on a cluster is to reveal the size of tables on the coordinator node. In reality, the data in distributed tables lives on the worker nodes (in shards), not on the coordinator. A true measure of distributed table size is obtained as a sum of shard sizes. Azure Cosmos DB for PostgreSQL provides helper functions to query this information.

Function Returns
citus_relation_size(relation_name)
  • Size of actual data in table (the "main fork").
  • A relation can be the name of a table or an index.
citus_table_size(relation_name)
citus_total_relation_size(relation_name)
  • citus_table_size plus:

    • size of indices

These functions are analogous to three of the standard PostgreSQL object size functions, except if they can't connect to a node, they error out.

Example

Here's how to list the sizes of all distributed tables:

SELECT logicalrelid AS name,
       pg_size_pretty(citus_table_size(logicalrelid)) AS size
  FROM pg_dist_partition;

Output:

┌───────────────┬───────┐
│     name      │ size  │
├───────────────┼───────┤
│ github_users  │ 39 MB │
│ github_events │ 37 MB │
└───────────────┴───────┘

Next steps