Postgres COUNT() function
Count rows or non-null values in a result set
The Postgres COUNT()
function counts the number of rows in a result set or the number of non-null values in a specific column.
It's useful for data analysis, reporting, and understanding the size and composition of your datasets. Some common use cases include calculating the total number of records in a table, finding the number of distinct values in a column, or determining how many rows meet certain conditions.
Function signatures
The COUNT()
function has two main forms:
- Counts the total number of rows in the result set.
- Counts the number of rows where the input expression is not NULL.
DISTINCT
is an optional keyword, that removes duplicate values before counting.
Example usage
Consider an orders
table that tracks orders placed by customers of an online store. It has columns order_id
, customer_id
, product_id
, and order_date
. We'll use the COUNT()
function to analyze this data.
Count all rows
To get the total number of orders, you can use COUNT(*)
:
This query will return the total number of rows in the orders
table.
Count non-null values
To count how many orders have a product_id
(assuming some orders might not have a product associated):
This query will return the number of orders where product_id
is not NULL.
Count distinct values
To find out how many unique customers have placed orders:
This query will return the number of distinct customer_id
values in the orders
table.
Advanced examples
We use the orders
table created in the previous section to demonstrate more use cases of the COUNT()
function.
Combine COUNT() with GROUP BY
You can use COUNT()
with GROUP BY
to get counts for different categories:
This query counts the number of orders for each month.
Use COUNT() in a subquery
You can use COUNT()
in a subquery to filter based on counts:
This query finds customers who have placed more orders than the average number of orders per customer.
Combine COUNT() with CASE
You can use COUNT()
with CASE
statements to only count rows that meet specific conditions:
This query counts the total number of orders, as well as the number of high-value and low-value orders.
Use COUNT() with FILTER clause
Postgres also allows using a FILTER
clause with aggregate functions, which can be more readable than CASE
statements:
This query counts the total number of orders, as well as the number of orders placed after April 1, 2023.
Additional considerations
Performance implications
COUNT(*)
is generally faster than COUNT(column)
or COUNT(DISTINCT column)
because it doesn't need to check for NULL values or uniqueness. However, on very large tables, even COUNT(*)
can be slow if it needs to scan the entire table.
For frequently used counts, consider maintaining a separate counter table or using materialized views to improve performance.
NULL handling
Both COUNT(column)
and COUNT(DISTINCT column)
expressions do not count NULL values. If you need to include NULL values in your count, use COUNT(*)
or COUNT(COALESCE(column, 0))
.
Alternative approaches
- For approximate counts of distinct values in very large datasets, consider using the
pg_stat_statements
extension or theHyperLogLog
algorithm (available through extensions likepostgresql-hll
). - For faster counts on large tables, consider using estimate counts based on table statistics with
pg_class.reltuples
.
Resources
Last updated on