SQL CUME_DIST

Summary: in this tutorial, you will learn how to calculate cumulative distribution values of rows using the SQL CUME_DIST() function.

Introduction to SQL CUME_DIST() Function

The CUME_DIST() is a window function that calculates the cumulative distribution of value within a set of values.

The CUME_DIST() function returns a value that represents the number of rows with values less than or equal to (<= )the current row’s value divided by the total number of rows:

N / total_rows
Code language: SQL (Structured Query Language) (sql)

In this formula:

  • N is the number of rows with the value less than or equal to the current row value.
  • total_rows is the number of rows in the partition or result set being evaluated.

The return value of the CUME_DIST() function has a range of the low value greater than 0 and the high value less than or equal to 1.

0 < CUME_DIST() <= 1
Code language: SQL (Structured Query Language) (sql)

The repeated column values receive the same CUME_DIST() value.

The syntax of the CUME_DIST() function is as follows:

CUME_DIST() OVER (
	PARTITION BY expr1, expr2 ...
	ORDER BY expr1 [ASC | DESC], expr2... 
)
Code language: SQL (Structured Query Language) (sql)

In this syntax:

  • First, the PARTITION BY clause divides the result set into partitions. If you omit the PARTITION BY clause, the function treats the whole result set as a single partition.
  • Then, the ORDER BY clause sorts the rows within each partition.
  • Finally, the CUME_DIST() function is applied to each sorted partition independently.

SQL CUME_DIST() function example

See the following employees and departments table from the sample database:

Employees & Departments Tables

The following statement creates a view named department_headcounts based on the employees and departments tables for the demonstration:

CREATE VIEW department_headcounts
AS
SELECT 
	department_name,
	COUNT(employee_id) headcount
FROM 
	employees e
	INNER JOIN departments d
		ON d.department_id = e.department_id
GROUP BY 
	e.department_id;
Code language: SQL (Structured Query Language) (sql)

The following statement finds the cumulative distribution values by headcount of each department:

SELECT
	department_name,
	headcount,
	ROUND(
		CUME_DIST() OVER (
			ORDER BY headcount
		)
	,2) cume_dist_val
FROM
	department_headcounts;
Code language: SQL (Structured Query Language) (sql)

The following shows the output:

SQL CUME_DIST Function Example

In this example, departments were sorted by their number of headcounts in ascending order. The total of rows in the result set is 11.

The Administration department has one headcount. The Human Resources and Public Relations also have the same headcount as Administration. As the result, there are three departments that have one headcount. The CUME_DIST() function will use the following formula to calculate the cumulative distribution values of the first row:

3 / 11 = 0.27
Code language: SQL (Structured Query Language) (sql)

The same logic is applied to the second and third rows.

The Marketing department has two headcounts. The function will find other departments which have the number of headcounts less than or equal to 2. The result is 5. Therefore, the CUME_DIST() of the Marketing department is 5 / 11 = 0.45

The same logic is applied to the remaining rows.

In this tutorial, you have learned how to calculate the cumulative distribution of a value in a set of values using the SQL CUME_DIST() function.

Was this tutorial helpful ?