Overview

When using aggregation functions, they can contain the DISTINCT keyword. It acts as a qualifier for them, to ensure that only unique values are being processed. Here’s how a sample syntax looks like:

aggregation function (DISTINCT expression [clause] ...) ...

DISTINCT keyword can be combined with the following aggregate functions:

  • AVG()
  • COUNT()
  • MAX()
  • MIN()
  • SUM()

All functions listed above, operate on the same input and return types, that are supported by their counterparts without any qualifiers. They can be grouped without any limitations, provided that they utilise a single DISTINCT keyword.

Any type of multiple DISTINCT qualifiers used in a query with GROUP BY statements are not supported at the moment, however if the query doesn’t include such statements, it will get executed but the output’s column order will be reversed at the moment

Examples

In this section we’ll focus on a few examples, that showcase sample usage of the above mentioned concepts. They will be based on creation of the following tables:

CREATE TABLE customer (
  customer_id int,
  cust_name string
);
INSERT INTO customer
    (customer_id, cust_name)
VALUES
    (11112, 'Alex'),
    (11113, 'Aaron'),
    (11114, 'Alice'),
    (11115, 'Nina'),
    (11116, 'Rosy'),
    (11117, 'Martha'),
    (11118, 'John');

CREATE TABLE rental (
    rental_id int,
    rental_date timestamp,
    return_date timestamp,
    car string,
    customer_id int,
    total_price int
);
INSERT INTO rental (rental_id, rental_date, return_date, car, customer_id, total_price)
VALUES
(8557411, '2022-04-02 09:10:19', '2022-04-10 10:15:05', 'Audi', 11112, 1400),
(8557421, '2022-04-06 07:00:30', '2022-04-19 07:10:19', 'BMW', 11115, 2000),
(8557451, '2022-04-19 08:00:20', '2022-04-24 08:05:00', 'Cadillac', 11112, 1000),
(8557311, '2022-05-11 09:15:28', '2022-05-18 09:00:18', 'Audi', 11115, 1500),
(8557321, '2022-05-20 10:12:22', '2022-05-28 10:08:48', 'Audi', 11113, 1500),
(8557351, '2022-06-10 12:18:09', '2022-06-20 18:12:23', 'Cadillac', 11114, 1200),
(8557221, '2022-06-17 14:02:02', '2022-06-20 14:17:02', 'Chevrolet', 11112, 1300),
(8557251, '2022-07-12 05:19:49', '2022-07-19 07:15:28', 'Chevrolet', 11116, 1400),
(8557231, '2022-08-09 09:29:08', '2022-08-24 09:30:58', 'Cadillac', 11114, 2000),
(8557291, '2022-08-18 15:15:20', '2022-09-01 15:30:19', 'BMW', 11117, 3000);

Here’s how the created tables will look like, respectively:

SELECT * FROM customer;

+-------------+-----------+
| customer_id | cust_name |
+-------------+-----------+
| 11112       | Alex      |
| 11113       | Aaron     |
| 11114       | Alice     |
| 11115       | Nina      |
| 11116       | Rosy      |
| 11117       | Martha    |
| 11118       | John      |
+-------------+-----------+

SELECT * FROM rental;

+------------+---------------------+---------------------+-----------+---------------+-------------+
| rental_id  | rental_date         | return_date         | car       | customer_id   | total_price |
+------------+---------------------+---------------------+-----------+---------------+-------------+
| 8557411    | 2022-04-02 09:10:19 | 2022-04-10 10:15:05 | Audi      | 11112         | 1400        |
| 8557421    | 2022-04-06 07:00:30 | 2022-04-19 07:10:19 | BMW       | 11115         | 2000        |
| 8557451    | 2022-04-19 08:00:20 | 2022-04-24 08:05:00 | Cadillac  | 11112         | 1000        |
| 8557311    | 2022-05-11 09:15:28 | 2022-05-18 09:00:18 | Audi      | 11115         | 1500        |
| 8557321    | 2022-05-20 10:12:22 | 2022-05-28 10:08:48 | Audi      | 11113         | 1500        |
| 8557351    | 2022-06-10 12:18:09 | 2022-06-20 18:12:23 | Cadillac  | 11114         | 1200        |
| 8557221    | 2022-06-17 14:02:02 | 2022-06-20 14:17:02 | Chevrolet | 11112         | 1300        |
| 8557251    | 2022-07-12 05:19:49 | 2022-07-19 07:15:28 | Chevrolet | 11116         | 1400        |
| 8557231    | 2022-08-09 09:29:08 | 2022-08-24 09:30:58 | Cadillac  | 11114         | 2000        |
| 8557291    | 2022-08-18 15:15:20 | 2022-09-01 15:30:19 | BMW       | 11117         | 3000        |
+------------+---------------------+---------------------+-----------+---------------+-------------+

DISTINCT combined with COUNT function

The following example uses DISTINCT qualifier combined with COUNT() function to calculate the number of unique car brands in rentals:

SELECT COUNT (DISTINCT car) AS number_of_car_brands
FROM rental;

When executing the above code, it will return the following output:

+----------------------+
| number_of_car_brands |
+----------------------+
| 4                    |
+----------------------+

Here’s another example, that uses DISTINCT qualifier combined with COUNT() function to calculate the amount of rentals by each customer:

SELECT c.cust_name AS customer_name, COUNT (DISTINCT r.rental_id) AS rental_count
FROM rental r
JOIN customer c ON r.customer_id = c.customer_id
GROUP BY c.cust_name;

It will calculate the rental_count by each customer_name as shown below:

+----------------+--------------+
| customer_name  | rental_couunt|
+----------------+--------------+
| Nina           | 2            |
| Aaron          | 1            |
| Alice          | 2            |
| Martha         | 1            |
| Alex           | 3            |
| Rosy           | 1            |
+----------------+--------------+

DISTINCT combined with MAX() function

The following example uses DISTINCT qualifier combined with MAX() function to find maximum single spending per each customer, dropping any repeated transactions:

SELECT c.cust_name AS customer_name,
       MAX (DISTINCT r.total_price) AS max_spending
FROM rental r
JOIN customer c ON r.customer_id = c.customer_id
GROUP BY c.cust_name;

The output for that code will be as follows:

+---------------+--------------+
| customer_name | max_spending |
+---------------+--------------+
| Martha        | 3000         |
| Rosy          | 1400         |
| Alex          | 1400         |
| Alice         | 2000         |
| Nina          | 2000         |
| Aaron         | 1500         |
+---------------+--------------+

DISTINCT combined with SUM() function

The following example compares the sum of unique revenues versus the sum of all revenues in rental data:

SELECT
    SUM (DISTINCT r.total_price) AS unique_revenue,
    SUM (r.total_price) AS total_revenue
FROM rental r;

Here’s the ouput of the above query:

+----------------+---------------+
| unique_revenue | total_revenue |
+----------------+---------------+
| 11400          | 16300         |
+----------------+---------------+

The result may help to understand what is the impact of repeating transactions on total revenue.

Limitations

There are few use-cases that we are aware of but do not support currently:

  • Aggregation functions with DISTINCT keyword used as an argument of an expression, e.g.
SELECT 1 + COUNT(DISTINCT col) FROM table
  • Any type of multiple DISTINCT qualifiers used in a query with GROUP BY statements