Group


Group By


Command Syntax (MySQL/DB2) Description Example (MySQL/DB2)
GROUP BY SELECT column_name(s) FROM table_name GROUP BY column_name(s) GROUP BY clause is used in collaboration with the SELECT statement to arrange data with identical values into groups. SELECT dep_id, COUNT(*) FROM employees GROUP BY dep_id;
This returns the department IDs and the number of employees in them, grouped by the department IDs.

Syntax:

SELECT      column_name(s)
FROM        table_name
WHERE       condition
GROUP BY    column_name(s)
ORDER BY    column_name(s);
  • The GROUP BY statement groups rows that have the same values into summary rows, like “find the number of customers in each country”.
  • The GROUP BY statement is often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result-set by one or more columns.

Customers:

CustomerID CustomerName ContactName Address City PostalCode Country
1
Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany
2 Ana Trujillo Emparedados y helados Ana Trujillo Avda. de la Constitución 2222 México D.F. 05021 Mexico
3 Antonio Moreno Taquería Antonio Moreno Mataderos 2312 México D.F. 05023 Mexico
4
Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
5 Berglunds snabbköp Christina Berglund Berguvsvägen 8 Luleå S-958 22 Sweden

Orders:

OrderID CustomerID EmployeeID OrderDate ShipperID
10248 90 5 1996-07-04 3
10249 81 6 1996-07-05 1
10250 34 4 1996-07-08 2

Shippers:

ShipperID ShipperName
1 Speedy Express
2 United Package
3 Federal Shipping

Example:

  • List the number of customers in each country, sorted from high to low
SELECT   COUNT(CustomerID), Country
FROM     Customers
GROUP BY Country
ORDER BY COUNT(CustomerID) DESC;

Output:

Number of Records: 21

Count Country
13  USA 
11  France 
11  Germany 
Brazil 
UK 

Group & Count

  • Let’s say we want to select country from Author table
  • And want them sorted in order
SELECT  country
FROM    Author
ORDER BY 1   # same as country = 1
  • Let’s say we want to count how many DISTINCT countries are in the Author table
  • We can use the GROUP BY because it will create a group for every UNIQUE/DISTINCT country in the table
SELECT  country, COUNT(country)
FROM    Author
GROUP BY country

# The way it is now it will output a column named country and the second column will be labeled 2 because it's the second col
SELECT  country, COUNT(country) AS Count
FROM    Author
GROUP BY country
# This output will give us a second column labeled Count

Group & Having

  • We can further expand our query to limit the number of countries that ONLY have more than 4 titles
  • In other words we want to extract the countries with a count >= 4
  • We already covered HAVING so we use it here
  • HAVING is isolated to the GROUP BY clause, so GROUP BY has to be used right before it
  • While WHERE applies to the SELECT clause
SELECT  country, COUNT(country) AS Count
FROM    Author
GROUP BY country
HAVING  count(country >= 4)

with JOIN

  • List the number of orders sent by each shipper
SELECT    Shippers.ShipperName, COUNT(Orders.OrderID) AS NumberOfOrders
FROM      Orders
LEFT JOIN Shippers 
       ON Orders.ShipperID = Shippers.ShipperID
GROUP BY  ShipperName;

Output:

Number of Records: 3

ShipperName NumberOfOrders
Federal Shipping  68 
Speedy Express  54 
United Package  74 

Rollup

The ROLLUP is an extension of the GROUP BY clause. The ROLLUP option allows you to include extra rows that represent the subtotals, which are commonly referred to as super-aggregate rows, along with the grand total row. By using the ROLLUP option, you can use a single query to generate multiple grouping sets.

Note that a grouping set is a set of columns by which you group. For example, a query that returns the inventory by the warehouse,  the grouping set is (warehouse).

SELECT    warehouse,      
          SUM (quantity) qty 
FROM      inventory 
GROUP BY  warehouse;

The following illustrates the basic syntax of the SQL ROLLUP.We will use the inventory table below:

one column rollup

  • The following statement uses the GROUP BY clause and the SUM() function to find the total inventory by warehouse:
SELECT   warehouse, 
         SUM(quantity) 
FROM     inventory
GROUP BY warehouse;

  • To retrieve the total products in all warehouses, you add the ROLLUP to the GROUP BY clause
SELECT   warehouse, 
         SUM(quantity) 
FROM     inventory 
GROUP BY ROLLUP (warehouse);

  • As you can see in the result, the NULL value in the warehouse column specifies the grand total super-aggregate line. In this example, the ROLLUP option causes the query to produce another row that shows the total products in all warehouses

coalesce

We’ll cover coalesce in another part.

  • To make the output more readable, you can use the COALESCE() function to substitute the NULL value by the All warehouses as follows:
SELECT      
         COALESCE(warehouse, 'All warehouses') AS warehouse,
         SUM(quantity) 
FROM     inventory 
GROUP BY ROLLUP (warehouse);

multiple columns rollup

  • The following statement calculates the inventory by warehouse and product:
SELECT   warehouse,
         product, 
         SUM(quantity)
FROM     inventory 
GROUP BY warehouse,
         product;

  • Now we add the ROLLUP clause
SELECT  warehouse,
        product,
        SUM(quantity)
FROM    inventory 
GROUP BY ROLLUP (warehouse , product);

Note that the output consists of summary information at two levels of analysis, not just one:

  • Following each set of product rows for a specified warehouse, an extra summary row appears displaying the total inventory. In these rows, values in the productcolumn set to NULL.

  • Following all rows, an extra summary row appears showing the total inventory of all warehouses and products. In these rows, the values in the  warehouse and product columns set to NULL.

partial rollup

You can use ROLLUP to perform a partial roll-up that reduces the number of subtotals calculated as shown in the following example:

SELECT   warehouse,
         product,
         SUM(quantity)
FROM     inventory 
GROUP BY warehouse,
         ROLLUP (product);

In this example, the ROLLUP only makes a supper-aggregate summary for the product column, not the warehouse column.

Coalesce

The COALESCE() function returns the first non-null value in a list.

Syntax: COALESCE(val1, val2, ...., val_n)

Technical Details

Works in: SQL Server (starting with 2008), Azure SQL Database, Azure SQL Data Warehouse, Parallel Data Warehouse
  • If we want to substitute a value from another column to use for the missing value in a different column, then we would use COALESCE.
  • Let’s say we want to look at product name, but if that data is missing we want it to substitute product code for the product name, and we’ll put the new data in a new column
SELECT  COALESCE(product,product_code) AS product_info
FROM    MyTable
  • If you remember the example above: what we are asking is, if the value of warehouse is NULL then substitute it with ‘All warehouses’
  • As you see in the image that’s what happened in the total row
SELECT      COALESCE(warehouse, 'All warehouses') AS warehouse

  • Here is another example
SELECT  COALESCE(NULL, 'Pick Me', 'No Pick ME') AS answer
[OUTPUT] Pick Me