Subquery


There are ways to work with multiple tables in the same query, we can use:

  1. Sub-queries
  2. Implicit Join
  3. Join operators (INNER, OUTER…)

Subqueries are not a specific command or statement but are smaller Queries nested inside another.

  • Why do we use subqueries? To make our code easier to understand
  • On the surface they appear more complicated but think of them as a way to group several ideas together to represent a broader thought
  • They decrease the amount of time we access the database
  • Make the logic simpler (hopefully)
  • Regardless of how many queries are nested within each other, the inner most query is performed first and the process continues outward till all queries have been performed
  • Values of inner queries can be used in outer queries
  • The outer most query is called: outer query or outer select, with the inner most query being the inner select

Subquery CAN NOT have more than one column specified in the SELECT clause

Subquery CAN NOT be nested in SET command

Subqueries that return more than one row can only be used with multiple value operators

  • It’s simply a query within a query.
  • The most inner query is executed first and moves outwards to the next outer query and ….
  • You can subquery with
    • With the IN or NOT IN operator
    • With comparison operators
      • Equal (=)
      • Greater than (>)
      • Less than (<)
      • Greater than or equal ( >=)
      • Less than or equal (<=)
      • Not equal ( !=) or (<>)
    • With the EXISTS or NOT EXISTS operator
    • With the ANY or ALL operator
    • In the FROM clause
    • In the SELECT clause

data

  • Here are two tables we’ll use to demonstrate: employees, departments
  • Find all employee_id with location_id = 1700
  • Of course you can use INNER JOIN on department_id WHERE location_id=1700 but for the sake of this section let’s
  • go about it as follows so I can explain subqueriy and WITH in the next section

query1

  • Let’s say you first find a list of all department_id at location_id = 1700
SELECT 
       *
FROM
       departments
WHERE
       location_id = 1700;

Output:

query2

  • Now that you have a list of all department_id for that location
  • Use it in table employees to extract all the employee_id, first and last names
SELECT 
           employee_id, first_name, last_name
FROM
           employees
WHERE
           department_id IN (1 , 3, 9, 10, 11)
ORDER BY   first_name , last_name;

subquery IN

  • Let’s do it using a subquery, remember the inner most query is executed first
  • The inner query will create a list of all department_id that are at location_id=1700
  • Outer query will check the department_id FROM the employees table to see if it is the list generated in the subquery
  • Extracts all the rows and ORDER BY first and last name in that order

So let’s extract all employees that are at location=1700

SELECT 
        employee_id, first_name, last_name
FROM
        employees
WHERE
        department_id IN (SELECT 
                                department_id
                          FROM
                                departments
                          WHERE
                                location_id = 1700)
ORDER BY 
        first_name , last_name;

subquery NOT IN

So let’s extract all employees that are NOT at location=1700

# ______ You guessed it just replace with NOT IN in the code chunk above

subquery MAX

  • Let’s use one of the comparison operators this time

Find the employees with the highest salary, we can use MAX

SELECT 
        employee_id, first_name, last_name, salary
FROM
        employees
WHERE
        salary = (  SELECT 
                          MAX(salary)
                    FROM
                          employees)
ORDER BY
        first_name , last_name;

subquery AVG

Find the employees whose salaries are greater than the average of all employees

SELECT 
        employee_id, first_name, last_name, salary
FROM
        employees
WHERE
        salary > (  SELECT 
                          AVG(salary)
                    FROM
                          employees);

subquery EXISTS

Products

ProductID ProductName SupplierID CategoryID Unit Price
1 Chais 1 1 10 boxes x 20 bags 18
2 Chang 1 1 24 - 12 oz bottles 19
3 Aniseed Syrup 1 2 12 - 550 ml bottles 10
4 Chef Anton’s Cajun Seasoning 2 2 48 - 6 oz jars 22
5 Chef Anton’s Gumbo Mix 2 2 36 boxes 21.35

Suppliers

SupplierID SupplierName ContactName Address City PostalCode Country
1 Exotic Liquid Charlotte Cooper 49 Gilbert St. London EC1 4SD UK
2 New Orleans Cajun Delights Shelley Burke P.O. Box 78934 New Orleans 70117 USA
3 Grandma Kelly’s Homestead Regina Murphy 707 Oxford Rd. Ann Arbor 48104 USA
4 Tokyo Traders Yoshi Nagase 9-8 Sekimai Musashino-shi Tokyo 100 Japan

List the suppliers with a product price less than 20

SELECT       SupplierName
FROM         Suppliers
WHERE EXISTS 
             (SELECT    ProductName
             FROM       Products 
             WHERE      Products.SupplierID = Suppliers.supplierID
                        AND Price < 20);

Number of Records: 24

SupplierName
Exotic Liquid 
New Orleans Cajun Delights 
Tokyo Traders 
Mayumi’s 
Pavlova, Ltd. 

With


Let’s start by stating that WITH is not supported by all database systems.

  • WITH allows you to give a subquery block a name (refactoring), which then can be referenced within the main query
  • It defines a temporary relation and this temporary relation (let’s pretend it’s a temporary table) is then used by the query associated with the WITH clause
  • It only exists for the duration of the query that it belongs to
  • The alias name assigned to the subquery is treated as it’s an inline view or table
  • Introduced by Oracle back with Oracle 9i
  • Think of it as a function created for the main function to use as its argument where the main function can perform iterative calculations

Let me go through some examples

outside attribute

  • If you’re familiar with R functions, think of a function that has a name, arguments, return value
  • Now think of WITH that has a name TempTable with a return value/attribute of avg_value
  • Syntax for that so far would be: WITH TempTable(avg_value) AS
  • What do I mean by outside attribute: look avg_value it is outside the AS () block
  • Let’s follow that in the code below
WITH    TempTable(avg_value)  AS ( Subquery to calculate the avg_value)
  • Subquery to calculate avg_value is
SELECT  AVG (measured_value)
FROM    MainTable
  • Let’s put the subquery in the ( subquery ) for WITH
WITH    TempTable(avg_value)  AS (
        SELECT  AVG (measured_value)
        FROM    MainTable
        )
  • So far with have WITH that calculates the AVG of column measure_value from MainTable and stores it in the attribute avg_value within the temporary table TempTable
  • So if we want to access the values of the temporary table we just call on TemTable.avg_value

To find the list of measured_value that’s greater than the average of all values measured we need to SELECT the values from the MainTable that are greater than the calculated avg_value from the WITH query

SELECT  measured_value
FROM    MainTable, TempTable
WHERE   MainTable.measured_value > TempTable.avg_value

Put it all together to find the measured_value > average of all values

WITH    TempTable(avg_value)  AS (
        SELECT  AVG (measured_value)
        FROM    MainTable
        )

SELECT  measured_value
FROM    MainTable, TempTable
WHERE   MainTable.measured_value > TempTable.avg_value

inside attribute

  • Let’s see if we can write it in a different way with this example
  • We want to list the average quantity of products ordered per ProductID
  • Data is in OrderDetails
OrderDetailID OrderID ProductID Quantity
1 10248 11 12
2 10248 42 10
3 10248 72 5
4 10249 14 9
5 10249 51 40
518 10443 28 12

Start with subquery

Let’s calculate the total number of products for each ProductID

SELECT    
          SUM(Quantity)  AS total
FROM     
          OrderDetails
GROUP BY  
          ProductID
  • So far we have a list of totals for each productID, in order to find the average
  • we call the attribute total from the subquery into the outer query
  • so let’s write the outer query like we did above
  • Ooops, we haven’t given the subquery a name and used a WITH yet so our FROM statement is not complete
SELECT
         AVG(total)
FROM
         *****

Let’s put it all together to find the average quantity of product sold

WITH    TempTable  AS (
        SELECT    SUM(Quantity)  AS total
        FROM      OrderDetails
        GROUP BY  ProductID
        )
SELECT  AVG(total)
FROM    TempTable;
  • As you see the attribute of TempTable.total was declared inside the AS ()
  • unlike the prior example where avg_value was declared in TempTable(avg_value) outside the AS()

Any & All


ANY & ALL can be found in theFilter document. But the nature of both of these operators are subqueries.

  • The use of either ANY or ALL will occur in a subquery
  • The ANY and ALL operators allow you to perform a comparison between a single column value and a range of other values.

Any

The ANY operator, means that the condition will be true if the operation is true for ANY of the values in the range.

  • returns a boolean value as a result
  • returns TRUE if ANY of the subquery values meet the condition
  • the operator below must be a standard comparison operator (=,<>,!=,>,>=,<, <=)
  • Syntax:
SELECT      column_name(s)
FROM        table_name
WHERE       column_name operator ANY
                (SELECT column_name
                FROM table_name
                WHERE condition);

All

The ALL operator, means that the condition will be true if the operation is true for ALL of the values in the range.

  • returns a boolean value as a result
  • returns TRUE if ALL of the subquery values meet the condition
  • is used with SELECT, WHERE and HAVING
  • the operator below must be a standard comparison operator (=,<>,!=,>,>=,<, <=)
  • Syntax:
SELECT ALL  column_name(s)
FROM        table_name
WHERE       condition(s)  

# ----    OR    ------
SELECT      column_name(s)
FROM        table_name
WHERE       column_name operator ALL
                (SELECT column_name
                FROM table_name
                WHERE condition);

Any Location


Subq in Where

Extract information for all the employees with a salary < average salary of the entire table

  • If you use a function with WHERE many times you’ll end up with an error “Cannot evaluate Aggregate function like AVG() in the WHERE clause”
  • If we want to extract values from employees table where
  • salary is less than average salary of the entire table
  • !st Query: we target the AVG() salary for the entire table
# ___ Inner QUERY, get avg salary from the entire table
SELECT  AVG(salary)
FROM    Employees;

# ___ Outer Query
SELECT emp_id, f_name, l_name, salary
FROM    Employees
WHERE   salary < 
                (SELECT  AVG(salary)
                 FROM    Employees);

Subq in Select

Extract information and average salary for each employee using an alias.

# ___ Start with the inner query first
SELECT  emp_id, salary,
        (SELECT AVG(salary) FROM    Emmployees)
        AS  avg_salary)
FROM    Employees;

Subq in From

Here we use the result of the subq to substitute the table name. This is called Derived Tables or Table Expressions

  • Let’s derive non-sensitive information from the table in a query
  • The subquery below could be done without but this is helpful specially when we use multiple tables and unions
# ___ Create a query to extract non-sensitive information
SELECT  * 
FROM    (SELECT emp_id, f_name, l_name, dep_id
        FROM    Employees)
        AS      All_emp

Multiple Tables


In all these examples below, for the code to work substitute (inner query) with the actual code for the inner query. I presented the way I did below because it is easier for beginners to understand it this way.

I’ve also use different column labels because what’s used in the image is annoying and not realistic

Example 1

Retrieve only the employee records from the employees table for which a department ID exists in the departments table

  • Once again starting from the inner query where we want to extract the dept_id that’s in the departments table
  • In the outer query we extract the information we need for the results found IN the inner query
  • So as you see the inner query targets one table and the outer query targets the other tables to extract a subset based on the query of the other table
# ___ Inner query ___
SELECT  dept_id_dep
FROM    Departments

# ___ Outer query ___
SELECT  * 
FROM    Employees
WHERE   dep_id_dep IN
        ( inner query );

Example 2

Retrieve location of an employee, when you can see that the Employees table does not have any information regarding location, but Departments table does in loc_id

  • To find information for location ‘L0002’
# ___ Inner query ___
SELECT  dept_id_dep
FROM    Departments
WHERE   loc_id = 'L0002'

# ___ Outer query ___
SELECT  *
FROM    Emmployees
WHERE   dep_id IN
        ( inner query );

Example 3

Retrieve dept_id, dep_name for employees who earn more than 70K. Note dep_name & dep_id are in Departments table while salary info is in Employees table

# ___ Inner query ___
SELECT  dep_id
FROM    Employees
WHERE   salary > 70000

# ___ Outer query
SELECT  dept_id, dep_name
FROM    Departments
WHERE   dept_id IN
        ( inner query );

See more subqueries using Implicit Joins in the next section

Implicit Joins


This section crosses between subquery and joins, it is more a join than a subquery so I’ll include it here as well as in the Joins page.

2 Tables in From

  • Consider what would happen if we used 2 tables in the From statement
  • This will result in a FULL JOIN - Cartesian Join
  • You will end up with more rows than for each table individually

Every row from the first table is joined with every row from the second table, no common keys/columns are used, so you can imagine how big can become if we have two large tables

SELECT  *
FROM    Employees, Departments

Filter

Lets limit the results of the join using additional operands. We want the rows where dept_id matches dep_id. Note the use of table names to prefix with every column name

SELECT  *
FROM    Employees, Departments
WHERE   Employees.dep_id = Departments.dept_id;

Alias

Since table names can be long and our statements can become long and complicated, we can use aliases without the use of AS

SELECT  *
FROM    Employees E, Departments D
WHERE   E.dep_id = D.dept_id;

# __ To subset even further we can narrow the SELECT statement
SELECT  Emp_id, Dep_name
FROM    Employees E, Departments D
WHERE   E.dep_id = D.dept_id;

Example

  • Retrieve the dep_name for each employee.
SELECT  f_name, dep_name
FROM    Employees E, Departments D
WHERE   E.dep_id = D.dept_id_dep;

Summary


This is a summary table of 4 sections, so it’ll be at the end of each of these sections:

  • Functions
  • Subqueries
  • Implicit Joins
  • DateTime
Command Syntax (MySQL/DB2) Description Example (MySQL/DB2)
COUNT SELECT COUNT(column_name) FROM table_name WHERE condition; COUNT function returns the number of rows that match a specified criterion. SELECT COUNT(dep_id) FROM employees;
AVG SELECT AVG(column_name) FROM table_name WHERE condition; AVG function returns the average value of a numeric column. SELECT AVG(salary) FROM employees;
SUM SELECT SUM(column_name) FROM table_name WHERE condition; SUM function returns the total sum of a numeric column. SELECT SUM(salary) FROM employees;
MIN SELECT MIN(column_name) FROM table_name WHERE condition; MIN function returns the smallest value of the SELECTED column. SELECT MIN(salary) FROM employees;
MAX SELECT MAX(column_name) FROM table_name WHERE condition; MAX function returns the largest value of the SELECTED column. SELECT MAX(salary) FROM employees;
ROUND SELECT ROUND(2number, decimals, operation) AS RoundValue; ROUND function rounds a number to a specified number of decimal places. SELECT ROUND(salary) FROM employees;
LENGTH SELECT LENGTH(column_name) FROM table; LENGTH function returns the length of a string (in bytes). SELECT LENGTH(f_name) FROM employees;
UCASE SELECT UCASE(column_name) FROM table; UCASE function displays the column name in each table in uppercase. SELECT UCASE(f_name) FROM employees;
LCASE SELECT LCASE(column_name) FROM table; LCASE function displays the column name in each table in lowercase. SELECT LCASE(f_name) FROM employees;
DISTINCT SELECT DISTINCT column_name FROM table; DISTINCT function is used to display data without duplicates. SELECT DISTINCT UCASE(f_name) FROM employees;
DAY SELECT DAY(column_name) FROM table DAY function returns the day of the month for a given date. SELECT DAY(b_date) FROM employees where emp_id = 'E1002';
CURRENT_DATE SELECT CURRENT_DATE; CURRENT_DATE is used to display the current date. SELECT CURRENT_DATE;
DATEDIFF() SELECT DATEDIFF(date1, date2); DATEDIFF() is used to calculate the difference between two dates or time stamps. The default value generated is the difference in number of days. SELECT DATEDIFF(CURRENT_DATE, date_column) FROM table;
FROM_DAYS() SELECT FROM_DAYS(number_of_days); FROM_DAYS() is used to convert a given number of days to YYYY-MM-DD format. SELECT FROM_DAYS(DATEDIFF(CURRENT_DATE, date_column)) FROM table;
DATE_ADD() SELECT DATE_ADD(date, INTERVAL n type); DATE_ADD() is used to calculate the date after lapse of mentioned number of units of date type, i.e. if n=3 and type=DAY, the result is a date 3 days after what is mentioned in date column. The type valiable can also be months or years. SELECT DATE_ADD(date, INTERVAL 3 DAY);;
DATE_SUB() SELECT DATE_SUB(date, INTERVAL n type); DATE_SUB() is used to calculate the date prior to the record date by mentioned number of units of date type, i.e. if n=3 and type=DAY, the result is a date 3 days before what is mentioned in date column. The type valiable can also be months or years. SELECT DATE_SUB(date, INTERVAL 3 DAY);;
Subquery SELECT column_name [, column_name ] FROM table1 [, table2 ] WHERE column_name OPERATOR (SELECT column_name [, column_name ] FROM table1 [, table2 ] [WHERE])

Subquery is a query within another SQL query and embedded within the WHERE clause.

A subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved.

SELECT emp_id, f_name, l_name, salaryFROM employeeswhere salary< (SELECT AVG(salary)FROM employees);

SELECT * FROM ( SELECT emp_id, f_name, l_name, dep_id FROM employees) AS emp4all;


SELECT * FROM employees WHERE job_id IN (SELECT job_ident FROM jobs);

Implicit Inner Join SELECT column_name(s) FROM table1, table2 WHERE table1.column_name = table2.column_name; Implicit Inner Join combines two or more records but displays only matching values in both tables. Inner join applies only the specified columns. SELECT * FROM employees, jobs where employees.job_id = jobs.job_ident;
Implicit Cross Join SELECT column_name(s) FROM table1, table2; Implicit Cross Join is defined as a Cartesian product where the number of rows in the first table is multiplied by the number of rows in the second table. SELECT * FROM employees, jobs;