Subquery
There are ways to work with multiple tables in the same query, we can use:
- Sub-queries
- Implicit Join
- Join operators (INNER, OUTER…)
Subqueries are not a specific command or statement but are smaller Queries nested inside another.
- Why do we use subqueries? To make our code easier to understand
- On the surface they appear more complicated but think of them as a way to group several ideas together to represent a broader thought
- They decrease the amount of time we access the database
- Make the logic simpler (hopefully)
- Regardless of how many queries are nested within each other, the inner most query is performed first and the process continues outward till all queries have been performed
- Values of inner queries can be used in outer queries
- The outer most query is called: outer query or outer select, with the inner most query being the inner select
Subquery CAN NOT have more than one column specified in the
SELECT
clauseSubquery CAN NOT be nested in
SET
commandSubqueries that return more than one row can only be used with multiple value operators
- It’s simply a query within a query.
- The most inner query is executed first and moves outwards to the next outer query and ….
- You can subquery with
- With the
IN
orNOT IN
operator - With comparison operators
- Equal (=)
- Greater than (>)
- Less than (<)
- Greater than or equal ( >=)
- Less than or equal (<=)
- Not equal ( !=) or (<>)
- With the
EXISTS
orNOT EXISTS
operator - With the
ANY
orALL
operator - In the
FROM
clause - In the
SELECT
clause
- With the
data
- Here are two tables we’ll use to demonstrate: employees, departments
- Find all employee_id with location_id = 1700
- Of course you can use
INNER JOIN
on department_idWHERE
location_id=1700 but for the sake of this section let’s - go about it as follows so I can explain subqueriy and
WITH
in the next section
query1
- Let’s say you first find a list of all department_id at location_id = 1700
SELECT *
FROM
departments
WHERE= 1700; location_id
Output:
query2
- Now that you have a list of all department_id for that location
- Use it in table employees to extract all the employee_id, first and last names
SELECT
employee_id, first_name, last_name
FROM
employees
WHEREIN (1 , 3, 9, 10, 11)
department_id ORDER BY first_name , last_name;
subquery IN
- Let’s do it using a subquery, remember the inner most query is executed first
- The inner query will create a list of all department_id that are at location_id=1700
- Outer query will check the department_id
FROM
the employees table to see if it is the list generated in the subquery - Extracts all the rows and
ORDER BY
first and last name in that order
So let’s extract all employees that are at location=1700
SELECT
employee_id, first_name, last_name
FROM
employees
WHEREIN (SELECT
department_id
department_id
FROM
departments
WHERElocation_id = 1700)
ORDER BY first_name , last_name;
subquery NOT IN
So let’s extract all employees that are NOT at location=1700
# ______ You guessed it just replace with NOT IN in the code chunk above
subquery MAX
- Let’s use one of the comparison operators this time
Find the employees with the highest salary, we can use
MAX
SELECT
employee_id, first_name, last_name, salary
FROM
employees
WHERE= ( SELECT
salary MAX(salary)
FROM
employees)
ORDER BY first_name , last_name;
subquery AVG
Find the employees whose salaries are greater than the average of all employees
SELECT
employee_id, first_name, last_name, salary
FROM
employees
WHERE> ( SELECT
salary AVG(salary)
FROM employees);
subquery EXISTS
Products
ProductID | ProductName | SupplierID | CategoryID | Unit | Price |
---|---|---|---|---|---|
1 | Chais | 1 | 1 | 10 boxes x 20 bags | 18 |
2 | Chang | 1 | 1 | 24 - 12 oz bottles | 19 |
3 | Aniseed Syrup | 1 | 2 | 12 - 550 ml bottles | 10 |
4 | Chef Anton’s Cajun Seasoning | 2 | 2 | 48 - 6 oz jars | 22 |
5 | Chef Anton’s Gumbo Mix | 2 | 2 | 36 boxes | 21.35 |
Suppliers
SupplierID | SupplierName | ContactName | Address | City | PostalCode | Country |
---|---|---|---|---|---|---|
1 | Exotic Liquid | Charlotte Cooper | 49 Gilbert St. | London | EC1 4SD | UK |
2 | New Orleans Cajun Delights | Shelley Burke | P.O. Box 78934 | New Orleans | 70117 | USA |
3 | Grandma Kelly’s Homestead | Regina Murphy | 707 Oxford Rd. | Ann Arbor | 48104 | USA |
4 | Tokyo Traders | Yoshi Nagase | 9-8 Sekimai Musashino-shi | Tokyo | 100 | Japan |
List the suppliers with a product price less than 20
SELECT SupplierName
FROM Suppliers
WHERE EXISTS
(SELECT ProductName
FROM Products Products.SupplierID = Suppliers.supplierID
WHERE < 20); AND Price
Number of Records: 24
SupplierName |
---|
Exotic Liquid |
New Orleans Cajun Delights |
Tokyo Traders |
Mayumi’s |
Pavlova, Ltd. |
With
Let’s start by stating that WITH
is not supported by all database systems.
WITH
allows you to give a subquery block a name (refactoring), which then can be referenced within the main query- It defines a temporary relation and this temporary relation (let’s pretend it’s a temporary table) is then used by the query associated with the
WITH
clause - It only exists for the duration of the query that it belongs to
- The alias name assigned to the subquery is treated as it’s an inline view or table
- Introduced by Oracle back with Oracle 9i
- Think of it as a function created for the main function to use as its argument where the main function can perform iterative calculations
Let me go through some examples
outside attribute
- If you’re familiar with R functions, think of a function that has a name, arguments, return value
- Now think of WITH that has a name TempTable with a return value/attribute of avg_value
- Syntax for that so far would be:
WITH TempTable(avg_value) AS
- What do I mean by outside attribute: look avg_value it is outside the
AS ()
block - Let’s follow that in the code below
TempTable(avg_value) AS ( Subquery to calculate the avg_value) WITH
- Subquery to calculate avg_value is
AVG (measured_value)
SELECT FROM MainTable
- Let’s put the subquery in the ( subquery ) for
WITH
TempTable(avg_value) AS (
WITH AVG (measured_value)
SELECT
FROM MainTable )
- So far with have
WITH
that calculates the AVG of column measure_value from MainTable and stores it in the attribute avg_value within the temporary table TempTable - So if we want to access the values of the temporary table we just call on TemTable.avg_value
To find the list of measured_value that’s greater than the average of all values measured we need to SELECT the values from the MainTable that are greater than the calculated avg_value from the WITH query
SELECT measured_value
FROM MainTable, TempTable> TempTable.avg_value WHERE MainTable.measured_value
Put it all together to find the measured_value > average of all values
TempTable(avg_value) AS (
WITH AVG (measured_value)
SELECT
FROM MainTable
)
SELECT measured_value
FROM MainTable, TempTable> TempTable.avg_value WHERE MainTable.measured_value
inside attribute
- Let’s see if we can write it in a different way with this example
- We want to list the average quantity of products ordered per ProductID
- Data is in OrderDetails
OrderDetailID | OrderID | ProductID | Quantity |
---|---|---|---|
1 | 10248 | 11 | 12 |
2 | 10248 | 42 | 10 |
3 | 10248 | 72 | 5 |
4 | 10249 | 14 | 9 |
5 | 10249 | 51 | 40 |
… | … | … | … |
518 | 10443 | 28 | 12 |
Start with subquery
Let’s calculate the total number of products for each ProductID
SELECT SUM(Quantity) AS total
FROM
OrderDetails
GROUP BY ProductID
- So far we have a list of totals for each productID, in order to find the average
- we call the attribute total from the subquery into the outer query
- so let’s write the outer query like we did above
- Ooops, we haven’t given the subquery a name and used a
WITH
yet so ourFROM
statement is not complete
SELECTAVG(total)
FROM*****
Let’s put it all together to find the average quantity of product sold
AS (
WITH TempTable SUM(Quantity) AS total
SELECT
FROM OrderDetails
GROUP BY ProductID
)AVG(total)
SELECT FROM TempTable;
- As you see the attribute of TempTable.total was declared inside the
AS ()
- unlike the prior example where avg_value was declared in TempTable(avg_value) outside the
AS()
Any & All
ANY
& ALL
can be found in theFilter document. But the nature of both of these operators are subqueries.
- The use of either
ANY
or ALL will occur in a subquery - The
ANY
andALL
operators allow you to perform a comparison between a single column value and a range of other values.
Any
The ANY
operator, means that the condition will be true if the operation is true for ANY of the values in the range.
- returns a boolean value as a result
- returns TRUE if ANY of the subquery values meet the condition
- the operator below must be a standard comparison operator (=,<>,!=,>,>=,<, <=)
- Syntax:
column_name(s)
SELECT
FROM table_name
WHERE column_name operator ANY
(SELECT column_name
FROM table_name WHERE condition);
All
The ALL
operator, means that the condition will be true if the operation is true for ALL of the values in the range.
- returns a boolean value as a result
- returns TRUE if ALL of the subquery values meet the condition
- is used with
SELECT, WHERE
andHAVING
- the operator below must be a standard comparison operator (=,<>,!=,>,>=,<, <=)
- Syntax:
column_name(s)
SELECT ALL
FROM table_namecondition(s)
WHERE
# ---- OR ------
column_name(s)
SELECT
FROM table_name
WHERE column_name operator ALL
(SELECT column_name
FROM table_name WHERE condition);
Any Location
Subq in Where
Extract information for all the employees with a salary < average salary of the entire table
- If you use a function with WHERE many times you’ll end up with an error “Cannot evaluate Aggregate function like AVG() in the WHERE clause”
- If we want to extract values from employees table where
- salary is less than average salary of the entire table
- !st Query: we target the AVG() salary for the entire table
# ___ Inner QUERY, get avg salary from the entire table
SELECT AVG(salary);
FROM Employees
# ___ Outer Query
SELECT emp_id, f_name, l_name, salary
FROM Employees<
WHERE salary
(SELECT AVG(salary); FROM Employees)
Subq in Select
Extract information and average salary for each employee using an alias.
# ___ Start with the inner query first
SELECT emp_id, salary,
(SELECT AVG(salary) FROM Emmployees)
AS avg_salary); FROM Employees
Subq in From
Here we use the result of the subq to substitute the table name. This is called Derived Tables or Table Expressions
- Let’s derive non-sensitive information from the table in a query
- The subquery below could be done without but this is helpful specially when we use multiple tables and unions
# ___ Create a query to extract non-sensitive information
*
SELECT
FROM (SELECT emp_id, f_name, l_name, dep_id
FROM Employees) AS All_emp
Multiple Tables
In all these examples below, for the code to work substitute (inner query) with the actual code for the inner query. I presented the way I did below because it is easier for beginners to understand it this way.
I’ve also use different column labels because what’s used in the image is annoying and not realistic
Example 1
Retrieve only the employee records from the employees table for which a department ID exists in the departments table
- Once again starting from the inner query where we want to extract the dept_id that’s in the departments table
- In the outer query we extract the information we need for the results found
IN
the inner query - So as you see the inner query targets one table and the outer query targets the other tables to extract a subset based on the query of the other table
# ___ Inner query ___
SELECT dept_id_dep
FROM Departments
# ___ Outer query ___
*
SELECT
FROM Employees
WHERE dep_id_dep IN; ( inner query )
Example 2
Retrieve location of an employee, when you can see that the Employees table does not have any information regarding location, but Departments table does in loc_id
- To find information for location ‘L0002’
# ___ Inner query ___
SELECT dept_id_dep
FROM Departments= 'L0002'
WHERE loc_id
# ___ Outer query ___
*
SELECT
FROM Emmployees
WHERE dep_id IN; ( inner query )
Example 3
Retrieve dept_id, dep_name for employees who earn more than 70K. Note dep_name & dep_id are in Departments table while salary info is in Employees table
# ___ Inner query ___
SELECT dep_id
FROM Employees> 70000
WHERE salary
# ___ Outer query
SELECT dept_id, dep_name
FROM Departments
WHERE dept_id IN; ( inner query )
See more subqueries using Implicit Joins in the next section
Implicit Joins
This section crosses between subquery and joins, it is more a join than a subquery so I’ll include it here as well as in the Joins page.
2 Tables in From
- Consider what would happen if we used 2 tables in the From statement
- This will result in a FULL JOIN - Cartesian Join
- You will end up with more rows than for each table individually
Every row from the first table is joined with every row from the second table, no common keys/columns are used, so you can imagine how big can become if we have two large tables
*
SELECT FROM Employees, Departments
Filter
Lets limit the results of the join using additional operands. We want the rows where dept_id matches dep_id. Note the use of table names to prefix with every column name
*
SELECT
FROM Employees, Departments= Departments.dept_id; WHERE Employees.dep_id
Alias
Since table names can be long and our statements can become long and complicated, we can use aliases without the use of AS
*
SELECT
FROM Employees E, Departments D= D.dept_id;
WHERE E.dep_id
# __ To subset even further we can narrow the SELECT statement
SELECT Emp_id, Dep_name
FROM Employees E, Departments D= D.dept_id; WHERE E.dep_id
Example
- Retrieve the dep_name for each employee.
SELECT f_name, dep_name
FROM Employees E, Departments D= D.dept_id_dep; WHERE E.dep_id
Summary
This is a summary table of 4 sections, so it’ll be at the end of each of these sections:
- Functions
- Subqueries
- Implicit Joins
- DateTime
Command | Syntax (MySQL/DB2) | Description | Example (MySQL/DB2) |
---|---|---|---|
COUNT | SELECT COUNT(column_name) FROM table_name WHERE condition; |
COUNT function returns the number of rows that match a specified criterion. |
SELECT COUNT(dep_id) FROM employees; |
AVG | SELECT AVG(column_name) FROM table_name WHERE condition; |
AVG function returns the average value of a numeric column. |
SELECT AVG(salary) FROM employees; |
SUM | SELECT SUM(column_name) FROM table_name WHERE condition; |
SUM function returns the total sum of a numeric column. |
SELECT SUM(salary) FROM employees; |
MIN | SELECT MIN(column_name) FROM table_name WHERE condition; |
MIN function returns the smallest value of the SELECTED column. |
SELECT MIN(salary) FROM employees; |
MAX | SELECT MAX(column_name) FROM table_name WHERE condition; |
MAX function returns the largest value of the SELECTED column. |
SELECT MAX(salary) FROM employees; |
ROUND | SELECT ROUND(2number, decimals, operation) AS RoundValue; |
ROUND function rounds a number to a specified number of decimal places. |
SELECT ROUND(salary) FROM employees; |
LENGTH | SELECT LENGTH(column_name) FROM table; |
LENGTH function returns the length of a string (in bytes). |
SELECT LENGTH(f_name) FROM employees; |
UCASE | SELECT UCASE(column_name) FROM table; |
UCASE function displays the column name in each table in uppercase. |
SELECT UCASE(f_name) FROM employees; |
LCASE | SELECT LCASE(column_name) FROM table; |
LCASE function displays the column name in each table in lowercase. |
SELECT LCASE(f_name) FROM employees; |
DISTINCT | SELECT DISTINCT column_name FROM table; |
DISTINCT function is used to display data without duplicates. |
SELECT DISTINCT UCASE(f_name) FROM employees; |
DAY | SELECT DAY(column_name) FROM table |
DAY function returns the day of the month for a given date. |
SELECT DAY(b_date) FROM employees where emp_id = 'E1002'; |
CURRENT_DATE | SELECT CURRENT_DATE; |
CURRENT_DATE is used to display the current date. |
SELECT CURRENT_DATE; |
DATEDIFF() | SELECT DATEDIFF(date1, date2); |
DATEDIFF() is used to calculate the difference between two dates or time stamps. The default value generated is the difference in number of days. |
SELECT DATEDIFF(CURRENT_DATE, date_column) FROM table; |
FROM_DAYS() | SELECT FROM_DAYS(number_of_days); |
FROM_DAYS() is used to convert a given number of days to YYYY-MM-DD format. |
SELECT FROM_DAYS(DATEDIFF(CURRENT_DATE, date_column)) FROM table; |
DATE_ADD() | SELECT DATE_ADD(date, INTERVAL n type); |
DATE_ADD() is used to calculate the date after lapse of mentioned number of units of date type, i.e. if n=3 and type=DAY, the result is a date 3 days after what is mentioned in date column. The type valiable can also be months or years. |
SELECT DATE_ADD(date, INTERVAL 3 DAY);; |
DATE_SUB() | SELECT DATE_SUB(date, INTERVAL n type); |
DATE_SUB() is used to calculate the date prior to the record date by mentioned number of units of date type, i.e. if n=3 and type=DAY, the result is a date 3 days before what is mentioned in date column. The type valiable can also be months or years. |
SELECT DATE_SUB(date, INTERVAL 3 DAY);; |
Subquery | SELECT column_name [, column_name ] FROM table1 [, table2 ] WHERE column_name OPERATOR (SELECT column_name [, column_name ] FROM table1 [, table2 ] [WHERE]) |
A subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved. |
|
Implicit Inner Join | SELECT column_name(s) FROM table1, table2 WHERE table1.column_name = table2.column_name; |
Implicit Inner Join combines two or more records but displays only matching values in both tables. Inner join applies only the specified columns. |
SELECT * FROM employees, jobs where employees.job_id = jobs.job_ident; |
Implicit Cross Join | SELECT column_name(s) FROM table1, table2; |
Implicit Cross Join is defined as a Cartesian product where the number of rows in the first table is multiplied by the number of rows in the second table. |
SELECT * FROM employees, jobs; |