Filter
This document is hard to define because if you think about it most commands we use in SQL are geared towards filtering the data. Here are some that come to mind:
SELECT, FROM, WHERE, LENGTH, MAX, MIN, TOP(), GROUP BY
….
See what I mean, even GROUP BY is a way to filter the data so I’ll do my best to not repeat to many commands in this section
Extract
Extract
EXTRACT
extracts a part from a given date.- SYNTAX:
EXTRACT(part FROM date)
- part can be any of the following as long as it’s part of a DATE
- works in MySQL 4.0
Parameter | Description |
---|---|
part | Required. The part to extract. Can be one of the following:
|
date | Required. The date to extract a part from |
Examples:
# --- week from date
EXTRACT(WEEK FROM "2017-06-15")
SELECT
# --- minute from a datetime
EXTRACT(MINUTE FROM "2017-06-15 09:34:21")
SELECT
# --- year and month from a datetime
EXTRACT(YEAR_MONTH FROM "2017-06-15 09:34:21")
SELECT
# --- year & month from date
SELECTEXTRACT(YEAR FROM Date) AS YEAR, --time grouping
EXTRACT(MONTH FROM Date) AS MONTH, --time grouping
--which products are sold
ProductId, --which stores are selling
StoreID, SUM(quantity) AS UnitsSold, --how many (impacts inventory)
AVG(UnitPrice) AS UnitPriceProxy, --can be interesting
COUNT(DISTINCT salesID) AS NumTransactions --unique transactions can be interesting
FROM [your_project_name_here].sales.sales_info
GROUP BY
YEAR, MONTH, ProductId, StoreID
ORDER BY YEAR, MONTH, ProductId, StoreID
Subset
Substr
with SELECT
Is used to SELECT
& extract a section, a sub of a string from a string
- If we want to extract a substring from customer_name starting with position 5 and being 3 chars long
SELECT SUBSTR (customer_name, 5,3) AS extracted_string
FROM TableName;
with WHERE
Or can use it with WHERE to filter out after the fact of SELECT and any other calculations
- Here we use it to extract a customer_id that starts with JO and is only 2 characters long
SELECT customer_id
FROM TalbeName
WHERE SUBSTR(Customer_id, 1,2) = "JO"
In & Not In
- The
IN
operator allows you to specify multiple values in aWHERE
clause. - The
IN
operator is a shorthand for multipleOR
conditions. - As you can guess
NOT IN
is the opposite
*
SELECT
FROM Customers IN ('Germany', 'France', 'UK'); WHERE Country
Like
Command | Syntax (MySQL/DB2) | Description | Example (MySQL/DB2) |
---|---|---|---|
LIKE | SELECT column1, column2, ... FROM table_name WHERE columnN LIKE pattern; |
Two wildcards often used in conjunction with the LIKE operator are percent sign(%) and underscore sign (_), depending upon the SQL engine being used. |
SELECT f_name , l_name FROM employees WHERE address LIKE '%Elgin,IL%'; This command will output all entries with Elgin,IL in the Address. |
Sometimes we want values that begin with a pattern LIKE
“Tur” so we use LIKE
along with WHERE
to filter/extract those values.
- % Represents zero, one, or multiple characters
- _ Represents a single character (MS Access uses a question mark (?) instead)
COUNT(CustomerID), Country
SELECT
FROM Customers 'Tu%' - for countries that start with Tu
WHERE Country LIKE '%p' - for countries that end with p
WHERE Country LIKE '%bs%' - for countries that have bs in any position
WHERE Country LIKE 'a__%'; - for countries that start with a and are at
WHERE Country LIKE 3 char long (a+two bars) least
Wildcard
- A wildcard character is used to substitute one or more characters in a string.
- Wildcard characters are used with the
LIKE
operator. - The
LIKE
operator is used in aWHERE
clause to search for a specified pattern in a column.
*
SELECT
FROM Customers 'a%'; WHERE CustomerName LIKE
Is Null
Is Not Null
A field with a NULL value is a field with no value.
If a field in a table is optional, it is possible to insert a new record or update a record without adding a value to this field. Then, the field will be saved with a NULL value.
- Both used to test for
IS NULL
values or norIS NOT NULL
values - At times data is collected with devices that allow fields to be blank/
NULL
. Those fields can affect data analysis so we have to understand their occurrence and their effect - Remember: a field that contains SPACES or ZERO is not
NULL
by this definition
COUNT(CustomerID), Country
SELECT
FROM Customers NULL
WHERE Country IS
#-- OR --
NULL; WHERE Country IS NOT
Between
Command | Syntax (MySQL/DB2) | Description | Example (MySQL/DB2) |
---|---|---|---|
BETWEEN | SELECT column_name(s) FROM table_name WHERE column_name BETWEEN value1 AND value2; |
The BETWEEN operator selects values within a given range. The values can be numbers, text, or dates. The BETWEEN operator is inclusive: begin and end values are included. |
SELECT * FROM employees WHERE salary BETWEEN 40000 AND 80000; This generates all records of employees with salaries between 40000 and 80000. |
- The
BETWEEN
operator selects values within a given range. The values can be numbers, text, or dates. - The
BETWEEN
operator is inclusive: begin and end values are included.
*
SELECT
FROM Products 10 AND 20;
WHERE Price BETWEEN
'2016-01-01' AND '2018-12-31' WHERE Year BETWEEN
Case
- Syntax:
CASE WHEN condition1 THEN result1
WHEN condition2 THEN result2\
WHEN conditionN THEN resultN
ELSE result END;
- The
CASE
expression goes through conditions and returns a value when the first condition is met (like an if-then-else statement). - Once a condition is true, it will stop reading and return the result.
- If no conditions are true, it returns the value in the
ELSE
clause. - If there is no
ELSE
part and no conditions are true, it returns NULL. - See How To Case for examples
Exists
Note: all these examples in EXISTS
are also examples of SUBQUERY
Syntax:
column_name(s)
SELECT
FROM table_name
WHERE EXISTS
(SELECT column_name
FROM table_name WHERE condition);
- The
EXISTS
operator is used to test for the existence of any record in a subquery. - The
EXISTS
operator returns TRUE if the subquery returns one or more records.
Products
ProductID | ProductName | SupplierID | CategoryID | Unit | Price |
---|---|---|---|---|---|
1 | Chais | 1 | 1 | 10 boxes x 20 bags | 18 |
2 | Chang | 1 | 1 | 24 - 12 oz bottles | 19 |
3 | Aniseed Syrup | 1 | 2 | 12 - 550 ml bottles | 10 |
4 | Chef Anton’s Cajun Seasoning | 2 | 2 | 48 - 6 oz jars | 22 |
5 | Chef Anton’s Gumbo Mix | 2 | 2 | 36 boxes | 21.35 |
Suppliers
SupplierID | SupplierName | ContactName | Address | City | PostalCode | Country |
---|---|---|---|---|---|---|
1 | Exotic Liquid | Charlotte Cooper | 49 Gilbert St. | London | EC1 4SD | UK |
2 | New Orleans Cajun Delights | Shelley Burke | P.O. Box 78934 | New Orleans | 70117 | USA |
3 | Grandma Kelly’s Homestead | Regina Murphy | 707 Oxford Rd. | Ann Arbor | 48104 | USA |
4 | Tokyo Traders | Yoshi Nagase | 9-8 Sekimai Musashino-shi | Tokyo | 100 | Japan |
List the suppliers with a product price less than 20
SELECT SupplierName
FROM Suppliers
WHERE EXISTS
(SELECT ProductName
FROM Products Products.SupplierID = Suppliers.supplierID
WHERE < 20); AND Price
Number of Records: 24
SupplierName |
---|
Exotic Liquid |
New Orleans Cajun Delights |
Tokyo Traders |
Mayumi’s |
Pavlova, Ltd. |
List the suppliers with a product price equal to 22
SELECT SupplierName
FROM Suppliers
WHERE EXISTS
(SELECT ProductName
FROM Products Products.SupplierID = Suppliers.supplierID
WHERE Price = 22); AND
Number of Records: 1
SupplierName |
---|
New Orleans Cajun Delights |
Select
Is usually the first command in your query, as you see above (or a long reference to a select statement). It is used to specify:
- the column/field names of the table you want to select
- both
FROM
andSELECT
will get more complicated as we start nesting aggregations and conditions into them, but remember the base of all these statements
Distinct
Select Distinct
Is a specific SELECT
where we only choose the unique value in column(s)
- In a table, a column often contains many duplicate values, and sometimes we only want the list of unique (distinct) values in that column(s)
SELECT DISTINCT Country FROM CountryTable;
Select Top
- The
SELECT TOP
clause is used to specify the number of records to return. - The
SELECT TOP
clause is useful on large tables with thousands of records. Returning a large number of records can impact performance. TOP
has to be first in theSELECT
statementTOP
can be used withSUM, AVG, COUNT
…- Syntax:
* -- where N is a number (TOP 5 * ) returns TOP 5 rows
SELECT TOP N FROM TableName;
Example:
TOP(5) AVG (column_name) AS top_five_avgs
SELECT # -- we used (5) because we are calculating the AVG as well
# -- could've been TOP 5 AS top_five
SQL Server/MS Access
- The method varies with other servers
|percent column_name(s)
SELECT TOP number
FROM TableName WHERE condition;
MySQL
column_name(s)
SELECT
FROM table_name
WHERE condition LIMIT number;
Fetch
- The previous command
SELECT TOP
cannot be used in Oracle,FETCH
is used instead
column_name(s)
SELECT
FROM table_namecolumn_name(s)
ORDER BY FETCH FIRST number ROWS ONLY;
Rownum
- Older Oracle syntax was
column_name(s)
SELECT
FROM table_name< number; WHERE ROWNUM
Where
Is used to filter the chosen records based on a specified condition(s)
- you can concatenate multiple conditions using logical operators
AND, OR
- narrows your query so that the database returns only the data with an exact value match or the data that matches a certain condition that you want to satisfy
WHERE
is also used inUPDATE, DELETE
and many other SQL commandsWHERE
CANNOT be used with aggregate functions
SELECT DISTINCT Country
FROM CountryTable = "blah"
WHERE Country = "blue"
AND Country = "blee"; OR Country
That covers the basics of a query, now we move on and build statements with more common commands that help, sort, filter, aggregate, calculate, summarize….and many other functions I haven’t used yet.
Having
Command | Syntax (MySQL/DB2) | Description | Example (MySQL/DB2) |
---|---|---|---|
HAVING | SELECT column_name(s) FROM table_name GROUP BY column_name(s) HAVING condition |
HAVING clause is used in conjunction with GROUP BY clause in collaboration with the SELECT statement in order to filter the data as per the given condition and then group as per identical values of a specified parameter. |
SELECT DEP_ID, COUNT(*) AS "NUM_EMPLOYEES", AVG(SALARY) AS "AVG_SALARY" FROM EMPLOYEES GROUP BY DEP_ID HAVING count(*) < 4 ORDER BY AVG_SALARY; |
HAVING
was added becauseWHERE
cannot be used with aggregate functionsHAVING
HAS to be combined with an aggregate function or theGROUP BY
clauseGOUP BY
has to occur BEFORE you can useHAVING
WHERE
is dedicated to theSELECT
clause, whileHAVING
is dedicated to theGROUP BY
clause- Syntax:
column_name(s)
SELECT
FROM table_name
WHERE conditioncolumn_name(s)
GROUP BY
HAVING conditioncolumn_name(s); ORDER BY
Example 1
HAVING
is used instead of WHERE
with aggregate functions as it is usually followed by an aggregate calculation
- I prefer to indent commands that follow GROUP BY and a few other commands because it clarifies the thought process (for me)
- In the code below I know most of the heavy work is done in the first 3 lines, once I use
GROUP BY
then - I want to see each group greater than 5
- and I want those values ordered in descending order, that’s why I indent the way I do
- I guess I’m used to R and python where I indent related section of code
SELECT CustomerID, Country
FROM Customers
GROUP BY Country COUNT(CustomerID) > 5
HAVING COUNT(CustomerID) DESC; ORDER BY
Example 2
count([member_biz_key]) AS bizCount,
SELECT
[member_biz_key]
FROM
[dbo].[Member]
GROUP BY
[member_biz_key]
HAVING count([member_biz_key]) > 1
ORDER BY bizCount desc
Any
ANY
& ALL
can be found in theFilter document. But the nature of both of these operators are subqueries.
- The use of either
ANY
or ALL will occur in a subquery - The
ANY
andALL
operators allow you to perform a comparison between a single column value and a range of other values.
The ANY
operator, means that the condition will be true if the operation is true for ANY of the values in the range.
- returns a boolean value as a result
- returns TRUE if ANY of the subquery values meet the condition
- the operator below must be a standard comparison operator (=,<>,!=,>,>=,<, <=)
- Syntax:
column_name(s)
SELECT
FROM table_name
WHERE column_name operator ANY
(SELECT column_name
FROM table_name WHERE condition);
All
The ALL
operator, means that the condition will be true if the operation is true for ALL of the values in the range.
- returns a boolean value as a result
- returns TRUE if ALL of the subquery values meet the condition
- is used with
SELECT, WHERE
andHAVING
- the operator below must be a standard comparison operator (=,<>,!=,>,>=,<, <=)
- Syntax:
column_name(s)
SELECT ALL
FROM table_namecondition(s)
WHERE
# ---- OR ------
column_name(s)
SELECT
FROM table_name
WHERE column_name operator ALL
(SELECT column_name
FROM table_name WHERE condition);