Filter


This document is hard to define because if you think about it most commands we use in SQL are geared towards filtering the data. Here are some that come to mind:

SELECT, FROM, WHERE, LENGTH, MAX, MIN, TOP(), GROUP BY….

See what I mean, even GROUP BY is a way to filter the data so I’ll do my best to not repeat to many commands in this section

Extract


Extract

  • EXTRACT extracts a part from a given date.
  • SYNTAX: EXTRACT(part FROM date)
  • part can be any of the following as long as it’s part of a DATE
  • works in MySQL 4.0
Parameter Description
part

Required. The part to extract. Can be one of the following:

  • MICROSECOND

  • SECOND

  • MINUTE

  • HOUR

  • DAYOFWEEK - returns 1-7 with Sunday=1

  • DAYOFYEAR

  • DAY

  • WEEK - returns 1-53 begin with Sunday

  • ISOWEEK

  • MONTH

  • QUARTER

  • YEAR

  • ISOYEAR

  • SECOND_MICROSECOND

  • MINUTE_MICROSECOND

  • MINUTE_SECOND

  • HOUR_MICROSECOND

  • HOUR_SECOND

  • HOUR_MINUTE

  • DAY_MICROSECOND

  • DAY_SECOND

  • DAY_MINUTE

  • DAY_HOUR

  • YEAR_MONTH

date Required. The date to extract a part from

Examples:

# --- week from date
SELECT  EXTRACT(WEEK FROM "2017-06-15")  

# --- minute from a datetime
SELECT  EXTRACT(MINUTE FROM "2017-06-15 09:34:21")

# --- year and month from a datetime 
SELECT  EXTRACT(YEAR_MONTH FROM "2017-06-15 09:34:21")  

# --- year & month from date 
SELECT
     EXTRACT(YEAR FROM Date) AS YEAR,             --time grouping 
     EXTRACT(MONTH FROM Date) AS MONTH,           --time grouping
     ProductId,                                   --which products are sold
     StoreID,                                     --which stores are selling
     SUM(quantity) AS UnitsSold,                  --how many (impacts inventory)
     AVG(UnitPrice) AS UnitPriceProxy,            --can be interesting
     COUNT(DISTINCT salesID) AS NumTransactions   --unique transactions can be interesting
FROM [your_project_name_here].sales.sales_info
GROUP BY
     YEAR, MONTH, ProductId, StoreID 
ORDER BY 
     YEAR, MONTH, ProductId, StoreID

Subset


Substr

with SELECT

Is used to SELECT & extract a section, a sub of a string from a string

  • If we want to extract a substring from customer_name starting with position 5 and being 3 chars long
SELECT          
        SUBSTR (customer_name, 5,3) AS extracted_string
FROM
        TableName;

with WHERE

Or can use it with WHERE to filter out after the fact of SELECT and any other calculations

  • Here we use it to extract a customer_id that starts with JO and is only 2 characters long
SELECT  customer_id
FROM    TalbeName
WHERE  
        SUBSTR(Customer_id, 1,2) = "JO"

In & Not In

  • The IN operator allows you to specify multiple values in a WHERE clause.
  • The IN operator is a shorthand for multiple OR conditions.
  • As you can guess NOT IN is the opposite
SELECT  * 
FROM    Customers 
WHERE   Country  IN ('Germany', 'France', 'UK');

Like

Command Syntax (MySQL/DB2) Description Example (MySQL/DB2)
LIKE SELECT column1, column2, ... FROM table_name WHERE columnN LIKE pattern;

LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

Two wildcards often used in conjunction with the LIKE operator are percent sign(%) and underscore sign (_), depending upon the SQL engine being used.

SELECT f_name , l_name FROM employees WHERE address LIKE '%Elgin,IL%';
This command will output all entries with Elgin,IL in the Address.

Sometimes we want values that begin with a pattern LIKE “Tur” so we use LIKE along with WHERE to filter/extract those values.

  • % Represents zero, one, or multiple characters
  • _ Represents a single character (MS Access uses a question mark (?) instead)
SELECT    COUNT(CustomerID), Country 
FROM      Customers 
WHERE     Country   LIKE  'Tu%'    - for countries that start with Tu 
WHERE     Country   LIKE  '%p'     - for countries that end with p 
WHERE     Country   LIKE  '%bs%'   - for countries that have bs in any position 
WHERE     Country   LIKE  'a__%';  - for countries that start with a and are at
                                         least 3 char long (a+two bars)

Wildcard

  • A wildcard character is used to substitute one or more characters in a string.
  • Wildcard characters are used with the LIKE operator.
  • The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
SELECT  *  
FROM    Customers 
WHERE   CustomerName LIKE 'a%';

Is Null

Is Not Null

A field with a NULL value is a field with no value.

If a field in a table is optional, it is possible to insert a new record or update a record without adding a value to this field. Then, the field will be saved with a NULL value.

  • Both used to test for IS NULL values or nor IS NOT NULL values
  • At times data is collected with devices that allow fields to be blank/NULL. Those fields can affect data analysis so we have to understand their occurrence and their effect
  • Remember: a field that contains SPACES or ZERO is not NULL by this definition
SELECT    COUNT(CustomerID), Country 
FROM      Customers 
WHERE     Country IS NULL 

#-- OR  --
WHERE     Country IS NOT NULL;

Between

Command Syntax (MySQL/DB2) Description Example (MySQL/DB2)
BETWEEN SELECT column_name(s) FROM table_name WHERE column_name BETWEEN value1 AND value2; The BETWEEN operator selects values within a given range. The values can be numbers, text, or dates. The BETWEEN operator is inclusive: begin and end values are included. SELECT * FROM employees WHERE salary BETWEEN 40000 AND 80000;
This generates all records of employees with salaries between 40000 and 80000.
  • The BETWEEN operator selects values within a given range. The values can be numbers, text, or dates.
  • The BETWEEN operator is inclusive: begin and end values are included.
SELECT  *  
FROM    Products 
WHERE   Price   BETWEEN 10 AND 20;

WHERE   Year    BETWEEN '2016-01-01' AND '2018-12-31'

Case

  • Syntax:
CASE    WHEN condition1 THEN result1
        WHEN condition2 THEN result2\
        WHEN conditionN THEN resultN 
        ELSE result 
END;
  • The CASE expression goes through conditions and returns a value when the first condition is met (like an if-then-else statement).
  • Once a condition is true, it will stop reading and return the result.
  • If no conditions are true, it returns the value in the ELSE clause.
  • If there is no ELSE part and no conditions are true, it returns NULL.
  • See How To Case for examples

Exists

Note: all these examples in EXISTS are also examples of SUBQUERY

Syntax:

SELECT        column_name(s) 
FROM          table_name 
WHERE EXISTS              
        (SELECT   column_name
        FROM     table_name
        WHERE    condition);
  • The EXISTS operator is used to test for the existence of any record in a subquery.
  • The EXISTS operator returns TRUE if the subquery returns one or more records.

Products

ProductID ProductName SupplierID CategoryID Unit Price
1 Chais 1 1 10 boxes x 20 bags 18
2 Chang 1 1 24 - 12 oz bottles 19
3 Aniseed Syrup 1 2 12 - 550 ml bottles 10
4 Chef Anton’s Cajun Seasoning 2 2 48 - 6 oz jars 22
5 Chef Anton’s Gumbo Mix 2 2 36 boxes 21.35

Suppliers

SupplierID SupplierName ContactName Address City PostalCode Country
1 Exotic Liquid Charlotte Cooper 49 Gilbert St. London EC1 4SD UK
2 New Orleans Cajun Delights Shelley Burke P.O. Box 78934 New Orleans 70117 USA
3 Grandma Kelly’s Homestead Regina Murphy 707 Oxford Rd. Ann Arbor 48104 USA
4 Tokyo Traders Yoshi Nagase 9-8 Sekimai Musashino-shi Tokyo 100 Japan

List the suppliers with a product price less than 20

SELECT       SupplierName 
FROM         Suppliers 
WHERE EXISTS 
             (SELECT     ProductName
              FROM       Products 
              WHERE      Products.SupplierID = Suppliers.supplierID
                         AND Price < 20);

Number of Records: 24

SupplierName
Exotic Liquid 
New Orleans Cajun Delights 
Tokyo Traders 
Mayumi’s 
Pavlova, Ltd. 

List the suppliers with a product price equal to 22

SELECT      SupplierName 
FROM        Suppliers 
WHERE EXISTS                
        (SELECT     ProductName
        FROM        Products             
        WHERE       Products.SupplierID = Suppliers.supplierID  
                    AND Price = 22);

Number of Records: 1

SupplierName
New Orleans Cajun Delights 

Select

Is usually the first command in your query, as you see above (or a long reference to a select statement). It is used to specify:

  • the column/field names of the table you want to select
  • both FROM and SELECT will get more complicated as we start nesting aggregations and conditions into them, but remember the base of all these statements

Distinct

Select Distinct

Is a specific SELECT where we only choose the unique value in column(s)

  • In a table, a column often contains many duplicate values, and sometimes we only want the list of unique (distinct) values in that column(s)
SELECT DISTINCT     Country  
FROM                CountryTable;

Select Top

  • The SELECT TOP clause is used to specify the number of records to return.
  • The SELECT TOP clause is useful on large tables with thousands of records. Returning a large number of records can impact performance.
  • TOP has to be first in the SELECT statement
  • TOP can be used with SUM, AVG, COUNT
  • Syntax:
SELECT TOP  N *         -- where N is a number (TOP 5 * ) returns TOP 5 rows
FROM        TableName;

Example:

SELECT TOP(5) AVG (column_name)  AS  top_five_avgs
# -- we used (5) because we are calculating the AVG as well
# -- could've been TOP 5 AS top_five

SQL Server/MS Access

  • The method varies with other servers
SELECT TOP  number|percent  column_name(s)
FROM                        TableName
WHERE                       condition;

MySQL

SELECT       column_name(s)
FROM         table_name
WHERE        condition
LIMIT        number;

Fetch

  • The previous command SELECT TOP cannot be used in Oracle, FETCH is used instead
SELECT          column_name(s)
FROM            table_name
ORDER BY        column_name(s)
FETCH FIRST     number ROWS ONLY;

Rownum

  • Older Oracle syntax was
SELECT          column_name(s)
FROM            table_name
WHERE           ROWNUM < number;

Where

Is used to filter the chosen records based on a specified condition(s)

  • you can concatenate multiple conditions using logical operators AND, OR
  • narrows your query so that the database returns only the data with an exact value match or the data that matches a certain condition that you want to satisfy
  • WHERE is also used in UPDATE, DELETE and many other SQL commands
  • WHERE CANNOT be used with aggregate functions
SELECT DISTINCT     Country  
FROM                CountryTable 
WHERE               Country = "blah"                     
                    AND Country = "blue" 
                    OR  Country = "blee";

That covers the basics of a query, now we move on and build statements with more common commands that help, sort, filter, aggregate, calculate, summarize….and many other functions I haven’t used yet.

Having

Command Syntax (MySQL/DB2) Description Example (MySQL/DB2)
HAVING SELECT column_name(s) FROM table_name GROUP BY column_name(s) HAVING condition HAVING clause is used in conjunction with GROUP BY clause in collaboration with the SELECT statement in order to filter the data as per the given condition and then group as per identical values of a specified parameter. SELECT DEP_ID, COUNT(*) AS "NUM_EMPLOYEES", AVG(SALARY) AS "AVG_SALARY" FROM EMPLOYEES GROUP BY DEP_ID HAVING count(*) < 4 ORDER BY AVG_SALARY;
  • HAVING was added because WHERE cannot be used with aggregate functions
  • HAVING HAS to be combined with an aggregate function or the GROUP BY clause
  • GOUP BY has to occur BEFORE you can use HAVING
  • WHERE is dedicated to the SELECT clause, while HAVING is dedicated to the GROUP BY clause
  • Syntax:
SELECT          column_name(s)
FROM            table_name
WHERE           condition
GROUP BY        column_name(s)
HAVING          condition
ORDER BY        column_name(s);

Example 1

HAVING is used instead of WHERE with aggregate functions as it is usually followed by an aggregate calculation

  • I prefer to indent commands that follow GROUP BY and a few other commands because it clarifies the thought process (for me)
  • In the code below I know most of the heavy work is done in the first 3 lines, once I use GROUP BY then
  • I want to see each group greater than 5
  • and I want those values ordered in descending order, that’s why I indent the way I do
  • I guess I’m used to R and python where I indent related section of code
SELECT       CustomerID, Country 
FROM         Customers 
GROUP BY     Country         
             HAVING COUNT(CustomerID) > 5
ORDER BY     COUNT(CustomerID) DESC;

Example 2

SELECT   count([member_biz_key]) AS bizCount,
        [member_biz_key] 
FROM   
        [dbo].[Member] 
GROUP BY    
        [member_biz_key] 
HAVING   
        count([member_biz_key]) > 1 
ORDER BY   
        bizCount desc

Any

ANY & ALL can be found in theFilter document. But the nature of both of these operators are subqueries.

  • The use of either ANY or ALL will occur in a subquery
  • The ANY and ALL operators allow you to perform a comparison between a single column value and a range of other values.

The ANY operator, means that the condition will be true if the operation is true for ANY of the values in the range.

  • returns a boolean value as a result
  • returns TRUE if ANY of the subquery values meet the condition
  • the operator below must be a standard comparison operator (=,<>,!=,>,>=,<, <=)
  • Syntax:
SELECT      column_name(s)
FROM        table_name
WHERE       column_name operator ANY
                (SELECT column_name
                FROM table_name
                WHERE condition);

All

The ALL operator, means that the condition will be true if the operation is true for ALL of the values in the range.

  • returns a boolean value as a result
  • returns TRUE if ALL of the subquery values meet the condition
  • is used with SELECT, WHERE and HAVING
  • the operator below must be a standard comparison operator (=,<>,!=,>,>=,<, <=)
  • Syntax:
SELECT ALL  column_name(s)
FROM        table_name
WHERE       condition(s)  

# ----    OR    ------
SELECT      column_name(s)
FROM        table_name
WHERE       column_name operator ALL
                (SELECT column_name
                FROM table_name
                WHERE condition);