SQL Basics


Although SQL is an ANSI/ISO standard, there are different versions of the SQL language.

However, to be compliant with the ANSI standard, they all support at least the major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner.

semicolon

Some database systems require a semicolon at the end of each SQL statement. Semicolon is the standard way to separate each SQL statement in database systems that allow more than one SQL statement to be executed in the same call to the server.

Naming


This is not only relevant to SQL but most data analysis, as we tend to use all SQL, R, python or at least 2 of the 3.

Caps

capitalization/case sensitivity

With SQL, capitalization usually doesn’t matter. You could enter SELECT or select or SeLeCT. They all work! But if you use capitalization as part of a consistent style your queries will  look more professional.

To enter SQL queries like a pro, it is always a good idea to use all caps for clause starters (e.g. SELECT, FROM, WHERE, etc.). Functions should also be in all caps (e.g. SUM()). The only time that capitalization does matter is when it is inside quotes (more on quotes below).

Vendors of SQL databases may use slightly different variations of SQL. These variations are called SQL dialects.

  • Some SQL dialects are case sensitive. BigQuery is one of them. Vertica is another.
  • But most, like MySQL, PostgreSQL, and SQL Server, aren’t case sensitive.
  • This means if you searched for country_code = ‘us’, it will return all entries that have ‘us’, ‘uS’, ‘Us’, and ‘US’.
  • This isn’t the case with BigQuery. BigQuery is case sensitive, so that same search would only return entries where the country_code is exactly ‘us’. If the country_code is ‘US’, BigQuery wouldn’t return those entries as part of your result.

Quotes

single or double quotes

There are two situations where it does matter what kind of quotes you use:

  1. When you want strings to be identifiable in any SQL dialect. Within each SQL dialect there are rules for what is accepted and what isn’t. But a general rule across almost all SQL dialects is to use single quotes for strings. This helps get rid of a lot of confusion. So if we want to reference the country US in a WHERE clause (e.g. country_code = 'US'), then use single quotes around the string 'US'.

  2. When your string contains an apostrophe or quotation marks. The second situation is when your string has quotes inside it. Suppose you have a column favorite_food in a table called FavoriteFoods and the other column corresponds to each friend.

Generally speaking, this should be the only time you would use double quotes instead of single quotes.

friend favorite_food
Rachel DeSantos Shepherd’s pie
Sujin Lee Tacos
Najil Okoro Spanish paella

Columns

snake_case names

Column names should be all lowercase. This helps keep your queries consistent and easier to read while not impacting the data that will be pulled when you run them. So, spaces are bad in SQL names. Never use spaces.This means that ‘total tickets’, which has a space between the two words, should be entered as total_tickets with an underscore instead of a space.

It is always good to give your columns useful names, especially when using functions. After running your query, you want to be able to quickly understand your results.

Tables

CamelCase names for tables

Table names should be in CamelCase. CamelCase capitalization means that you capitalize the start of each word, like a two-humped (Bactrian) camel. So the table TicketsByOccasion uses CamelCase capitalization.

Please note that the capitalization of the first word in CamelCase is optional; camelCase is also used. Some people differentiate between the two styles by calling CamelCase, PascalCase, and reserving camelCase for when the first word isn’t capitalized, like a one-humped (Dromedary) camel; for example, ticketsByOccasion.

At the end of the day, CamelCase is a style choice. There are other ways you can name your tables, including:

  • All lower or upper case, like ticketsbyoccasion or TICKETSBYOCCASION
  • With snake_case,  like tickets_by_occasion

Keep in mind, the option with all lowercase or uppercase letters can make it difficult to read your table name, so it isn’t recommended for professional use.

The second option, snake_case, is technically okay. With words separated by underscores, your table name is easy to read, but it can get very long because you are adding the underscores. It also takes more time to enter. If you use this table a lot, it can become a chore.

Indent

As a general rule, you want to keep the length of each line in a query <= 100 characters. Now it is much easier to understand what you are trying to do in the SELECT clause.

Sure, both queries will run without a problem because indentation doesn’t matter in SQL. But proper indentation is still important to keep lines short. And it will be valued by anyone reading your query, including yourself!

Comments

/*  */

Some tables aren’t designed with descriptive naming conventions.

  • In the example, field1 was the column for a customer’s last name, but you wouldn’t know it by the name.
  • A better name would have been something such as last_name. In these cases, you can place comments alongside your SQL to help you remember what the name represents.
  • Comments are text placed between certain characters, /* and */, or after two dashes – – as shown below.
SELECT 
        field1     /* this is the last name column */ 
FROM    
        table      -- this is the customer data table
WHERE  
        field1
LIKE
        'Ch%';

– –

Comments can also be added outside of a statement as well as within a statement.

  • You can use this flexibility to provide an overall description of what you are going to d
  • step-by-step notes about how you achieve it, and
  • why you set different parameters/conditions.
-- This is an important query used later to join with the accounts table
SELECT     
        rowkey,     -- key used to join with account_id 
        Info.date,  -- date is in string format YYYY-MM-DD HH:MM:SS
        Info.code -- e.g., 'pub-###' 
FROM Publishers

#

You can use # in place of the two dashes, –, in the above query but keep in mind

  • # isn’t recognized in all SQL dialects (MySQL doesn’t recognize #)
  • So it is best to use – and be consistent with it.
  • When you add a comment to a query using –, the database query engine will ignore everything in the same line after –. It will continue to process the query starting on the next line.

How To Query


Basics Summary

Here are some of the basic commands we’ll cover and that you’ll use often:

Command Syntax Description Example
SELECT SELECT column1, column2, … FROM table_name; SELECT statement is used to fetch data from a database. SELECT city FROM placeofinterest;
WHERE SELECT column1, column2, …FROM table_name WHERE condition; WHERE clause is used to extract only those records that fulfill a specified condition. SELECT * FROM placeofinterest WHERE city == ‘Rome’ ;
COUNT SELECT COUNT * FROM table_name ; COUNT is a function that takes the name of a column as argument and counts the number of rows when the column is not NULL. SELECT COUNT(country) FROM placeofinterest WHERE country=‘Canada’;
DISTINCT SELECT DISTINCT columnname FROM table_name; DISTINCT function is used to specify that the statement is a query which returns unique values in specified columns. SELECT DISTINCT country FROM placeofinterest WHERE type=‘historical’;
LIMIT SELECT * FROM table_name LIMIT number; LIMIT is a clause to specify the maximum number of rows the result set must have. SELECT * FROM placeofinterest WHERE airport=“pearson” LIMIT 5;
INSERT INSERT INTO table_name (column1,column2,column3…) VALUES(value1,value2,value3…); INSERT is used to insert new rows in the table. INSERT INTO placeofinterest (name,type,city,country,airport) VALUES(‘Niagara Waterfalls’,‘Nature’,‘Toronto’,‘Canada’,‘Pearson’);
UPDATE UPDATE table_name SET[[column1]=[VALUES]] WHERE [condition]; UPDATE used to update the rows in the table. UPDATE placeofinterest SET name = ‘Niagara Falls’ WHERE name = “Niagara Waterfalls”;
DELETE DELETE FROM table_name WHERE [condition]; DELETE statement is used to remove rows from the table which are specified in the WHERE condition. DELETE FROM placeofinterest WHERE city IN (‘Rome’,‘Vienna’);

What is a query?

A query is a request for data or information from a database. When you query databases, you use SQL to communicate your question or request. You and the database can always exchange information as long as you speak the same language.

Every programming language, including SQL, follows a unique set of guidelines known as syntax. Syntax is the predetermined structure of a language that includes all required words, symbols, and punctuation, as well as their proper placement. As soon as you enter your search criteria using the correct syntax, the query starts working to pull the data you’ve requested from the target database.

Basic

The syntax of every SQL query is the same:

  • Use SELECT to choose the columns you want to return.
  • Use FROM to choose the tables where the columns you want are located.
  • Use WHERE to filter for certain information.

A SQL query is like filling in a template. You will find that if you are writing a SQL query from scratch, it is helpful to start a query by writing the SELECT, FROM, and WHERE keywords in the following format:

Step 1

SELECT

FROM

WHERE ;

Notice that the SQL statement shown above has a semicolon at the end. The semicolon is a statement terminator and is part of the American National Standards Institute (ANSI) SQL-92 standard, which is a recommended common syntax for adoption by all SQL databases. However, not all SQL databases have adopted or enforce the semicolon, so it’s possible you may come across some SQL statements that aren’t terminated with a semicolon. If a statement works without a semicolon, it’s fine.

Step 2

  • Enter the table name after the FROM
  • the table columns you want after the SELECT
  • finally, the conditions you want to place on your query after the WHERE.
  • make sure to add a new line and indent when adding these, as shown below:
SELECT Specifies the columns from which to retrieve data
FROM Specifies the table from which to retrieve data
WHERE Specifies criteria that the data must meet

Following this method each time makes it easier to write SQL queries. It can also help you make fewer syntax errors.

Example of a query

Here is how a simple query would appear in BigQuery, a data warehouse on the Google Cloud Platform.

SELECT    first_name
FROM      customer_data.customer_name
WHERE     first_name = 'Tony'

The above query uses three commands to locate customers with the first_name, ‘Tony’:

  1. SELECT the column named first_name
  2. FROM a table named customer_name (in a dataset named customer_data) (The dataset name is always followed by a dot, and then the table name.)
  3. But only return the data WHERE the first_name is ‘Tony’

As you can conclude, this query had the correct syntax, but wasn’t very useful after the data was returned because it  only showed the first column and a long list of “Tony”

Multiple columns in a query

Of course, as a data professional, you will need to work with more data beyond customers named Tony. Multiple columns that are chosen by the same SELECT command can be indented and grouped together.

If you are requesting multiple data fields from a table, you need to include these columns in your SELECT command. Each column is separated by a comma as shown below:

SELECT   
        column A,
        column B,
        column C
FROM 
        customer_data.customer_name 
WHERE
        condition1 = 'Tony'
        AND condition 2
        AND condition 3;

Notice that unlike the SELECT command that uses a comma to separate fields / variables / parameters, the WHERE command uses the AND statement to connect conditions.

Data Types


I will only cover part of SQL Server data types the rest could be found in the tables below or looked up for the server you are using:

varchar

Is a variable char STRING. SQL does not set a predetermined size it expands with the size. The max is 50 anything above that will truncate

​varbinary

Can store any type of data, we can use HASH passwords, strings, integers…

String Data Types

Data type Description Max size Storage
char(n) Fixed width character string 8,000 characters Defined width
varchar(n) Variable width character string 8,000 characters 2 bytes + number of chars
varchar(max) Variable width character string 1,073,741,824 characters 2 bytes + number of chars
text Variable width character string 2GB of text data 4 bytes + number of chars
nchar Fixed width Unicode string 4,000 characters Defined width x 2
nvarchar Variable width Unicode string 4,000 characters
nvarchar(max) Variable width Unicode string 536,870,912 characters
ntext Variable width Unicode string 2GB of text data
binary(n) Fixed width binary string 8,000 bytes
varbinary Variable width binary string 8,000 bytes
varbinary(max) Variable width binary string 2GB
image Variable width binary string 2GB

Numeric Data Types

Data type Description Storage
bit Integer that can be 0, 1, or NULL
tinyint Allows whole numbers from 0 to 255 1 byte
smallint Allows whole numbers between -32,768 and 32,767 2 bytes
int Allows whole numbers between -2,147,483,648 and 2,147,483,647 4 bytes
bigint Allows whole numbers between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 8 bytes
decimal(p,s)

Fixed precision and scale numbers.

Allows numbers from -10^38 +1 to 10^38 –1.

The p parameter indicates the maximum total number of digits that can be stored (both to the left and to the right of the decimal point). p must be a value from 1 to 38. Default is 18.

The s parameter indicates the maximum number of digits stored to the right of the decimal point. s must be a value from 0 to p. Default value is 0

5-17 bytes
numeric(p,s)

Fixed precision and scale numbers.

Allows numbers from -10^38 +1 to 10^38 –1.

The p parameter indicates the maximum total number of digits that can be stored (both to the left and to the right of the decimal point). p must be a value from 1 to 38. Default is 18.

The s parameter indicates the maximum number of digits stored to the right of the decimal point. s must be a value from 0 to p. Default value is 0

5-17 bytes
smallmoney Monetary data from -214,748.3648 to 214,748.3647 4 bytes
money Monetary data from -922,337,203,685,477.5808 to 922,337,203,685,477.5807 8 bytes
float(n)

Floating precision number data from -1.79E + 308 to 1.79E + 308.

The n parameter indicates whether the field should hold 4 or 8 bytes. float(24) holds a 4-byte field and float(53) holds an 8-byte field. Default value of n is 53.

4 or 8 bytes
real Floating precision number data from -3.40E + 38 to 3.40E + 38 4 bytes

Date and Time Data Types

Data type Description Storage
datetime From January 1, 1753 to December 31, 9999 with an accuracy of 3.33 milliseconds 8 bytes
datetime2 From January 1, 0001 to December 31, 9999 with an accuracy of 100 nanoseconds 6-8 bytes
smalldatetime From January 1, 1900 to June 6, 2079 with an accuracy of 1 minute 4 bytes
date Store a date only. From January 1, 0001 to December 31, 9999 3 bytes
time Store a time only to an accuracy of 100 nanoseconds 3-5 bytes
datetimeoffset The same as datetime2 with the addition of a time zone offset 8-10 bytes
timestamp Stores a unique number that gets updated every time a row gets created or modified. The timestamp value is based upon an internal clock and does not correspond to real time. Each table may have only one timestamp variable

Other Data Types

Data type Description
sql_variant Stores up to 8,000 bytes of data of various data types, except text, ntext, and timestamp
uniqueidentifier Stores a globally unique identifier (GUID)
xml Stores XML formatted data. Maximum 2GB
cursor Stores a reference to a cursor used for database operations
table Stores a result-set for later processing