The SDS Blog: 10 Essential Topics to Understand SQL

The SDS Blog: 10 Essential Topics to Understand SQL

topics to understand SQL

(Time to Read: 12 mins)

People that master SQL understand the three main parts of a functioning SQL system:

First, they understand how and why a SQL database is used.

Second, they can identify the key structures of SQL databases.

Third, they get the information they need efficiently. 

Now you might be wondering, where do you start? 

Here’s the deal:

We will flesh out what exactly the purpose of SQL (structured Query Language) is and how you are likely to interact with your company database.

Once you understand the use of the SQL database, we will look at how different databases are structured and why.

Now we can get our hands dirty and learn how to manipulate the tables and data to create the outputs we want. 

The 10 areas we will cover in his article:

  1. Key elements of a database
  2. Relational Keys
  3. Select Queries
  4. Joins
  5. Data Modelling
  6. Fact and Dimension Table Types
  7. Star Schema
  8. Snow Flake Schema
  9. Data Definition Language (DDL) basics
  10. Data Manipulation Language (DML) basic

 

 1. Key elements of a database:

A database usually takes the key areas of a business and breaks them into structured tables.

Each table will store information on a key area:

Staff table – A record of each member of staff and usually includes the hierarchy of management and job description.

Customer table – This table will store who your customers are and how to contact them e.g. address, email etc.

Product table- A table that brings all your product information together e.g. product name, product colour, product description e.g.

Orders table – This table keeps a record of each order your company has processed, this table will refer to a lot of key area in the business e.g. who made the sale, who was sold to, what was sold and when.

 

 2. Relational keys:

With each part of the business having its own table, we need to know how a table is structured and how they relate to each other:

What is a primary Key?

Tables within a database are structured for a purpose so that each table removes the need for duplicate values.

For example, a product table should not have duplicates of the same product.

A primary key is assigned to each unique row within a table. This creates an efficient structure when you need to find related information.

The product table could have a product ID unique to each product, though other aspects of the product might be duplicated e.g.

Product ID Product Type

Product ID

(Primary Key)

Product Name Product Type Product Colour
1 Stylish Kettle Kettle Red
2 Basic Toaster Toaster Red
3 Retro Kettle Kettle Blue

In this example there are duplicate values in each row, however each row is unique and is referenced by a primary key.

What is a foreign key?

A foreign key is a common field shared by two or more tables holding data from different business areas.

A foreign key is usually the primary key in one table and a reference key in another table.

For example, an order table would have a unique row for each order, however each order would reference information from other tables e.g.

Order ID

(Primary Key)

Order Date Product ID (Foreign key) Customer ID (Foreign Key)
1 1/2/1996 2 142
2 6/3/2001 1 362
3 7/9/2007 14 283

We can see by referencing the Product ID from the Order table to the product table above, that we have orders for a basic toaster and for a stylish kettle as well as an item with product ID 14.

The orders table also references different customers from the customer table that will have matching primary keys (Customer ID).

 

 3. Select Queries

Understanding the structure of a table allows us to write queries that access the information we need form each table.

What is a query:

A query is a SQL statement that asks a question of the database.

Why do we use a SQL query?

Imagine our orders table had all the company’s transactions over ten years and you wanted to know what was sold on 1st December 2017.

If you were to get this information from the table it could take a long time scrolling to find the right lines.

However, with a SQL Query we can get the computer to present the information we need.

SQL Query example:

SELECT

Order_Date, Order_ID, Product_ID

FROM Orders

Where Order_Date = “1 Dec 2017”

 This query would show us the results of every order placed on this day.

For a select statement we must assign which fields we want to retrieve, and then the table the fields come from.

 We can then add any filters for the data allowing us to get answers to specific queries quickly.

 

4. Joins:

Now we know how each table is structured (Primary Keys), how tables are linked (Foreign Keys) and how to write select queries. We can now look at the ways in which we can retrieve data from multiple tables simultaneously.

 Why use a join?

A join is how we connect two or more tables together. This allows us to summarise data that would normally be stored in different tables into one result set. This is useful when analysing data for report creation.

 What is a join?

We saw earlier that our product table and order table have the Product ID field in common. We could then use this ID to join the two tables e.g. 

SELECT

Order_ID, Order_Date, Product_Name, Product_Type, Product_Colour

FROM

Orders

JOIN Product

ON Orders.Product_ID = Product.Product_ID 

As you can see, the results would allow access to the information from the two separate tables.

 What are the different types of joins?

 topics to understand SQL 1

Inner Join:
This join includes only the information from both tables that matches.

E.g. if we changed the join in the above example to an inner join, would only include products that were included in orders and orders that included the products from the products table.

If there were any orders of products that were not in the products table these records would be excluded.

 topics to understand SQL 2

Left (Outer) Join:

This join includes all data from the first table referenced and only the records that match from the second table.

E.g. if we changed the join in the above example to a left join:  The results would show all order details. However, it would only show the product details that were includes in the orders.  

 

Right (Outer) Join:

 This is like the left join

This join includes all data from the second table referenced and only the records that match from the first table.

E.g. if we changed the join in the above example to a left join:  The results would show all product details. However, it would only show the order details that used products within the products table.

Full Outer Join:

This join is the opposite of the inner join.

All records are brought from both tables, where a link between the tables can be made the details will be joined.

E.g. if we changed the join in the above example to a full outer join:  The results would show all orders and product records, however if the order had a product that was missing from the product table all the product details would be null also if there was not an order for one of the products all the order details would be null.  

 

5. Data Modelling:

Now we know how to join tables in a query, we can look at the relationship between tables.

What are the different types of table relationships?

There are three types of relational models between tables: one-to-one, one-to-many, and many-to-many.

What is a one-to-one relationship?

A one-to-one relationship means a single ID in one table would reference a single ID in another table.

E.g. we have a table for our employees linked to a table for employee roles.

Employee:

Employee_ID First_Name Last_Name
154 John Smith
765 Tim Woods
388 Brian Jones

Employee roles:

Employee_ID Role Department
154 Engineer Maintenance
675 Scientist Research
388 Customer Service Call centre

 We can see that these tables could be joined using the Employee ID.

The relationship between the two tables is one-to-one because each employee has a unique ‘Employee ID’ and each ‘Employee ID’ has one role. 

 What is a one-to-many relationship?

A one-to-many relationship requires one table with unique values to be related to a table that has multiple values for each of the original values.

For example, we can look at our Customer table again and its relationship to the Order table.

Customer Table:

Customer_ID (PK) Customer_Name Country
142 Robinson James USA
283 Carl Jung DE
362 Rupert Maximus UK

 

Order Table:

Order_ID (PK) Order_Date Product_ID (FK) Customer_ID (FK)
1 1/2/1996 2 142
2 6/3/2001 1 362
3 7/9/2007 14 283
2 7/10/2007 14 362
5 1/3/1998 2 142

We can see that these tables could be joined using the Customer ID.

The relationship between the two tables is one-to-many because each customer is unique and each customer can have more than one order. In this example, we can see that Customer IDs 362 and 142 have two orders.

What is a many-to-many relationship?

A many-to-many relationship means that are no unique IDs used in either table

For example, we can look at our Orders and Product table.

Orders:

Order ID (PK) Order Date Product ID (FK) Customer ID (FK)
1 1/2/1996 2 142
2 6/3/2001 1 362
3 7/9/2007 14 283
2 7/10/2007 14 362
5 1/3/1998 2 142

 

Product:

Product ID (Primary Key) Product Name Product Type Product Colour
1 Stylish Kettle Kettle Red
2  Basic Toaster Toaster Red
3 Retro Kettle Kettle Blue
14 Cheese grater Utensil Green
15 Kitchen Board Utensil White

 The relationship between the two tables is many-to-many because each order can have many products and each product can be part of many customer orders. In this example, we can see Order ID 2 has two products and Product ID 14 and 2 appear in two orders.

 

6. Table Types:

When creating a database, we use two types of table to split up our data: dimension tables and fact tables.

 What is a Fact table?

A Fact table (also known as a measure table) as the name suggests stores a record of all the facts related to a subject.
In our example the Fact table is the Orders table, here we have all the facts needed in a transaction.

We have the who (Customer ID), the what (Product ID) and the when (Order Date). This allows each line of the order table to summarise everything about a transaction.

As well as the who, what and when the Fact table holds the quantities and totals involved in the transaction e.g. Product quantity, Order total, Cost and Profit. 

What is a Dimension table?

Dimension tables give structure to our Fact tables. If we look at our orders table, the Product ID and Customer ID do not mean much on their own. However, with our Dimension tables we know that Product ID 1 is a Stylish Kettle and Customer ID is Carl from Germany.

Using Dimension tables to store the details of each area of the transaction allows the database to save space and speed up processing time. 

Why do we use fact and dimension tables?

We use Fact and Dimension tables to improve the performance of our database and allow for easy maintenance.

If we were to store all the text information from our dimension table in the fact table, it would look nice to the user. However, due to the duplication of text fields, the server would really struggle to handle large datasets or process queries.

Also, if an update to the details stored in our database was required, it would have to make many more changes, for example:
If Product ID 2 changed from being “Stylish Kettle” to “Normal Kettle”, we would have to find and change every row that has “Stylish Kettle”.
With Fact and Dimension tables, we only update one row within the Product Dimension table and all references within the Orders table are updated.

 

 7. What is the star schema?

With an understanding of the key structural aspects of a database, we can look are the various types of database structures and their uses.
The star schema is easiest to understand, as we have already been working with one within our examples:

As we can see the at the centre of our star schema is our Fact table Orders. Each of our Dimension tables are linked to the Fact table, created a star looking schema.

Why use a star schema?

This is the most common schema for databases that are used for reports purposes, as this schema is easier for a user to navigate.  

What is a snow flake schema?

We covered snowflake schemas briefly when looking at the many-to-many relationship between our Staff table and Staff role table.

A snowflake schema uses a technique called normalisation to remove duplicates from a dimension table. This leaves just IDs referencing a table specific to this attribute.

 For example, if we normalised our product table to fit a star schema: 

Product:

Product ID (Primary Key) Product Name Product Type Product Colour
1 Stylish Kettle 1 Red
2  Basic Toaster 2 Red
3 Retro Kettle 1 Blue
14 Cheese grater 3 Green
15 Kitchen Board 3 White

 

Product Type:

Product Type ID Product Type
1 Kettle
2 Toaster
3 Utensil

As we can see, the Product Type text field within the Product table has been replaced with an ID which references the Product Type table where the text values are stored.

Why use a snow flake schema?

The most common use for snowflake schemas are transactional databases. These are live database that records each transaction as it happens.

For example, every time an order is placed with Amazon, a transaction is created. This will most likely be stored within a snow flake schema as it offers the quickest processing time for saving and querying data which is required for sales and stock taking. 

 

8. What is Data Definition Language (DDL)?

 Data Definition Language (DDL) is a vocabulary used in Transact-SQL (T-SQL) to create, edit and delete tables. 

What are the most frequently used DDLs?

CREATE:

We use this each time we create a table, for example:

CREATE TABLE Product (

               Product ID int,

               Product Name Varchar(255),

               Product Type Varchar (255),

               Product Colour Varchar (255),

               );

This script would create the table Product with the fields set out as the example we used above.
“int” and “Varchar” after the field name designate the data type of the field. “int” shows the field will store an integer, “Varchar” that the field will store text (or a string)

DROP:

We use this to delete a table, including its structure and data. For example:

DROP TABLE Product;

This would delete the Product table and any data contained within it.

TRUNCATE:

We use truncate to remove the data stored in a table but retain the table structure, for example:

TRUNCATE TABLE Product;

This would delete all data stored in the Product table, however, the table would remain with each field as set up originally.

ALTER:

We can use this to change the structure of a table.

For example, we can remove a field:

ALTER TABLE Product

DROP COLUMN Product Colour

This would remove the column the table and all data stored within it.

We can also add additional columns to an existing table:

ALTER TABLE Product

ADD Product Description Varchar (255)

 

 9. What is Data Manipulation Language (DML)?

Data Manipulation Language (DML) is a vocabulary used to retrieve and work with data stored within the database.

What are the most frequently used DMLs?

SELECT:

We have already used this DML to specify which records we want to retrieve for reviewing.

E.g.

SELECT

Order Date, Order ID, Product ID

FROM Orders

Where Order Date = “1 Dec 2017”

This would retrieve the data in each row stored in the fields Order Date, Order ID and Product ID.

The FROM and WHERE clauses are essential to DML queries to specify which table we wish the query to run on, and which range of data we wish to retrieve.

In this example we are accessing data from Orders table with an order date specified as “1 Dec 2017”

DELETE:

This is the opposite to the select statement, where the data in each row we specify is removed from the table e.g.

DELETE FROM Product

WHERE Order Date = “1 Dec 2017”

This statement would remove each row from the database that matched the details of the WHERE clause. All data in the associated fields i.e. Product ID would also be deleted.

UPDATE:

We can use this DML to correct or update a value within a table e.g.

UPDATE Product

SET Product Name = “Normal Kettle”, Product Colour = “Green”

WHERE Product ID = 1

This statement would select rows where the Product ID is 1, and then update the existing values within fields Product Name and Product Colour.

INSERT:

We can use the insert DML to add missing values to our tables e.g.

INSERT INTO Product (Product ID, Product Name, Product Type, Product Colour)

VALUES (13, “Stylish Cheese Grater”, 3, “Blue”)

This statement specifies the fields we wish to insert and the values to be added. You do not have to specify all fields, however any field not selected will be defaulted to a null value.

0

Sebastian Moncada
Sebastian Moncada

Content Manager

What are you waiting for?

EMPOWER YOUR CAREER WITH SUPERDATASCIENCE

CLAIM YOUR TRIAL MEMBERSHIP NOW
as seen on: