All Articles

Learning SQL, Part 6 - Joins

This is the sixth article in the learning SQL series:

Joins Overview

Joins in SQL are a way to combine related tables. Typically, this is done through a foreign key - a column in a table used to reference another table. By instructing SQL to combine the tables using like keys, we are able to write more complex queries and utilize columns from more than one table.

Types of Joins

There are multiple different types of joins, but this article will give an overview of the two most commonly used joins - the INNER JOIN and the LEFT OUTER JOIN.

INNER JOIN

The inner join is used in situations where we want records that exist in both tables. For example, we may have 100 customers in a DimCustomer table and various sales transactions in a FactSales table. Out of these 100 customers, maybe only 80 of them had sales this year. An inner join would return records for just these 80 customers - since they existed in both tables.

Let’s walk through an example. There are 18,484 distinct customers in our DimCustomer table (example SQL below). Let’s say we are trying to determine sales dollars in FactInternetSales for each customer in 2011. Based on the nature of an Inner Join, how many records will be returned - the full 18,484 (one for each customer) or a smaller number that corresponds to those who had a sales record in 2011?

SELECT COUNT(*) as [record_count]   --returns 18484
FROM [AdventureWorksDW2017].[dbo].[DimCustomer]
Example 1 Query: Total Customer Count

The SQL to answer this question is below. This query returns 2,216 rows. It returns the lower row count due to the fact that not every customer had a sales record in 2011. The results include the CustomerKey (we could have used additional customer attributes if needed) and sales totals - but only for customers who existed in both tables.

SELECT
	 customer.CustomerKey           --returns 2216 rows
	,SUM(sales.ExtendedAmount) as [sales]
FROM [AdventureWorksDW2017].[dbo].[DimCustomer] customer
INNER JOIN [AdventureWorksDW2017].[dbo].[FactInternetSales] sales
	ON customer.CustomerKey = sales.CustomerKey
        AND YEAR(sales.OrderDate) = 2011	
GROUP BY
	 customer.CustomerKey
Example 2 Query: Inner Join

LEFT OUTER JOIN

The left outer join also joins two tables, but keeps records from the left table that did not have a match.

Continuing on the same example, what happens if we change our inner join to a left outer join? The below query (note the join type is the only change) now returns the full 18,484 rows.

The CustomerKeys that do not have any sales simply show a NULL in the sales column. This is the expected behavior from a Left Join - records from the left table that do not have a match will simply show a NULL in columns from the right table (unless specifically handled differently using a ISNULL or COALESCE function).

SELECT
	 customer.CustomerKey           --returns 18484 rows
	,SUM(sales.ExtendedAmount) as [sales]
FROM [AdventureWorksDW2017].[dbo].[DimCustomer] customer
LEFT OUTER JOIN [AdventureWorksDW2017].[dbo].[FactInternetSales] sales
	ON customer.CustomerKey = sales.CustomerKey
        AND YEAR(sales.OrderDate) = 2011	
GROUP BY
	 customer.CustomerKey
Example 3 Query: Left Outer Join

Other Join Types

Other common joins include the FULL OUTER JOIN and RIGHT JOIN. The Full Outer Join returns records from both tables and shows nulls where there was no match. The Right Join is just the opposite of our left join example above (keeps records from the right table).

Why Join Types Matter

Different join types can give different results, particularly in cases where two tables do not have exact matches or lookups from one to the other. To determine which type of join to use, you should test the tables to ensure that all the keys exist from one to another (you could even do something similar to the examples above, but removing the year restriction for example).

An Inner Join can perform better and should be used where possible; however, based on the question you are looking to answer, a LEFT or FULL OUTER join may be the best choice if you want to show NULLs or Zeroes for records that did not have a lookup/match in the secondary table.

Next Steps

Play around with the different join types to become familiar with the syntax and how the different types affect results. Choose a fact table that you want to aggregate metrics on, pick one or more dimension keys, and start to build out the joins. You can join more than one table to build more complex queries that return richer datasets.

For a deeper dive into some of the concepts related to this article, check out the following books: