This is the sixth article in the learning SQL series:
- Part 1: Setup your environment
- Part 2: Basic Select Statements
- Part 3: Where Statements
- Part 4: More Where Statements
- Part 5: Aggregations and Group By Statements
- Part 6: Joins (this article)
Joins in SQL are a way to combine related tables. Typically, this is done through a foreign key - a column in a table used to reference another table. By instructing SQL to combine the tables using like keys, we are able to write more complex queries and utilize columns from more than one table.
There are multiple different types of joins, but this article will give an overview of the two most commonly used joins - the INNER JOIN and the LEFT OUTER JOIN.
The inner join is used in situations where we want records that exist in both tables. For example, we may have 100 customers in a DimCustomer table and various sales transactions in a FactSales table. Out of these 100 customers, maybe only 80 of them had sales this year. An inner join would return records for just these 80 customers - since they existed in both tables.
Let’s walk through an example. There are 18,484 distinct customers in our DimCustomer table (example SQL below). Let’s say we are trying to determine sales dollars in FactInternetSales for each customer in 2011. Based on the nature of an Inner Join, how many records will be returned - the full 18,484 (one for each customer) or a smaller number that corresponds to those who had a sales record in 2011?
SELECT COUNT(*) as [record_count] --returns 18484 FROM [AdventureWorksDW2017].[dbo].[DimCustomer]
The SQL to answer this question is below. This query returns 2,216 rows. It returns the lower row count due to the fact that not every customer had a sales record in 2011. The results include the CustomerKey (we could have used additional customer attributes if needed) and sales totals - but only for customers who existed in both tables.
SELECT customer.CustomerKey --returns 2216 rows ,SUM(sales.ExtendedAmount) as [sales] FROM [AdventureWorksDW2017].[dbo].[DimCustomer] customer INNER JOIN [AdventureWorksDW2017].[dbo].[FactInternetSales] sales ON customer.CustomerKey = sales.CustomerKey AND YEAR(sales.OrderDate) = 2011 GROUP BY customer.CustomerKey
The left outer join also joins two tables, but keeps records from the left table that did not have a match.
Continuing on the same example, what happens if we change our inner join to a left outer join? The below query (note the join type is the only change) now returns the full 18,484 rows.
The CustomerKeys that do not have any sales simply show a NULL in the sales column. This is the expected behavior from a Left Join - records from the left table that do not have a match will simply show a NULL in columns from the right table (unless specifically handled differently using a ISNULL or COALESCE function).
SELECT customer.CustomerKey --returns 18484 rows ,SUM(sales.ExtendedAmount) as [sales] FROM [AdventureWorksDW2017].[dbo].[DimCustomer] customer LEFT OUTER JOIN [AdventureWorksDW2017].[dbo].[FactInternetSales] sales ON customer.CustomerKey = sales.CustomerKey AND YEAR(sales.OrderDate) = 2011 GROUP BY customer.CustomerKey
Other common joins include the FULL OUTER JOIN and RIGHT JOIN. The Full Outer Join returns records from both tables and shows nulls where there was no match. The Right Join is just the opposite of our left join example above (keeps records from the right table).
Different join types can give different results, particularly in cases where two tables do not have exact matches or lookups from one to the other. To determine which type of join to use, you should test the tables to ensure that all the keys exist from one to another (you could even do something similar to the examples above, but removing the year restriction for example).
An Inner Join can perform better and should be used where possible; however, based on the question you are looking to answer, a LEFT or FULL OUTER join may be the best choice if you want to show NULLs or Zeroes for records that did not have a lookup/match in the secondary table.
Play around with the different join types to become familiar with the syntax and how the different types affect results. Choose a fact table that you want to aggregate metrics on, pick one or more dimension keys, and start to build out the joins. You can join more than one table to build more complex queries that return richer datasets.