POS/410 Flexnet Week 3 - Online
Lecture - Joins and Sub-queries
One of the most powerful features of the SQL language is its flexibility in retrieving and manipulating data. Data can be retrieved from one or more tables and then sorted and summarized as desired. Last week you retrieved data from a single table. This week we’ll take a look at retrieving data from multiple tables. There are two basic approaches: joins and sub-queries. In most case what can be accomplished with a join can be accomplished via a join and vice-versa.
Personal preferences will probably dictate which approach you use. Most people seem to prefer joins and they are generally faster. However, in some cases sub-queries will run faster and for certain type of queries they are the only way to do it.
This lecture will present an overview of this important and interesting topic. It should provide enough background to get you started with your related homework and project work.
First I’ll provide background information on both methods. Then I’ll cover some more aspects of joins in more depth, especially those that may be less obvious or well known. More extensive coverage of sub-queries is best saved to a later lecture.
The examples will be drawn from the class employees database.
Background
Joins
Joins are the basic operation used to pull or compare data from more than one table in a single SELECT statement. In most cases an equivalent operation can be performed using a sub-query. I’ll cover that subject later in the lecture.
There are two ways of coding a join. One uses the standard ANSI SQL syntax. The other uses an older syntax that one commonly encounters in commercial practice. Both are acceptable (although ANSI is preferable) and both will work in MS Access and SQL Server. Below are the two forms of the join.
Older form of Join
SELECT columnlist
FROM table1, table2
WHERE table1.keyfield1 = table2.keyfield2
Typically keyfield1 and keyfield2 would represent the primary and foreign keys of the two tables. Here is an example from the class employees database (a slightly older version).

In this case the job_title_code is the foreign key in the employees table that references the job_title_code primary key in the job_titles table. These two comprise what the literature often calls the join column. We have to use the table name followed by the dot operator to qualify the field name so that the SQL processor will know which field we are referring to.
ANSI Join
SELECT columnlist
FROM table1
INNER JOIN table2
ON table1.keyfield1 = table2.keyfield2
Note there are three other types of joins: LEFT OUTER, RIGHT OUTER and FULL OUTER. I’ll cover those later. Be aware some databases do not support FULL OUTER and some don’t even support RIGHT OUTER. Fortunately for our class SQL Sever is not among these. Below is an example of the ANSI style join using the class database.

This is the default syntax in most databases now, and the one you should use.
Sub-queries
A SELECT statement may contain other SELECT statements. The contained SELECT statements are called sub-queries or sometimes sub-selects (MySQL uses this term). Where and how you can use a sub-query varies from implementation to implementation. One common use is wherever you would use an expression. For example finding all employees whose salary is greater than the average. In this case you compare the salary of an employee to the results of a summary query that returns the average salary. Unfortunately, discussion of this type of sub-query will have to wait until a future workshop after we have looked at summary queries.
Another common use is performing lookups with the IN operator. Say, for example, you wanted to list all the non-exempt employees. This requires information from both the employees table and the job_titles table, which has the exempt status. Here’s the SQL:

Remember the IN operator? It tests to see whether a value is in a list. Like this:
First_name IN (:John”, “Sally”, “Bob”, “Alice”)
This saves you having to write first_name = “John” or first_name = “Sally”, or… What the sub-query above does is generate a list which can be searched via the IN operator. This is a very common query in commercial practice, especially the negative form: find all parts not in the Nogales warehouse, find employee records with no corresponding employee benefits record, etc. In fact, this query is so common there will be a bonus point example on the quiz. And, yes, it could be done with a join, but this syntax is widely used, often in a slightly different form using EXISTS which can be very fast.
FYI, this type of query can also be done with the standard MS Access Query Designer. Put the sub-query in the criteria row. See below. Presumably, this same thing can be done in other GUI query designers.

More about Joins
Now that we have the basics out of the way, let’s take a look at few more aspects of joins:
Table aliases
Equi-joins
Outer joins
Self joins
Table Aliases
Tired of writing out the names of tables both in the FROM clause and then again in the ON condition and maybe in the AND? Well, you can give a table a shorthand name called an alias. Our original syntax could have been:
SELECT columnlist
FROM table1 as alias1
INNER JOIN table2 as alias2
ON alias1.keyfield1 = alias2.keyfield2
Here is our original example redone using aliases:

I used an alias for the table names and thereafter used the aliases to qualify the field names. One more thing: I added job_title_code to the SELECT columnlist. Why did I have to use e. before job_title_code? I qualified the field name with an alias to prevent the SQL processor from responding that I had used an ambiguous field name. I had to do this since I had used the same name in both tables.
Using a table name is good practice and you should always use them. It makes your queries easier to read and manage. We will use them throughout the remainder of this lecture.
Equi-joins
An equi-join is the kind we have been using where the fields in the join columns must match. There is another type, sometimes seen, where the condition is > or <> or similar. These are called theta-joins. They are an advanced topic and will not be covered in this lecture (nor required for your homework or team project).
Outer Joins
There are three types of outer joins: left, right and full. The left join matches records from both the left and right tables like the inner join. Then it adds the unmatched records from the left table. The missing values that would have come from the right table are filled in with nulls. The books treat this as something of passing interest but in commercial practice it is extremely common, especially fro reporting. Examples: list the names of all my real estate agents along with their commissions this month, if any. An inner join will only return agents with sales that month, not all agents which is what I want. I want to see all names and a 0 or “no sales” if there were no sales.
Here is an example of an inner join that shows the job_titles and associated employee names within each job_title_code. Note there are 10 records and 5 titles.

Here is the same thing with a left join. Note there are now 11 records and the missing engineer title shows. The missing employee name for the engineer job has been filled with nulls.

This is a simple example, but it illustrates the difference. You need to be careful to use the correct join or you will not get the results you expect. This can be especially important for accurate reporting.
Here is the SQL (note the keyword outer is optional):

The right join does the same thing the other way around. It selects all matched records plus unmatched records from the right table. Again, the missing values are filled with nulls. As you might expect the full outer join selects unmatched records from both tables.
Self Joins
Can you join a table to itself? Of course. Remember the recursive relationship from DBM380? I’ve added another column to my employees table. This column contains the SSN of the employee’s manager. Since a manager is also an employee this SSN “points” back into the same table. Shown below are a few sample records.

Observe that Roy Orbison with SSN 15 reports to Gert Stangl. Gert has an SSN of 14 which is the value in Roy’s mgr_soc_sec_no field.
To list employees with their managers we would use a self-join. The trick is to use aliases to create a “shadow copy”, if you will, of the employees table with a different name. We know how to do this using aliases. Then, we can join that to the original. Shown below is the actual SQL code to do this.

The results of the query are shown below. Note we have used a left join to include the person at the top (Jean-Claude) who reports to no one.

Conclusion
This lecture reviewed the two basic approaches to retrieving information from more than one table. The two techniques are sub-queries and joins. We covered the basics of the various joins, especially the inner join and the left join. We also started an exploration of sub-queries. One type, in particular, is very common in commercial practice and we looked at an example.
Hopefully, this will provide additional background for your assignment(s).
Dan D'Urso