Lesson 7 - MySQL Step By Step: Queries Over Multipe Tables (JOIN)
In the previous lesson, MySQL step by step: Data Types and NULL, we went through data types and explained
the NULL
value. Today, we're going to start with a simple content
management system, that will look similarly to the one here at ICT.social. We're
going to show queries through multiple tables.
Conceptual model
In the next few lessons, we're going to create kind of a simlified ICT.social database. First of all, let's talk about how it's going to look like. Today, of course, we're going to make just a small part. And because a picture can tell more than a thousand words, let's start right with it:
What you see is a conceptual model. It's created using the UML notation (a graphical language) and in practice, we create these kind of diagrams very often created before we even start writing any code. This way, we can think about what we really need to create.
As we can see, the system user can write comments and articles. The articles belong into sections. So it's a database of a very simple content management system, which you can certainly imagine thanks to ICT.social.
Creating tables and data
Today, we're going to focus on queries through multiple tables. Let's create some tables first. Tables for the users and articles will just be enough for us right now.
Users
Because the CMS user's going to have a different structure than our current
user, we're going to create the user
table again. Let's drop the
current one:
DROP TABLE `user`;
Then we'll create a new table. The user's going to have (besides its id) a nickname, email, and password:
CREATE TABLE `user` ( `user_id` int AUTO_INCREMENT, `nickname` varchar(155), `email` varchar(155), `password` varchar(255), PRIMARY KEY (`user_id`) );
Let's insert some users right away:
INSERT INTO `user` (`nickname`, `email`, `password`) VALUES ('Michael', '[email protected]', 'dGg#@$DetA53d'), ('David', '[email protected]', '$#fdfgfHBKBKS'), ('Denny', '[email protected]', 'Jmls_aSW2RFss'), ('Emma', '[email protected]', 'fw8QT32qmcsld');
Articles
Articles will be bound to the user who wrote them, that means their author.
To make the relationship between the tables, we're going to add a column with
the author id to the article
table. The id of the user who wrote
the article is going to be stored there (it's the primary key from the
user
table).
We're talking about the 1:N relationship (1 user can have N (multiple) articles and each article belongs exactly to one user). The part (here our article) always stores the id of the whole (the user in this case) where it belongs to.
The article will contain (in addition to its id again) the author id, short
description, url, title, content and the publication date. Let's create the
article
table:
CREATE TABLE `article` ( `article_id` int AUTO_INCREMENT, `author_id` int, `description` varchar(155), `url` varchar(155), `title` varchar(155), `content` text, `published` datetime, PRIMARY KEY (`article_id`) );
The only thing worth mentioning is the use of the TEXT
type for
the article content.
Next, we're going to add articles and assign users as authors to them. I took 4 articles from ICT.social, which I greatly shortened and simplified. The query looks like this:
INSERT INTO `article` (`author_id`, `description`, `url`, `title`, `content`, `published`) VALUES (1, "What's an algorithm? If you don't know, read this article.", 'what-is-an-algorithm', 'Algorithm', "When we talk about algorithms, let's agree on what an algorithm is all about. Simply put, the algorithm is a guide to solving a problem. When we look at it from a human perspective, the algorithm might be a guide to get up in the morning. Even if it sounds simple, it's quite a problem. Computers are machines and they do not think. Therefore, we must describe all the steps of the algorithm. This brings us to the first property of the algorithm - it must be elementary (consisting of a finite number of simple and easy-to-understand steps, i.e. commands). 'Get out of bed' is certainly not an algorithm. 'Open your eyes, take out the blanket, sit down, put your feet on the floor and rise' - this sounds quite detailed and would be a true algorithm. But we'll probably work in IT, so we'll solve problems of sorting elements by their value or of searching for an element based on its content. These are the two basic tasks that computers do most often and which need to be thought through and optimized to take as little time as possible. As another example of algorithm, it could be also to solve a quadratic equation or to solve a sudoku.", '2012-3-21'), (2, 'Bacteria is a cellular automaton in combination with a game.', 'bacteria-cellular-automaton', 'Bacteria', 'Bacteria is a cellular automaton developed by the British mathematician John Horton Conway in 1970. This whole game is controlled by four simple rules:/n/n 1. A living bacteria with fewer than two living neighbors die./n 2. A living bacteria with more than three living neighbors dies due to overgrowth./n 3. A living bacteria with two or three living neighbors survives without a change to the next generation./n 4. A dead bacteria with exactly three living neighbors becomes alive again./n With a right initial placement of the bacteria, these seemingly primitive rules can create walking formations, clusters "firing" walking fives, surprisingly complex symmetrical explosions, oscillators (periodically oscillating groups), or the infinite spectacle of how complex and perfect shapes can be created by these four conditions. The entire program is conceived as a game, where your goal is to create a colony living as long as possible.', '2012-2-14'), (3, 'Cheese Mouse is a relaxing 2D game.', 'cheese-mouse-relaxing-platformer', 'Cheese Mouse', 'Cheese mouse is a 2D game with a "hot island atmosphere" where you control a mouse and you have to get to the cheese. But there are lots of pitfalls and enemies like snakes, rats, piranhas, robots, mummies, and all sorts of creatures. I created the game with several colorful worlds at the elementary school with Veisen and it ended up at the 2nd place in the Bonusweb game competition, where it won $200. It was created in Game Maker through summer holidays, in my childhood, which greatly influenced its graphic design. I like to play it sometimes to relax and for better mood.', '2004-6-22'), (2, 'Pacman is a remake of the iconic game.', 'pacman-remake', 'Pacman', "This is an absolutely basic version of this game including a level editor, so you can create your own levels. Over time, I'm going to modify it a bit and add some new features, fullscreen and better graphics. The engine of the game will also be the core of my new Geckon man project, which is still in the writing stage.", '2011-6-3');
Queries through multiple tables
Now, we have the articles in the database and the users assigned to them.
Let's make a query over these 2 tables, get the articles and join the nicknames
of their authors to them. I didn't use the word "join" accidentally, the command
to access another table in a query is called JOIN
.
Let's write the query and then explain it. We're going to write queries on
multiple lines to make them more readable:
SELECT `title`, `nickname` FROM `article` JOIN `user` ON `author_id` = `user_id` ORDER BY `nickname`;
The results:
Pacman David Bacteria David Cheese Mouse Denny Algorithm Michael
In the first line of the SELECT
statement, we select the columns
as they were all from a single table, simply listing what we're interested in.
Since we're selecting articles and attaching the users to them, we select from
the article
table. To attach data of another table, we use the
JOIN
clause specifying the table we want to attach, and then the
ON
clause. ON
is similar to WHERE
, but it
only applies to the attached table and not to the one which we primarily select
the data from. In the condition, we define that to every article the user whose
user_id
is stored in the author_id
column should be
attached. We ordere the results by the users' nickname. In case we needed just
some of the articles, we'd place a WHERE
clause before the
ORDER BY
, as we're used to.
INNER JOIN and OUTER JOIN
The INNER
and OUTER
JOIN
s are two
types of the JOIN
statements. They work the same way and the only
difference is what happens when the item, to which the relationship refers to,
doesn't exist.
INNER JOIN
If we use just the JOIN
keyword in an SQL query, it's considered
an INNER JOIN
by MySQL. If there's no user with the id that is
stored in the article, such an article without a user wouldn't be included in
the results. The relationship is unbreakable.
Let's try it. We'll add an article that'll refer to the id of a non-existing user:
INSERT INTO `article` (`author_id`, `description`, `url`, `title`, `content`, `published`) VALUES (99, 'An article with a non-existent user to test different JOINS.', 'article-without-user', 'Article without an author', 'This article is assigned to a nonexistent user with the ID of 99 and is used to test different types of JOINS in the MySQL database.', '2012-10-21');
The inserted article refers to a user with the user_id
equal to
99
, who is not in the database. Let's run our SQL query with the
JOIN again. Just to be clear, it's better to define that we want the
INNER JOIN
explicitly:
SELECT `title`, `nickname` FROM `article` INNER JOIN `user` ON `author_id` = `user_id` ORDER BY `nickname`;
The results:
Pacman David Bacteria David Cheese Mouse Denny Algorithm Michael
The results are still the same, and the article without an author isn't in the results.
LEFT OUTER JOIN
OUTER JOIN
s allow you to select even those results, that have
failed to join due to missing entries. Let's try the LEFT JOIN
,
which considers the result valid even if the left part of the join condition
(article) exists and the right one (the one that's being joined, the user)
doesn't. The columns of the part being joined will contain the NULL
value:
SELECT `title`, `nickname` FROM `article` LEFT JOIN `user` ON `author_id` = `user_id` ORDER BY `nickname`;
The results:
Article without an author NULL Pacman David Bacteria David Cheese Mouse Denny Algorithm Michael
We can see that the article's been selected anyway, even the right part (the one being joined, the user) hasn't been selected. Before we join tables, it's a good idea to think about whether the join can fail and what we want to happen in that case. In a real application, this probably shouldn't happen.
RIGHT OUTER JOIN
Similarly, as the LEFT
outer JOIN
considered the
result valid when at least the left part existed, the RIGHT JOIN
does the opposite. If there's the user (the right part being joined) and there's
no article (the left part), it's still going to be included in the results.
Personally, I haven't used this JOIN
yet. We already have one such
user in the table, it's Emma. Let's try the RIGHT JOIN
:
SELECT `title`, `nickname` FROM `article` RIGHT JOIN `user` ON `author_id` = `user_id` ORDER BY `nickname`;
The results:
Pacman David Bacteria David Cheese Mouse Denny NULL Emma Algorithm Michael
As expected, the article without an author disappeared, and the Emma showed up.
We'd certainly find some other JOIN
statements in MySQL, but
this is enough for our purposes.
Using the WHERE clause
Theoretically, we can avoid using JOINS
and use the
FROM
and WHERE
clauses instead. In FROM
,
we can list multiple tables separated by commas. Then, in WHERE
, we
specify the condition to join the table. Ideally, the database converts this
query to the INNER JOIN
first and then processes it.
SELECT `title`, `nickname` FROM `article`, `user` WHERE `author_id` = `user_id` ORDER BY `nickname`;
The results are the same as with INNER JOIN
:
Pacman David Bacteria David Cheese Mouse Denny Algorithm Michael
The disadvantage of joining with the WHERE
clause is that we
wouldn't be able to perform all kinds of JOIN
s and in some cases,
the queries can be less optimized. We never know how the database will optimize
such a query and it will vary according to the used database. Let's consider
this approach rather a curiosity than for a real use.
In the next lesson, MySQL Step By Step: More Queries And the M:N Relationship, we'll continue with queries through multiple tables and add another database part of our content management system.