Lesson 5 - MySQL step by step: Ordering, Limit, and Aggregation
In the previous lesson, MySQL step by step: Selecting Data (Searching), we prepared test data and learned basic
queries. It was the SELECT
statement and several operators. Today,
we're going to look at ordering and aggregation.
Ordering
We haven't yet cared about the order of the results that the
SELECT
query returned. In fact, there even wasn't any order.
Internally, the database keeps certain sophisticated rules (which are above the
level of this course) and it returned the items as they went. If we made a
change in the database and called the same query again, the order would be
probably completely different. Of course, the database is able to sort the
returned results.
We can sort by any column. When sorting by the id, we get the items in the
order they were inserted into the database. We can sort by numeric columns, but
also by the text ones (which sorts by alphabet). We can also sort by date and
all other data types, the database will always handle it. Let's select all the
users and sort them by their last name. We'll use the ORDER BY
clause to do so, which comes at the end of the query:
SELECT `first_name`, `last_name` FROM `user` ORDER BY `last_name`
The result:
Patricia Adams James Baker Donald Brown David Clark Michelle Davis Lisa Green ...
Of course, there could also be a WHERE
clause in the query, but
to keep things simple, we've just selected all the users.
We can sort by several criteria (columns). Let's sort the users by the number articles they wrote and those with the same number also in alphabetical order:
SELECT `first_name` ,`last_name`, `articles_count` FROM `user` ORDER BY `articles_count`, `last_name`
The result:
Michelle Davis 0 Charles Lopez 0 Aron Heaven 1 Daniel Jones 1 Paul Lee 1 Betty Miller 1 Jenifer Williams 1 Donald Brown 2 Lucy Harris 2 Anthony Jackson 2 ...
Sort direction
Of course, we can also sort in the other direction. We can sort in ascending
order (the default direction) with the ASC
keyword and in
descending with the DESC
keyword. Let's try to create a list of the
users by the number of their articles. First will be those with the most
articles, so we'll sort in descending order. Those with the same number of
articles will be sorted in alphabetical order:
SELECT `first_name`, `last_name`, `articles_count` FROM `user` ORDER BY `articles_count` DESC, `last_name`;
The result:
Jeff Moore 18 John Smith 17 Donna Johnson 12 Stephen Murphy 12 Paul Walker 9 Carol Young 9 ...
We always have to specify the DESC
keyword again. You can see
that the ordering by last_name
is ascending because we've put the
DESC
keyword only after articles_count
.
Limit
Let's stay with our user rating by their number of articles for a while. Now, we want to list just 10 best users. If you had one million users, it probably wouldn't be a good idea to select all of them and then use just 10 in your application and throw the 999 990 away. We'll define a limit, the maximum number of the entries we want to select. At the same time, we'll define the sorting order. The limit is always at the end of a query:
SELECT `first_name`, `last_name`, `articles_count` FROM `user` ORDER BY `articles_count` DESC, `last_name` LIMIT 10;
Try it.
We can use LIMIT
and ORDER BY
for other commands as
well, such as DELETE
or UPDATE
. This way, we can
ensure that only one entry is deleted or updated.
Aggregation functions
The database offers many of so-called aggregation functions. These are functions that manipulate with multiple values in some way and return just a single value as the result.
COUNT()
An example of such a function is the COUNT()
function, that
returns the number of table rows that meet certain criteria. Let's count how
many users wrote at least one article:
SELECT COUNT(*) FROM `user` WHERE `articles_count` > 0;
The result:
29
We use SELECT
to get the value of COUNT()
.
COUNT()
is not a command, it's a function which gets the table rows
as inputs, and the results are returned by the SELECT
. We use
parentheses to call functions as in other programming languages (at least
in most of them). The asterisk means that we're interested in all the columns.
For example, we can count only the users who have their name filled in (more
precisely, who don't have NULL
in the name column but let's keep
this for further lessons).
You'd certainly come up with another way to achieve this
result. You'd simply select some value as before (for example the name), then
send these rows to your application and count how many names there are. The data
could be removed then. However, such a transfer would overload the database
unnecessarily and slow down the application. COUNT()
transfers only
a single number. Never count by selecting values, just use the
COUNT()
function!
AVG()
AVG()
returns the average of given values. Let's look at the
average number of articles per user:
SELECT AVG (`articles_count`) FROM `user`;
The result:
5.3226
SUM()
SUM()
returns the sum of the given values. Let's see how many
articles were written by people born since 1980:
SELECT SUM(`articles_count`) FROM `user` WHERE `birth_date` > '1980-1-1';
The result:
65
MIN()
The MIN()
function returns the minimum (the lowest value). Let's
find the lowest date of birth:
SELECT MIN(`birth_date`) FROM `user`;
The result:
1935-05-15
Beware, if we'd like to select the first and last names too, this code won't work:
-- This code won't work SELECT `first_name`, `last_name`, MIN(`birth_date`) FROM `user`;
The aggregation function works with multiple column values and the selected
columns (first_name
and last_name
) won't be related
with the value returned by MIN()
anyhow. We could solve this
problem with a sub-query or even more easily, avoid the MIN()
and
MAX()
functions and use ORDER BY
and
LIMIT
instead:
SELECT `first_name`, `last_name`, `birth_date` FROM `user` ORDER BY `birth_date` LIMIT 1;
The result:
Linda Martin 1935-05-15
MAX()
Similar to MIN()
, there's also the MAX()
function.
Let's find the maximum number of articles a user wrote:
SELECT MAX(`articles_count`) FROM `user`;
The result:
18
MySQL has some more aggregation functions, but these are no longer interesting to us.
Grouping
We can group database entries by certain criteria. We use grouping almost
always together with aggregation functions. Let's group the users by
first_name
:
SELECT `first_name` FROM `user` GROUP BY `first_name`;
The result:
Anthony Aron Barbara Betty Carol ...
We can see that each name is included only once, even if there are there
multiple times in the database. We'll now add, besides the name, also the number
of its occurrences in the table, using the COUNT(*)
aggregate
function:
SELECT `first_name`, COUNT(*) FROM `user` GROUP BY `first_name`;
The result:
... Jeff 1 Jenifer 1 John 2 Joseph 1 Linda 1 ...
We can see that there are, for example, two Johns.
AS
To simplify a query, we can define aliases in it. For example, we can rename
a column with a long name to make the query more readable. We'll find this often
in queries through multiple tables where it's very useful. For tables, we use
AS
to simplify the operations within the query. For columns, we use
AS
so our application can see the data under a different name than
it's really in the database. This can be especially useful for aggregation
functions because there's no column for their result in the database, so it
could be difficult to work with the result instead. Let's modify our last
query:
SELECT `first_name`, COUNT(*) AS `cnt` FROM `user` GROUP BY `first_name`;
The result:
... Jeff 1 Jennifer 1 John 2 Joseph 1 Linda 1 ...
In the next lesson MySQL step by step: Data Types and NULL, we'll say something about NULL
and
introduce charts of data types in MySQL. I also have some additional test data
for you, for a content management system, so we can do more advanced queries