Lesson 7 - LINQ in C# - Revolution in querying
In the previous lesson, Queue and stack in C# .NET, we talked about queues and stacks. In today's tutorial, we're going to introduce the revolutionary LINQ technology. LINQ refers to a set of tools for querying data which simplifies and generalizes working with any sort of data.
Motivation
We've all worked with different kinds of collections and used them in different ways. We look for an element in an array in a different way than when we read data from an XML file and when we search for a user in a database. Now, imagine if there was a unified way to query data. In other words, to be able to run the same query on both ordinary arrays and on an XML file or a database. As you may have guessed, LINQ provides us this functionality. This is an enormous abstraction whose only toll is an insignificant amount of performance reduction. This technology brought programming in C# to new heights.
LINQ as a language
LINQ is a rather extensive and sophisticated technology. Its name is an acronym for Language INtegrated Query. As the name suggests, it's a query language that is integrated directly into the C# language syntax. It's a part of the language since C# 3.0 and the .NET Framework since version 3.5. On newer versions, it even runs on multiple threads which increases its efficiency.
LINQ is very similar to the SQL language, so it's a declarative language. We tell the program what we're looking for and we let it work its magic. The advantage to LINQ's integration into C# is the syntax checking during compile time.
Let's make a small example before we go any further. Create a new project (a console application), and name it LINQ. Let's create a simple array of strings.
string[] names = {"David", "Martin", "Daniel", "Peter", "John", "Elisa"};
Now, using a LINQ query, we'll select the array items whose lengths are greater than 5 letters. Add the following code to the program:
var query = from n in names where (n.Length > 5) select n;
The query looks a lot like SQL, those who know it have a leg up on everyone
else. I bet none of you have ever called an SQL query on an array, have you?
We'll go over what the query
is doing in a bit, but first, let's finish our program by having it print the
query result to the console:
{CSHARP_CONSOLE}
string[] names = {"David", "Martin", "Daniel", "Peter", "John", "Elisa"};
var query = from n in names
where (n.Length > 5)
select n;
// printing the result
foreach (string name in query)
{
Console.WriteLine(name);
}
Console.ReadKey();
{/CSHARP_CONSOLE}
The program's output:
Console application
Martin
Daniel
What the query looks like
Let's return to our query, which looked like this:
var query = from n in names where (n.Length > 5) select n;
SQL programmers will surely be surprised to see that the query is backwards. There is a reason behind this, which we'll get into later.
First, we specify where we want to select the data using the
from keyword. From
is followed by a variable that
will represent an item from the collection for the rest of the query. Then, the
in keyword follows along with the collection. It works sort of
like the foreach loop. We write queries on multiple lines to keep things clear.
This is especially important for more complex queries.
To add a condition, all you have to do is add a line with the where keyword, followed by the condition. We write conditions in the same way as we did before.
On the last line, we have a select keyword which we use to
determine the values that we want to select. In the example above, we select all
of the elements in the collection, i.e. n
. However, this way, we're
only able to select things like lengths using n
. For example,
n.Length.
The var keyword
We store the query into a variable of the var
type, which is new
to us. In fact, it's not even a data type. The var keyword
allows us to leave the data type choice up to the compiler. Meaning that C# will
determine what the data type should be automatically when compiling it.
Theoretically, we could use var
in other cases. For example:
var s = "C# will recognize that this is a string and it will assign the string type to the variable s"; var i = 10;
The above code will be translated to this:
string s = "C# will recognize that this is a string and it will assign the string type to the variable s"; int i = 10;
In other words, var
allows to determine the data type at compile
time and saves us from having to specify it. In ordinary programs, using
var
would be rather confusing, since providing data types makes
sense. With that in mind, we'll set data types as we're used to. Don't
substitute all data types with the var
keyword as some amateurs
do.
Var
was introduced along with LINQ due to three reasons. First,
the data types in LINQ queries are rather complex, so it'd be complicated to
specify them explicitly every time. Second of all, if we change the collection
type, we would also have to change the query type, which would require us to
edit the code and the technology wouldn't be all that universal. Third of all,
anonymous types come with LINQ, and we need var
to be able to store
them. We'll get to them soon. What you need to know is that var has its
place in queries and shouldn't be used in regular code (even though it
could be used there theoretically).
Generally, it's best to only use var
if it simplifies the
declaration and it's still clear of which type the variable is. Let's make four
examples with and without using var
:
int a = 10; List<Dictionary<string, string>> dictionaries = new List<Dictionary<string, string>>(); IOrderedQueryable<Uzivatel> fromNY = from u in db.Users where u.City == "New York" orderby u.Name select u; int b = a;
We can modify the code using var
like this:
var a = 10; var dictionaries = new List<Dictionary<string, string>>(); var fromNY = from u in db.Users where u.City == "New York" orderby u.Name select u; int b = a;
Var
doesn't help us much with the first variable. However, using
it with the generic list for generic dictionaries is correct since we can still
say from the right side of the assignment of which type the
dictionaries
variable is. Either way, it'd be much more clear to
write a separate class instead. Storing collections in collections is not a good
practice. As for the LINQ query, the data type is complex. If we removed the
orderby
clause, the type would change to IQueryable<User>.
Using var
, we don't have to worry about the data type nor change to
fit the query. The last var
is a rather deterrent example, since we
have no way of telling which data type the b variable is.
Another common mistake is that people think that var
declares a
variable of a dynamic type, therefore, they'd be able to store whatever they
want in it. This isn't true, the type is strictly assigned at compile time and
cannot be changed afterward. With that in mind, you should be able to understand
why the code below won't work:
// this code will not work var variable = "It contains text now"; variable = 10; // Now it contains a number
The program above creates a variable of the string type and then shows an error because the code tried to assign an int to a string data type.
Under the hood
How does it all work? If you look at the beginning of your source code, you will see that it contains the following:
using System.Linq;
This bit of code is prepared by default for all types of projects. Let's see what happens if we comment it out. As soon as you do, Visual Studio will underline the variable names in the query from our previous example.
LINQ works using providers. There are several types of providers and you can even define your own. Next, we'll use LINQ to Objects, which is implemented in the System.Linq namespace and extends the regular collections such as List and Array with some more methods. In other words, it uses extension methods.
Now, let's (keep that last bit commented out) invoke the VS list of methods
on our array. Write name.
(name and a dot), so we can see them:

Now, uncomment the line and do the same thing once again:

There are suddenly lots of new method for an ordinary array. When C# performs a LINQ query, it calls these collection methods on the background. We use lambda expressions within them, which we've already gone over in the OOP course.
Our query:
{CSHARP_CONSOLE}
string[] names = {"David", "Martin", "Daniel", "Peter", "John", "Elisa"};
var query = from n in names
where (n.Length > 5)
select n;
// printing the result
foreach (string name in query)
{
Console.WriteLine(name);
}
Console.ReadKey();
{/CSHARP_CONSOLE}
This query is translated by C# to the following:
var query = names.Where(n => n.Length > 5).Select(n => n);
You can test that the query actually works the same. We can work with LINQ
like this as well, but using an SQL-like notation is much nicer. Either way,
we've just explained why we added a where
before the
select
in the query. The data must be found using the where()
method before we map the result as we need it using the select() method. The
reason being the method order called internally from the LINQ technology.
Last of all, we'll reveal what that mysterious var
type
represents in our example. The final type for the query on our array is:
System.Linq.Enumerable.WhereArrayIterator<string>
Since we can't know what type a LINQ query will return (more precisely, we
should be relieved from that), the var
type was introduced, as
mentioned above.
Next time, LINQ providers, anonymous types, grouping and sorting in C#, we'll continue with more queries.
Did you have a problem with anything? Download the sample application below and compare it with your project, you will find the error easily.
Download
By downloading the following file, you agree to the license terms
Downloaded 7x (145.45 kB)
Application includes source codes in language C#