Lesson 7 - LINQ in C# - Revolution in querying

In the previous lesson, Queue and stack in C# .NET, we talked about queues and stacks. In today's tutorial, we're going to introduce the revolutionary LINQ technology. LINQ refers to a set of tools for querying data which simplifies and generalizes working with any sort of data.

Motivation

We've all worked with different kinds of collections and used them in different ways. We look for an element in an array in a different way than when we read data from an XML file and when we search for a user in a database. Now, imagine if there was a unified way to query data. In other words, to be able to run the same query on both ordinary arrays and on an XML file or a database. As you may have guessed, LINQ provides us this functionality. This is an enormous abstraction whose only toll is an insignificant amount of performance reduction. This technology brought programming in C# to new heights.

LINQ as a language

LINQ is a rather extensive and sophisticated technology. Its name is an acronym for Language INtegrated Query. As the name suggests, it's a query language that is integrated directly into the C# language syntax. It's a part of the language since C# 3.0 and the .NET Framework since version 3.5. On newer versions, it even runs on multiple threads which increases its efficiency.

LINQ is very similar to the SQL language, so it's a declarative language. We tell the program what we're looking for and we let it work its magic. The advantage to LINQ's integration into C# is the syntax checking during compile time.

Let's make a small example before we go any further. Create a new project (a console application), and name it LINQ. Let's create a simple array of strings.

string[] names = {"David", "Martin", "Daniel", "Peter", "John", "Elisa"};

Now, using a LINQ query, we'll select the array items whose lengths are greater than 5 letters. Add the following code to the program:

var query = from n in names
            where (n.Length > 5)
            select n;

The query looks a lot like SQL, those who know it have a leg up on everyone else. I bet none of you have ever called an SQL query on an array, have you? We'll go over what the query is doing in a bit, but first, let's finish our program by having it print the query result to the console:

{CSHARP_CONSOLE}

string[] names = {"David", "Martin", "Daniel", "Peter", "John", "Elisa"};
var query = from n in names
            where (n.Length > 5)
            select n;

// printing the result
foreach (string name in query)
{
    Console.WriteLine(name);
}
Console.ReadKey();
{/CSHARP_CONSOLE}

The program's output:

Console application
Martin
Daniel

What the query looks like

Let's return to our query, which looked like this:

var query = from n in names
            where (n.Length > 5)
            select n;

SQL programmers will surely be surprised to see that the query is backwards. There is a reason behind this, which we'll get into later.

First, we specify where we want to select the data using the from keyword. From is followed by a variable that will represent an item from the collection for the rest of the query. Then, the in keyword follows along with the collection. It works sort of like the foreach loop. We write queries on multiple lines to keep things clear. This is especially important for more complex queries.

To add a condition, all you have to do is add a line with the where keyword, followed by the condition. We write conditions in the same way as we did before.

On the last line, we have a select keyword which we use to determine the values that we want to select. In the example above, we select all of the elements in the collection, i.e. n. However, this way, we're only able to select things like lengths using n. For example, n.Length.

The var keyword

We store the query into a variable of the var type, which is new to us. In fact, it's not even a data type. The var keyword allows us to leave the data type choice up to the compiler. Meaning that C# will determine what the data type should be automatically when compiling it. Theoretically, we could use var in other cases. For example:

var s = "C# will recognize that this is a string and it will assign the string type to the variable s";
var i = 10;

The above code will be translated to this:

string s = "C# will recognize that this is a string and it will assign the string type to the variable s";
int i = 10;

In other words, var allows to determine the data type at compile time and saves us from having to specify it. In ordinary programs, using var would be rather confusing, since providing data types makes sense. With that in mind, we'll set data types as we're used to. Don't substitute all data types with the var keyword as some amateurs do.

Var was introduced along with LINQ due to three reasons. First, the data types in LINQ queries are rather complex, so it'd be complicated to specify them explicitly every time. Second of all, if we change the collection type, we would also have to change the query type, which would require us to edit the code and the technology wouldn't be all that universal. Third of all, anonymous types come with LINQ, and we need var to be able to store them. We'll get to them soon. What you need to know is that var has its place in queries and shouldn't be used in regular code (even though it could be used there theoretically).

Generally, it's best to only use var if it simplifies the declaration and it's still clear of which type the variable is. Let's make four examples with and without using var:

int a = 10;
List<Dictionary<string, string>> dictionaries = new List<Dictionary<string, string>>();
IOrderedQueryable<Uzivatel> fromNY = from u in db.Users
                                     where u.City == "New York"
                                     orderby u.Name
                                     select u;
int b = a;

We can modify the code using var like this:

var a = 10;
var dictionaries = new List<Dictionary<string, string>>();
var fromNY = from u in db.Users
             where u.City == "New York"
             orderby u.Name
             select u;
int b = a;

Var doesn't help us much with the first variable. However, using it with the generic list for generic dictionaries is correct since we can still say from the right side of the assignment of which type the dictionaries variable is. Either way, it'd be much more clear to write a separate class instead. Storing collections in collections is not a good practice. As for the LINQ query, the data type is complex. If we removed the orderby clause, the type would change to IQueryable<User>. Using var, we don't have to worry about the data type nor change to fit the query. The last var is a rather deterrent example, since we have no way of telling which data type the b variable is.

Another common mistake is that people think that var declares a variable of a dynamic type, therefore, they'd be able to store whatever they want in it. This isn't true, the type is strictly assigned at compile time and cannot be changed afterward. With that in mind, you should be able to understand why the code below won't work:

// this code will not work
var variable = "It contains text now";
variable = 10; // Now it contains a number

The program above creates a variable of the string type and then shows an error because the code tried to assign an int to a string data type.

Under the hood

How does it all work? If you look at the beginning of your source code, you will see that it contains the following:

using System.Linq;

This bit of code is prepared by default for all types of projects. Let's see what happens if we comment it out. As soon as you do, Visual Studio will underline the variable names in the query from our previous example.

LINQ works using providers. There are several types of providers and you can even define your own. Next, we'll use LINQ to Objects, which is implemented in the System.Linq namespace and extends the regular collections such as List and Array with some more methods. In other words, it uses extension methods.

Now, let's (keep that last bit commented out) invoke the VS list of methods on our array. Write name. (name and a dot), so we can see them:

Regular array methods in C# .NET - Collections and LINQ in C# .NET

Now, uncomment the line and do the same thing once again:

LINQ array methods in C# .NET - Collections and LINQ in C# .NET

There are suddenly lots of new method for an ordinary array. When C# performs a LINQ query, it calls these collection methods on the background. We use lambda expressions within them, which we've already gone over in the OOP course.

Our query:

{CSHARP_CONSOLE}

string[] names = {"David", "Martin", "Daniel", "Peter", "John", "Elisa"};

var query = from n in names
            where (n.Length > 5)
            select n;

// printing the result
foreach (string name in query)
{
    Console.WriteLine(name);
}
Console.ReadKey();

{/CSHARP_CONSOLE}

This query is translated by C# to the following:

var query = names.Where(n => n.Length > 5).Select(n => n);

You can test that the query actually works the same. We can work with LINQ like this as well, but using an SQL-like notation is much nicer. Either way, we've just explained why we added a where before the select in the query. The data must be found using the where() method before we map the result as we need it using the select() method. The reason being the method order called internally from the LINQ technology.

Last of all, we'll reveal what that mysterious var type represents in our example. The final type for the query on our array is:

System.Linq.Enumerable.WhereArrayIterator<string>

Since we can't know what type a LINQ query will return (more precisely, we should be relieved from that), the var type was introduced, as mentioned above.

Next time, LINQ providers, anonymous types, grouping and sorting in C#, we'll continue with more queries.

Did you have a problem with anything? Download the sample application below and compare it with your project, you will find the error easily.

Download

By downloading the following file, you agree to the license terms

Downloaded 6x (145.45 kB)
Application includes source codes in language C#

Article has been written for you by David Capka Hartinger

User rating:

2 votes

The author is a programmer, who likes web technologies and being the lead/chief article writer at ICT.social. He shares his knowledge with the community and is always looking to improve. He believes that anyone can do what they set their mind to.

David learned IT at the Unicorn University - a prestigious college providing education on IT and economics.

Activities