Get up to 80 % extra points for free! More info:

Regular expressions in C# .NET

In the last lesson, Tuple and ValueTuple in C# .NET , we introduced Tuple and ValueTuple.

Regular expressions were created due to the need to work with text strings in a certain unified way. They're an interesting tool not only for verifying whether a given text string meets specified rules (validation), but also for us to search for certain substrings in a relatively simple way. We often get rid of a few nested conditions.

A regular expression is a string composed of certain characters. I don't know anyone who would read this string and immediately understand what the expression means. The grammar of regular expressions isn't complicated, but it's quite confusing, so it's good to comment on already written expressions. I'll start with an example of a regular expression:

[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}

The purpose of this regular expression is to simply determine whether an entered string is an email. The expression is quite simplified, so some invalid addresses will pass through it.

We'll show how to work with expressions in our programs and explain the meaning of individual parts.

The Regex class

The Regex class allows us to work with regular expressions. In the constructor, we pass a regular expression to it and then use the IsMatch() method to find out whether an entered string meets the rule. It may sound confusing because we test a string with a string.

We'll show it right away. Let's create a new instance of the regex and pass a regular expression to it.

// The regular expression for email address verification
Regex r = new Regex("[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}");

Then we'll call the IsMatch() method, which returns a value of the type bool. True if the tested string met the rules, false if not. Then we can create a condition that verifies if the user has entered a valid email:

Console.Write("Enter an email: ");
if (r.IsMatch(Console.ReadLine())) {
    Console.WriteLine("You have entered a valid email.");
} else {
    Console.WriteLine("You have entered an INvalid email.");
}
Console.ReadKey();

Result:

Console application
Enter an email: Hello@World
You've entered an INvalid email.

Now we know how to work with regular expressions, now let's look at how to write them.

Writing regular expressions

Dots

A dot replaces any character, for example for the expression .... (4 dots) anything that has 4 characters will apply.

Cdf = invalid

Good = valid

A@x9 = valid

A@x9O = valid

Now notice the last example. It has 5 characters and there are only 4 dots in the regular expression, so the expression shouldn't logically pass. But it works something like this:

  1. The first character is a dot - the expression has been fulfilled so far
  2. The second character is a dot - the expression has been fulfilled so far
  3. The third character is a dot - the expression has been fulfilled so far
  4. The fourth character is a dot - the expression has been fulfilled
  5. No more characters in the expression, the expression has been fulfilled.

As you can see, the string is validated to see if it contains the regular expression. If it contains something extra, it'll work the same way. Therefore, the method is called find. However, solving this problem is easy, simply add a caret (^, Alt + 94) before the expression, which ensures that test text will be at the beginning of the string, and put a dollar ($, right alt + Ů) after the expression. The metacharacter $ provides verification from the end of a string, so we verify that the first and the second part of the string meets the rule. In parentheses, the rule is already verified in the normal way.

Brackets

Brackets indicate a group of characters that a string may (or may not) contain. If it's allowed to contain them, then we simply write them in parentheses (we don't separate them with anything). If, on the other hand, it cannot contain them, we'll add a caret before the characters (^, write it with Alt + 94). If we want to specify that the alphabet should be verified, for example, you don't need to write abcd…., but simply [a-zA-z]. This will ensure that all characters between a-z and A-Z are checked. The characters are taken from the ASCII table, so the letter č may not fulfill the given expression.

Parentheses

Parentheses associate us a certain part of an expression. Quantifiers (see below) then apply to the entire content of a parenthesis.

Escape characters

Sometimes we want to use a metacharacter in an expression, for example we want to verify if the user entered (hello|world). In order to use escape characters, we must write a backslash (\, right alt + Q) before them. The expression could then look like this:

\(hello\|world\)

Quantifiers

Quantifiers tell us how many times characters will be repeated. There are several types of quantifiers, the basic one is {N}, where N indicates the number of iterations. Then we have {N, M}, where N is the minimum number of iterations and M is the maximum. There are also predefined quantifiers. A question mark (?) is an alternative to {0, 1}, an asterisk {0-∞}, and plus {1-∞}. With a predefined asterisk and plus, it works maximally for infinity. There's no other way to write them.

So the example of four individual characters, which you saw in the introduction, could also be written as:

^.{4}$

Wildcards

In practice, wildcards are still used. These characters shorten expressions, which are then read (a little) better.

Wildcards are written either in lowercase or uppercase. Uppercase ones are the opposite of lowercase ones. \d are numbers 0-9 so the expression is identical with [0-9]. Whereas \D is identical with [^0-9]. \w is then for any letter, number or underscore. \s is for whitespace (such as a space).

Examples

The following examples show validation of several strings using regular expressions. We'll see that we can write literally anything in them.

Example 1 - a phone number

As an example, try to come up with a regular expression that verifies whether the user has entered a valid US phone number.

Solution:

^(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}$

Example 2 - Date and time validation

The user enters a date in the format mm-dd-yyyy and voluntarily with time hh:mm. The user can omit zero for both date and time. This means he doesn't have to write 07-11-2021 05:03 but can only 7-11-2021 5:3. The user doesn't have to list the whole year 1999, but can only 99. The user can enter only numbers in the range 01-31 for the date. For the month then 01-12. Any year, hours 0-23 and minutes 0-59.

Solution:

^[01]?[0-9]-[0-3]?[0-9]-[0-9]{2,4}( [0-2]?[0-9]:[0-5]?[0-9])?$

Example 3 - IP address validation

The user enters an IP address. The IP address consists of four numbers in the range 0-255, these parts are separated by a dot. An example of a valid IP address is 1.234.1.234 and an invalid 1.234.432.1 (432 is out of range). The user doesn't have to write 025, but can only write 25, nor does he have to write 005, but can only write 5.

Solution:

^((2[0-5][0-5])|(0?[0-9][0-9])|((0{2})?[0-9])|(1[0-9][0-9]))\.((2[0-5][0-5])|(0?[0-9][0-9])|((0{2})?[0-9])|(1[0-9][0-9]))\.((2[0-5][0-5])|(0?[0-9][0-9])|((0{2})?[0-9])|(1[0-9][0-9]))\.((2[0-5][0-5])|(0?[0-9][0-9])|((0{2})?[0-9])|(1[0-9][0-9]))$

In the next part, we'll show more interesting functions than just string comparisons.

In the next quiz, Online object-oriented programming in C#.NET quiz, we will test the experience gained from the course.


 

Download

By downloading the following file, you agree to the license terms

Downloaded 1x (23.87 kB)
Application includes source codes in language C#

 

Previous article
Tuple and ValueTuple in C# .NET
All articles in this section
Object-Oriented Programming in C# .NET
Skip article
(not recommended)
Online object-oriented programming in C#.NET quiz
Article has been written for you by Filip Smolík
Avatar
User rating:
No one has rated this quite yet, be the first one!
Activities