Lesson 8 - Strings in Python - Working with single characters
In the previous exercise, Solved tasks for Python lesson 7, we've practiced our knowledge from previous lessons.
In the last lesson, Solved tasks for Python lesson 7, we learned to work with lists. If you noticed some similarities between lists and strings, then you were absolutely onto something. For the rest of you, it may come as a surprise that a string is essentially a sequence of characters and we can work with it like so.
First, we'll check out how it works by simply printing the character at the given positions:
{PYTHON}
s = "Hello ICT.social"
print(s)
print(s[2])
The output:
Console application
Hello ICT.social
l
As you can see, we can access the characters of a string through the brackets like we do with lists. Keep in mind that characters at given positions are read-only in Python. For example, we can't write the following:
# This code doesn't work s = "Hello ICT.social" s[1] = "o"
However, there is a simple workaround to this by converting the string to a
list. We can act the same way with characters as we learned to do with list
items. Then, we'd get our string back using the join()
method.
{PYTHON}
s = list("Hello ICT.social")
print(s)
s[1] = "o"
s = "".join(s)
print(s)
Notice that we call the join()
method on an empty string. We can
also specify any other character as the separator (what goes between each
character). The output:
Console application
['H', 'e', 'l', 'l', 'o', ' ', 'I', 'C', 'T', '.', 's', 'o', 'c', 'i', 'a', 'l']
Hollo ICT.social
Character occurrence in a sentence analysis
Let's write a simple program that analyzes a given sentence for us. We'll
search for the number of vowels, consonants, and non-alphanumeric characters
(e.g. space or !
).
We'll hard-code the input string in our code so we won't have to write it
again every time. Once the program is complete, we'll replace the string with
input("")
. We'll iterate over the characters using a loop. By the
way, we won't focus as much on program speed here (we'll choose practical and
simple solutions).
First, let's define vowels and consonants. We don't have to count non-alphanumeric characters since they'll be the string length minus the number of vowels and consonants. Since we don't want to deal with letter cases, uppercase/lowercase, we'll convert the entire string to lowercase at the start. Let's set up variables for the individual counters, also, since the code is a bit more complex, we'll add in comments.
# the string that we want to analyze s = "A programmer gets stuck in the shower because the instructions on the shampoo were: Lather, Wash, and Repeat." print(s) s = s.lower() # Counters initialization vowels_count = 0 consonants_count = 0 # definition of character groups vowels = "aeiouy" consonants = "bcdfghjklmnpqrstvwxz" # the main loop for char in s:
First of all, we prepare the string and convert it to lowercase. Then, we
reset the counters. For the definition of characters groups, we only need
ordinary strings. The main loop iterates over each character in the string
s
, so in each iteration of the loop the variable char
will contain the current character.
Now let's increment the counters. For simplicity's sake, I'll focus on the loop instead of rewriting the code over and over again:
# the main loop for char in s: if char in vowels: vowels_count += 1 elif char in consonants: consonants_count += 1
The in
operator is already known to us. First of all, we try to
find the character char
from our sentence in the
vowels
string and possibly increase their counter. If it's not
included in the vowels, we look for it in the consonants and possibly increase
their counter.
Now, all we're missing is the printing, displaying text, part at the end:
{PYTHON}
# the string that we want to analyze
s = "A programmer gets stuck in the shower because the instructions on the shampoo were: Lather, Wash, and Repeat."
print(s)
s = s.lower()
# Counters initialization
vowels_count = 0
consonants_count = 0
# definition of character groups
vowels = "aeiouy"
consonants = "bcdfghjklmnpqrstvwxz"
# the main loop
for char in s:
if char in vowels:
vowels_count += 1
elif char in consonants:
consonants_count += 1
print("Vowels: %d" %(vowels_count))
print("Consonants: %d" % (consonants_count))
print("Non-alphanumeric characters: %d" % (len(s) - (vowels_count + consonants_count)))
Console application
A programmer gets stuck in the shower because the instructions on the shampoo were: Lather, Wash, and Repeat.
Vowels: 33
Consonants: 55
Non-alphanumeric characters: 21
That's it, we're done!
The ASCII value
Perhaps you've already heard of the ASCII table. Especially, in the MS-DOS
era when there was practically no other way to store text. Individual characters
were stored as numbers of a range from 0
to 255
. The
system provided the ASCII table which had 256 characters and each ASCII code
(numerical code) was assigned to one character.
Hopefully, you understand why this method is no longer as relevant. The table
simply could not contain all the characters of all international alphabets. Now,
we use Unicode (UTF-8) encoding where characters are represented in a different
way (this is set as default in Python 3, but not in Python 2). In Python, we
have the option to work with ASCII values for individual characters. The main
advantage to this is that the characters are stored in a table next to each
other, alphabetically. For example, at position 97
we can find
"a"
, at 98
"b"
, etc. It is the same with
numbers, but unfortunately, the accent characters are messed up.
Now, let's convert a character into its ASCII value and vice versa create the character according to its ASCII value:
{PYTHON}
# conversion from text to ASCII value
c = "a" # character
i = ord(c) # ordinal (ASCII) value of the character
print("The character %s was converted to its ASCII value of %d" %(c, i))
# conversion from an ASCII value to text
i = 98
c = chr(i)
print("The ASCII value of %s was converted to its textual value of %d" % (c, i))
We use the ord()
function to get the ordinal (ASCII) value of a
character and the chr()
function to get the character from its
ordinal value.
The Caesar cipher
Let's create a simple program to encrypt text. If you've ever heard of the
Caesar cipher, then you already know exactly what we're going to program. The
text encryption is based on shifting characters in the alphabet by a certain
fixed number of characters. For example, if we shift the word
"hello"
by 1
character forwards, we'd get
"ifmmp"
. The user will be allowed to select the number of character
shifts.
Let's get right into it! We need variables for the original text, the
encrypted message, and the shift. Then, we need a loop iterating over each
character and printing an encrypted message. We'll also have to hard-code the
message defined in the code, so we won't have to write it over and over during
the testing phase. After we finish the program, we'll replace the contents of
the variable with the input()
function. The cipher doesn't work
with accent characters, spaces and punctuation marks. We'll just assume the user
will not enter them. Ideally, we should remove accent characters before
encryption, as well as anything except letters.
# variable initialization s = "blackholesarewheregoddividedbyzero" print("Original message: %s" % (s)) message = "" shift = 1 # loop iterating over characters for char in s: # printing print("Encrypted message: %s" % (message))
Now, let's move into the loop, we'll convert the character char
to its ASCII value, its ordinal value, increase the value by however many shifts
and convert it back to the character. This character will be added to the final
message:
{PYTHON}
# variable initialization
s = "blackholesarewheregoddividedbyzero"
print("Original message: %s" % (s))
message = ""
shift = 1
# loop iterating over characters
for char in s:
i = ord(char)
i += shift
character = chr(i)
message += character
# printing
print("Encrypted message: %s" % (message))
Console application
Original message: blackholesarewheregoddividedbyzero
Encrypted message: cmbdlipmftbsfxifsfhpeejwjefecz{fsp
Let's try it out! The result looks pretty good. However, we can see that the
characters after "z"
overflow to ASCII values of other characters
("{"
in the picture). Therefore, the characters are no longer just
alphanumeric, but other nasty characters. Let's set our characters up as a
cyclical pattern, so the shifting could flow smoothly from "z"
to
"a"
and so on. We'll get by with a simple condition that decreases
the ASCII value by the length of the alphabet so we'd end back up at
"a"
.
i = ord(char) i += shift # overflow control if i > ord("z"): i -= 26 character = chr(i) message += character
If i
exceeds the ASCII value of "z"
, we reduce it
by 26
characters (the number of characters in the English
alphabet). The -=
operator does the same as we would do with
i = i - 26
. It's simple and our program is now operational. Notice
that we don't use direct character codes anywhere. There's ord("z")
in the condition even though we could write 122
there directly. I
set it up this way so that our program is fully encapsulated from explicit ASCII
values, so it'd be clearer how it works. Try to code the decryption program as
practice for yourself.
In the next lesson, Solved tasks for Python lesson 8, we'll see that there are still a couple more things we haven't touched base on that strings can do. Spoiler: We'll learn how to decode "Morse code".
In the following exercise, Solved tasks for Python lesson 8, we're gonna practice our knowledge from previous lessons.