Course Overview: We'll learn how Strings are stored and managed in Python.
Hello, and welcome back. In this lecture, I'm going to introduce you to the concept of strings in Python. Now strings are an essential data type that you will almost always end up working with when coding, and specifically when coding in Python. So understanding how to declare, initialize, and manipulate them, is an essential and fundamental skill that you need to master. The objectives of this lecture are to introduce you to strings, and demonstrate many of the common operations that can be performed on strings, such as indexing and conversions. For this lecture, you need to know what primitive Python data types are. You should be familiar with what an integer is, what a float is, what a Boolean data type is. You should know how to troubleshoot and debug your own Python scripts, and if you attend, to follow along, and reuse the code presented within these demonstrations.
Now in the previous lecture on data types, we talked about switches. And how they can be thought of as being used to generate a stream of zeroes and ones. In turn, these zeroes and ones are then processed by the computer, and can be interpreted either as being a number, such as the number 65. Or as an alphanumeric, such as the letter A. Or even as a straight out hardware instruction. So let's now take a closer look at the process a computer undertakes, converting a stream of binary digits into an alphanumeric character. Under the hood, the process is fairly simple. The computer takes the stream of binary digits, ones and zeroes, and converts them into an equivalent hexadecimal value. That is then used against a lookup table, to convert it finally into a particular alphanumeric character. The lookup table in this case is the standard ASCII table. Now the ASCII table provides a mapping from numbers, or lookup codes, to individual alphanumeric characters, and vice versa. So let's step through an example. We start off with the binary value 01000001. Consisting of eight bits, or one byte. Which then converts to the hexidecimal value, 41.
This in turn is then used as a lookup code, for or on the ASCII table. If we select the row four down the left-hand side, vertical, and intersect it with column one, alongside the top horizontal, finally, we arrive at the end result, which is the alphanumeric character A. Now, if we wanted to use the letter B, then we would need to use the hexadecimal value 42. And likewise, 43 for the letter C, and so on. So what you can start to see now, is that a string such as the string ABC, is really just a sequence of individual alphanumeric characters. Which have themselves been mapped from hexadecimal values, which were used as lookup codes against the ASCII table. The ASCII table allows us to represent a maximum of 256 characters, since we're starting out with eight bits, or one byte, as a binary representation. Now, in the early days, this was sufficient, but over time, requirements have evolved, and now we have to represent alphabets for multiple international languages. Mathematical symbols, emojis, icons, et cetera. So to accommodate this ever-increasing expansion, Unicode was introduced. Now Python Three fully supports Unicode.
Now Unicode is a superset of ASCII, and can use up to four bytes, or 32 bits, to represent a character. So, to quickly summarize, the original ASCII character and coding standard, has 128 valid code points. The extended ASCII character and coding standard has 256 valid code points. And the Unicode character encoding, or UTF-8, has 1,102,064 valid code points. Okay. Now that we have covered the fundamentals about how strings are represented, let's break out into our PyCharm integrated Python terminal, and perform some basic operations on strings. Now remember that when we created the Hello World script, we typed Hello World! With an exclamation mark, within enclosing double quotations. The enclosing double quotations are important, because it tells the Python interpreter where the string starts and finishes. Creating a simple and small phrase, such as Hello World, in Python, is very easy, as we just witnessed. However, if you wanted to put a quotation inside the string itself, remembering that the string is already using quotations to indicate where it starts and finishes, how would you do that and what would happen to the string? Well, if you left the quotation untouched, we would end up with a syntax problem. So let's take a look at this now. If I were to start with the following string, Quote. Within enclosing double quotes. And then expand it to Quote Me, and this time, also adding in a single double quote, to be part of the defined string like so. Then you will see that Python is not able to understand that syntax. Therefore, what we want to do in such cases, is that you have to escape the problematic quotation. The escaping character, in Python, is a single backslash character. Let's now use this and patch our Quote Me string like so.
Now, when it comes to defining strings within Python, there are multiple methods. For starters, and for most common requirements, we can use either double quotations, or the single quotations to enclose and define strings. When you use single quotations, you can then actually use a double quotation inside, without having to use the escaping characters. So that is really convenient. But there'll still be instances where, for example, you might need to use a single quotation as well as double quotations within a single string, and therefore, you will need to use the escaping backslash character. Alright, another thing that you might encounter when working with strings, is the requirement to define a multi-line string. One that has line breaks, and is made up of multiple lines, again, Python provides a nice language feature, that allows you to create such strings. For this requirement, we use triple quotation. Triple quotation allows you to define strings that are multi-lined, like so. Here we are splitting the string Hello World over two lines, using enclosing triple quotations. You can see here that the Python interpreter has processed the string, and added a line break character between the words Hello and World. And line breaks are encoded using the character sequence \n. When coding with Python, you are going to want to assign string values to variables. We can accomplish this like so. Here we define the variable Test, and initialize it with the string Hello. We can then, at any later stage, after that initialization, recall the variable again like this. Here the value assigned to the variable test, is printed back to the screen. A common requirement when working with strings, is a need to join two or more individual strings together. This is more commonly known as string concatenation. For example, we could take our existing test variable. And add the string World, to it, by using the plus operator like so. Noting here, that we have not used any blank spaces. When this is evaluated, the outcome will be Hello World, as we see here. This demonstrates the concept of string concatenation. Okay, returning now to the slides, where we will review the importance of string immutability. Immutability is important because it tells us about the current state of a string object, as represented within the system. So let's consider the Python example as presented here on the slide. Before we walk through the explanation, take some time to review the code and consider what values are printed out for the variables A and B. Let's now walk through this line by line, from the top down. For starters, we define a variable A, to be initialized with the string Hello. What this actually means is that the variable A is pointing to an immutable string object, with the contents Hello. Next, we define the variable B, which is initialized with the variable A. Again, more specifically, this is saying variable B points to the same immutable string object that variable A is pointing at. In this case, it is the same string, Hello. Next, we update variable A, to this time be the result of concatenating itself with the string World. Now, an important point here, is that this results in a completely new immutable string object, Hello World. This results in variable A pointing at this new immutable string object. So variable B remains pointing at the same immutable string object, that it was initialized with. Which still contains the string Hello. Therefore, the Print A statement, results in printing Hello World, and the Print B statement, results in printing Hello. This example presented is intended to drive home the point, of string immutability. Okay, if you're ready, let's move on, and now review the concept referred to as slicing. Now string slicing allows us to extract substrings, or sequences of characters, from a string, by specifying a starting and ending position. For example, if we were to start off with the string Hello World. Then we can use string slicing to pull out substrings. Sequences of characters from within the string. Such as the character sequences that make up the word Hello, and/or World. As well as many other examples. The following table presented here, will help you understand how string slicing works, in terms of configuring the starting and ending positions for slicing. We'll continue our examples with the string Hello World. Notice that as we did with string indexing, string slicing can be configured by starting from either the front or the back of the string. Okay, let's jump back into our PyCharm-embedded Python terminal. We'll reuse variable A, which still exists within our current session. And still holds the string value Hello World. Which we established in our previous demonstration. Now this can be clearly seen in the right hand side variables page. So for our first slice attempt, let's define a slice on A, like this. Here we are saying that we want to slice and extract characters starting from the front of the string, from positions one through 'til five. The resulting slice, as printed in the terminal here, is the four-letter string Ello. E-L-L-O. Next, let's demonstrate slicing from the front again, starting at position six. And go all the way through to the end of the string, which is denoted by not setting the end position. This time it results in the five-letter string, O-R-L-D, as can be seen here. Let's now repeat the slice, but this time, and we'll set the ending position to be minus one. So before we execute this, review the table as presented here, to determine what will be extracted as a result. Now as expected, we get the three-letter string ORL. We can also create a slice that starts from the beginning of the string, by not setting the starting position, like this. Here we set the finishing position to be five. And this time, when we run the string, we get the five-letter string H-E-L-L-O. Now we can also extract individual characters like this. A5 returns W. Both A4 and A6 return O.
We can also use negative indexes. Here the index -4 returns O. And the index -5 returns W. So as you can see, Python provides a lot of flexibility when working with strings, in terms of indexing and slicing. Alright, that concludes this lecture on strings. We now have an understanding of strings, and how they're represented and stored within the Python program. You now understand the concepts of what a string is and how to declare, initialize and manipulate them. And you also, importantly, understand that strings are immutable objects. Please go ahead and close this lecture, and we'll see you in the next one.
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.