Working with Binary Data
Introduction to Python
The course is part of this learning path
In this first course, we introduce the Python Language, the declaration model, and how variables and functions are used in python.
Our learning objectives for this course are to introduce the python language and to be able to recognize and explain the core concepts of the Python language.
- [Instructor] Hi and welcome back. In this lecture we'll look at binary data. Our learning objectives are to know the difference between text and binary data, to be able to open files in text or binary mode, and to use Struct to process binary data streams. Now a file can be opened in binary mode. This allows for raw or non-deliminated reads, which do not treat new lines and carriage returns as anything special. So in binary mode, read will return a bytes object, and an array of eight-bit integers, not a Python string, which is an array of Unicode captures. So we use the .decode function to convert the bytes object to a string. We use the write up function to write raw data to a file. And we use the seek function to position the next read, and tell to determine the current location within the file. When you read data from a network application such as getting the HTML source of a webpage, it is retrieved as binary data. Even though it's text theoretically, it is typically encoded as ASCII or UTF-8. So this is represented by a bytes object, which is an array of bytes. To convert a bytes object to a string, call a decode method. When going the other direction, as in writing some text out to a network application, you will need to convert from a Python string, which is an in-memory representation, to a string of bytes. To do this, call the strings in code method. If you need to process a raw binary file, the Struct module provides the Struct class. You can instantiate a Struct with a format string representing the binary data layout. From the instance, you can call unpack to decode a binary stream, or pack to encode it. The size property is the number of bytes needed for the data. The format string describes the data layout using format codes. Each code is a letter representing a data type, and can be preceded with a size or repeat count, depending on the data type, and a prefix, which specifies the byte order and alignment. Native byte order or alignment refers to the same byte order or alignment used by the C compiler on the current platform.
Standard refers to a standard set of sizes for typical numerical objects, such as shorts, ints, longs, floats, and doubles. The default is native. The question mark conversion code corresponds to the _bool type defined by C99. If this type is not available, it is simulated using a char. In standard mode, it is always represented by one byte. The q and Q conversion codes are available in native mode only, if the platform C compiler supports C long long, or on Windows, int64. They're always available in standard modes. When attempting to pack a non-integer using any of the integer conversion codes, if the non-integer has an index method, then that method is called to convert the argument to an integer before packing. The n and N conversion codes are only available for the native size, selected as the default or with the @ byte order character. For the standard size, you can use whichever of the other integer formats fits your application. For the f and d conversion codes, the packed representation uses the IEEE 754 binary32, for f, or binary64, or d format, regardless of the floating-point format used by the platform. The P format character is only available for the native byte ordering, selected as a default or with the @ byte order character. The byte order character, equals, chooses to use little or big-endian ordering based on the host system. The Struct mode does not interpret this as native ordering, so the P format is not available. So in our example here, first we create some assorted values, and then second we create a Struct object, with desired data layout. Third, the size property gives size of data in bytes. Four, unpack converts values into binary stream using format. Five, unpack converts binary stream into list of values, and six, decode the raw bytes into a string, and strip off trailing null bytes that were added by pack. And here's the output of our script. In parse, if we define layout of bitmap header first, then we read the first 14 bytes of the bitmap file in binary mode. We unpack the binary header into individual values, and then we output the individual values. If we're reading binary data, first we use the read method to read a specified number of bytes, second read the return bytes, not the string, third, use decode method to convert bytes to string. Bitwise operations. Python has bitwise operations to compare individual bits in an integer. This is sometimes used for flags or for packing more information into a byte. Instead of using a 32-bit integer to store that something is true, you can just use one bit. The and and or operators work on two integers of the same size, and return a new integer with the bits modified according to the operator. The and operator can be used to clear bits. It is typically used with a mask composed of ones for all the bits you don't want to clear. When comparing two numbers, the and operator and the ampersand sets a result bit to one, if the corresponding bits in both numbers are both set to one, otherwise it sets the bits results to zero. The or operator sets a result bit to one, either of the corresponding bits in both op numbers are set to one, otherwise it sets the result bit to zero. The complement operator reverses the value of bits, i.e. it changes one to zeroes, and zeroes to ones. The xor operator sets a result bit to one, if the corresponding bits have different values. Otherwise it sets the result bit to zero. The left shift operator moves bits to the left a specified number of places. The shift right operator works like shift left, but moves bits to the right. In our bitwise example, py, a and b are integers on our second line, bitwise, and, three bitwise or, line four, bitwise xor, and five, complement, flip bit values, line six, shift right one bit. Line seven, shift right three bits. Line eight, shift left one bit, and line nine, shift left three bits. Okay, that concludes our binary data lecture. See you in the next one.
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.