Working with Python

The course is part of this learning path

More on Functions

Continuing on from Practical Data Science with Python, this Course explores a variety of Python features in a practical, hands-on way. It starts by looking at Python functions and you are given a guided walkthrough of a range of functions relating to data science. It then moves on to dictionaries in Python and how to create them. After that, you'll be guided through flow control in Python, loops, and finally, you'll be shown a technical demonstration in Python looking at classes, variables, and stringification.

Learning Objectives

The main objective of this Course is to enhance your knowledge of Python, and learn about Python functions, loops, dictionaries, and flow control.

Intended audience

This Course is intended for IT professionals who already have a good knowledge of Python and who want to enhance that knowledge from a data science perspective.



Hello and welcome back. We'll have a look at defining functions.

So I've mentioned how we can create functions using Python. What is a function? A function performs a task. This is a function which literally does nothing. What pass is telling us is yes, I know there's nothing going on here, but don't get angry at me. If I left this blank, then I would get an error. However if I have a pass, this simply means that this is empty space, but it's fine. There's nothing happening there. So we can use this to refute the idea that a function performs a task, right? They don't actually have to do anything.

They should perform tasks, but they don't have to do anything. The way I would like us to see functions are simply as repeatable pieces of code. They are a chunk of code to which you have assigned a name, which you can give arguments to, you can get things out of. Again, both of these are optional. At the most fundamental, it is a block of code that you can point to and say do this, or do that. You may be capturing some piece of analysis. You may be capturing some script or setting up another Python script. You may be capturing anything, but they are just chunks of code.

Okay, I'm going to create a function which takes in a predicted value and an observed value. And all it's going to do is it's going to compute the square distance between the two. So I'm going to hit return, predicted minus observed. And I'm going to run predicted minus observed to the power of two, squared distance between two points. Now if I call error on two variables, on two numbers for example, I could do it on nine and 10, then we should see that they are one apart.

If I do it between eight and 10, I should expect to see four. Syntactically, what are the elements of this? If I defined predicted and observed, how have I formatted this thing, this function? Colon is one of the most important aspects. So what the colon says to Python is I've finished defining the name and arguments and all of that stuff that's going into my function. And I'm now ready to jump to the function body. And because Jupiter has a very nice friendly IDE, when you have a code on there and you hit enter, it will automatically take you one level of indentation in.

So Python is quite a unique language in that the indentation is part of the syntax of the code. So indentation is not simply a nice thing to do like it is in Java. Indenting your code in Java is just a little courtesy that you afford to the other people who might want to read your code. However in Python, it is integral to the code. This will not work if I have my function looking something like this. Association to a function is done by indentation. So the returns statement is associated with the function because it's indented.

And I can have as much going on here as I want to. I could do predicted to the power of three or 43. I could print, "Hello There". I can do everything I want to do here. I can do as much as I want. And then eventually I will actually return the results of this operation. So colons are an integral part. Arguments into my function. So it's worth saying prediction observed or pred and obs only exist on what we call scope. They only exist within the scope of the function. So when I put these numbers in, I'm giving them temporary names, predicted and observed. I'm working with predicted and observed within the function. Once I run this function, if I try and access predicted, it doesn't exist, it's not there. It only existed in the function to describe what to do with the data coming in. I've cut this function out. So the blue part is the name of the function. After that, the arguments go into the function.

And then there's one more thing of interest. What's print? I'll show you my example what the difference between print and for example error is here. So if I had a function called error and if I had a function called error_2, error_2_Tokyo_Drift. If I have error, error_2_Tokyo_Drift, and I have changed return to be print, and I printed out the results of these things here, we would have two completely different functions. If I call error_2_Tokyo_Drift on eight and 10, if I print these out, if I print the results of these out, I will get completely different outputs. So if I print these out, what I get is four. I need to print out something to say, well let's put anything in here, blah blah blah blah, just to separate them out. So when I get the error function where I printed out, what I get is the number four printed out because it's a result of the error function. Return sees this function and gives something out of it. And then I print the results of error_2_Tokyo_Drift. Now, what error_2_Tokyo_Drift does within this function, this function's sole purpose is to compute squared error and then print it out to the screen. So it prints it out to the screen, but then the function itself returns a reference to nothing. This is exactly the same as having a blank return statement, return nothing. So I can't get any data out of this function, right? Similarly, if I wanted to take error for eight and 10 again, and I said okay, I want the results of the error between eight and 10 to be stored in a variable called err, and then I want to print err to the screen, and I can print it towards the screen, if I did the same thing for error_2_Tokyo_Drift, then err actually contains nothing. If I take this out of the cell, err is nothing, there's nothing in it. Print statement's job is to stick things into output for us so that we can read them. Return won't print anything out, because this function's job is to take in some arguments, compute something, and then give it back to the program or user. So print and return are very different. Print doesn't return anything, it's a function that returns nothing. We're writing functions that actually return things. So I'm going to get rid of Tokyo Drift then. So I hope that clarified things a bit for you.

So if a function doesn't return anything, it will always return a reference to this special object called none. And none, there is a single none within the entire Python language that sits in memory. And things point to it when there's nothing there. When a function doesn't return anything. So any function that doesn't actually have a return statement in it, will always return a reference to none. So there are the main aspects of functions. The main bits that we need to be aware of when it comes to functions.

Now there are different kinds of arguments that we can pass in. This function can be called in three ways. I can pass in two numbers and the function will decide that the first one belongs to predicted, and the second one belongs to observed. I can call this function by saying I want observed to be equal to eight. And I want predicted to be equal to 10. I'm getting the same output because we're squaring numbers, nothing actually changes. Observed is given by 8, predicted is given by 10, and I can then swap that around, so I can say predictor is gonna be given by eight and observed is gonna be given by 10.

So I can even pass them by name, or I can pass them by position in this function here. If I wanted to have a function that contains default values, say I wanted a default value of zero and zero, my predicted an my observed. So that if I call it without passing data in, I'm still going to compute something. Then I can specify default arguments within what we call the signature of the function. So this says if predict and observe don't get anything passed into them, set them to zero. I could pass in solely observed. I could say I want observed to be equal to 10, so then we're going to compute 10 minus zero, all squared. Whichever one it is.

I can pass in a single value. I can pass in no values. I can pass in one positionally, I can pass in two positionally. I can pass in any sort of thing. Functions are often a mixture of defaulted behavior and non-defaulted behavior. And I can specify how I want my function to be called using special characters.

Special notation. So if I add in an asterisk sign into my function definition, every argument after the asterisk has to be specified by name. So I'm getting shouted at when I run this. If I passed in pred is equal to 10, then I get told you're missing one keyword on the argument called obs. So then I'll go okay right, so obs is going to be five. Then my function is happy, but I've defined the behavior of this function so that it doesn't allow anything to be passed in by position. It only accepts keywords. So this is how we can extend the behavior of our function. 

And I can have things before this, so I could have positional, I'm going to call it positional_var. I could pass in a positionable variable. So if I just put in "hi I'm, "hi I'm "positional", like that, and then I passed in the rest of these guys. Then this function is going to be happy. I'm going to print out positional, and I'm going to print out the positionable variable like that. This can be passed by position, this can be passed by name.

Everything after the asterisk has to be passed by name, everything before it, laying on some sort of extra complexity when it comes to functions. We can also have a final argument called something like kwargs, which you'll often see in visualization libraries. And this says that anything that isn't a positionable variable predicted and observed that gets passed in after this, stick it in an object called a dictionary. And we'll use bits of it if we want to take it, essentially.

I can just put in something like "another thing", right? I pass that in there, and I want it to print out kwargs to print out kwargs as if an argument follows keyword arguments. That's true. Another thing, another element. It's going to be given by this, so this function now takes in a single positional argument, two keyword only arguments, and then a collection of things at the end which are going to be put in this dictionary called kwargs which contain the name of the variable and the value it takes.

Now we can extend this more and more and more, but this should give you an idea of why you get shouted at when you use certain kinds of functions. They will often tell you things like you're specifying a keyword, a positional argument after a keyword argument. So that's not allowed, you're not allowed to give a keyword argument and then say number 10 and then another keyword argument. The rule are, get your positional ones done first, then do your keyword ones.

Then you've got your kwargs at the end, your keyword arguments, so this guy. So the star star means after you have received observed, well, after you've received all of these things, stick every other argument into a dictionary where the key is a name of the variable and then the argument is the name of the actual data. We can do it so that we get in certain arguments as well, args, and then we have kwargs after that. But these are things that you'll actually come across.

So this is as much as you need to know about functions. In reality, what do you need to know about functions? You need to know how to define something like error which will take in argument one, argument two, and argument three, sum all the arguments together? So A1, A2, A3, we can write these, so these nice simple statements. And then obviously return. We can return multiple things at the same time, so I can run this new error function. I could run it with one, two, and three in it, and I can stick the outputs into X, Y, Z, just by comma separating them, and running this function. Then I should have X as this, Y is this, and Z is this, right? Multiple return from.

About the Author

Delivering training and developing courseware for multiple aspects across Data Science curriculum, constantly updating and adapting to new trends and methods.

Covered Topics