## The course is part of these learning paths

Continue the journey to data and machine learning, with this course from Cloud Academy.

In previous courses, the core principles and foundations of Data and Machine Learning have been covered and best practices explained.

This course gives an informative introduction to deep learning and introducing neural networks.

This course is made up of 12 expertly instructed lectures along with 4 exercises and their respective solutions.

**Please note**: the Pima Indians Diabetes dataset can be found at this GitHub repository or at Kaggle page mentioned throughout the course.

**Learning Objectives**

- Understand the core principles of deep learning
- Be able to execute all factors of the framework of neural nets

**Intended Audience**

- It would be advisable to complete the Intro to Data and Machine Learning course before starting.

Hello and welcome to this video on activation functions. In this video, we will talk about activation functions other than the sigmoid and the step function that we've introduced before. Activation functions are non linear functions applied when passing the output of a layer to the next layer or to the final lap. They're one of the key ingredients of neural networks and they are what make neural networks so versatile. So far, we've encountered the sigmoid and the step activation functions. The sigmoid can be easily implemented in Python as a function that returns one over one plus the exponential of minus x. T

his function smoothly maps all of the real axis onto the interval zero to one. The sigmoid has a value of one half when x is equal to zero, and for positive values of x, it quickly goes to plus one while for negative values of x, it goes to zero. We use the sigmoid when finding the logistical regression. The step function obtain a similar effect for very large positive and very large negative values of x, snapping the positive values to one and the negative values to zero. However, it does so with a very sharp discontinuous transition at x equals to zero, with the step function in the perception, a multilayer perception. There are other activation functions that we can use to extend the multilayer perception to other types of fully connected networks. Let's see a few examples. The hyperbolic tangent, or simply, tanh, is similar to the sigmoid in that it's bonded and smoothly varying. However, since it varies between minus one and plus one, it penalizes with the negative weight, values of x that are negative. The rectified linear unit, or simply the rectifier, is defined as the maximum between zero and x. It was originally motivated from biology and it's been shown to be more effective than sigmoid and tanh in neural networks.

It is probably the most popular activation function for deep neural networks. It assigns zero to all negative values of x, and it leaves x unchanged for positive values, which means it's not bonded on the positive side. This will turn out to be useful to improve the training speed as we shall see later on. The softplus function is a smooth approximation of the rectified linear unit and it's defined as the logarithm of one plus e to the x. It behaves very similarly to the RELU for very large positive and very large negative values of x, but it has a smooth transition near the zero. We can use any of these functions to connect the output of one layer to the input of the next in order to make the neural network nonlinear, so for example, we could put a step function activation between the two layers, or we could use a smooth bonded function like the sigmoid or the tanh, or use RELU or softplus to emphasize larger values while removing negative values.

Each of these activation functions will make the overall network nonlinear and this is the secret power of neural networks. With non linearities at each layer, they are able to approximate very complex functions, and deal with any sorts of inputs and outputs. In conclusion, in this video, we've learned a few other activation functions and learned how they make neural networks non linear. Non linearities are the secret of neural networks and they're what make them so powerful. So thank you for watching, and see in the next video.

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.