Data Visualization with Python using Bokeh
The course is part of this learning path
Bokeh is an interactive visualization library in Python that provides visual artefacts for modern web browsers. In this course, we're going to have a look at the fundamental tools that are necessary to build interactive plots in Python using Bokeh.
Bokeh exposes two interface levels to users: bokeh.plotting and bokeh.models, and this course will focus mainly on the bokeh.plotting interface.
We'll start things off by exploring two key concepts in Bokeh: Column Data Source and Glyphs. Then we'll move on to looking at different aspects related to the customization of a bokeh plot, as well as focusing on how to introduce interactivity into a Bokeh object.
You'll also learn about using inspectors to report information about the plot and we'll also investigate different ways to plot multiple Bokeh objects in one figure. We'll round off the course by looking at plot methods for categorical variables.
- Learn about Columns Data Sources and Glyphs in Bokeh and how they are used
- Learn how to customize your plots and add interactivity to them
- Understand how inspectors can be added to plots to provide additional information
- Learn how to plot multiple Bokeh objects in one figure
- Understand the plot methods available for categorical variables
- Data scientists
- Anyone looking to build interactive plots in Python using Bokeh
To get the most out of this course, you should have a good understanding of Python. Before taking this course, we also recommend taking our Data Visualization with Python using Matplotlib course.
The GitHub repo for this course can be found here: https://github.com/cloudacademy/interactive-data-visualization-with-bokeh
Welcome back. The `bokeh.plotting` interface provides a convenient way to create plots on top of `glyphs`, which are the fundamental building blocks of any bokeh object. More specifically, glyphs map data properties to visual properties. The family of glyphs includes many objects: from standard geometrical attributes, such as lines, rectangles, squares, to more sophisticated shapes, such as wedges and patches.
Here we will cover the most important glyphs. If you are curious to know all the possible shapes involved in the construction of a bokeh plot, please see the official documentation.
In order to create plots using the bokeh.plotting interface, we need at least four steps that are as follows.
First, we need to specify the output: it is good practice to tell bokeh where to generate the desired output.
Second, we must create a Figure Object: this is a very important step, and it basically creates an empty, highly customizable plot with typical default options.
Third, we add Renders: here we basically apply glyphs to the plot, and finally, we need to tell Bokeh either to show or save the results.
Note that bokeh allows you to fill the plot with data coming from different sources of data: from lists to tuples, and python dictionaries, but even NumPy arrays or pandas dataframe columns work well.
Apart from NumPy and pandas data structures, bokeh extensively uses a particular data structure, called the Column Data Source (CDS in short). It’s essentially a simpler version of a pandas Dataframe: this particular data structure has a data attribute, which is a python dictionary that maps string names to sequences of data.
When using bokeh, though, it is a good practice to create a CDS explicitly. However, CDSs are created in the back-end for you: when you pass any data to a bokeh object, bokeh creates a column data source for you behind the scenes.
Now, let’s take a hands-on look at how bokeh works. To do so, open a Jupyter notebook similar to the one you see here on my screen. I strongly encourage you to follow the steps you find in the README of the course-specific GitHub repository in order to have the same environment as mine.
We firstly need to import the data into our session. We use pandas to do so, and that has been done for you in this snippet.
We also need the `Date` column to be a datetime. Indeed, if you inspect, for instance, the facebook data frame and you apply the info method to it, you see that that the column is of type object.
We do not want that, we want that column to be a datetime. To do so, we firstly define a dictionary that maps the data with the corresponding Symbol and then, for each single dataframe, we apply the pandas to_datetime function on the original column Date, and we store it.
This has been done for you in the following snippet. A simple call of the info method on the facebook dataframe shows that the column Date is now of type datetime.
One of the easiest examples of glyphs is the line. Now we’ll look at an example that shows how to generate a single line glyph from two pandas DataFrame columns using the line glyph method.
Firstly, we carry out the necessary imports. From bokeh plotting, we first import the figure method which creates an empty canvas with typical defaults. We also import the show function, which is used to tell bokeh to show the figure in the browser. And since we are using a Jupiter notebook, we also import the output_notebook method which configures the default output state to generate output in notebook cells when the show function is called during the Output Visualisation phase.
To do so, we need to call the output_notebook function in this way. And then we create a figure object by calling the figure function, and storing it into the variable plot.
We then apply the line function which basically requires two arguments: the x coordinate and the y coordinate.
Let’s say that the x coordinate is going to be the Date column of our facebook data frame, and then we have as the y coordinate, the Close column of that data frame.
Although it’s not necessary, we pass those arguments as lists by applying the pandas to_list method on those columns.
Finally, we call show, which in this case, displays the output in a Jupyter cell.
That's awesome, isn't it? With just two lines of code we have produced a remarkable plot. We also see that on the top right border of our figure we have different icons. The set of icons is called the toolbar, and it is a very important object in any bokeh figure. We will see how to deal with it in Lecture 3 of this course.
For the moment, it is important to point out that those icons come by default with the figure method, and can be easily customized. In particular, note that if we inspect any of them, we get an interactive label.
The first icon you see at the top of the toolbar is the Pan tool. By default, the `Pan` tool that is already selected and is used to pan the plot region.
We can use the reset icon to come back to the original output. Another possibility is `Box Zoom` which is used to draw rectangular regions to zoom in, like this.
You first select box zoom and then you draw the rectangle region you are interested in so that now you have a better zoom on that particular area. Again, if you want to go back to the original plot, you just need to select the reset icon.
Wheel zoom is instead used to zoom in and out on the plot, centered on the mouse location.
Finally, note that the save icon allows you to save a file if not specified in the code, and the help icon will redirect you to the official documentation. (https://docs.bokeh.org/en/latest/docs/user_guide/tools.html)
Note that the x coordinates are not expressed as a Date. What we see is an internal bokeh representation of that particular date time object. Hence, we need to convert those numbers into a proper date. To do so, we simply set the argument x_axis_type to datetime inside the `figure()` function.
We run this again and now we see that the dates are expressed correctly, as desired.
Ok, so far we have used a pandas data frame inside the line function, but we can directly use column data source in it.
As I said at the beginning of this lecture, Bokeh has its own data structure, called Column Data Source, which can be used as input of any Bokeh Object. This is highly recommended since it brings flexibility to customization in a plot.
To create a column data source is pretty easy using the ColumnDataSource function form the models interface, and we just need to pass the pandas data frame to ColumnDataSource, as shown here:
We can access the data - in particular in the Close column - using the data attribute on the Close column. For simplicity, we’ll just show the first ten observations.
To create the aforementioned figure now using the CDS, we specify the source argument inside the line, and we pass the column names of the corresponding variables we wish to have as x and y coordinates.
Hence, instead of passing a pandas dataframe inside the line, we define the source argument to be the facebook data frame source and we pass the columns we wish to have as the x and y coordinates - in this case, Date and Close.
We basically run the cell to show the plot like so.
Another useful glyph is circle, which is used to scatter circle markers on a plot.
As an example, suppose we wish to investigate the relationship between the daily absolute difference in closing price and volume. To do so, we create a new column in the original data frame called Diff, which computes the first order difference in price between two elements. To do so, we employ the pandas diff method, and we apply it to the close column.
Since we are interested in the relationship between the two quantities, we can take the absolute value of this quantity using the python abs function.
If we inspect the first two rows from that data frame, we see we have a new column, which is the result of the operation above.
We can now create a scatter plot using the circle method on the figure object. To do so, we create another column data source by calling ColumnDataSource on fb_df, and we store it in the variable facebook data frame source v2.
We then initialize a figure object without any argument, and we store it in the new variable new plot; we then apply the circle method to the plot, with the Volume on the x-axis and the difference in price, namely Diff, on the y-axis. Remember to pass the facebook data frame source v2 as the source of this figure. We then show the plot.
Note that we can use different markers in bokeh. As an example, it is worth mentioning asterisk, circle_cross, diamond, and inverted_triangle, to name a few. Please check the documentation for further details. For instance, if we want to use inverted_triangle, just replace the circle with that class and then run the cell to get the output.
Finally, it is worth mentioning bars as a member of the bokeh family of glyphs. Suppose, for instance, we are interested in plotting the observed daily volumes.
One possible way to visualize them is by means of bars. Bokeh provides the hbar and vbar glyph functions to plot data using bars, where h and v stand for horizontal and vertical, respectively. Here we apply vbar to the figure object.
We create another figure and store another plot in the variable. We apply the bar function to it and unlike other glyphs, bars require a different argument: top, which describes the reference value - in our case, Volume.
We still have the x argument, which denotes the x-center coordinate: in our case Date. Since we are dealing with date time we need to specify the x axis type as date time. Finally, we have to pass the facebook data frame source v2 as source. Finally we show the plot.
We will cover this family of plots in more detail in Lecture 6 when dealing with categorical variables.
That concludes this lecture. In the next one, we are going to cover how to customize a bokeh object. See you there!
Andrea is a Data Scientist at Cloud Academy. He is passionate about statistical modeling and machine learning algorithms, especially for solving business tasks.
He holds a PhD in Statistics, and he has published in several peer-reviewed academic journals. He is also the author of the book Applied Machine Learning with Python.