Data Visualization with Python using Bokeh
The course is part of this learning path
Bokeh is an interactive visualization library in Python that provides visual artefacts for modern web browsers. In this course, we're going to have a look at the fundamental tools that are necessary to build interactive plots in Python using Bokeh.
Bokeh exposes two interface levels to users: bokeh.plotting and bokeh.models, and this course will focus mainly on the bokeh.plotting interface.
We'll start things off by exploring two key concepts in Bokeh: Column Data Source and Glyphs. Then we'll move on to looking at different aspects related to the customization of a bokeh plot, as well as focusing on how to introduce interactivity into a Bokeh object.
You'll also learn about using inspectors to report information about the plot and we'll also investigate different ways to plot multiple Bokeh objects in one figure. We'll round off the course by looking at plot methods for categorical variables.
- Learn about Columns Data Sources and Glyphs in Bokeh and how they are used
- Learn how to customize your plots and add interactivity to them
- Understand how inspectors can be added to plots to provide additional information
- Learn how to plot multiple Bokeh objects in one figure
- Understand the plot methods available for categorical variables
- Data scientists
- Anyone looking to build interactive plots in Python using Bokeh
To get the most out of this course, you should have a good understanding of Python. Before taking this course, we also recommend taking our Data Visualization with Python using Matplotlib course.
The GitHub repo for this course can be found here: https://github.com/cloudacademy/interactive-data-visualization-with-bokeh
Welcome back. Recall one of the most important outputs we saw in the last lecture. To jog your memory, I’ll show the snippet here: we basically created a figure with datetime as the x axis type, and then we apply a line glyph by passing a CDS. In the previous lecture this was related to a facebook data frame but now it’s for APPLE with then x axis being the date and the y axis, the closing price.
I am sure that you have been asking yourself a few questions while going through the process of plotting glyphs. For example, you might be asking: how can I manage the figure size? Can I add a title? What about a legend? Do glyphs have visual properties that can be easily customized, such as size, color and transparency? Well, each of these questions has an answer, and we will go through those in this lecture.
One of the easiest improvements we can think of is to provide a custom value for both `plot_width` and `plot_height` inside the figure function.
For example, we can set plot_width equal to 800 and plot_height equal to 450 to make the figure bigger and more readable.
Another improvement we can easily add to the figure is the title: we simply use the title argument inside the figure function, and we pass a string that we wish to show along with the plot. An example is adding a title such as "Apple Closing Price Jan-Sept 2020”.
We now see that on the top left of our figure we have a title. To control the font and the size of the title, we cannot work directly inside the `figure()` function but instead we have to use specific attributes of this class.
For instance, if we want to set the font size of the title to `15pt` we can call the attribute text_font_size of the figure attribute title by setting it equal to the string `15pt`. We can also change the color of the title, by setting the attribute text_color equal to orange, for example. Here is the result.
Another useful argument is `tools`: this controls the tools we wish to have in the toolbar. Tools can also be supplied conveniently with a comma-separated string containing tool shortcut names. For example, we wish to have the following tools: inside a string we pass "pan, wheel_zoom, box_zoom, reset".
We then pass the selected tools to the figure via the tools argument. Like this, only a few tools are shown in the toolbar.
Alternatively, we can pass a list of tools expressed by their bokeh object class, as follows.
First, we import the WheelZoomTool, the WheelPanTool and the BoxZoomTool from bokeh models. Then, instead of passing a string to the tools argument, we pass a list of bokeh objects, namely WheelZoomTool, WheelPanTool and BoxZoomTool.
We run the cell and we get exactly the same results but with different syntax. Note that we can also deactivate the tools with toolbar_location equal to None and tool as an empty string. Now the toolbar has disappeared.
Finally, we can also specify x-label and y-label titles directly in the figure method using the arguments x_axis_label and y_axis_label.
We can specify, for instance, that x_axis_label is equal to Date and y_axis_label is equal to the string Closing Price. You now see that those two labels are shown in the plot.
Note that we can also control the label size using the attribute `axis_label_text_font_size` on the plot.axis attribute. We set this to be equal to 12pt. This will control the font of both axes with the same font size.
If you want to be more precise and you want to control the x and y axis separately, you just need to perform the following: the syntax is the same but instead of applying axis_label_text_font_size to the generic axis attribute you now specify both the x-axis and y-axis separately to choose different sizes for them. The result confirms this.
We can also control different glyph features, such as color, transparency and size. For instance, suppose we want to change the color and size of the line, as well as controlling the transparency of that object. This can be easily done by using the arguments `color`, `line_width`, and `alpha` inside the line function.
For instance we specify line_width equal to 3, color equal to orange, and alpha, which is the transparency argument, as 0.8. You see that color and line width have changed.
It’s a good habit to specify that we are dealing with discrete-time observations. So, instead of just plotting a plain vanilla line, we add on top of it markers, such as a circle for each observation. In this way, we can easily tell the reader that the grid is discrete.
To do so, we apply the circle function to the plot and we pass the same coordinates - close for the y coordinate and date for the x coordinate - using the same data source, and we fill the circle with a given color.
So, let’s go back here and just apply the circle function to the plot and pass the Date as the x coordinate and price as the y coordinate. I fill the colour of the marker with white, and
require each marker to have size equal to 7.
If we run the cell, we see that the markers clearly highlight each single trading day. We can improve the readability of the plot by reducing the size of the markers, say to 3. Here is the result.
We can also add a legend to our plot. This operation in a bokeh figure is pretty straightforward: we just need to specify a legend_label inside the glyphs - in our case, the line - and we pass a string saying Apple.
Here is the result. The legend is shown on the top right of our plot. That is pretty easy, isn’t it?
Legends are useful when we have multiple plots. So let us plot our three datasets in just one figure.
In lecture 5, we’ll cover different methods to deal with multiple plots. Here, we loop over all possible stocks - in our case APPLE, FACEBOOK and GOOGLE - and basically we will create a line identified by a distinct color for each single stock.
To do so, we employ bokeh.palettes, and we choose the Colorblind palette.
Therefore, from bokeh palettes we import the Colorblind3 palette: this is very useful since it allows us to assign different colors to different stocks in the legend.
What I want to do now is to apply this palette to every single stock so that each one will be identified by a distinct color. This is done for you in the following snippet.
We create a new object called newplot by calling the figure function with the usual settings, but now we are basically going to loop over the map_stocks_df dictionary containing the stock name and the associated dataframe, and we loop over its keys and values associated to the variable name and data in this loop, and then for each element we associate a color that comes from the color blind map.
This allows us to produce a line for each single element inside the new plot object. We then show the result here.
If you do not remember the structure of map_stocks_df, here it is. Ok, let’s run this cell and show the result.
We can specify the legend location by setting the location attribute of the legend attribute equal to top_left. To do so, we just need to apply to the attribute legend of the new plot the attribute location equal to top left. so now you see that we have moved the legend from top right to top left.
One possible improvement we can make is as follows: you see that the three series have different scales and magnitudes. To make the plot more readable, we can reshape the data so that the stocks will be free of the scale effect.
We firstly create a method called daily change that’s basically applied to each single row in the data frame and returns the percentage change in price for each stock versus the previous stock.
We then apply the same logic we saw before to create a new column called Returns and we apply the daily change to each element in the Close column.
For simplicity, I am going to copy and paste this snippet and fix it to satisfy our needs: indeed, we apply the daily change method for each single observation in the close column and we create a new column called returns in the data frame.
Now the data has this new column called returns and we can plot the returns by simply changing the y coordinate for each single line call. Therefore, we copy the above snippet and instead of plotting the close column for the y coordinate, I am gonna plot the return. Here is the result.
What we see is that the plots are overlapping now because the scale has changed. We can therefore use a line width equal to 1 to make the plot more readable. Also note that we can fill the background of the legend box with any color: in this case, we’ll use the light blue color. To do so, we set the legend.background_fill_color attribute as equal to light blue for the new_plot object.
Now, the coolest thing about bokeh is that it allows us to generate dynamic plots. An example is made with legends. Indeed, it is possible to dynamically select which series to show inside the figure by muting a series directly via the legend box.
To do so, we need to add a couple of extra parameters inside the line call, and without any effort, our plot will shock our audience! Those parameters are: muted_color and muted_alpha.
Let’s write the code in a better way by setting the arguments on a new line and in particular, muted color is going to be set as equal to the color coming from the palette, i.e nothing more than the color variable, and for the muted_alpha argument, we need to specify the transparency that we wish to have when we disable an item from the legend - in this case, let’s set it to 0.2.
We then set the legend.click_policy attribute equal to "mute". This is really what we need to activate this feature inside our plot. We run the cell and we get the following output. Now you can navigate the new output legend and you have interactivity: for instance, you can disable the apple series just by clicking on it - and you can choose to reactivate it by clicking on it again.
If you want to look at just the apple series, all you need to do is deactivate both the FB and GOOGLE series from the legend, and just the blue series will be shown. That is really cool, isn't it?
That concludes this lecture. In the next lecture, we are going to cover how to leverage the bokeh object with Inspectors. See you there!
Andrea is a Data Scientist at Cloud Academy. He is passionate about statistical modeling and machine learning algorithms, especially for solving business tasks.
He holds a PhD in Statistics, and he has published in several peer-reviewed academic journals. He is also the author of the book Applied Machine Learning with Python.