This course is the second course in two-part series on how to build an application in Python. In the first course, we built a data ingestion process that extracted named entities from articles across a few different publications. We extracted named entities from around 100,000 articles and we saved the results into Cloud Firestore. In this second course, we'll explore the codebase for a web application used to visualize those results.
We'll kick off the course by checking out some quality of life changes implemented while developing this app. That includes a custom bash theme, a replacement debugger, a debugger command for starting an IPython shell, and pytest plugins. After that, we're going to review the data access layer and its accompanying tests. That's going to include multiple implementations of each data access service. Then we'll check out Python's web application standard.
Next, we'll review the web application layer and its accompanying tests. That's going to include a fast web application framework, custom middleware, request hooks, and application configuration. After that, we're going to review the presentation layer, including a Vue.js app and materialize CSS. Finally, we're going to run the app locally and trace some requests through the application using the debugger.
If you have any feedback relating to this course, feel free to contact us at support@cloudacademy.com≥
Learning Objectives
- Implement a few developer quality of life changes
- Implement a testable data access layer
- Understand how a Python web app operates
- Understand how to build and test a more complex web app
- Understand how to use ipdb and IPython
- Enhance your knowledge of the Python programming language
Intended Audience
This course is intended for software developers or anyone who wants to learn more about building apps with Python.
Prerequisites
- Before taking this course, please make sure you have taken the first course in this two-part series: Building a Python Application: Course One
- You should also have an understanding of Python 3, Linux CLI, HTML/JS, and Git
Resources
The source code for the course is available on GitHub.
Hello and welcome. In this lesson, we're going to explore the web application layer. Recall that the WSGI server expects the application to be callable. With Falcon, this is done by creating an instance of falcon.API. When building a web application, we often want some bit of functionality to run at different points during a request's lifecycle.
Falcon allows us to configure middleware that will accomplish that by being called at one of three phases. These phases include, before routing the request to a resource, after routing a request to a resource. And before the response is returned. The middleware in this application runs before returning the response. Its responsibility is to set the cross-origin request headers. So this is going to be evaluated for every response.
These resources here are responsible for handling the requests. Falcon allows us to map a URL to a resource. Falcon resources are just plain old Python classes with request handlers for our HTTP verbs. Routes can include parameters such as this pub value. And this allows us to extract values from the URL path and pass those values to our request function.
This function here returns the API instance with the routes configured. We have three resources, one for publications, one for entity frequencies, and one for word clouds. Let's check them out. The publications resource accepts a data_storage object and a bucket name.
Falcon is automatically going to map our request to this on_get method here. The implementation for this is pretty minimal because all of the heavy lifting happens in the data access layer. It attempts to get the publications using our data access layer's publications method, and it passes in a bucket name which is used to generate the image URL.
We convert all of the publication models into dictionaries and set the responses media property to this list. This is going to return the list of dictionaries as JSON. The frequencies resource is similar. The key difference is that on_get here accepts a publication. Recall that the word_counts method of the data source requires a publication.
This value comes from the URL mapping. Falcon extracts the value from the URL path and it passes it to our function for us. Using the get_param method of the request, we can get the URL parameters and there are also data type specific implementations.
So we pass the parameters to the word_counts method, convert these to dictionaries and set our media property. The word cloud resource here accepts a blob_storage object a data_storage object and a bucket name.
Notice that rather than on_get, we have an on_post method, which maps to an HTTP post request. We'll circle back on this decorator in a moment. We loop over all of our publications, then we get the top 5,000 entities for each publication and we pass the frequencies to the generate_word_cloud function, which by default returns the image as bytes. Then we save the image to blob storage.
This resource here is intended to be used by developers to generate the images, this is not meant to be run by just anyone. So this decorator here helps to ensure that only requests containing this specific token are allowed. This function will be called before on_post gets called and it'll check the token against this hard hard-coded value. If that doesn't match, we raise an error. Otherwise, on_post will get called as normal.
At the top of the file, we have this bit of code here, which sets our default debugger to be ipdb. We could set this using the PYTHONSTARTUP script variable. However, I wanted to make this visible for the course, so I put it here. This create_app function here doesn't accept any parameters, it's intended to be called by Gunicorn.
So it gets its configuration from environment variables. So when we have Gunicorn call this function, it's going to configure this Falcon application and then return it so that Gunicorn can use it as its callable. By default, this function is set to use the NoOp data access layer. Let's check out the tests. When we tested the data access layer, we asserted that given a specific input that we receive the expected output.
Now, it's not much different when testing a web application. However, we do need to consider that input and output for web applications involve interactions over HTTP. Now that means our input needs to be HTTP requests and our output will be in the form of HTTP responses.
So we're not just testing functions, classes and methods directly. Rather, we're going to be interacting with the web application by using a web client. Falcon helps by providing us with a test client. This fixture here is module scoped by default.
Recall that in previous tests we specified function scoping so that the fixture was run per test. In this case, we only want one application to be created and we're going to use it for all the tests. The test client expects us to provide it with a WISGI application. And so we're going to use the create_app method.
Recall that this uses the NoOp data access implementations by default. I like to use NoOp implementations by default, rather than production settings. My reasoning is this, we're going to start this application more often while developing and testing than in production. Granted, it is likely to be up and running longer in production. Though, we're talking about starting up the app.
This app is going to be started up every time we run our tests, it's going to be started whenever we're using in development. So I like to make sure that we can start this application without needing access to external services. And I find that makes it easier to test and develop.
An added bonus, I find that it makes developer onboarding easier because they can start working on the code base right away even if they don't have access to those external services. It's been my experience, that we're going to need to configure production systems no matter what.
We're going to need to set environment variables and configure file and folder permissions. We'll have to set up service credentials, et cetera. So I prefer to use defaults that reduce the developer effort. Let's see what we're testing for each Falcon resource.
For publications, we're using our Falcon client to submit a get request to /publications. We assert that the HTTP status code is returned as 200. We check to see if the Access Control header is set to an asterisk. This header value here is set by our CORS middleware and I've specified the default as an asterisk.
We assert that the results here is going to be JSON data by calling the JSON property. And we further assert that it's a list. Because we're using our NoOp data access, which is hard-coded to return 10 records, we can safely assert that the length is 10. And in this last bit, we assert that these properties exist for the publications object, at least for the first one.
So this test checks the status code, response headers and data type and format of the response data. This test here for get_frequencies is basically the same thing as this last one. Since the frequencies resource accepts URL parameters which are used for paging the results, we basically run the same test as the last two, except we also pass in some URL parameters.
Since we're using the NoOp version with our 10 hard-coded results, by specifying that we want all records after index one, we should get eight records returned. This helps us to understand if our checkpoint is working correctly.
For our word cloud resource, we're not actually testing the image creation and upload process. This was more a matter of time than anything else. With more time I'd have preferred to make this a bit more efficient and add some tests. However, what we are testing here is that unauthorized requests will result in a 403 status, and that authorized requests result in a 201.
So these tests are gonna help us to ensure that we don't accidentally allow this URL to be unprotected. Even if, as we have here, the protection is fairly simplistic due to this basic token. Let's look at the final test here. This test here is meant to verify that when we call create_app, it configures the resources with the correct data access implementations.
When we call create_app, it checks for specific environment variables and based on the results, it determines which implementations to use. At the end, it returns the results of this internal create_app function. I wanted to be able to test that the implementations selected in create_app are correct, at least for our default state of NoOp. To do that, I'm patching the internal create app function.
Python's unittest package offers a lot of great functionality. From the mock module, we can use this patch function to replace existing functionality. This is telling the patch function that we want to replace the _create_app function from the main module of the current package with this version here. The end result of this is that when we run this test, the _current_app function is no longer going to be the one used, rather this new version here is going to be used in its place. And that's just for this test.
This new version allows us to loop over the arguments passed in by the create_app function and assert that if we're using the empty environment variables, that we're going to get our NoOp storage classes. The test itself is using monkeypatch to set the environment variables for just this test. pytest gives us access to monkeypatch by specifying it as an argument.
If we run all of these tests, we'll see that they're all passing. I find it valuable to keep in mind that these tests are here to validate our assumptions. We're not developing in order to make some arbitrary tests happy. We're developing tests to prove that the code we write functions the way we think it does.
When done well, these tests become a guardrail keeping us from making breaking changes. However, they can also become an albatross that slows productivity when the tests are written solely for the sake of having tests.
Actively developed code evolves and changes over time. When tests evolve alongside the code base, they tend to keep their value. If code and tests don't evolve together, the result is often that progressively more and more tests are going to be skipped, until eventually no one runs the tests at all, because it's just too much effort to get them up to date.
Okay, so that's our web application layer. It's a Falcon application that interacts with our data access layer to present the publications, word counts, and word cloud images. In our next lesson, we're going to review the front end code. So whenever you're ready, I will see you in the next lesson.
Lectures
Course Introduction - Quality of Life for Developers - What Is It That We're Building? - Exploring the Data Access Layer - The Web Server Gateway Interface - Exploring the Front End Code - Running the Web App - Summary / Next Steps
Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.