One of the best ways to learn new programming languages and concepts is to build something. Learning the syntax is always just the first step. After learning the syntax the question that arises tends to be: what should I build? Finding a project to build can be challenging if you don’t already have some problems in mind to solve.
Throughout this course, we’re going to learn more about Python 3 by building a data ingestion process. We’re going to go from setting up a development VM through to deploying the app to a 16 core, cloud VM where we can test. The application is going to allow us to submit articles to a front-end HTTP endpoint where they’ll be enqueued onto a multiprocessing queue. On the back-end, a set of worker processes will dequeue the articles, extract named entities from the article, and enqueue the results to be saved. A set of saver processes will dequeue the results and save the records to Cloud Firestore. Both the front and back-ends will be run together using supervisord. And we’ll use setuptools to create a software distribution used to deploy the app.
This course is broken up into sprints to give you a real-world development experience, and guide you through each step of building an application with Python.
The source code for the course is available on GitHub.
If you have any feedback relating to this course, feel free to contact us at firstname.lastname@example.org.
- Configure a local development environment for an app using a VM
- Implement a data processor that can accept text, extract named entities, and return the results
- Implement a multi-process aware message queue and use Pytest
- Create data models to use as messages to pass on the message queue
- Create the backend for the application
- Create a web endpoint that we can use to enqueue our post models
- Implement a method for running the frontend and backend together
- Run the application using a dataset to see how the system performs under actual server load
This course is intended for software developers or anyone who already has some experience in building simple apps in Python and wants to move on to something more complex.
To get the most out of this course, you should be familiar with Python, ideally Python 3, and have some knowledge of Linux and how to use Git.
Hello and welcome to Sprint Seven! When we implemented the backend, we used setuptools to create an entrypoint that we can use to run the backend. When we implemented the frontend, we used uvicorn to serve as an entrypoint for our frontend. Both the frontend and backend are two parts of our holistic system. They're intended to be run together. So we need a mechanism to allow them to run as a single entity. Our goal for this sprint is to implement a method for running both the front and backends together. We're going to run our frontend and backend together as a Linux daemon. Linux daemons are processes that run in the background and are not attached to an interactive shell. This ensures that they can remain running even if we're logged out of the system. PEP 3143 has some specific implementation details about how we would implement a daemon in our Python code. Now, we could implement this for ourselves in code, however, there are already tools that allow us to daemonize an existing process. The solution I chose for this was a library called Supervisor. Supervisor is a process control system. It's not a substitute for init. Rather, it's a tool for allowing us to control the starting, stopping, and restarting of our processes. Supervisor is broken into two components. There's the daemon and the controller. The daemon uses a configuration file that we create and it tells it how to start our processes, and the controller allows us to interact with that daemon to control those processes. We're going to use two configuration files. One is going to server as our local development version, the other is going to be the version that we use in production. The local version is going to be called supervisord.local.conf and I'm going to paste in the completed configuration. This is a standard INI style file. With Linux systems, it's common for daemons to have the letter d added to the end of a process name. So when you see supervisord, it's referring to the daemon. When you see supervisorctl, as in control, it's our controller. This first section here allows us to specify settings for the daemon, and we're just going to set the log level to info and a log file. The unix_http_server section allows us to have supervisor start up a web user interface that we could use to interact with the processes. The supervisor control section configures the URL used to access supervisord on the backend, and in this case, it's using a Unix socket. These next two sections for program are where we define the programs that should be started. The priority property determines the startup and shutdown order when we're running things as a group. Command specifies the app that we want to run. This stop wait seconds here allows us to specify the number of seconds that we want to allow supervisord to wait after asking it to shut down the processes before it kills the parent process. We're using the ingestiond entrypoint for our backend process and we're using gunicorn for our frontend. Recall that when we created the frontend, we used uvicorn to run the app. Uvicorn can be paired with Gunicorn. Gunicorn is a production quality server manager and it's able to use uvicorn workers, which will allow us to run our frontend application. So we do need to install gunicorn. To test out that the frontend works, we can run this command here and we can see that it is starting up our application. If we run the command here for the backend, we can see it is displaying the logs. So the commands are working as expected. Now, let's install supervisor with pip, and now, if we attempt to run supervisor, let's just say that we pass in the help flag, notice it says it's not installed. Recall supervisor is comprised of the daemon and the controller. To run the daemon, we call supervisord and we use the -c flag to specify our configuration. Notice it returns without error, but also, without any real indication of what just happened. The job of the daemon is to run our programs in the background. Using htop, we can see our processes, and let's filter this for, let's say Python 3.8. If we switch to tree view by pressing T, notice that supervisord is the parent process for the backend, that's this ingestiond process here. And it also has children of it's own. These children are our worker and saver processes, as well as processes for the queues. This gunicorn process is also a child of supervisord and supervisord owns these processes and we can use the controller now to interact with them. The controller's command is supervisorctl, and we do need to specify the configuration file. We can specify the subcommands that we want to run directly or we can also use this interactively. Calling status. We can see our two programs, we see their status, their pids, and their uptime. Running this controller without any arguments is going to jump us into an interactive mode. We can interact with multiple programs, for example, by calling status all or we can specify the front or backend. Let's exit out of this and check our logs. In my testing, I found that everything seems to be written to standard error so we're going to checkout the standard error log, and if we tail the last few lines, we can see our logs from our backend. Let's stop everything using stop all. So our processes are now stopped. We can start them up again by calling start all and by calling status, we can see they're both up-and-running. Checking the logs again, we can see that the backend is indeed running, spacy has loaded, and with this running, we can actually try and submit a post through FastAPI's document page. First, we need to authenticate so let's grab that API Key and we're going to paste it in and we'll test out this endpoint. I'm just going to type in something here that has a few entities that it can extract so we'll use this and submit it and now, if we check the logs, notice there's no messages here in the logs. Now, that's because our cache size is set to 10. So let's submit this same message nine more times, and if we go back to the logs, we can see that it extracted John, Apple, and Microsoft 10 times. Okay, we can have the controller shutdown the daemon by calling shutdown and tailing the logs again will show us that the backend has shutdown. Okay, so everything seems to be working, let's create a production version of this configuration file. We're going to call it supervisord.conf and I'm going to paste in the results here. The only real difference between this and the other version, the local version is the path to our commands, as well as the settings that we used for our front and backend worker processes. In production, we're going to use a 16 core server. So, we'll have the backend create 14 worker processes, and 15 saver processes. The frontend is going to run four worker processes, which will give us four frontend processes, each connecting to the input queue and enqueueing Posts. Okay, our goal for this sprint was to get the front and backend components running together as a holistic system. And through using Supervisor, we were able to accomplish that. In our next sprint, we're going to start getting this deployed to that 16 core server, a more production ready server so that we can start to test out this code. So when you're ready to get this thing deployed, I will see you in the next sprint!
Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.