image
Sprint 8
Start course
Difficulty
Advanced
Duration
1h 57m
Students
1830
Ratings
4.2/5
Description

One of the best ways to learn new programming languages and concepts is to build something. Learning the syntax is always just the first step. After learning the syntax the question that arises tends to be: what should I build? Finding a project to build can be challenging if you don’t already have some problems in mind to solve.

Throughout this course, we’re going to learn more about Python 3 by building a data ingestion process. We’re going to go from setting up a development VM through to deploying the app to a 16 core, cloud VM where we can test. The application is going to allow us to submit articles to a front-end HTTP endpoint where they’ll be enqueued onto a multiprocessing queue. On the back-end, a set of worker processes will dequeue the articles, extract named entities from the article, and enqueue the results to be saved. A set of saver processes will dequeue the results and save the records to Cloud Firestore. Both the front and back-ends will be run together using supervisord. And we’ll use setuptools to create a software distribution used to deploy the app.

This course is broken up into sprints to give you a real-world development experience, and guide you through each step of building an application with Python.

The source code for the course is available on GitHub.

If you have any feedback relating to this course, feel free to contact us at support@cloudacademy.com.

Learning Objectives

  • Configure a local development environment for an app using a VM
  • Implement a data processor that can accept text, extract named entities, and return the results
  • Implement a multi-process aware message queue and use Pytest
  • Create data models to use as messages to pass on the message queue
  • Create the backend for the application
  • Create a web endpoint that we can use to enqueue our post models
  • Implement a method for running the frontend and backend together
  • Run the application using a dataset to see how the system performs under actual server load

Intended Audience

This course is intended for software developers or anyone who already has some experience in building simple apps in Python and wants to move on to something more complex.

Prerequisites

To get the most out of this course, you should be familiar with Python, ideally Python 3, and have some knowledge of Linux and how to use Git.

Transcript

Hello, and welcome to Sprint Eight. We've spent a lot of effort so far getting the system to this point. Our next step is to get this running on a more production-ready server. We're going to use Google Cloud's Compute Engine to start up a virtual machine, so let's do a little preparation. First, I wanna clear out the existing test data from Firestore. And I'm leaving this here because I want you to see what that test data is going to look like. Here's a quick look at the data structure for the final data. We have documents for each publication. We're tracking the number of articles with account property, and we're storing the entities in a collection called "Ents". Each entity has its own document. This ID here is the result of using Python's hash function to hash the entity text. The reasoning for this was that I wanted a consistent method for getting IDs for each entity, but I didn't wanna use the text itself and run into any weird issues with specific character sets. Each of these Entity documents has a word property that stores the entity, and it has a count of how many times it was used for the given publication. Let's delete this collection, which will delete all of the test data. Okay, perfect. Let's start up a Virtual Machine. I'm going to name this "Ingest-0", and I'm going to run this in east4. I'll select a 16 core n1 standard. And I want to use Ubuntu 18.04 with a 50 gigabyte SSD. I'm going to use a service account that I created previously. This service account has project editor access. Now, it's far more permissive than it needs to be, however, I didn't want to invest time into locking this down for the demo, when it's really not the focus of the course. Let's jump forward once this is complete, and we'll go from there. Okay, our Virtual Machine is up and running, so we now have some place to deploy too. However, we still need a deployment plan. I'm going to create a few files here, and then we'll populate them in a moment. First will be bootstrap.sh. Next will be build.sh. And finally, deploy.sh. Okay, let's start with bootstrap. When we set up our vagrant Virtual Machine, we compiled Python 3.8 and we created our virtual environment. We need to do that for our newly-created Cloud Virtual Machine. Now this script does just that. It's exactly what we ran previously, with a little bit of extra to make sure that it doesn't run again. So, once it's complete it creates this file here that signals that the Virtual Machine has been bootstrapped. Let's check out our build script, which is responsible for creating a distribution. It starts by installing the dependencies, which includes our development dependencies, and that's gonna allow us to run pytests in development, make sure everything's working, before we run our build. Once it runs a test and everything is successful, it then creates a distribution. Let's fire these off and see how they work. We've seen the tests run, but calling this will collect the tests and it's going to run them. And, in our case, they're all passing. I wanna show what happens when we run this setup. To do that, you'll wanna watch the Explore here on the side. When we run this setup and we specify the sdist option, Setup Tools is going to create a distribution file. Okay, notice this dist directory got created here, and inside we have our application that has been tarred and gzipped. The deploy.sh file is going to use the Google Cloud SDK to configure the Virtual Machine. It starts by making a directory that we can use to store our code. Then it uses Secure Copy to copy that new distribution to the directory we just created. It's also going to copy over bootstrap and the supervisor configuration file. Next, it runs bootstrap. Then it installs our distribution, using the version of pip from our virtual environment. And, finally, it's going to install the spacy model. The complete process for building and deploying is to run our build script, followed by running our deploy script. When we run build, we run it from inside of our development VM, because that's where the version of Python that we've set up is configured. The deployment, however, uses G-Cloud. Now, my personal preference is to avoid configuring virtual machines to have any sort of credentials, and that includes authenticating to the G-Cloud SDK. So, while we run the build script from the Virtual Machine, I'm going to run the deployment from my host machine where I've already pre-authenticated G-Cloud. In order to deploy, we need to know the name of the virtual machine, which is ingest-0. This is going to bootstrap our Virtual Machine and install our distribution. And here it is. This is going to allow us to connect in via ssh. And we're going to set the name. And it'll take a second to connect, but here we are. Notice, we have our visual indicator that we're inside of our virtual environment. If we use htop we can get a sense for what this system is doing by default. We got 16 cores, roughly 60 gigs of RAM, and there are 32 processes running. None of them are Python 3.8, and that's what we'd expect, because we just installed it. The Python version for our activated environment is indeed Python 3.8, meaning that the bootstrap process did work successfully. If we attempt to call the back end with the defaults, we can see it did load up our spacy models, which means our workers are running. So, our deployment process works. Now, with this ready, we can actually start to test things out, and that's what we're going to do in the next sprint. So, whenever you're ready to start sending some data through the system, I'll see you in the next lesson.

Lectures

Course Introduction - Sprint 1 - Sprint 2 - Sprint 3 - Sprint 4 - Sprint 5 - Part One - Sprint 5 - Part Two - Sprint 6 - Sprint 7 - Sprint 9 - Post Mortem

About the Author
Students
100796
Labs
37
Courses
44
Learning Paths
58

Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.