hands-on lab

Deploying Large Language Models Using Ray Serve

Up to 1h 30m
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.
Lab description

Ray Serve is a framework for deploying and serving machine learning and large language model (LLM) inference workloads. It is designed to be scalable, support complex multi-model workflows, and efficiently utilize costly resources such as GPUs. The Phi-3 model released by Microsoft is a capable LLM model that has been optimized for use with CPUs and low memory environments.

Learning how to use Ray Serve to deploy a large language model will benefit anyone working with machine learning models and looking to deploy them in a production environment.

In this hands-on lab, you will use a development environment to implement a Ray Serve deployment, and you will run your deployment on a virtual machine.

Learning objectives

Upon completion of this beginner-level lab, you will be able to:

  • Implement a Ray Serve deployment that allows you to interact with a large language model
  • Test your deployment on a virtual machine
  • Deploy your Ray Serve deployment to a Ray cluster

Intended audience

  • Anyone looking to learn about deploying machine learning models
  • Cloud Architects
  • Data Engineers
  • DevOps Engineers
  • Machine Learning Engineers
  • Software Engineers


Familiarity with the following will be beneficial but is not required:

  • Large Language Models
  • The Python programming language

The following content can be used to fulfill the prerequisites:

Environment before

Environment after

About the author
Learning paths

Andrew is a Labs Developer with previous experience in the Internet Service Provider, Audio Streaming, and CryptoCurrency industries. He has also been a DevOps Engineer and enjoys working with CI/CD and Kubernetes.

He holds multiple AWS certifications including Solutions Architect Associate and Professional.

Covered topics
Lab steps
Implementing a Ray LLM Deployment
Logging In to the Amazon Web Services Console
Connecting to the Virtual Machine Using EC2 Instance Connect
Manually Running Your Deployment