image
Vault Architecture
Vault Architecture
Difficulty
Intermediate
Duration
1h 44m
Students
1802
Ratings
4.5/5
Description

HashiCorp Vault provides a simple and effective way to manage security in cloud infrastructure. The HashiCorp Vault service secures, stores, and tightly controls access to tokens, passwords, certificates, API keys, and other secrets in modern computing.

This course will enable you to recognize, explain, and implement the services and functions provided by the HashiCorp Vault service.

Agenda

In this course we learn to recognize and implement the core HashiCorp Vault services in cloud infrastructure. The topics we cover are as follows: 

  • Vault architecture and its core components
  • Vault policies and how they are used to grant or forbid access to operations in Vault
  • Secrets and secret management as performed within Vault
  • Vault cubbyholes and how they can be utilized
  • Vault dynamic secrets
  • Vault authentication and Vault identities

Intended Audience

This course will appeal to anyone looking to extend their knowledge of cloud security best practices, and to learn more about the tools and services available to help manage cloud security. If you are performing any of the roles below, we recommend completing this course. 

  • Architects and Developers
  • System Administrators
  • Security specialists
  • DevOps specialists
  • And anyone else interested in managing and maintaining secrets 

Learning Objectives

At the end of this course you will be able to explain and implement the HashiCorp Vault service, and you will also be able to implement the Vault CLI and API to execute tasks related to Vault administration. By completing this course, you will:

  • Understand the core principles of Vault, including how Vault can be used to manage and maintain secrets
  • Understand the key benefits of using Vault, including how to deploy and configure it within your own environments
  • Be able to evaluate and select HashiCorp Vault services
  • Know how to implement the Vault CLI and API to execute tasks related to administration and configuration

Prerequisites

We recommend completing the Cloud Academy DevOps Fundamentals Learning Path so you have a basic understanding of system administration and configuration tasks.

 

Transcript

Welcome back!

In this lecture we'll provide a full review of the Vault Architecture, how it has been designed, and how it is expected to be used. Within this lecture we'll cover each of the following. We'll provide a review of current security requirements and the need to manage and maintain secrets in distributed infrastructures. We'll highlight uses case in which Vault can be used. We'll introduce you to the components that make up the Vault architecture. And finally, we'll start to examine the Vault Server itself, how it is initialized and how it generally works. Traditionally, when an organization launched a new application, it involved operations, development, and network/security teams. All teams had to collaborate and coordinate so that the application could be successfully deployed onto physical infrastructure which in essence was static, not so bad when having to manage security. Now with the adoption of the cloud, and cloud very much the norm for hosting new applications, teams need to manage security and secrets in a dynamic and elastic environment. With this in mind, the ability to manage security and secrets becomes harder due to the fact that the cloud infrastructure dynamically adjusts to optimize itself against the demand profile. Further still, adopting multi-cloud environments, introduces more complexity for managing security and secrets. 

The primary challenge of multi cloud adoption is heterogeneity: how can operations, security, and development teams apply a consistent approach to security? In the past security could be managed easily, as the physical infrastructure was a well known and non moving part with clear boundaries. Security was applied through the use of appropriate controls applied at various layers. But how does this work in the new world order? First we must step back and re-examine our assumptions about security. The traditional data center had, quote, four walls and a pipe, and a clear network perimeter. Anyone inside the network is assumed to be authorized to access the infrastructure. Firewalls serve as bulkheads between front-end, user-facing applications and backend databases. IP addresses are generally static, which allow security professionals to provide additional constraints on application interactions based on IP address. However, a cloud doesn't have a distinct perimeter, and with multi-cloud, the surface area expands exponentially. And because the network topology is software-defined, any server can become Internet-facing with a few API calls. This lack of control over network topologies makes it hard to force all traffic through security or compliance tools. Infrastructure may also span multiple sites, meaning there isn't a single ingress point to allow secured traffic to flow into a network. And the decomposition of monolithic applications into highly ephemeral microservices means that IP addresses are highly dynamic, rendering IP-based security inappropriate for many scenarios. Rather than attempting to recreate the traditional castle and moat approach, security professionals typically focus on addressing the following core requirements. First, distributed secrets management. Application-specific secrets such as database usernames and passwords can become exposed given the lack of network perimeter. 

Providing a mechanism for ops and development teams to manage and rotate distributed secrets is a much larger issue in this environment and paramount. Second, encryption of data in flight and at rest. Traffic between application components that might reside on different providers and even geographies must be encrypted. Third, identity management. Authenticating identity between application components through the use of expiring tokens, for example, that provide assurance of identity. Before we reveal exactly what Vault excels at, let's quickly reiterate some of the key challenges encountered when managing secrets within a distributed infrastructure. How can we centralize secrets? How can we manage the lifecycle of a secret? How can we audit secret access? How can we manage access to secrets? How can we securely distribute secrets across hybrid environments? And finally, how can we mitigate a compromised secret? Vault solves the challenge of security for distributed infrastructure. It provides multiple layers of security that are independent of the network. 

Vault provides secrets management, encryption as a service, and privilege access management. Security operators use Vault to manage secrets centrally, such as private encryption keys, API tokens, and database credentials. Vault will store and manage the distribution of these secrets to applications and end users. Having considered the challenges involved with managing secrets, in combination with keeping modern infrastructures secure, let's now review the key objectives that Vault was created to accomplish. Vault has three key objectives. First, Vault is designed to be used as a single source of secrets for both humans and machine actors, eliminating secret sprawl. Second, Vault is designed to scale to meet the needs of the largest organizations. Secrets are encrypted both at rest and in transit. Third, Vault is designed to provide full secret lifecycle management and governance. Secrets management includes renewals, revocation, and the ability to manage leases for secrets. Vault can be used in a variety of use cases, some of which are presented here. We'll now cover a few of the key Vault workflows and configuration processes. In doing so you should begin to understand how Vault is designed to work, and how it should be configured and used. 

As we'll see, at its core, Vault has an API through which all interactions takes place. A key feature of Vault is the ability to establish access control for the various operations that can be performed within Vault. As you will see later on, every operation within Vault is path based, and access control is specified against these paths implemented through the use of policies. For example, Vault can be configured with a policy which safeguards access and secret distribution to particular applications and infrastructure. Policies provide a declarative way to either grant, or deny, access to operations and paths. Typically someone within your security team will create the rules and permissions that will dictate who can do what within Vault. The policies are created using a declarative syntax written in either HCL, short for Hashicorp Configuration Language, or JSON. The policies are then uploaded and applied to Vault. We'll cover policies in greater detail in a separate Policies lecture later on in the course. Before any operations are performed within Vault, the actor Human or Machine must authenticate themselves to Vault. Vault provides a flexible authentication system, whereby the different authentication backends can be configured to establish the identity of the client or user in question. 

Example authentication backends that can be configured are: AWS, GitHub, GCP, Kubernetes LDAP, and Okta. Having successfully completed the authentication sequence, Vault will return back a token, which itself will be mapped to a set of Vault policies controlling which operations can be performed when the token is presented back to Vault. Secrets management and workflow within Vault is performed through the use of a secrets management backend. Vault provides several different backends which can be enabled or disabled based on your requirements. Example secrets backends are: AWS, TOTP or Time-based One-Time Password Algorithm, Key/Value, and databases Custom secrets backends can be built and configured via a plugins system, such as a Kerberos plugin. Secrets can be stored and generated. Some secrets can be generated dynamically, while others are verbatim. Secrets are returned to the user/client with any defined and/or appropriate policies. Working with secrets requires the presentation of an authentication token as discussed in the previous slide. Vault believes that everything should be encrypted at all times possible. So Vault uses ciphertext wrapping to encrypt all data at rest and in-flight. This minimizes exposure of secrets and sensitive information. The Vault server or service provides all of the features we have previously discussed. 

In this section we will begin to examine the Vault server itself. The Vault server provides an HTTP API which clients interact with and manages the interaction between all the backends, ACL enforcement, and secret lease revocation. When the Vault server starts, the HTTP API is exposed such that the clients can begin to interact with it. The Vault server requires a storage backend so that the encrypted data is durable, and persists across restarts. The Vault server can be started up in Dev mode, and this is useful for development, testing, and exploration. The dev server stores all its data in-memory but still encrypted, listens on localhost but without TLS, and automatically unseals and shows you the unseal key and root access key. For obvious reasons, the Dev mode configuration should never be used in production. To start the Vault server in dev mode, from within your terminal enter vault server -dev. The Vault service will startup as seen here. Vault starts in a sealed state. Before any operation can be performed on the Vault it must be unsealed. This is done by providing the unseal keys. When the Vault is initialized it generates an encryption key which is used to protect all the data. That key is protected by a master key. 

By default, Vault uses a technique known as Shamir's secret sharing algorithm to split the master key into five shares, any three of which are required to reconstruct the master key. Note, both parameters as mentioned are defaults, but are configurable, only during the initialization of the Vault. By using a secret sharing technique, we avoid the need to place absolute trust in the holder of the master key, and avoid storing the master key at all. The master key is only retrievable by reconstructing the shares. The shares are not useful for making any requests to Vault, and can only be used for unsealing. Once unsealed the standard ACL mechanisms are used for all requests. 

When Vault is deployed into a production environment, it should be highly available to minimize any downtime. Certain storage backends, such as Consul, provide additional coordination functions that enable Vault to run in a highly available configuration. When supported by the backends, Vault will automatically run in high availability mode without additional configuration. When running in high availability mode, Vault servers have two additional states they can be in, standby and active. For multiple Vault servers sharing a storage backend, only a single instance will be active at any time while other instances are hot standbys. The active server operates in a standard fashion and processes all requests. The standby servers do not process requests, and instead redirect to the active Vault. Meanwhile, if the active Vault server is sealed, fails, or loses network connectivity, then one of the the standbys will take over and become the active instance. Note, only the unsealed servers act as a standby. 

In this section we cover some of the internal structure and architecture of Vault. Again in doing so, this will help you understand how Vault works and how it should be used. Let's begin to break down this picture. There is a clear separation of components that are inside or outside of the security barrier. Only the storage backend and the HTTP API are outside. All other components are inside the barrier. The storage backend is untrusted and is used to durably store encrypted data. When the Vault server is started, it must be provided with a storage backend so that data is available across restarts. The HTTP API similarly must be started by the Vault server on start so that clients can interact with it. Vault operations in the form of requests come through the HTTP API layer. Inbound requests pass through a Barrier. The Barrier is a cryptographic seal around Vault. Similarly, all outgoing data to be persisted in the storage backend passes back through the barrier. In terms of data flow there are no exceptions, all data that flows between Vault and the Storage Backend passes through the barrier. Secrets engines are components which store, generate, or encrypt data. 

Secrets engines are incredibly flexible, so it is easiest to think about them in terms of their function. Secrets engines are provided some set of data, they take some action on that data, and they return a result. Some secrets engines simply store and read data like encrypted Redis/Memcached. Other secrets engines connect to other services and generate dynamic credentials on demand. Other secrets engines provide encryption as a service, totp generation, certificates, and much more. Secrets engines are enabled at a path in Vault. When a request comes to Vault, the router automatically routes anything with the route prefix to the secrets engine. In this way, each secrets engine defines its own paths and properties. 

To the user, secrets engines behave similar to a virtual filesystem, supporting operations like read, write, and delete. A secret as stored within Vault is considered to be, first, any piece of sensitive data that if acquired by an unauthorized party would cause some form of harm to the owning organization. Second, confidential and therefore cryptographically protected. And third, to have an associated lease. Vault uses a storage backend which is responsible for the persistent and durable storage of all encrypted data. Depending on requirements, different storage backends can be configured, however only one storage backend can be configured per Vault cluster. Example storage backends are: Filesystem, S3, Database, or Consul. All data is encrypted at rest within the storage backend, and in transit, using 256 bit AES. Advanced Encryption Standard, AES, is a specification for the encryption of electronic data established by the U.S. National Institute of Standards and Technology in 2001. 

Audit devices are the components in Vault that keep a detailed log of all requests and responses to Vault. Because every operation with Vault is an API request/response, the audit log contains every authenticated interaction with Vault, including errors. Multiple audit devices can be enabled and Vault will send the audit logs to both. This allows you to not only have a redundant copy, but also a second copy in case the first is tampered with. Example audit devices are: File, Syslog, and Socket. Auth methods are the components in Vault that perform authentication and are responsible for assigning identity and a set of policies to a user. Having multiple auth methods enables you to have an auth method that makes the sense for your use case of Vault and your organization. For example, on developer machines, the Github auth method is the easiest to use, but for servers the AppRole method is the recommended choice. Vault Tokens are conceptually similar to a session cookie as used by most websites. Once a user successfully authenticates to Vault, regardless of which auth method and backend is used, a token is returned which is then used for all subsequent activity performed against Vault. 

In the example as seen here, the user authenticates using an LDAP auth method. Assuming the user enters the correct password, the LDAP authentication will succeed and Vault will generate and return a Vault token, as can be seen highlighted here. In a different example, a user may authenticate using the GitHub auth method, assuming it is enabled. Here the user logs in using his or her personal GitHub API token. When the GitHub API token has been verified by GitHub, Vault again generates a token for the user. Regardless of the auth method used, once a user has acquired a Vault token, the same token is presented back to Vault for all subsequent activity, whether it is to write secrets, read secrets, or encrypt data, for example. This remains the case as long as the Vault token remains valid and hasn't expired. 

In this section we go over the methods and approaches in which you interact and work with Vault. Before we review the Vault API and associated client tools which are used to interact with the Vault, we'll re-emphasize again that every activity undertaken against Vault is done so against a path. Internally Vault uses path based routing to ensure the requested action is performed against the correct endpoint. Additionally all activities are controlled and authorized through the use of policies which themselves have been created by specifying permit and deny rules against particular paths. This controls who can access and manage which resources in Vault. At its core Vault provides an HTTP based API. All actions, management, and activity performed against Vault is done so through the HTTP API. Vault provides a CLI client tool which is just a wrapper over the HTTP API. All common functionality is implemented within the Vault CLI. 

In this section we will cover the Auditing features that Vault provides. Audit devices are the components in Vault that keep a detailed log of all requests and responses to Vault. Because every operation with Vault is an API request/response, the audit log contains every authenticated interaction with Vault, including errors. Multiple audit devices can be enabled and Vault will send the audit logs to both. This allows you to not only have a redundant copy, but also a second copy in case the first is tampered with. Each line in the audit log is a JSON object. The type field specifies which type of object it is. Currently, only two types exist: request and response. The line contains all of the information for any given request and response. By default, all the sensitive information is first hashed before logging in the audit logs. The following example shows how to enable the file audit device. In this example all activity and operations performed against vault will be recorded in a file named vault_audit.log located in the logs directory. The example then tails the file and pipes the output through JQ. For those unfamiliar with JQ, JQ is like sed for JSON data, you can use it to slice and filter and map and transform structured data with the same ease as other utilities such as sed, awk, and grep. As mentioned here, the audit log if needed, can be easily consumed into Elasticsearch for easier searching and reporting. 

Okay that completes this lecture on the Vault Architecture. Go ahead and close this lecture, and we'll see you shortly in the next one.

About the Author
Students
143286
Labs
69
Courses
109
Learning Paths
209

Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.

He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.

Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).