This post is the first of a two-part series on using Vault in production. Both posts are slightly redacted forms of internal documentation. This post will cover why we chose our specific workflow, and the second post will cover day-to-day usage of Vault.
Problems
Sensitive credentials and keys are stored in certain code repositories (Github).
- Anyone with access to Github has access to these credentials.
- Anyone who has checked out code has these sensitive credentials on their hard drive.
- Key rollovers are a very difficult, manual process.
Sensitive credentials and keys are stored in plain text.
- Anyone who can see these credentials can use them.
Shared credentials and keys are used in numerous places.
- Generating a meaningful audit log is difficult.
Goals
- Encrypt sensitive credentials and keys at rest.
- Store sensitive credentials and keys in a central, remote, network accessible location.
- Gate and audit access to sensitive credentials and keys.
- Provide a unique identifier to each user/agent (per auditing purposes).
Solution
Vault
By leveraging Vault, we can meet all of our goals.
1.
Encrypt sensitive credentials and keys at rest.
Vault encrypts data all stored data at rest.
2.
Store sensitive credentials and keys in a central, remote, network accessible location.
Vault is a highly available secret management solution that is network accessible via its HTTP API or via running a local client.
3.
Gate and audit access to sensitive credentials and keys.4.
Provide a unique identifier to each user/agent (per auditing purposes).
Vault allows for per user, per machine, or per app credentials controlling access as granularly as needed or desired. In addition, all requests and key usages are recorded in Vault’s logs or syslog
which can be shipped to a centralized logging solution.
Implementation Strategy
While Vault provides the primitives and tools, we still need to form a process that understands and works with SeatGeek both now and in the future. With encryption and auditing handled, our job is to store and provide access to secrets as well as manage tokens.
NOTE: The following assumes knowledge about specific Vault features, general AWS knowledge, and SeatGeek’s Base AMI.
Storing Secrets
At SeatGeek (and most other software shops), the two most common types of secrets are the following:
-
Per Enviroment
This includes secrets that the same for every machine or application, but differ based on the current environment. They are also commonly or can be used by all machines or applications, which is important to note.
Examples: New Relic, PagerDuty
-
Per Application
This includes secrets that differ between applications, where an application is the combination of itself and the environment in which in runs. This also includes secrets that are not common to every application, regardless if one value is always used.
Examples: Braintree Token, Spreedly Key, Sentry DSN
To address these two use cases, we will be using Vault’s generic secret backend.
The reasons for using this backend are simplicity and flexibilty. It allows for arbitrary key-value pairs to be stored, encrypted, and retrieved from Vault without the need or use of third party services.
The generic secret backend allows for key-value pairs to be written under the namespace secret
, and can be associated with various ACL’s. The currently used schema is of the following form:
1
|
|
Here, the top level under the secret
namespace is ENVIRONMENT
, with each APP
getting its own bucket per ENVIRONMENT
in which KEY
s are written. Vault KEY
s can contain a dictionary of key-value pairs themselves, and so the secret VALUE
is written to the key value
.
NOTE: bucket == namespace
The following environments exist:
1 2 3 4 |
|
Each app will have a bucket created when it is configured to launch in a given environment. Additionally, for our per environment secrets there is a common
bucket under each ENVIRONMENT
namespace.
Examples of secrets in the wild:
1 2 |
|
1 2 |
|
Accessing Secrets
The basic premise here is a client authenticates and is granted a token. That token, among other things, is associated with a role and corresponding set of authorizations in the form of policies or permissions.
Authentication
At SeatGeek (for the time being), there are two Vault clients we need to worry about:
-
Developers
These are people who write code at SeatGeek. Developers should be granted enough access to be able to do their jobs while keeping our sensitive information secure and our applications running.
-
Machines
These includes any servers running with SeatGeek infrastructure. Machines should be able to self-authenticate in order to retrieve necessary secrets for provisioning and running applications.
NOTE: This workflow differs for Admins who are granted root tokens, no permission restrictions here.
To provide these levels of access, two different Vault authentication strategies will be used specifically github
authentication and app-id
authentication.
The github
authentication strategy was chosen here as we are already using it as a means of authenticating people for internal applications, and so some user grouping has already been done.
The app-id
authentication strategy is used for roughly the same reasons as the generic secret backend. It is the simplest and most flexible to implement without relying on other systems.
Successful authentication via either of these methods results in a Vault token, which can be used to retrieve secrets.
Our github
authentication includes simply allowing anyone in the SeatGeek Github organization on the team-developers
team to be able to request and retrieve a Vault token. This is done by making a Vault login request with a Github personal access token. While this does not include everyone who writes code, it handles the majority of users for now.
Our app-id
strategy reserved for machine authentication is highly dependent on AWS and our newer infrastructure strategies. When an AWS machine boots up, it can be configured to run with an IAM Role. This role is unique per application per environment, and also includes an id which can be retrieved from an instance’s metadata on the machine itself. Using this information, all SeatGeek IAM roles are whitelisted within Vault against their matching app and associated with a IP Range that corresponds environment’s VPC IP Range. This is our user-id
in Vault terms. Machines can then make a Vault login request with the app they are responsible for running (applied during configuration management) and their IAM Role Instance Profile ID (attachment id). Assuming all pieces line up (IP address, app id, IAM Role Instance Profile ID), a Vault token is granted.
Additionally and only for machine authentication, there is a ENVIRONMENT-base-ami
role that all machines can authenticate as. This allows for all machines on boot to be able to retrieve environment secrets via Vault’s app-id
strategy without knowing which app is to be deployed. This is/would primarily be used to be able to test the Base AMI in isolation in our environments.
In both of these app-id
authentication scenarios, the user-id
is the machine IAM Role Id. However, when applications authenticate, the user-id
is app-IAM_ROLE_ID
. user-id
’s must be unique, and this allows for us to have two user-id
’s for a give IAM Role along with the appropriate configuration.
In the latest release of Vault, the app-id
strategy has been deprecated in favor a new app-role
strategy. Ultimately we will migrate from app-id
to app-role
with roughly the same implementation but are currently held back by the version of Vault (0.6.0) and the vault-ruby
(0.6.0) gem we are using.
Authorization
Vault implements authorization via its own ACL’s or policies. These provide a set of permissions which can be scoped to various operations within Vault, typically indicated by namespaces. In the case of obtaining secrets, that namespace is secret
. Additionally, these ACL’s can be associated with the various authentication strategies. A more generic way to think of it is a client authenticates and is granted a token. That token, among other things, is associated with a role and corresponding set of policies (same as other authentication/authorization strategies).
The current policies are used to control access to Vault secrets:
1 2 3 4 |
|
As far as developer authorization, all Github users are granted staging-read-only
and testing-read-write
, which if not obvious, means that any secret
under the staging
namespace can be read, and free reign with the testing
namspace. production
read-only access will be granted on a per application bases to service owners, and be implemented via Github teams.
As far as machine authorization, machines are granted the ENVIRONMENT-APP-read-only
and ENVIRONMENT-common-read-only
. As such, machines can access the common
bucket and their app
bucket within their ENVIRONMENT
, nothing else. Cross ENVIRONMENT
and cross app
secret access is currently disabled and discouraged, although this might be revisited in the future.
Important to note here is the inability for non-Admins to write or update anything in Vault. These permissions are currently restricted to members of the Operations team, but this will surely be revisited in the future.
Token Managment
As of now, Vault tokens last forever once granted. This is a temporary measure that allows for simplicity of use, but additionally tooling will allow for this be changed.
Causes for Concern
- Admins are granted root tokens
- Developer authentication and authorization is reliant on Github
- Machine credentials can be used on other machines within an IP Range
- Assumptions are made around machines running a single application
- Tokens last forever and be reused if retrieved
- Vault is not using TLS
- Metrics are not currently sent anywhere
- No ui solution for managing secrets
- Not possible to easily assume an application’s environment
Strategic Improvements
Admin Tokens
Currently, Admins are granted root tokens without permission restricitons. The latest version of Vault (0.6.2) has changed the ways in which root tokens are created/used, and as such, these could be substitued for Admin tokens or tokens with equivalent or slightly less permissions granted.
Developer Authentication/Authorization
With a centralized login system, developers would be able to authenticate with means other than Github potentially being more flexible and less dependent on a 3rd party. Permission granularity could also be provided on a per user basis allowing for trusted production access (ex: service owner access).
Machine Authentication
While we are already leveraging AWS for machine authentication, there are improvements in Vault to make this simpler and more secure. This integration would tie us tighter to AWS infrastructure, but it is doubtful we would run servers elsewhere, and if so we have an existing strategy.
These improvements involve allowing machines to one time authenticate with AWS dynamic metadata, addressing the issue of credential (re)use on different machines. Machines can be currently whitelisted by IAM Role or AMI.
App Authentication
We currently have a decent strategy for machine authentication, but our application authentication lacks flexibility. Specifically we assume that a single machine is running a single application and as such has a single IAM Role with the appropriate permissions for that application. This does not work if multiple applications coexist on a single machine, or if an application is broken up into tiers.
A way to combat this is to have application authentication use a different mechanism than machine authentication. This will require a revisit but will most likely leverage Vault’s Cubbyhole to multi-application scoped tokens via one time tokens.
Token Management
Tokens last forever currently, and should have leases and TTL’s. This would involve additional work to renew token leases as necessary.
TLS
TLS is disabled on our Vault cluster as it is addressed only within our internal network. With the requirement of TLS for all HTTP 2.0 connections, this will be revisited in the future and most likely with Vault serving as an internal CA.
Metrics and Monitoring
We are still in the early stages of adoption and use, but Vault has support for shipping application stats via a few means including StatsD.
Web UI
Either writing or adopting an existing open source solution would be extremely beneficial, as it would remove the burden of managing secrets from the Operations team while also allowing developers more control over how their applications are configured.
Locally Assuming App Roles
There is currently no way to run a command locally using the credentials in staging/production for a given application. Something like a .env
file writer or a foreman-style command runner for our application manifests could go a long way in allowing developers to run services locally while simulating an environment.
Vault Configuration
https://www.vaultproject.io/docs/config/index.html
Below lists our current Vault configuration, which takes into account the following conditions:
- Vault is running within our internal network and is not publicly accessible.
- Consul is already being used
1 2 3 4 5 6 7 8 9 |
|
Vault differentiates itself from other secret management services with its high availabilty option, and we leverage the Consul backend to deliver that. The Consul client is already configured to run on all of our machines (with default port mappings), with our Vault servers being no different. This also means that all data is stored encrypted in Consul, and so the Consul install should also be highly available.
As Vault is run within our internal network (and for other reasons), TLS is disabled. While this is desireable, we need to do additional work to make internal TLS usage a reality. Vault is also running on the standard default port of 8200 and listening on all network interfaces.
If you think these kinds of things are interesting, consider working with us as an Infrastructure Engineer at SeatGeek. Or, if infrastructure isn’t your thing, we have other openings in engineering and beyond!
Reference: https://sreeninet.wordpress.com/2016/10/01/vault-use-cases/