Bootstrapping AWS with Terraform and CodeCommit

A rough model that I’ve been working on and thinking about recently is for the AWS account (or accounts) be put together so that there’s a “bastion” or “bootstrap” instance that can be used to build out the rest of the environment. There is a certain chicken-and-egg around this, particularly if you want to use AWS resources and services to bootstrap this up.

I’m going to talk (at length) about a solution I’ve recently gotten sorted out. This has a certain number of pre-requisites that I’ll outline before getting into how it all hangs together. The key thing around this is to limit as far as possible any manual tinkering about, and script up as much as possible so that it is both repeatable and able to be exposed to standard sorts of code cutting practices.

One caveat around what I’m presenting – the Terraform state is stored locally to where we are running Terraform, which is not best practice. Ideally we’d be tucking it away in something like S3, which I will probably cover at a later point.

I’ll start with the basics: don’t do any of this using the root credentials for your account. If you are used to doing so, stop right now, create an ‘admin’ user which one way or another has the arn:aws:iam::aws:policy/AdministratorAccess privilege at a minimum, give it a stupidly strong password, and turn on MFA. For our bootstrapping purposes, I also created an admin_cli user which does not have permission to use the web console, and has the same sorts of privileges (although you can probably be more precise if needed). Most importantly, this one does have an Access Key and Secret Key, whereas the admin user does not. Ideally the policies for these users are attached to an admin group and the users added to that group, but it’s not critical.

As an aside, I’d recommend that you make it a practice to associate roles with groups rather than users – it makes it much simpler to manage the effective permissions a given user has.

First, you want ~/.aws/credentials file with the admin_cli credentials in it. After bootstrapping, this can probably be removed, but in the short term, I’d suggest you chmod 400 ~/.aws/credentials and be careful.

[admin_cli]
aws_access_key_id=AUAHTIWHU3M38YZRHS2Q
aws_secret_access_key=randomstringofstuff

It’s also worth having ~/.aws/config to store some defaults:

[default]
region=eu-west-2
output=json

Bootstrap with CLI

The initial round of booting up needs to be done via the AWS CLI, so I’m going to assume you have that installed and in your path. Additionally note that I’ve got all this in a subdirectory of a project folder:

$ ls -l rahookwork/
-rw-r--r--   1 robert  staff   181  6 Oct 11:11 README.md
drwxr-xr-x  12 robert  staff   408  6 Oct 20:09 bastion-scripts
drwxr-xr-x   6 robert  staff   204  6 Oct 13:37 bootstrap-scripts
drwxr-xr-x   5 robert  staff   170  6 Oct 16:29 data

There’s a file env.rc with some constants that are the things most likely to need customisation:

export AWS_PROFILE=admin_cli
export AWS_DEFAULT_REGION=eu-west-2
export AWS_ACCOUNT_ID=569192483465

export ADMIN_GROUP=admin
export DEV_GROUP=developers
export ADMIN_USERS="admin_cli admin"
export DEV_USERS="somedev"
export KEY_NAME=demo_bastion

The initial bootstrap script assumes in general that there’s nothing in the target AWS account, but does try to make some allowance for stuff already existing. You’ll notice a pattern where I’m using get operations with the CLI and throwing away the output. The CLI does have a somewhat annoying design philosphy and treats “failure to find” as an error, which makes it messy to check if something exists before creating it. Because, guess what? Yep, creating stuff is not idempotent through the CLI, and an attempt will also result in script errors.

Here then, the giant bootstrap.sh. Despite it’s size, it should be reasonably self explanatory:

  1. create some groups with policies attached to them
  2. add some (pre-existing) users to the groups
  3. create a public/private key pair that we will use later for getting to our bastion instance
  4. make some CodeCommit repositories (note that I added these bootstrap scripts to the relevant repository after it was created, by hand)
  5. build some public/private key pairs that are attached to our dev users so that they will be able to use CodeCommit.

Not all of this strictly needs to be done by this initial bootstrap script, as it could have been managed somewhat through Terraform, but as development proceeded this was a convenient place and time to do it.

#!/bin/bash

cd `dirname $0`
[ -d ../data ] || mkdir ../data
[[ -s ./env.rc ]] && source ./env.rc

echo "======== setting up groups ========"
aws iam get-group --group-name $ADMIN_GROUP > /dev/null 2>&1
if [ $? -gt 0 ]
then
  aws iam create-group --group-name $ADMIN_GROUP
  aws iam attach-group-policy --group-name $ADMIN_GROUP --policy-arn "arn:aws:iam::aws:policy/AdministratorAccess"
fi

aws iam get-group --group-name $DEV_GROUP > /dev/null 2>&1
if [ $? -gt 0 ]
then
  aws iam create-group --group-name $DEV_GROUP
  aws iam attach-group-policy --group-name $DEV_GROUP --policy-arn "arn:aws:iam::aws:policy/AmazonS3FullAccess"
  aws iam attach-group-policy --group-name $DEV_GROUP --policy-arn "arn:aws:iam::aws:policy/IAMUserChangePassword"
fi

for ID in $ADMIN_USERS
do
  aws iam add-user-to-group --group-name $ADMIN_GROUP --user-name $ID
done

for ID in $DEV_USERS
do
  aws iam add-user-to-group --group-name $DEV_GROUP --user-name $ID
done

echo "======= setting up bastion key pair ======="
aws ec2 describe-key-pairs --output text --key-name $KEY_NAME >/dev/null 2>&1
if [ $? -gt 0 ]
then
  aws ec2 create-key-pair --key-name $KEY_NAME --query 'KeyMaterial' | sed -e 's/^"//' -e 's/"$//' -e's/\\n/\
/g'> ../data/$KEY_NAME.pem
  chmod 400 ../data/$KEY_NAME.pem
fi
aws ec2 describe-key-pairs --output text --key-name $KEY_NAME

echo "======== setting up CodeCommit repositories ========"
aws codecommit create-repository --repository-name bootstrap-scripts --repository-description "Bootstrap scripts for account" > /dev/null 2>&1
aws codecommit create-repository --repository-name bastion-scripts --repository-description "Bastion scripts for account" > /dev/null 2>&1
for REPO in `aws codecommit list-repositories --output text | cut -f 3`
do
  aws codecommit get-repository --repository-name $REPO --query 'repositoryMetadata.cloneUrlSsh' | sed 's/"//g'
done

echo "======== attaching SSL key to users ========"
for ID in $DEV_USERS
do
  ssh-keygen -b 2048 -f ../data/${ID}_key -N '' -C $ID >/dev/null
  chmod 400 ../data/${ID}_key
  aws iam upload-ssh-public-key --user-name $ID --ssh-public-key-body "$(cat ../data/${ID}_key.pub)"
  aws iam list-ssh-public-keys --user-name $ID --output text | cut -f2,5
done

You will notice that keys – particularly the bastion private key that will be used to SSH to the instance – are dropped in the data directory one level up from these scripts. That’s done to minimise the chance of those being checked in to CodeCommit at some later point!

Terraform scripts

I’m going to assume that you’re somewhat familiar with TerraForm, so will largely dump the scripts without comment on you. One thing you are going to want in the bastion-scripts folder (since we’re going to have it tucked away in the bastion-scripts CodeCommit repository we built), is a .gitignore to exclude state:

.terraform/
terraform.tfstate
terraform.tfstate.backup

you’re also going to want a terraform.tfvars file which will encapsulate most of the stuff that is locally configurable. You’re going to want this to be in alignment with the env.rc file, or things go wrong.

aws_region="eu-west-2"
aws_profile="admin_cli"
aws_account_id="569192483465"
bastion_key="bastion"

The variables.tf file has some bits that may need to be configured as well. Take particular note of bastion_ssh_inbound which will need to include the IP address of the machine you’re going to run Terraform on!

variable "tags" {
  default = {
    "owner"       = "rahook"
    "project"     = "work-bootstrap"
    "client"      = "Internal"
  }
}

variable "bastion_user" {
  default = "ec2-user"
}

variable "dev_user" {
  default = "somedev"
}

variable "bastion_ami_name" {
  default = "amzn-ami-hvm-2017.09.0.20170930-x86_64-ebs"
}

variable "bastion_instance_type" {
  default = "t2.micro"
}

# 172.16.0.0 - 172.16.255.255
variable "bastion_vpc_cidr" {
  default = "172.16.0.0/16"
}

# 172.16.10.0 - 172.16.10.255
variable "bastion_subnet_cidr" {
  default = "172.16.10.0/24"
}

variable "bastion_ssh_inbound" {
  type = "list"
  default = [ "192.168.1.0/24", "27.15.212.0/24"]
}

variable "root_vol_size" {
  default = 10
}

/* variables to inject via terraform.tfvars */
variable "aws_region" {}
variable "aws_account_id" {}
variable "aws_profile" {}
variable "bastion_key" {}

The outputs.tf file will dump some useful stuff out at the end of the run:

output "bastion_public_dns" {
  value = "${aws_instance.bastion.public_dns}"
}

output "bastion_private_dns" {
  value = "${aws_instance.bastion.private_dns}"
}

output "connect_string" {
  value = "ssh -i ${var.bastion_key}.pem ${var.bastion_user}@${aws_instance.bastion.public_dns}"
}

I’ve stuffed all the actual work into a single file. If this bothers you, you can rework it with modules, but short term this was sufficient for me. I’ll break it up a little, but bear in mind this is all a single main.tf file.

We start by specifying the provider, and some glue code that will help us find the desired AMI:

provider "aws" {
  region  = "${var.aws_region}"
  profile = "${var.aws_profile}"
}

data "aws_ami" "target_ami" {
  most_recent = true

  filter {
    name = "owner-alias"
    values = ["amazon"]
  }

  filter {
    name = "name"
    values = [ "${var.bastion_ami_name}" ]
  }
}

Next up, I build a dedicated VPC (and subnet) to hold the Bastion instance in, partially so that I have the option later of bolting it down behind a NAT router, partly because I anticipate putting some other management services in this VPC. You’ll see too that wherever possible I’m tagging the resource. Trust me, once you get a ton of resources you will thank yourself for having tagged them at the beginning. Note also that the VPC routing in play is extremely crude, and you are likely to want to fine tune this for production purposes.

resource "aws_vpc" "bastion_vpc" {
  cidr_block = "${var.bastion_vpc_cidr}"
  enable_dns_support = true
  enable_dns_hostnames = true

tags {
  Name = "Bastion-vpc"
  Project = "${var.tags["project"]}"
  Owner = "${var.tags["owner"]}"
  Client = "${var.tags["client"]}"
}
}

resource "aws_subnet" "bastion_subnet" {
  vpc_id = "${aws_vpc.bastion_vpc.id}"
  cidr_block = "${var.bastion_subnet_cidr}"
  map_public_ip_on_launch = true

  tags {
    Name = "Bastion-subnet"
    Project = "${var.tags["project"]}"
    Owner = "${var.tags["owner"]}"
    Client = "${var.tags["client"]}"
  }
}

resource "aws_internet_gateway" "bastion-gateway" {
  vpc_id = "${aws_vpc.bastion_vpc.id}"

  tags {
    Name = "Bastion-gateway"
    Project = "${var.tags["project"]}"
    Owner = "${var.tags["owner"]}"
    Client = "${var.tags["client"]}"
  }
}

resource "aws_route_table_association" "bastion-rta" {
  subnet_id = "${aws_subnet.bastion_subnet.id}"
  route_table_id = "${aws_route_table.bastion-rt.id}"
}

resource "aws_route_table" "bastion-rt" {
  vpc_id = "${aws_vpc.bastion_vpc.id}"
    route {
    cidr_block = "0.0.0.0/0"
    gateway_id = "${aws_internet_gateway.bastion-gateway.id}"
  }

  tags {
    Name = "Bastion-rt"
    Project = "${var.tags["project"]}"
    Owner = "${var.tags["owner"]}"
    Client = "${var.tags["client"]}"
  }
}

Next up, lets make sure that there’s a minimal attack surface on the EC2 box:

resource "aws_security_group" "bastion_ssh" {
  name = "bastion_ssh"
  description = "allows ssh access to bastion"
  vpc_id = "${aws_vpc.bastion_vpc.id}"

  ingress {
    from_port = 22
    to_port = 22
    protocol = "tcp"
    cidr_blocks = "${var.bastion_ssh_inbound}"
  }

  egress {
    from_port = 0
    to_port = 0
    protocol = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

This is the interesting and annoyingly complicated bit – we want the EC2 instance we build to be able to check things out of the CodeCommit repositories (i.e. other scripts that will bootstrap up the rest of the environment), but we want to avoid putting credentials or keys on this box unless we have no other choice. By associating a role, with appropriate policies, with the instance, then we can avoid the problem.

resource "aws_iam_role" "bastion_role" {
  name_prefix = "bastion"
  path = "/"
  description = "roles polices the bastion can use"
  force_detach_policies = true
  assume_role_policy = "${data.aws_iam_policy_document.ec2-service-role-policy.json}"
}

resource "aws_iam_role_policy_attachment" "bastion-role-codecommit" {
  role = "${aws_iam_role.bastion_role.name}"
  policy_arn = "arn:aws:iam::aws:policy/AWSCodeCommitReadOnly"
}

resource "aws_iam_instance_profile" "bastion_profile" {
  name_prefix = "bastion"
  role = "${aws_iam_role.bastion_role.name}"
}

data "aws_iam_policy_document" "ec2-service-role-policy" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
    type = "Service"
    identifiers = ["ec2.amazonaws.com"]
    }
  }
}

Finally, we specify the EC2 instance, which is pretty tiny:

resource "aws_instance" "bastion" {
  ami = "${data.aws_ami.target_ami.id}"
  instance_type = "${var.bastion_instance_type}"
  key_name = "${var.bastion_key}"
  subnet_id = "${aws_subnet.bastion_subnet.id}"
  vpc_security_group_ids = [ "${aws_security_group.bastion_ssh.id}" ]

  iam_instance_profile = "${aws_iam_instance_profile.bastion_profile.name}"
    root_block_device = {
    volume_type = "gp2"
    volume_size = "${var.root_vol_size}"
  }

  tags {
    Name = "Bastion"
    Project = "${var.tags["project"]}"
    Owner = "${var.tags["owner"]}"
    Client = "${var.tags["client"]}"
  }

  volume_tags {
    Project = "${var.tags["project"]}"
    Owner = "${var.tags["owner"]}"
    Client = "${var.tags["client"]}"
  }
}

I’ve knocked up a somewhat dummy CodeCommit repository that I can later try to checkout, to verify that the instance role is working as hoped:

resource "aws_codecommit_repository" "bastion-smoketest" {
  repository_name = "bastion-smoketest"
  description     = "smoke test scripts for the bastion."
}

The last step is to configure the box after it is launched. Note that this is using the private key for the bastion that was generated by the initial bootstrap.sh script to allow remote execution.

resource "null_resource" "update" {
  connection {
    type = "ssh"
    agent = false
    user = "${var.bastion_user}"
    host = "${aws_instance.bastion.public_dns}"
    private_key = "${file("${path.module}/../data/${var.bastion_key}.pem")}"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo yum update -y",
      "sudo yum install git -y",
      "mkdir ~/.aws ~/bin && cd ~/bin && wget https://releases.hashicorp.com/terraform/0.10.7/terraform_0.10.7_linux_amd64.zip && unzip terraform*zip",
      "sudo git config --system credential.https://git-codecommit.${var.aws_region}.amazonaws.com.helper '!aws --profile default codecommit credential-helper $@'",
      "sudo git config --system credential.https://git-codecommit.${var.aws_region}.amazonaws.com.UseHttpPath true",
      "aws configure set region ${var.aws_region}",
      "aws configure set output json",
      "cd ~ && git clone https://git-codecommit.${var.aws_region}.amazonaws.com/v1/repos/bastion-smoketest"
    ]
  }
}

The first bits of the provisioning are simple and obvious:

  1. update installed software
  2. install git
  3. install terraform

What’s not so obvious is the configuration of git on the box to allow it to use the instance role. Each of those steps are necessary, particularly ensuring that the default AWS CLI configuration in play for the user has a region matching the Git configuration. The specification of --profile default is redundant, since we’ve only created a default profile when invoking aws configure, but it may help to leave it there as a reminder.

There’s a variety of potential pain-in-the-neck pieces around this if you’re doing anything complicated – there’s a lot of assumptions that the CodeCommit repository is in the same region (and account) as the EC2 box, and we’re assuming that the box is able to reach outside to the broader internet for at least installing Terraform.

I hope to follow this up soon with an example of developing Terraform scripts on the desktop, commiting them to CodeCommit, and having Terraform on this bastion box use them to build more bits in the AWS account.

Leave a Reply

Your email address will not be published. Required fields are marked *