Getting Rid of Public IPv4s
Background
In February 2024, AWS started to charge for every public IPv4. This lead to some unplanned increases in our AWS monthly bill, and will only increase as our deployment grows. I know we have public IPs in our EC2 and ECS instances, but still the cost seems to be significantly more than what I could account for.
Luckily, AWS also provides a tool to track public IPs called IP Address Manager “IPAM” under Amazon VPC. Setting it up was trivial, and after about 10 minutes, I found out the extra public IPs: they came from our ALBs, with each AZ taking one IP.
IPv6 migration is bigger than what we can do now, plus IPv6-only is not fully supported as of June, 2024, so reducing our IPv4 footprint is the most logical choice.
The first thing is to revisit your VPC and define your subnets as public and private subnets (there are more types here). Public subnets can talk to internet via an Internet gateway and resources in the subnet will need to have public IPs; private subnets will need a NAT device to talk to the internet, but their resources don’t need public IPs.
Changing from a public subnet to a private subnet is VERY easy: just change the associated routing table to use a NAT gateway instead of the internet gateway. HOWEVER, be aware that subnets tied to ALBs HAVE to be public. On the other hand, EC2 instances cannot change VPC, subnets and AZs. So the tradeoff is: change the subnet, or change the resources (for EC2, this means launching new instances from existing images, and migrating EBS volumes, so quite cumbersome)
In our case, since we have only a few EC2 instances in a separate subnet, while having different types of resources under the ECS public subnets, we decided to keep them public, add some new private and migrate the ECS services there. We later changed the EC2 subnet to a private one as well.
Changes
First thing to do was creating a new NAT gateway under a public subnet. Then I added it to the routing table of a PRIVATE subnet and tested it with a new EC2 instance in the subnet. Obviously this instance does not have a public IP, but AWS has provided a new tool EC2 Instance Connect
Use this [ssh] via AWS CLI (https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ec2-instance-connect/ssh.html)
aws ec2-instance-connect ssh --instance-id <instance_id> [--os-user <user>]
Once logged in, run:
ping ietf.org
If everything is working, you should see responses like these:
PING ietf.org (104.16.44.99) 56(84) bytes of data.
64 bytes from 104.16.44.99 (104.16.44.99): icmp_seq=1 ttl=40 time=9.62 ms
64 bytes from 104.16.44.99 (104.16.44.99): icmp_seq=2 ttl=40 time=8.81 ms
64 bytes from 104.16.44.99 (104.16.44.99): icmp_seq=3 ttl=40 time=8.72 ms
Once the NAT is confirmed to be working, the rest of the migration is straight forward and easy with Terraform. Since we keep our EC2 subnets separate from ECS subnets, we first created the new private subnets for ECS:
resource "aws_subnet" "ecs_private_subnet_az" {
for_each = toset(var.ecs_subnets)
vpc_id = aws_vpc.myvpc.id
cidr_block = cidrsubnet(aws_vpc.myvpc.cidr_block,4,var.az_private_netnum[each.key])
availability_zone = "${local.aws_region}${each.key}"
}
where we already defined:
variable "ecs_subnets" {
default = ["a","b","c"]
}
variable "az_private_netnum" {
# Netnum used in cidrsubnet function: https://developer.hashicorp.com/terraform/language/functions/cidrsubnet
default = {
a = 7
b = 8
c = 9
d = 10
e = 11
f = 12
}
}
We also define routing tables and routing association for these subnets:
resource "aws_route_table" "ecs_private_routing" {
vpc_id = aws_vpc.myvpc.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = "nat-XYZZZZZZZZZZZZZZZ"
}
}
resource "aws_route_table_association" "ecs_private_subnet_az_route_table" {
for_each = toset(var.ecs_subnets)
subnet_id = aws_subnet.ecs_private_subnet_az[each.key].id
route_table_id = aws_route_table.ecs_private_routing.id
}
After one round of Terraform update and apply to create these private subnets, we just updated the ECS service to switch to these subnets:
resource "aws_ecs_service" "myservice" {
name = "backend"
cluster = aws_ecs_cluster.ecs.id
task_definition = aws_ecs_task_definition.myservice_backend.arn
desired_count = 20
launch_type = "FARGATE"
scheduling_strategy = "REPLICA"
network_configuration {
subnets = [for subnet in aws_subnet.ecs_private_subnet_az: subnet.id]
security_groups = [
aws_security_group.ecs_public_v2.id,
aws_security_group.ecs_internal_default.id
]
assign_public_ip = false
}
load_balancer {
...
}
}
Finally, we migrated the EC2 subnet to private by changing the associated routing table to use the new NAT.
Overall
It took me a couple of days to test and change the Terraform files, and we migrated our staging environment over the weekend. Production uses the same terraform, so migration would be just another weekend. We are all set for a world of minimal public IPs!