AWSTemplateFormatVersion: '2010-09-09' Description: A stack for deploying containerized Multi-Model-Server inference API in AWS Fargate. This stack runs containers in a public VPC subnet, and includes a public facing load balancer to register the services in. Parameters: ServiceName: Type: String Default: mms-api Description: A name for the service ImageUrl: Type: String Description: The url of a docker image that contains the application process that will handle the traffic for this service ContainerCpu: Type: Number Default: 4096 Description: How much CPU to give the container. 1024 is 1 CPU ContainerMemory: Type: Number Default: 8192 Description: How much memory in megabytes to give the container ContainerPort: Type: Number Default: 8080 Description: What port number the application inside the docker container is binding to LBPort: Type: Number Default: 80 Description: What port number the load balancer is listening to Path: Type: String Default: "*" Description: A path on the public load balancer that this service should be connected to. Use * to send all load balancer traffic to this service. DesiredCount: Type: Number Default: 3 Description: How many copies of the service task to run InferenceModelName: Type: String Default: dicom_featurization_service Description: The model name used in API endpoint and container command InferenceModelS3Location: Type: String Description: The S3 location of the inference model package ESKNNIndexName: Type: String Default: visual-search-knn Description: The ElasticSearch endpoint to index the image feature vector into KNN index Mappings: SubnetConfig: VPC: CIDR: '10.0.0.0/16' PublicOne: CIDR: '10.0.0.0/24' PublicTwo: CIDR: '10.0.1.0/24' Resources: # VPC in which containers will be networked. # It has two public subnets # We distribute the subnets across the first two available subnets # for the region, for high availability. VPC: Type: AWS::EC2::VPC Properties: EnableDnsSupport: true EnableDnsHostnames: true CidrBlock: !FindInMap ['SubnetConfig', 'VPC', 'CIDR'] # Two public subnets, where containers can have public IP addresses PublicSubnetOne: Type: AWS::EC2::Subnet Properties: AvailabilityZone: Fn::Select: - 0 - Fn::GetAZs: {Ref: 'AWS::Region'} VpcId: !Ref 'VPC' CidrBlock: !FindInMap ['SubnetConfig', 'PublicOne', 'CIDR'] MapPublicIpOnLaunch: true PublicSubnetTwo: Type: AWS::EC2::Subnet Properties: AvailabilityZone: Fn::Select: - 1 - Fn::GetAZs: {Ref: 'AWS::Region'} VpcId: !Ref 'VPC' CidrBlock: !FindInMap ['SubnetConfig', 'PublicTwo', 'CIDR'] MapPublicIpOnLaunch: true # Setup networking resources for the public subnets. Containers # in the public subnets have public IP addresses and the routing table # sends network traffic via the internet gateway. InternetGateway: Type: AWS::EC2::InternetGateway GatewayAttachement: Type: AWS::EC2::VPCGatewayAttachment Properties: VpcId: !Ref 'VPC' InternetGatewayId: !Ref 'InternetGateway' PublicRouteTable: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref 'VPC' PublicRoute: Type: AWS::EC2::Route DependsOn: GatewayAttachement Properties: RouteTableId: !Ref 'PublicRouteTable' DestinationCidrBlock: '0.0.0.0/0' GatewayId: !Ref 'InternetGateway' PublicSubnetOneRouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: SubnetId: !Ref PublicSubnetOne RouteTableId: !Ref PublicRouteTable PublicSubnetTwoRouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: SubnetId: !Ref PublicSubnetTwo RouteTableId: !Ref PublicRouteTable # ECS Resources ECSCluster: Type: AWS::ECS::Cluster # A security group for the containers we will run in Fargate. # Two rules, allowing network traffic from a public facing load # balancer and from other members of the security group. # # Remove any of the following ingress rules that are not needed. # If you want to make direct requests to a container using its # public IP address you'll need to add a security group rule # to allow traffic from all IP addresses. FargateContainerSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Access to the Fargate containers VpcId: !Ref 'VPC' EcsSecurityGroupIngressFromPublicALB: Type: AWS::EC2::SecurityGroupIngress Properties: Description: Ingress from the public ALB GroupId: !Ref 'FargateContainerSecurityGroup' IpProtocol: -1 SourceSecurityGroupId: !Ref 'PublicLoadBalancerSG' EcsSecurityGroupIngressFromSelf: Type: AWS::EC2::SecurityGroupIngress Properties: Description: Ingress from other containers in the same security group GroupId: !Ref 'FargateContainerSecurityGroup' IpProtocol: -1 SourceSecurityGroupId: !Ref 'FargateContainerSecurityGroup' # Load balancers for getting traffic to containers. # This sample template creates one load balancer: # # - One public load balancer, hosted in public subnets that is accessible # to the public, and is intended to route traffic to one or more public # facing services. # A public facing load balancer, this is used for accepting traffic from the public # internet and directing it to public facing microservices PublicLoadBalancerSG: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Access to the public facing load balancer VpcId: !Ref 'VPC' SecurityGroupIngress: # Allow access to ALB from anywhere on the internet on port 80 - CidrIp: 0.0.0.0/0 IpProtocol: tcp FromPort: !Ref LBPort ToPort: !Ref LBPort PublicLoadBalancer: Type: AWS::ElasticLoadBalancingV2::LoadBalancer Properties: Scheme: internet-facing LoadBalancerAttributes: - Key: idle_timeout.timeout_seconds Value: '30' Subnets: # The load balancer is placed into the public subnets, so that traffic # from the internet can reach the load balancer directly via the internet gateway - !Ref PublicSubnetOne - !Ref PublicSubnetTwo SecurityGroups: [!Ref 'PublicLoadBalancerSG'] TargetGroup: Type: AWS::ElasticLoadBalancingV2::TargetGroup Properties: HealthCheckIntervalSeconds: 6 HealthCheckPath: /ping HealthCheckProtocol: HTTP HealthCheckTimeoutSeconds: 5 HealthyThresholdCount: 2 UnhealthyThresholdCount: 2 TargetType: ip Port: !Ref 'LBPort' Protocol: HTTP VpcId: !Ref 'VPC' PublicLoadBalancerListener: Type: AWS::ElasticLoadBalancingV2::Listener DependsOn: - PublicLoadBalancer Properties: DefaultActions: - TargetGroupArn: !Ref 'TargetGroup' Type: 'forward' LoadBalancerArn: !Ref 'PublicLoadBalancer' Port: !Ref 'LBPort' Protocol: HTTP # This is a role which is used by the ECS tasks themselves. ECSTaskExecutionRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Statement: - Effect: Allow Principal: Service: [ecs-tasks.amazonaws.com] Action: ['sts:AssumeRole'] Path: / Policies: - PolicyName: AmazonECSTaskExecutionRolePolicy PolicyDocument: Statement: - Effect: Allow Action: # Allow the ECS Tasks to download images from ECR - 'ecr:GetAuthorizationToken' - 'ecr:BatchCheckLayerAvailability' - 'ecr:GetDownloadUrlForLayer' - 'ecr:BatchGetImage' # Allow the ECS tasks to upload logs to CloudWatch - 'logs:CreateLogStream' - 'logs:PutLogEvents' Resource: '*' # This is a role which is used by the ECS tasks themselves. ECSTaskRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Statement: - Effect: Allow Principal: Service: [ecs-tasks.amazonaws.com] Action: ['sts:AssumeRole'] Path: / Policies: - PolicyName: s3-es-ddb-access PolicyDocument: Statement: - Effect: Allow Action: - 's3:ListBucket' - 's3:GetObject' - 's3:GetObjectVersion' - 's3:PutObject' - 's3:PutObjectAcl' - 'es:ESHttpGet' - 'es:ESHttpHead' - 'es:ESHttpPost' - 'es:ESHttpPut' - 'es:ESHttpPatch' - 'dynamodb:PutItem' - 'dynamodb:UpdateItem' - 'dynamodb:GetItem' Resource: '*' # The task definition. This is a simple metadata description of what # container to run, and what resource requirements it has. TaskDefinition: Type: AWS::ECS::TaskDefinition Properties: Family: !Ref 'ServiceName' Cpu: !Ref 'ContainerCpu' Memory: !Ref 'ContainerMemory' NetworkMode: awsvpc RequiresCompatibilities: - FARGATE ExecutionRoleArn: !Ref ECSTaskExecutionRole TaskRoleArn: !Ref ECSTaskRole ContainerDefinitions: - Name: !Ref 'ServiceName' Cpu: !Ref 'ContainerCpu' Memory: !Ref 'ContainerMemory' Image: !Ref 'ImageUrl' EntryPoint: - "python" - "/usr/local/bin/dockerd-entrypoint.py" Command: - "multi-model-server" - "--start" - "--models" - !Sub "${InferenceModelName}=${InferenceModelS3Location}" - "--mms-config" - "/home/model-server/config.properties" Environment: - Name: es_index Value: !Ref ESKNNIndexName - Name: aws_region Value: !Ref AWS::Region PortMappings: - ContainerPort: !Ref 'ContainerPort' LogConfiguration: LogDriver: awslogs Options: awslogs-group: !Ref CloudWatchLogsGroup awslogs-region: !Ref AWS::Region awslogs-stream-prefix: !Ref ServiceName CloudWatchLogsGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Sub ecs-fargate-${ServiceName} RetentionInDays: 365 # The service. The service is a resource which allows you to run multiple # copies of a type of task, and gather up their logs and metrics, as well # as monitor the number of running tasks and replace any that have crashed Service: Type: AWS::ECS::Service DependsOn: PublicLoadBalancerListener Properties: ServiceName: !Ref 'ServiceName' Cluster: !Ref ECSCluster LaunchType: FARGATE DeploymentConfiguration: MaximumPercent: 200 MinimumHealthyPercent: 75 DesiredCount: !Ref 'DesiredCount' NetworkConfiguration: AwsvpcConfiguration: AssignPublicIp: ENABLED SecurityGroups: - !Ref FargateContainerSecurityGroup Subnets: - !Ref PublicSubnetOne - !Ref PublicSubnetTwo TaskDefinition: !Ref 'TaskDefinition' LoadBalancers: - ContainerName: !Ref 'ServiceName' ContainerPort: !Ref 'ContainerPort' TargetGroupArn: !Ref 'TargetGroup' # These are the values output by the CloudFormation template. Be careful # about changing any of them, because of them are exported with specific # names so that the other task related CF templates can use them. Outputs: InferenceAPIUrl: Description: The url of the external load balancer Value: !Join ['', ['http://', !GetAtt 'PublicLoadBalancer.DNSName', '/predictions/', !Ref InferenceModelName]]