AWSTemplateFormatVersion: "2010-09-09" Description: | (SO9052) - This solution creates an Amazon SageMaker Studio domain along with the necessary resources required for the Intelligent Document Processing Workshop. Parameters: UserProfileName: Type: String Description: The user profile name for the IDP workshop Default: 'SageMakerUser' DomainName: Type: String Description: The domain name of the Sagemaker studio instance Default: 'IDPSagemakerDomain' Mappings: JupyterMap: us-east-1: jupyterimage: "arn:aws:sagemaker:us-east-1:081325390199:image/jupyter-server-3" us-east-2: jupyterimage: "arn:aws:sagemaker:us-east-2:429704687514:image/jupyter-server-3" us-west-1: jupyterimage: "arn:aws:sagemaker:us-west-1:742091327244:image/jupyter-server-3" us-west-2: jupyterimage: "arn:aws:sagemaker:us-west-2:236514542706:image/jupyter-server-3" af-south-1: jupyterimage: "arn:aws:sagemaker:af-south-1:559312083959:image/jupyter-server-3" ap-east-1: jupyterimage: "arn:aws:sagemaker:ap-east-1:493642496378:image/jupyter-server-3" ap-south-1: jupyterimage: "arn:aws:sagemaker:ap-south-1:394103062818:image/jupyter-server-3" ap-northeast-2: jupyterimage: "arn:aws:sagemaker:ap-northeast-2:806072073708:image/jupyter-server-3" ap-southeast-1: jupyterimage: "arn:aws:sagemaker:ap-southeast-1:492261229750:image/jupyter-server-3" ap-southeast-2: jupyterimage: "arn:aws:sagemaker:ap-southeast-2:452832661640:image/jupyter-server-3" ap-northeast-1: jupyterimage: "arn:aws:sagemaker:ap-northeast-1:102112518831:image/jupyter-server-3" ca-central-1: jupyterimage: "arn:aws:sagemaker:ca-central-1:310906938811:image/jupyter-server-3" eu-central-1: jupyterimage: "arn:aws:sagemaker:eu-central-1:936697816551:image/jupyter-server-3" eu-west-1: jupyterimage: "arn:aws:sagemaker:eu-west-1:470317259841:image/jupyter-server-3" eu-west-2: jupyterimage: "arn:aws:sagemaker:eu-west-2:712779665605:image/jupyter-server-3" eu-west-3: jupyterimage: "arn:aws:sagemaker:eu-west-3:615547856133:image/jupyter-server-3" eu-north-1: jupyterimage: "arn:aws:sagemaker:eu-north-1:243637512696:image/jupyter-server-3" eu-south-1: jupyterimage: "arn:aws:sagemaker:eu-south-1:592751261982:image/jupyter-server-3" sa-east-1: jupyterimage: "arn:aws:sagemaker:sa-east-1:782484402741:image/jupyter-server-3" RegionMap: us-east-1: datascience: "arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" us-east-2: datascience: "arn:aws:sagemaker:us-east-2:429704687514:image/datascience-1.0" us-west-1: datascience: "arn:aws:sagemaker:us-west-1:742091327244:image/datascience-1.0" us-west-2: datascience: "arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0" af-south-1: datascience: "arn:aws:sagemaker:af-south-1:559312083959:image/datascience-1.0" ap-east-1: datascience: "arn:aws:sagemaker:ap-east-1:493642496378:image/datascience-1.0" ap-south-1: datascience: "arn:aws:sagemaker:ap-south-1:394103062818:image/datascience-1.0" ap-northeast-2: datascience: "arn:aws:sagemaker:ap-northeast-2:806072073708:image/datascience-1.0" ap-southeast-1: datascience: "arn:aws:sagemaker:ap-southeast-1:492261229750:image/datascience-1.0" ap-southeast-2: datascience: "arn:aws:sagemaker:ap-southeast-2:452832661640:image/datascience-1.0" ap-northeast-1: datascience: "arn:aws:sagemaker:ap-northeast-1:102112518831:image/datascience-1.0" ca-central-1: datascience: "arn:aws:sagemaker:ca-central-1:310906938811:image/datascience-1.0" eu-central-1: datascience: "arn:aws:sagemaker:eu-central-1:936697816551:image/datascience-1.0" eu-west-1: datascience: "arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0" eu-west-2: datascience: "arn:aws:sagemaker:eu-west-2:712779665605:image/datascience-1.0" eu-west-3: datascience: "arn:aws:sagemaker:eu-west-3:615547856133:image/datascience-1.0" eu-north-1: datascience: "arn:aws:sagemaker:eu-north-1:243637512696:image/datascience-1.0" eu-south-1: datascience: "arn:aws:sagemaker:eu-south-1:488287956546:image/sagemaker-data-wrangler-1.0" sa-east-1: datascience: "arn:aws:sagemaker:sa-east-1:782484402741:image/datascience-1.0" Resources: LambdaExecutionRole: Type: "AWS::IAM::Role" Properties: Description: 'Lambda Role for Cloudformation' Policies: - PolicyName: sagemaker-access PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - sagemaker:UpdateDomain - sagemaker:UpdateUserProfile - sagemaker:CreateStudioLifecycleConfig - sagemaker:DeleteStudioLifecycleConfig - sagemaker:CreateApp - sagemaker:DeleteApp Resource: '*' AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Principal: Service: - lambda.amazonaws.com Action: - "sts:AssumeRole" Path: / ManagedPolicyArns: - 'arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess' - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole SageMakerExecutionRole: Type: AWS::IAM::Role Properties: Description: 'IDP IAM role for Sagemaker' Policies: - PolicyName: s3-access PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - s3:GetObject - s3:PutObject - s3:DeleteObject - s3:ListBucket Resource: arn:aws:s3:::* - PolicyName: comprehend-passrole PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - iam:PassRole Resource: arn:aws:iam::*:role/* Condition: StringLikeIfExists: 'iam:PassedToService': "comprehend.amazonaws.com" - PolicyName: textract-comprehend-sl-access PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - textract:GetDocumentTextDetection - textract:GetDocumentAnalysis - textract:AnalyzeDocument - textract:AnalyzeID - textract:AnalyzeExpense - textract:DetectDocumentText - textract:StartDocumentAnalysis - textract:StartDocumentTextDetection - textract:StartLendingAnalysis - textract:GetLendingAnalysis - comprehend:DetectEntities - comprehend:DetectPiiEntities - comprehend:ContainsPiiEntities - comprehend:DescribePiiEntitiesDetectionJob - comprehend:ListPiiEntitiesDetectionJobs - comprehend:StartPiiEntitiesDetectionJob - comprehend:StopPiiEntitiesDetectionJob - comprehend:StartEntitiesDetectionJob - comprehend:ClassifyDocument - comprehend:DescribeDocumentClassificationJob - comprehend:DescribeDocumentClassifier - comprehend:CreateDocumentClassifier - comprehend:CreateEntityRecognizer - comprehend:DescribeEntityRecognizer - comprehend:CreateEndpoint - comprehend:DescribeEndpoint - comprehend:DeleteEndpoint - comprehend:DeleteDocumentClassifier - comprehend:DeleteEntityRecognizer - comprehend:StopTrainingDocumentClassifier - comprehend:StopTrainingEntityRecognizer - comprehend:ListEndpoints - comprehend:ListEntityRecognizers - comprehend:ListEntityRecognizerSummaries - comprehend:ListDocumentClassifiers - comprehend:ListDocumentClassifierSummaries - comprehend:ImportModel - comprehend:StartDocumentClassificationJob - comprehendmedical:DetectEntitiesV2 - comprehendmedical:DetectPHI - comprehendmedical:InferICD10CM - comprehendmedical:InferRxNorm - comprehendmedical:InferSNOMEDCT - comprehendmedical:StartICD10CMInferenceJob - comprehendmedical:StartRxNormInferenceJob - comprehendmedical:StartSNOMEDCTInferenceJob - comprehendmedical:StartPHIDetectionJob Resource: '*' AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Principal: Service: - sagemaker.amazonaws.com - comprehend.amazonaws.com Action: - sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/AmazonSageMakerFullAccess - arn:aws:iam::aws:policy/IAMReadOnlyAccess - arn:aws:iam::aws:policy/AmazonAugmentedAIIntegratedAPIAccess DefaultVpcLambda: Type: AWS::Lambda::Function Properties: FunctionName: CFGetDefaultVpcId Code: ZipFile: | import json import boto3 import cfnresponse ec2 = boto3.client('ec2') def lambda_handler(event, context): if 'RequestType' in event and event['RequestType'] == 'Create': vpc_id = get_default_vpc_id() subnets = get_subnets_for_vpc(vpc_id) if vpc_id and subnets: cfnresponse.send(event, context, cfnresponse.SUCCESS, {'VpcId': vpc_id , "Subnets" : subnets}, '') else: cfnresponse.send(event, context, cfnresponse.FAILED, {'Error': "Missing Default VPC. A Default VPC is required"}, '') else: cfnresponse.send(event, context, cfnresponse.SUCCESS, {},'') def get_default_vpc_id(): vpcs = ec2.describe_vpcs(Filters=[{'Name': 'is-default', 'Values': ['true']}]) vpcs = vpcs['Vpcs'] vpc_id = vpcs[0]['VpcId'] return vpc_id def get_subnets_for_vpc(vpcId): response = ec2.describe_subnets( Filters=[ { 'Name': 'vpc-id', 'Values': [vpcId] } ] ) subnet_ids = [] for subnet in response['Subnets']: subnet_ids.append(subnet['SubnetId']) return subnet_ids Description: Return default VPC ID and Subnets Handler: index.lambda_handler MemorySize: 512 Role: !GetAtt LambdaExecutionRole.Arn Runtime: python3.9 Timeout: 5 DefaultVpcFinder: Type: Custom::ResourceForFindingDefaultVpc Properties: ServiceToken: !GetAtt DefaultVpcLambda.Arn LccLambda: Type: AWS::Lambda::Function Properties: FunctionName: LccForSagemakerStudio Code: ZipFile: | import json import boto3 import base64 import textwrap import cfnresponse sagemaker = boto3.client('sagemaker') def lambda_handler(event, context): print(event) script = textwrap.dedent('''\ #!/bin/bash set -eux echo "Cloning IDP repository" export REPOSITORY_URL="https://github.com/aws-samples/aws-ai-intelligent-document-processing" git -C /home/sagemaker-user clone $REPOSITORY_URL echo "Cloning complete"''') script_byte = script.encode("ascii") base64_bytes = base64.b64encode(script_byte) base64_string = base64_bytes.decode("ascii") if 'RequestType' in event and event['RequestType'] == 'Create': domain_id = event['ResourceProperties']['DomainID'] user_profile = event['ResourceProperties']['UserProfileName'] try: resp = sagemaker.create_studio_lifecycle_config(StudioLifecycleConfigName='idp-git-bootstrap', StudioLifecycleConfigContent=base64_string, StudioLifecycleConfigAppType='JupyterServer') lcc_config_arn = resp['StudioLifecycleConfigArn'] jupyter_setting = {'JupyterServerAppSettings': { 'DefaultResourceSpec': { 'LifecycleConfigArn': lcc_config_arn, 'InstanceType': 'system' }, 'LifecycleConfigArns': [ lcc_config_arn ] } } resp_d = sagemaker.update_domain(DomainId=domain_id, DefaultUserSettings=jupyter_setting) cfnresponse.send(event, context, cfnresponse.SUCCESS, {}, '') except Exception as e: print(e) cfnresponse.send(event, context, cfnresponse.FAILED, {'Error': 'Unable to create Lifecycle Config'}, '') elif 'RequestType' in event and event['RequestType'] == 'Delete': try: resp = sagemaker.delete_studio_lifecycle_config(StudioLifecycleConfigName='idp-git-bootstrap') cfnresponse.send(event, context, cfnresponse.SUCCESS, {}, '') except Exception as e: print(e) cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Message': 'Unable to delete Lifecycle Config.. proceeding'}, '') else: cfnresponse.send(event, context, cfnresponse.SUCCESS, {}, '') Description: Creates a Lifecycle Config for the SageMaker Studio JupyterApp to clone the IDP Github repo Handler: index.lambda_handler MemorySize: 512 Role: !GetAtt LambdaExecutionRole.Arn Runtime: python3.9 Timeout: 5 DefaultLcc: Type: Custom::ResourceForLcc DependsOn: - StudioDomain Properties: ServiceToken: !GetAtt LccLambda.Arn DomainID: !Ref StudioDomain UserProfileName: !Ref UserProfileName StudioDomain: Type: AWS::SageMaker::Domain Properties: AppNetworkAccessType: PublicInternetOnly AuthMode: IAM DefaultUserSettings: ExecutionRole: !GetAtt SageMakerExecutionRole.Arn JupyterServerAppSettings: DefaultResourceSpec: InstanceType: system SageMakerImageArn: !FindInMap - JupyterMap - !Ref 'AWS::Region' - jupyterimage DomainName: !Ref DomainName SubnetIds: !GetAtt DefaultVpcFinder.Subnets VpcId: !GetAtt DefaultVpcFinder.VpcId UserProfile: Type: AWS::SageMaker::UserProfile DependsOn: DefaultLcc Properties: DomainId: !GetAtt StudioDomain.DomainId UserProfileName: !Ref UserProfileName UserSettings: ExecutionRole: !GetAtt SageMakerExecutionRole.Arn JupyterApp: Type: AWS::SageMaker::App DependsOn: UserProfile Properties: AppName: default AppType: JupyterServer DomainId: !GetAtt StudioDomain.DomainId UserProfileName: !Ref UserProfileName DataScienceApp: Type: AWS::SageMaker::App DependsOn: UserProfile Properties: AppName: instance-event-engine-datascience-ml-t3-medium AppType: KernelGateway DomainId: !GetAtt StudioDomain.DomainId ResourceSpec: InstanceType: ml.t3.medium SageMakerImageArn: !FindInMap - RegionMap - !Ref 'AWS::Region' - datascience UserProfileName: !Ref UserProfileName ### S3 Bucket For A2I A2IBucket: Type: AWS::S3::Bucket DeletionPolicy: Delete Properties: BucketName: !Join - "-" - - "idp-a2i" - !Select - 0 - !Split - "-" - !Select - 2 - !Split - "/" - !Ref "AWS::StackId" CorsConfiguration: CorsRules: - AllowedOrigins: - "*" AllowedMethods: - GET BucketEncryption: ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256 VersioningConfiguration: Status: Enabled