# AWS Service Catalog --- All self-provisioned products in this solution such as data science environment, Studio user profile, SageMaker Notebooks, and MLOps project templates are delivered and deployed via [AWS Service Catalog](https://aws.amazon.com/servicecatalog). One of the main advantages of using AWS Service Catalog for self-service provisioning is that users can configure and deploy configured products and AWS resources without needing full privileges to AWS services. The deployment of all AWS Service Catalog products happens under a specified service role with the defined set of permissions. The Service Catalog approach offers the following features: + Product definition via CloudFormation templates following best-practices and compliance requirements + Control access to underlying AWS services + Self-service product provisioning for the entitled end users + Implement technology recommendations and patterns + Provide security and information privacy guardrails **Governed and secure environments** are delivered via AWS Service Data catalog: + Research and train ML models + Share reproducible results + Develop data processing automation + Develop ML model training automation + Define ML Deployment resources + Test ML models before deployment ![AWS Service Catalog pipeline](../img/service-catalog-pipeline.png) ## AWS Service Catalog products in this solution The following sections describe products which are delivered as part of this solution. ### Data science environment product This product provisions end-to-end data science environment (Studio) for a specific data science team (Studio Domain) and stage (dev/test/prod). It deploys the following team- and stage-specific resources: + VPC: - Dedicated `data science Team VPC` - Private subnets in each of the selected Availability Zones (AZ), up to four AZs are supported - If NAT gateway option is selected: NAT gateways and public subnets in each of the selected AZs + VPC endpoints: - S3 VPC endpoint (`gateway` type) and endpoint policy to access the S3 data and model buckets. Only data and model buckets can be accessed - VPC endpoints (`interface` type) to access AWS public services (CloudWatch, SSM, SageMaker, CodeCommit) - VPC endpoint to access the `Shared services VPC` + Security Groups: - SageMaker security group for SageMaker resources. No ingress allowed - VPC endpoint security group for all VPC endpoints. **HTTPS 443 ingress only** - VPC endpoint security group to access the `Shared services VPC`. **HTTP 80 ingress only** + IAM roles: - Data Scientist IAM role - data science team administrator IAM role + AWS KMS: - KMS key for data encryption in S3 buckets - KMS key for data encryption on EBS volumes attached to SageMaker instances (for training, processing, batch jobs) + Amazon S3 buckets: - Amazon S3 data bucket with a pre-defined bucket policy. The S3 bucket is encrypted with AWS KMS key. - Amazon S3 model bucket with a pre-defined bucket policy. The S3 bucket is encrypted with AWS KMS key ### Team-level products #### SageMaker project templates This solution deploys two custom [SageMaker projects](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-whatis.html) as Service Catalog products: + Multi-account model deployment + Model building, training, and validating with SageMaker Pipelines These products are available as SageMaker projects in Studio and only deployable within the Studio. ![sm-mlops-projects](../img/sm-mlops-projects.png) #### User profile for Studio domain (_not implemented in this version_) Each provisioning of a data science environment product creates a Studio domain with a _default user profile_. You can optionally manually (from AWS CLI or SageMaker console) create new user profiles: + Each user profile has its own dedicated compute resource with a slice of the shared EFS file system + Each user profile can be associated with its own execution role (or use default domain execution role) ❗ There is a limit of one SageMaker domain per region per account and you can provision only one data science environment product per region per account. #### SageMaker notebook product (_not implemented in this version_) This product is available for Data Scientist and data science Team Administrator roles. Each notebook is provisioned with pre-defined lifecycle configuration. The following considerations are applied to the notebook product: + Only some instance types are allowed to use in the notebook + Pre-defined notebook execution role is attached to the notebook + Notebook execution role enforce use of security configurations and controls (e.g. the notebook can be started only in VPC attachment mode) + Notebook has write access only to the project-specific S3 buckets (data and model as deployed by the data science Environment product) + Notebook-attached EBS volume is encrypted with its own AWS KMS key + Notebook is started in the SageMaker VPC, subnet, and security group --- [Back to README](../README.md) --- Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0