# Accessing Client Side Encrypted Data with Athena and Lake Formation ## Overview of Steps * Create KMS Key * Setup a fargate (serverless) based EKS cluster * Enable EMR on EKS * Prepare a spark job to client side encrypt (CSE) the raw data (.csv file) & populate Glue catalog * Verify data is client side encrypted on S3 * Put bucket & CSE data under Lake Formation control, setup a Lake Formation data filter. * Setup Lake Formation data access permissions * Adjust KMS Key policy * Query Data with Athena (V3) * Setup Cross Account access for consumer account via Lake Formation * Setup Lake Formation resource links in consumer account * Verify Athena access in the consumer account --------- ## Create a KMS key: export KeyID=$(aws kms create-key --query KeyMetadata.KeyId --output text) ## Setup serverless (Fargate) EKS environment - for EMR on EKS ```bash eksctl create cluster -f cluster.yaml ``` ---- ## Setup EMR on EKS ```bash ./add-emr-toeks.sh ``` ----- ## Create a bucket ```bash export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) export AWS_REGION=$(aws configure get region) export s3DemoBucket=s3://emr-eks-demo-${ACCOUNT_ID}-${AWS_REGION} aws s3 mb $s3DemoBucket ``` ## copy the raw data tripdata.csv into the bucket - ready for spark job ```bash aws s3 cp s3://aws-data-analytics-workshops/shared_datasets/tripdata/tripdata.csv tripdata.csv aws s3 cp tripdata.csv s3://emr-eks-demo-${ACCOUNT_ID}-${AWS_REGION}/data/tripdata.csv ``` ----- ## Prep spark code to read raw data - and write to parquet (with client side encryption from EMR) + populate glue catalog ```bash cat > trip-count-encrypt-write4.py <