{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Dashboard Using Yahoo! Finance Data\n", "\n", "In this example, we demonstrate how to deploy a simple self-contained dashboard.\n", "\n", "We deploy an example with [Yahoo! Finance](https://finance.yahoo.com/) for data, but the same method can be used to deploy any kind of self-contained dashboard. If you're looking to use the dashboard to interact with any kind of machine learning model, check out the [SageMaker Dashboards for ML](https://github.com/awslabs/sagemaker-dashboards-for-ml) example instead.\n", "\n", "We first import all of the required Python packages for this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "from dotenv import load_dotenv\n", "import os\n", "from pathlib import Path\n", "import sagemaker\n", "import warnings\n", "\n", "import utils" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our AWS CloudFormation Stack created a number of other resources that we'll need to refer to later on in this notebook (e.g. [Amazon S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html), [Amazon ECR Repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/Repositories.html), [Amazon ECS Cluster](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/clusters.html), etc). We have access to this information in a `.env` file. Using the [`dotenv`](https://pypi.org/project/python-dotenv/) Python package, the contents of `.env` are loaded as environment variables, and we can access the AWS resource identifiers later on in the notebook using `os.environ`: e.g. `os.environ['DASHBOARD_ECR_REPOSITORY']`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dotenv_filepath = Path('../../../../.env').resolve()\n", "if dotenv_filepath.exists():\n", " load_dotenv()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dashboard Development" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Source code for the example dashboard can be found at `./dashboard/src/app.py`. We use [Streamlit](https://www.streamlit.io/) in this example because of its simplicity, flexibility and integrations with a wide variety of visualisation tools such as [Altair](https://altair-viz.github.io/), [Bokeh](https://docs.bokeh.org/en/latest/index.html) and [Plotly](https://plotly.com/). We could start the dashboard server from the `dashboard` Conda environment, but this would lead to differences between development and deployment environments which should be avoided. Instead, we use Docker containers from the beginning to simplify the deployment later on. Our Dockerfile is defined at `./dashboard/Dockerfile` and has an `ENTRYPOINT` set to `streamlit run src/app.py` to start the dashboard server within the container. We start off by building the Docker image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "image_name = 'dashboard'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "!(cd dashboard && docker build -t {image_name} . --no-cache)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With our Docker image built, we can now choose a local port that will be used by the dashboard server. Although the Streamlit server will be running on port 80 inside the container (defined in the `Dockerfile`), we can map this to a different local port on our Amazon SageMaker Notebook Instance. We'll use port 8501 in our example, which is the Streamlit default." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "port = 8501" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're now ready to start the Docker container. We have a defined a utility function, called `get_docker_run_command`, that can be used to construct the correct `docker run` command. It handles a number of different things including:\n", "\n", "* port forwarding: access the server running inside the Docker container.\n", "* local directory mounts: edit source files without having to rebuild the Docker container.\n", "* debug modes: change the verbosity of error messages.\n", "* permissions: pass the IAM role from the SageMaker Notebook Instance to the Docker container.\n", "\n", "After running the cell below, you should click on the `Dashboard URL` link that appears at the top, rather than the `URL` output by the Streamlit server. All local ports must be accessed via the [Jupyter Server Proxy](https://github.com/jupyterhub/jupyter-server-proxy) at `https://{notebook-url}/proxy/{port}`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "url = utils.get_dashboard_url(port)\n", "command = utils.get_docker_run_command(port, image_name, local_dir_mount='./dashboard/script', debug=True)\n", "!echo Dashboard URL: {url}\n", "!{command}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should be able to interact with the sliders and map on the dashboard.\n", "\n", "When you're ready, try making a few changes to `eda.py`. When returning to the dashboard, you should notice a few extra buttons appear in the top right corner. Click 'Rerun'. You should see your changes appear on the dashboard. Click 'Always rerun' to perform a 'live reload' whenever a change is detected.\n", "\n", "When you're finished developing the dashboard, interrupt the Jupyter kernel by clicking on the stop button or clicking 'Kernel' -> 'Interrupt'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dashboard Deployment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Docker Container Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With dashboard development finished, we should now test a Docker container without the local directory mount. Our existing Docker image has outdated dashboard source code files, so we must rebuild the container with the latest source code files. Our `Dockerfile` copies source code files into the image." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "!(cd dashboard && docker build -t {image_name} . --no-cache)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just like before we use `get_docker_run_command` to start the dashboard Docker container, but this time we don't mount a local directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "url = utils.get_dashboard_url(port)\n", "command = utils.get_docker_run_command(port, image_name)\n", "!echo Dashboard URL: {url}\n", "!{command}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If all looks good here, we can continue with the dashboard deployment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Amazon ECR Upload" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our dashboard Docker image is currently stored on our Amazon SageMaker Notebook Instance. We need to access this image from [Amazon Elastic Container Service](https://aws.amazon.com/ecs/) (ECS), so we need to upload this image to [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) (ECR). As part of the AWS CloudFormation Stack creation, an Amazon ECR Repository was created for us to store the dashboard Docker image. We can retrieve the necessary identifiers from the environment variables." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "AWS_ACCOUNT_ID = os.environ['AWS_ACCOUNT_ID']\n", "AWS_REGION = os.environ['AWS_REGION']\n", "DASHBOARD_ECR_REPOSITORY = os.environ['DASHBOARD_ECR_REPOSITORY']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We just need to tag the local dashboard Docker image, login to Amazon ECR and push the local dashboard Docker image to the Amazon ECR Repository." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!docker tag {image_name} {AWS_ACCOUNT_ID}.dkr.ecr.{AWS_REGION}.amazonaws.com/{DASHBOARD_ECR_REPOSITORY}:latest\n", "!eval $(aws ecr get-login --no-include-email)\n", "!docker push {AWS_ACCOUNT_ID}.dkr.ecr.{AWS_REGION}.amazonaws.com/{DASHBOARD_ECR_REPOSITORY}:latest" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Amazon ECS Service Start" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our dashboard Docker image is now on Amazon ECR, but the dashboard isn't actually running yet.\n", "\n", "Amazon ECS is a fully-managed service for running Docker containers. You don't need to provision or manage servers; you just define the task that needs to be run and specify the resources the task needs. Our AWS CloudFormation Stack created a number of Amazon ECS resources for the dashboard: most notably a [Cluster](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/clusters.html), a [Task Definition](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definitions.html) and a [Service](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_services.html). We can retrieve the necessary identifiers from the environment variables." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DASHBOARD_ECS_CLUSTER = os.environ['DASHBOARD_ECS_CLUSTER']\n", "DASHBOARD_ECR_SERVICE = os.environ['DASHBOARD_ECR_SERVICE']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our example Task Definition states that the dashboard Docker container should be run with a single vCPU and 2GB of memory. Our example Service starts with a desired task count of 0, so while the dashboard is being developed there are no tasks being run on Amazon ECS. When the desired task count is set to 1 or more, the service will try to maintain that number of instances of the task definition simultaneously. When we set the desired task count to 2 (with the command below), the service will start 2 new tasks. Our [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) (ALB) is used to distribute traffic across tasks. When a task fails, the service will deprovision the failing task and provision a replacement." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws ecs update-service --cluster {DASHBOARD_ECS_CLUSTER} --service {DASHBOARD_ECR_SERVICE} --desired-count 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After running the above command, the new tasks will take a few minutes to provision and be added to the ALB's [Target Group](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html).\n", "\n", "Meanwhile, we can retrieve the URL that will be used to access the deployed dashboard." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DASHBOARD_URL = os.environ['DASHBOARD_URL']\n", "DASHBOARD_ALB = os.environ['DASHBOARD_ALB']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Your next actions will depend on the choices you made when launching the AWS CloudFormation Stack. If you opted to use a custom domain, you will have to add a CNAME record on that domain pointing to the `DASHBOARD_ALB` URL. Otherwise you're permitted to access the `DASHBOARD_ALB` URL directly.\n", "\n", "**Caution**: You will receive a warning from your browser when accessing the dashboard if you didn't provide a custom SSL certificate when launching the AWS CloudFormation Stack. A self-signed certificate is created and used as a backup but this is certainly not recommended for production use cases. You should obtain an SSL Certificate that has been validated by a certificate authority, import it into [AWS Certificate Manager](https://aws.amazon.com/certificate-manager/) and reference this when launching the AWS CloudFormation Stack. Should you wish to continue with the self-signed certificate (for development purposes), you should be able to proceed past the browser warning page. With Chrome, you will see a 'Your connection is not private' error message ('NET::ERR_CERT_AUTHORITY_INVALID'), but by clicking on 'Advanced' you should then see a link to proceed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if DASHBOARD_URL != DASHBOARD_ALB:\n", " warnings.warn('\\n' + '\\n'.join([\n", " \"Add CNAME record on your domain before continuing!\",\n", " \"from: {}\".format(DASHBOARD_URL),\n", " \"to: {}\".format(DASHBOARD_ALB),\n", " \"Otherwise you will see 'An error was encountered with the requested page' with Amazon Cognito.\"\n", " ]))\n", "print(f\"DASHBOARD_URL: https://{DASHBOARD_URL}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When using Amazon Cognito authentication, you will be greeted by a sign-in page.\n", "\n", "A sample user called `dashboard_user` was created during the AWS CloudFormation Stack creation, and the password will have been sent to the email address you provided during launch. You should look out for an email from no-reply@verificationemail.com with the subject 'New Dashboard Account'.\n", "\n", "After typing in the temporary password, you will be prompted to enter a new one. You can use this to access the dashboard on other occasions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this stage, you should be able to access the dashboard. Some common issues and potential resolutions are as follows:\n", "\n", "**503 Service Temporarily Unavailable**\n", "\n", "You might see this when accessing the dashboard URL, and this is typically an error response from the Application Load Balancer. Make sure you have followed the instructions in the notebook, and waited a few minutes for the Amazon ECS Service to start. You should check your Amazon ECS Tasks if this error message doesn't disappear after 5 minutes of waiting. You should avoid leaving your Amazon ECS Service with the desired task count greater than 0 when tasks are consistently failing, because the Service will keep trying to provision new tasks.\n", "\n", "**An error was encountered with the requested page.**\n", "\n", "You might see this if you are using a custom domain (or subdomain). Check that the Callback URL on the Amazon Cognito User Pool Client matches your domain (or subdomain) exactly: even capital letters and forgetting the trailing slash can cause issues here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean Up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you've finished with this solution, make sure that you delete all unwanted AWS resources.\n", "\n", "AWS CloudFormation can be used to automatically delete all standard resources that have been created by the solution and notebooks. Still, we explicitly set the desired task count of our Amazon ECS Service to 0.\n", "\n", "**Caution**: You need to manually delete any extra resources that you may\n", "have created in these notebooks. Some examples include, extra Amazon S3\n", "buckets (to the solution's default bucket), any Amazon SageMaker\n", "endpoints (using a custom name), and extra Amazon ECR repositories." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws ecs update-service --cluster {DASHBOARD_ECS_CLUSTER} --service {DASHBOARD_ECR_SERVICE} --desired-count 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can now return to AWS CloudFormation and delete the stack." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_dashboard", "language": "python", "name": "conda_dashboard" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }