{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Amazon SageMaker Lineage\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "---" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Amazon SageMaker Lineage enables events that happen within SageMaker to be traced via a graph structure. The data simplifies generating reports, making comparisons, or discovering relationships between events. For example easily trace both how a model was generated and where the model was deployed. \n", "\n", "The lineage graph is created automatically by SageMaker and you can directly create or modify your own graphs.\n", "\n", "\n", "## Key Concepts\n", "\n", "* **Lineage Graph** - A connected graph tracing your machine learning workflow end to end. \n", "* **Artifacts** - Represents a URI addressable object or data. Artifacts are typically inputs or outputs to Actions. \n", "* **Actions** - Represents an action taken such as a computation, transformation, or job. \n", "* **Contexts** - Provides a method to logically group other entities.\n", "* **Associations** - A directed edge in the lineage graph that links two entities.\n", "* **Lineage Traversal** - Starting from an arbitrary point trace the lineage graph to discover and analyze relationships between steps in your workflow.\n", "* **Experiments** - Experiment entites (Experiments, Trials, and Trial Components) are also part of the lineage graph and can be associated wtih Artifacts, Actions, or Contexts.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook Overview\n", "\n", "This notebook demonstrates how to:\n", "* Understand the basics of lineage entities.\n", "* Create and associate lineage entities to track your workflow.\n", "* Traverse the associations between lineage entities." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "Select the `Python 3 (Data Science)` kernel in SageMaker Studio." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import sagemaker\n", "\n", "region = boto3.Session().region_name\n", "sagemaker_session = sagemaker.session.Session()\n", "default_bucket = sagemaker_session.default_bucket()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datetime import datetime\n", "from sagemaker.lineage.context import Context\n", "from sagemaker.lineage.action import Action\n", "from sagemaker.lineage.association import Association\n", "from sagemaker.lineage.artifact import Artifact\n", "\n", "unique_id = str(int(datetime.now().replace(microsecond=0).timestamp()))\n", "\n", "print(f\"Unique id is {unique_id}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create an example context\n", "\n", "# the name must be unique across all other contexts\n", "context_name = f\"machine-learning-workflow-{unique_id}\"\n", "\n", "ml_workflow_context = Context.create(\n", " context_name=context_name,\n", " context_type=\"MLWorkflow\",\n", " source_uri=unique_id,\n", " # properties services as a method to store metdata on lineage entities in additional to Tags\n", " properties={\"example\": \"true\"},\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# list all the contexts\n", "\n", "contexts = Context.list(sort_by=\"CreationTime\", sort_order=\"Descending\")\n", "\n", "for ctx in contexts:\n", " print(ctx.context_name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create an example action and associate it with the context\n", "\n", "model_build_action = Action.create(\n", " action_name=f\"model-build-step-{unique_id}\",\n", " action_type=\"ModelBuild\",\n", " source_uri=unique_id,\n", " properties={\"Example\": \"Metadata\"},\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Association Type can be Produced|DerivedFrom|AssociatedWith|ContributedTo\n", "context_action_association = Association.create(\n", " source_arn=ml_workflow_context.context_arn,\n", " destination_arn=model_build_action.action_arn,\n", " association_type=\"AssociatedWith\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# now the Action and Context are associated:\n", "incoming_associations_to_action = Association.list(destination_arn=model_build_action.action_arn)\n", "for association in incoming_associations_to_action:\n", " print(\n", " f\"{model_build_action.action_name} has an incoming association from {association.source_name}\"\n", " )\n", "\n", "outgoing_associations_from_context = Association.list(source_arn=ml_workflow_context.context_arn)\n", "for association in outgoing_associations_from_context:\n", " print(\n", " f\"{ml_workflow_context.context_name} has an outgoing association to {association.destination_name}\"\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create an artifact representing inputs to the model building action\n", "input_test_images = Artifact.create(\n", " artifact_name=\"mnist-test-images\",\n", " artifact_type=\"TestData\",\n", " source_types=[{\"SourceIdType\": \"Custom\", \"Value\": unique_id}],\n", " source_uri=f\"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/t10k-images-idx3-ubyte.gz\",\n", ")\n", "\n", "input_test_labels = Artifact.create(\n", " artifact_name=\"mnist-test-labels\",\n", " artifact_type=\"TestLabels\",\n", " source_types=[{\"SourceIdType\": \"Custom\", \"Value\": unique_id}],\n", " source_uri=f\"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/t10k-labels-idx1-ubyte.gz\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create an artifact representing a trained model\n", "output_model = Artifact.create(\n", " artifact_name=\"mnist-model\",\n", " artifact_type=\"Model\",\n", " source_types=[{\"SourceIdType\": \"Custom\", \"Value\": unique_id}],\n", " source_uri=f\"s3://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/model/tensorflow-training-2020-11-20-23-57-13-077/model.tar.gz\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# associate the data set artifact with an incoming association to the example action\n", "Association.create(\n", " source_arn=input_test_images.artifact_arn, destination_arn=model_build_action.action_arn\n", ")\n", "Association.create(\n", " source_arn=input_test_labels.artifact_arn, destination_arn=model_build_action.action_arn\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# associate the example action with an outgoing association to the model artifact\n", "Association.create(\n", " source_arn=model_build_action.action_arn, destination_arn=output_model.artifact_arn\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Cleanup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def delete_associations(arn):\n", " # delete incoming associations\n", " incoming_associations = Association.list(destination_arn=arn)\n", " for summary in incoming_associations:\n", " assct = Association(\n", " source_arn=summary.source_arn,\n", " destination_arn=summary.destination_arn,\n", " sagemaker_session=sagemaker_session,\n", " )\n", " assct.delete()\n", "\n", " # delete outgoing associations\n", " outgoing_associations = Association.list(source_arn=arn)\n", " for summary in outgoing_associations:\n", " assct = Association(\n", " source_arn=summary.source_arn,\n", " destination_arn=summary.destination_arn,\n", " sagemaker_session=sagemaker_session,\n", " )\n", " assct.delete()\n", "\n", "\n", "def delete_lineage_data():\n", " print(f\"Deleting context {ml_workflow_context.context_name}\")\n", " delete_associations(ml_workflow_context.context_arn)\n", " ctx = Context(\n", " context_name=ml_workflow_context.context_name, sagemaker_session=sagemaker_session\n", " )\n", " ctx.delete()\n", "\n", " print(f\"Deleting action {model_build_action.action_name}\")\n", " delete_associations(model_build_action.action_arn)\n", " actn = Action(action_name=model_build_action.action_name, sagemaker_session=sagemaker_session)\n", " actn.delete()\n", "\n", " for artifact in [input_test_images, input_test_labels, output_model]:\n", " print(f\"Deleting artifact {artifact.artifact_arn} {artifact.artifact_name}\")\n", " delete_associations(artifact.artifact_arn)\n", " artfct = Artifact(artifact_arn=artifact.artifact_arn, sagemaker_session=sagemaker_session)\n", " artfct.delete()\n", "\n", "\n", "delete_lineage_data()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Caveats\n", "\n", "* Associations cannot be created between two experiment entities. For example between an Experiment and Trial.\n", "* Associations can only be created between the following resources: Action, Artifact, or Context.\n", "* The maximum number of manually created lineage entities are:\n", " * Artifacts: 6000\n", " * Contexts: 500\n", " * Actions: 3000\n", " * Associations: 6000\n", "* There is no limit on the number of lineage entities created automatically by SageMaker." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Contact\n", "\n", "Submit any questions or issues to https://github.com/aws/sagemaker-experiments/issues or mention @aws/sagemakerexperimentsadmin" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-lineage|sagemaker-lineage.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-lineage|sagemaker-lineage.ipynb)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (TensorFlow 2.10.0 Python 3.9 CPU Optimized)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/tensorflow-2.10.1-cpu-py39-ubuntu20.04-sagemaker-v1.2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 4 }