{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# HIV Inhibitor Prediction Using Graph Neural Networks (GNN) on Amazon SageMaker\n", "\n", "**Note:** This notebook was last tested with the `Python 3 (Pytorch 1.12 Python 3.8 CPU Optimized)` environment image in Amazon SageMaker Studio.\n", "\n", "## Learning Objectives\n", "\n", "- Understand the basics of graph neural networks and how they can be applied to molecular graphs\n", "- Install and use the Deep Graph Library (DGL)\n", "- Build, train, and deploy a DGL model on SageMaker\n", "- Perform hyperparameter tuning of deep learning models\n", "- Use your own scripts to train custom models in SageMaker \n", "- Track model training and other tasks using SageMaker Experiments\n", "\n", "\n", "## Introduction\n", "\n", "Human immunodeficiency virus type 1 (HIV-1) is the most common cause of Acquired Immunodeficiency Syndrome (AIDS). One ongoing area of research is finding compounds that inhibit HIV-1 viral replication. Schematically, this is shown below as:\n", "\n", "\n", "
\n", " | smiles | \n", "HIV_active | \n", "
---|---|---|
0 | \n", "CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)... | \n", "0 | \n", "
1 | \n", "C(=Cc1ccccc1)C1=[O+][Cu-3]2([O+]=C(C=Cc3ccccc3... | \n", "0 | \n", "
2 | \n", "CC(=O)N1c2ccccc2Sc2c1ccc1ccccc21 | \n", "0 | \n", "
3 | \n", "Nc1ccc(C=Cc2ccc(N)cc2S(=O)(=O)O)c(S(=O)(=O)O)c1 | \n", "0 | \n", "
4 | \n", "O=S(=O)(O)CCS(=O)(=O)O | \n", "0 | \n", "
5 | \n", "CCOP(=O)(Nc1cccc(Cl)c1)OCC | \n", "0 | \n", "
6 | \n", "O=C(O)c1ccccc1O | \n", "0 | \n", "
7 | \n", "CC1=C2C(=COC(C)C2C)C(O)=C(C(=O)O)C1=O | \n", "0 | \n", "
8 | \n", "O=[N+]([O-])c1ccc(SSc2ccc([N+](=O)[O-])cc2[N+]... | \n", "0 | \n", "
9 | \n", "O=[N+]([O-])c1ccccc1SSc1ccccc1[N+](=O)[O-] | \n", "0 | \n", "
10 | \n", "CC(C)(CCC(=O)O)CCC(=O)O | \n", "0 | \n", "
11 | \n", "O=C(O)Cc1ccc(SSc2ccc(CC(=O)O)cc2)cc1 | \n", "1 | \n", "
12 | \n", "O=C(O)c1ccccc1SSc1ccccc1C(=O)O | \n", "0 | \n", "
13 | \n", "CCCCCCCCCCCC(=O)Nc1ccc(SSc2ccc(NC(=O)CCCCCCCCC... | \n", "0 | \n", "
14 | \n", "Sc1cccc2c(S)cccc12 | \n", "0 | \n", "