{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "This notebook evaluates an ML model design for its capacity to learn an embedding capable of distinguishing between different \"mechanisms of action\", or MOA, in the bbbc021 dataset. It does this by considering N trained models, where N corresponds to the number of chemical compounds with known MOA. Each of the N models differs from the others in that one particular compound was left out of its training set. Then, each of these models can be tested against its \"left out\" compound to evaluate its capacity to accurately classify the MOA of the left-out compound using knowledge learned from other compounds sharing the same MOA. The bbbc021 dataset has 12 MOA and 38 compounds with known MOA (there are several representative compounds per MOA, and up to 8 different concentrations per compound). There are a total of 103 'treatments' in the bbbc021 datasets with known MOA, where a treatment == the application of a particular compound at a particlar concentraion.\n", "\n", "During training the network model learns to compute an embedding (vector space) that tries to position compounds with the same MOA close together, while keeping compounds with differing MOA farther apart. Once trained, a model can be used to predict the MOA of an unknown (or untrained) MOA by finding its nearest labeled neighbors in the embedding space.\n", "\n", "This notebook assumes each of the N models is trained and available for evalution, and that each image in the dataset has a computed embedding corresponding to each model.\n", "\n", "For each of the N \"one compound left out\" models, the mean embedding for each of M treatments is computed. Then, MOA is assigned to each of the treatments corresponding to the left out compound (i.e., for each concetration separately) based on its nearest-neighbor. This is called NSC, or \"Not Same Compound\" analysis.\n", "\n", "[ TODO: Another analysis is done, NSCB, called \"Not Same Compound or Batch\", in which in addition to the compound being left out (at all concentrations) for nearest-neighbor consideration, all compounds prepared in the same Batch are also left out, to remove Batch-related characterists from biasing the results. This is only possible for 10 of the 12 MOAs, because 2 only have representatives in a single Batch. ]" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/opt/conda/lib/python3.7/site-packages/secretstorage/dhcrypto.py:16: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead\n", " from cryptography.utils import int_from_bytes\n", "/opt/conda/lib/python3.7/site-packages/secretstorage/util.py:25: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead\n", " from cryptography.utils import int_from_bytes\n", "Requirement already satisfied: shortuuid in /opt/conda/lib/python3.7/site-packages (1.0.1)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\n", "\u001b[33mWARNING: You are using pip version 21.1.3; however, version 21.2.4 is available.\n", "You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.\u001b[0m\n" ] } ], "source": [ "!pip install shortuuid" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import sys\n", "import os\n", "import math\n", "import base64\n", "import boto3\n", "import sagemaker\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import collections\n", "from collections import defaultdict\n", "from PIL import Image\n", "import sklearn\n", "from sklearn.metrics import ConfusionMatrixDisplay\n", "from matplotlib.ticker import NullFormatter\n", "from sklearn import manifold, datasets\n", "from time import time\n", "from time import sleep" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "EMBEDDING_NAME = 'bbbc021'\n", "BASELINE_TRAIN_ID = 'bneoLZG9npVDBeLCwx6qoE'" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "s3c = boto3.client('s3')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'/root/bioimage-search/datasets/bbbc-021/notebooks'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%pwd" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "bioimsArtifactBucket='bioimage-search-output'\n", "bbbc021Bucket='bioimagesearchbbbc021stack-bbbc021bucket544c3e64-10ecnwo51127'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# assumes cwd=/root/bioimage-search/datasets/bbbc-021/notebooks\n", "sys.path.insert(0, \"../../../cli/bioims/src\")\n", "import bioims as bi" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "sys.path.insert(0, \"../scripts\")\n", "import bbbc021common as bb" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "sagemaker_session = sagemaker.Session()\n", "bucket = sagemaker_session.default_bucket()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'sagemaker-us-east-1-580829821648'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bucket" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get ImageID->(compound, concentration) maps" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "image_df, moa_df = bb.Bbbc021PlateInfoByDF.getDataFrames(bbbc021Bucket)\n", "compound_moa_map = bb.Bbbc021PlateInfoByDF.getCompoundMoaMapFromDf(moa_df)\n", "\n", "sourceCompoundMap={}\n", "sourceConcentrationMap={}\n", "compoundCountMap={}\n", "moaCountMap={}\n", "for i in range(len(image_df.index)):\n", " r = image_df.iloc[i]\n", " imageSourceId = r['Image_FileName_DAPI'][:-4]\n", " imageCompound=r['Image_Metadata_Compound']\n", " sourceCompoundMap[imageSourceId]=imageCompound\n", " sourceConcentrationMap[imageSourceId]=r['Image_Metadata_Concentration']\n", " if imageCompound not in compoundCountMap:\n", " compoundCountMap[imageCompound]=1\n", " else:\n", " compoundCountMap[imageCompound] = compoundCountMap[imageCompound] + 1\n", " if imageCompound in compound_moa_map:\n", " imageMoa=compound_moa_map[imageCompound]\n", " if imageMoa not in moaCountMap:\n", " moaCountMap[imageMoa]=1\n", " else:\n", " moaCountMap[imageMoa] = moaCountMap[imageMoa] + 1" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'5-fluorouracil': 96,\n", " 'acyclovir': 96,\n", " 'AG-1478': 192,\n", " 'ALLN': 96,\n", " 'aloisine A': 96,\n", " 'alsterpaullone': 64,\n", " 'anisomycin': 96,\n", " 'aphidicolin': 96,\n", " 'arabinofuranosylcytosine': 96,\n", " 'atropine': 96,\n", " 'bleomycin': 96,\n", " 'bohemine': 64,\n", " 'brefeldin A': 96,\n", " 'bryostatin': 64,\n", " 'calpain inhibitor 2 (ALLM)': 96,\n", " 'calpeptin': 64,\n", " 'camptothecin': 96,\n", " 'carboplatin': 96,\n", " 'caspase inhibitor 1 (ZVAD)': 96,\n", " 'cathepsin inhibitor I': 96,\n", " 'Cdk1 inhibitor III': 96,\n", " 'Cdk1/2 inhibitor (NU6102)': 96,\n", " 'chlorambucil': 96,\n", " 'chloramphenicol': 64,\n", " 'cisplatin': 96,\n", " 'colchicine': 96,\n", " 'cyclohexamide': 96,\n", " 'cyclophosphamide': 64,\n", " 'cytochalasin B': 96,\n", " 'cytochalasin D': 96,\n", " 'demecolcine': 96,\n", " 'deoxymannojirimycin': 64,\n", " 'deoxynojirimycin': 96,\n", " \"3,3'-diaminobenzidine\": 96,\n", " 'docetaxel': 96,\n", " 'doxorubicin': 96,\n", " 'emetine': 96,\n", " 'epothilone B': 96,\n", " 'etoposide': 96,\n", " 'filipin': 64,\n", " 'floxuridine': 96,\n", " 'forskolin': 96,\n", " 'genistein': 96,\n", " 'H-7': 96,\n", " 'herbimycin A': 96,\n", " 'hydroxyurea': 96,\n", " 'ICI-182,780': 96,\n", " 'indirubin monoxime': 96,\n", " 'jasplakinolide': 96,\n", " 'lactacystin': 96,\n", " 'latrunculin B': 96,\n", " 'leupeptin': 96,\n", " 'LY-294002': 96,\n", " 'methotrexate': 96,\n", " 'methoxylamine': 96,\n", " 'mevinolin/lovastatin': 96,\n", " 'MG-132': 96,\n", " 'mitomycin C': 96,\n", " 'mitoxantrone': 96,\n", " 'monastrol': 192,\n", " 'neomycin': 96,\n", " 'nocodazole': 96,\n", " 'nystatin': 96,\n", " 'okadaic acid': 96,\n", " 'olomoucine': 96,\n", " 'PD-150606': 96,\n", " 'PD-169316': 64,\n", " 'PD-98059': 96,\n", " 'podophyllotoxin': 96,\n", " 'PP-2': 64,\n", " 'proteasome inhibitor I': 96,\n", " 'puromycin': 96,\n", " 'quercetin': 96,\n", " 'raloxifene': 96,\n", " 'rapamycin': 96,\n", " 'roscovitine': 96,\n", " 'SB-202190': 96,\n", " 'SB-203580': 96,\n", " 'simvastatin': 96,\n", " 'sodium butyrate': 96,\n", " 'sodium fluoride': 96,\n", " 'SP-600125': 96,\n", " 'staurosporine': 96,\n", " 'taurocholate': 96,\n", " 'taxol': 1416,\n", " 'temozolomide': 96,\n", " 'trichostatin': 96,\n", " 'tunicamycin': 96,\n", " 'UO-126': 96,\n", " 'valproic acid': 64,\n", " 'vinblastine': 96,\n", " 'vincristine': 96,\n", " 'Y-27632': 96,\n", " 'AZ235': 96,\n", " 'AZ138': 96,\n", " 'AZ701': 96,\n", " 'AZ258': 96,\n", " 'AZ841': 96,\n", " 'AZ-A': 96,\n", " 'AZ-B': 96,\n", " 'AZ-C': 96,\n", " 'AZ-H': 96,\n", " 'AZ-I': 96,\n", " 'AZ-J': 96,\n", " 'AZ-K': 96,\n", " 'AZ-L': 96,\n", " 'AZ-M': 96,\n", " 'AZ-N': 96,\n", " 'AZ-O': 96,\n", " 'AZ-U': 96,\n", " 'TKK': 96,\n", " 'UNKNOWN': 64,\n", " 'DMSO': 1320}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compoundCountMap" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Protein degradation': 384,\n", " 'Kinase inhibitors': 192,\n", " 'Protein synthesis': 288,\n", " 'DNA replication': 384,\n", " 'DNA damage': 384,\n", " 'Microtubule destabilizers': 384,\n", " 'Actin disruptors': 288,\n", " 'Microtubule stabilizers': 1608,\n", " 'Cholesterol-lowering': 192,\n", " 'Epithelial': 256,\n", " 'Eg5 inhibitors': 192,\n", " 'Aurora kinase inhibitors': 288,\n", " 'DMSO': 1320}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "moaCountMap" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "embeddingClient = bi.client('embedding')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "imageClient = bi.client('image-management')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "trainingConfigurationClient = bi.client('training-configuration')" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "embeddingInfo = trainingConfigurationClient.getEmbeddingInfo(EMBEDDING_NAME)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "plateList = imageClient.listCompatiblePlates(embeddingInfo['inputWidth'], embeddingInfo['inputHeight'], embeddingInfo['inputDepth'], embeddingInfo['inputChannels'])" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "trainList = trainingConfigurationClient.getEmbeddingTrainings(EMBEDDING_NAME)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-2KrMFC136oXVYJ7YCpNfr6-mxcscf4W7NBPJkUjDWUpaM',\n", " 'messageId': '4d3807c7-64ed-4fc6-b7ba-005a47a2d285',\n", " 'filterKey': 'train-filter/bbbc021/ALLN-filter.txt',\n", " 'trainId': '2KrMFC136oXVYJ7YCpNfr6',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-42s7EYRjYWfW3Ly9gxPUzk-WhNXafEfArRztsJ55pZGVC',\n", " 'messageId': 'fd36b765-65d9-4b6b-bb04-966a2cf8b9cf',\n", " 'filterKey': 'train-filter/bbbc021/AZ-J-filter.txt',\n", " 'trainId': '42s7EYRjYWfW3Ly9gxPUzk',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-4CiT9BNcMV7ZftmY7YWArf-XfrEvWqzj42HSozCaJWwKM',\n", " 'messageId': 'e75a176f-0590-4fde-b8ed-d3f769b0b809',\n", " 'filterKey': 'train-filter/bbbc021/PP-2-filter.txt',\n", " 'trainId': '4CiT9BNcMV7ZftmY7YWArf',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-6he8YfdaT4eLrspDsnJyFe-DzvFXogFCWLK9wojwBsjCo',\n", " 'messageId': 'e096a4e3-326a-4b24-b7e2-c886b1f2035c',\n", " 'filterKey': 'train-filter/bbbc021/cytochalasinD-filter.txt',\n", " 'trainId': '6he8YfdaT4eLrspDsnJyFe',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-7HRSPABX4n2rAoMs5LVwD8-jWJSWef582UR6pz2KhdwDP',\n", " 'messageId': '9f5afb49-6421-4dbc-a1e3-720690314ae0',\n", " 'filterKey': 'train-filter/bbbc021/alsterpaullone-filter.txt',\n", " 'trainId': '7HRSPABX4n2rAoMs5LVwD8',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-7btuhRyHiQFqhp27Hyh5EW-9mGQFWPXuP97B2bWQGhUqY',\n", " 'messageId': '1ea73688-b0cb-4b39-88f3-a66de78faa1e',\n", " 'filterKey': 'train-filter/bbbc021/simvastatin-filter.txt',\n", " 'trainId': '7btuhRyHiQFqhp27Hyh5EW',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-7fTJ1Qjq5RJmk2kPZZ3t8R-5Nuqytc29kGTHCG3rrPeYq',\n", " 'messageId': '904070e2-2219-432d-a12b-a8b104e2b27d',\n", " 'filterKey': 'train-filter/bbbc021/cytochalasinB-filter.txt',\n", " 'trainId': '7fTJ1Qjq5RJmk2kPZZ3t8R',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-8CDun7CBUmC4gAz6MjvPCp-bAh2TvX5uAGFMaHqEFXq7j',\n", " 'messageId': 'b01e089a-540d-4ddc-bb01-b53f5a0ba188',\n", " 'filterKey': 'train-filter/bbbc021/docetaxel-filter.txt',\n", " 'trainId': '8CDun7CBUmC4gAz6MjvPCp',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-9JyEVkSyapQuPfttU3Zuy6-PYTkKAoRxL7XmL6XhXBBwv',\n", " 'messageId': '3aaf171e-1260-4365-b82f-518b0d75a813',\n", " 'filterKey': 'train-filter/bbbc021/AZ-A-filter.txt',\n", " 'trainId': '9JyEVkSyapQuPfttU3Zuy6',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-a5TDDad6FBHpcz7uWw8nhx-Eas5gwUDnGFeR2VDBB7vcL',\n", " 'messageId': '8c16a54a-4903-46a6-b930-436bc3721f66',\n", " 'filterKey': 'train-filter/bbbc021/epothiloneB-filter.txt',\n", " 'trainId': 'a5TDDad6FBHpcz7uWw8nhx',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': '',\n", " 'sagemakerJobName': 'bioims-bneoLZG9npVDBeLCwx6qoE-DVwEVKD8W77DJzKCQx43CS',\n", " 'messageId': 'ac6d3cd2-6e91-47dc-b2a3-f5e9f7292707',\n", " 'filterKey': '',\n", " 'trainId': 'bneoLZG9npVDBeLCwx6qoE',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-c9aVZXAoCg74i8QsAMPEQP-32PYuuSsw8ABVPts9tmucK',\n", " 'messageId': '6611ecca-e678-4949-acbc-1151b45974b5',\n", " 'filterKey': 'train-filter/bbbc021/cyclohexamide-filter.txt',\n", " 'trainId': 'c9aVZXAoCg74i8QsAMPEQP',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-dcAeCFupHshz3dbeZiCCQk-ApqgwPXdvGFT3VnVZbFobV',\n", " 'messageId': 'b19a0555-f6b9-42f2-a1c3-c5bfc7f84c52',\n", " 'filterKey': 'train-filter/bbbc021/mitomycinC-filter.txt',\n", " 'trainId': 'dcAeCFupHshz3dbeZiCCQk',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-g8DMAt72M4VUkoJTgF83n8-96kgKJqUMm45fJSfGRyz3i',\n", " 'messageId': '8a61b2e1-fb40-4e39-875c-0b97dbb1438e',\n", " 'filterKey': 'train-filter/bbbc021/colchicine-filter.txt',\n", " 'trainId': 'g8DMAt72M4VUkoJTgF83n8',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-gR5FE1YpCTKRZUU9zRyRNK-9sF2nBxnTFZK4TAdi9KNCF',\n", " 'messageId': '572ac182-e37a-4318-b0c3-efac79e0fb77',\n", " 'filterKey': 'train-filter/bbbc021/chlorambucil-filter.txt',\n", " 'trainId': 'gR5FE1YpCTKRZUU9zRyRNK',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-gy4pEscXEMRA8ySocm2qY9-SXKiQffvVp2KRG7gRRvx54',\n", " 'messageId': 'b83fe7e2-e033-463d-8d61-f58cfb787d92',\n", " 'filterKey': 'train-filter/bbbc021/mitoxantrone-filter.txt',\n", " 'trainId': 'gy4pEscXEMRA8ySocm2qY9',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-htHgfiEJXvwq4p5SKzfqcX-CRuZ5HmsDcDiYWHuRwkzts',\n", " 'messageId': '5116c4f5-ed33-400a-aabf-765dcb5f16b7',\n", " 'filterKey': 'train-filter/bbbc021/floxuridine-filter.txt',\n", " 'trainId': 'htHgfiEJXvwq4p5SKzfqcX',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-jknhGNHribQdQfmcXzSCTH-YhV8yobYP5G4FqjmgapLib',\n", " 'messageId': '5811f93d-52ae-4718-b2e9-f8ef16363661',\n", " 'filterKey': 'train-filter/bbbc021/camptothecin-filter.txt',\n", " 'trainId': 'jknhGNHribQdQfmcXzSCTH',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-kGLL5LP2RF2rvN1BgqVGYC-TBHREpM4LNJauHc8z34CEi',\n", " 'messageId': 'b2f8be0b-bea0-4713-9bba-cad60cba8996',\n", " 'filterKey': 'train-filter/bbbc021/PD-169316-filter.txt',\n", " 'trainId': 'kGLL5LP2RF2rvN1BgqVGYC',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-n57tLbFxJoJuAkuMmiTeJt-A4MwDNSu5W7J4FBgkSgRPW',\n", " 'messageId': 'ff2e4361-4e0c-4236-809d-bf92a6eade4b',\n", " 'filterKey': 'train-filter/bbbc021/etoposide-filter.txt',\n", " 'trainId': 'n57tLbFxJoJuAkuMmiTeJt',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-odEByheKEgLzd7pxxwJuhK-9M7JNFaQJCWTYUPENjYNi7',\n", " 'messageId': '23de3bc2-e5aa-4d52-b4f8-c462f8d31bd7',\n", " 'filterKey': 'train-filter/bbbc021/demecolcine-filter.txt',\n", " 'trainId': 'odEByheKEgLzd7pxxwJuhK',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'roiDepth': 1,\n", " 'trainingHyperparameters': {'backend': 'gloo',\n", " 'batch_size': 1,\n", " 'seed': 1,\n", " 'epochs': 2},\n", " 'roiHeight': 128,\n", " 'trainId': 'origin',\n", " 'inputHeight': 1024,\n", " 'inputWidth': 1280,\n", " 'comments': '',\n", " 'inputChannels': 3,\n", " 'imageMethodArn': 'arn:aws:batch:us-east-1:580829821648:job-definition/imagepreprocessingjobde-c7df1aec5a8b940:1',\n", " 'embeddingVectorLength': 256,\n", " 'trainingInstanceType': 'ml.g4dn.4xlarge',\n", " 'plateMethodArn': 'arn:aws:batch:us-east-1:580829821648:job-definition/platepreprocessingjobde-f44b2d3675e9fc4:1',\n", " 'inputDepth': 1,\n", " 'modelTrainingScriptBucket': 'bioimage-search-input',\n", " 'wellMethodArn': 'wellMethodArn-placeholder',\n", " 'imagePostMethodArn': 'imagePostMethodArn-placeholder',\n", " 'embeddingName': 'bbbc021',\n", " 'modelTrainingScriptKey': 'bbbc021-1-train-script.py',\n", " 'roiWidth': 128},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-p5tzajaTAAJU4XkCFanwKX-eu5yMnHao64rUACxzXaSfB',\n", " 'messageId': '9e32c8ac-78a4-48d3-ba06-5dfe751bf81e',\n", " 'filterKey': 'train-filter/bbbc021/AZ-C-filter.txt',\n", " 'trainId': 'p5tzajaTAAJU4XkCFanwKX',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-pbuH8k6n7X1f85wbZig2b1-ViJwoNW2gPt6rnrB4UsaWu',\n", " 'messageId': 'e0ae0b13-09d1-4090-9be4-3c96c0178ced',\n", " 'filterKey': 'train-filter/bbbc021/MG-132-filter.txt',\n", " 'trainId': 'pbuH8k6n7X1f85wbZig2b1',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-pmKd7eDdjgfSgPHqm66cFM-KzPxHgVCtgqehNPng7YTGY',\n", " 'messageId': 'bc198355-4785-465f-b226-35dd0e2b5e27',\n", " 'filterKey': 'train-filter/bbbc021/lactacystin-filter.txt',\n", " 'trainId': 'pmKd7eDdjgfSgPHqm66cFM',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-qZU9GJ77LRqA2TdEnr6nTz-nymUUbMZamMW5ouDvsrXj6',\n", " 'messageId': '854a1192-9800-44fd-a9a7-64d378896561',\n", " 'filterKey': 'train-filter/bbbc021/mevinolin-lovastatin-filter.txt',\n", " 'trainId': 'qZU9GJ77LRqA2TdEnr6nTz',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-rE3D2myZgJ8hKnSCb8aRMT-kzdBvf7DWHiUYMoHqnc4Nr',\n", " 'messageId': 'dcd9f453-06b5-4b86-8e50-ab86d6c9856e',\n", " 'filterKey': 'train-filter/bbbc021/anisomycin-filter.txt',\n", " 'trainId': 'rE3D2myZgJ8hKnSCb8aRMT',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-rViqQ833enfkBrZESAAADn-d9YK83xqRs7i8AUYGXWF34',\n", " 'messageId': '025f8160-5401-414d-8cc5-27554c9bb67c',\n", " 'filterKey': 'train-filter/bbbc021/bryostatin-filter.txt',\n", " 'trainId': 'rViqQ833enfkBrZESAAADn',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-rwd7367RPqAr1LDWsnSE6b-9yHwmaCGUGgAqUXvSjDpVz',\n", " 'messageId': '7a234ad1-e428-4842-9fcd-7057b56f3821',\n", " 'filterKey': 'train-filter/bbbc021/nocodazole-filter.txt',\n", " 'trainId': 'rwd7367RPqAr1LDWsnSE6b',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-skLDWd41qhB5Wt4vYabdjB-9Bm54PQRTJtzJdUzmEHAeb',\n", " 'messageId': '832223bc-1002-4613-9af5-9141f7877f27',\n", " 'filterKey': 'train-filter/bbbc021/AZ-U-filter.txt',\n", " 'trainId': 'skLDWd41qhB5Wt4vYabdjB',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-t3cn8mXr3VouBDNh8zYHJb-RoEELUze379HYEHwqxTeD7',\n", " 'messageId': '116cdb51-336a-4eb1-9b1a-e33f7016facb',\n", " 'filterKey': 'train-filter/bbbc021/latrunculinB-filter.txt',\n", " 'trainId': 't3cn8mXr3VouBDNh8zYHJb',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-t6LfEkNbinGSRvFThhiVwc-dqHhy65kY3MWHZiBk9b2s7',\n", " 'messageId': 'e4c1a07f-ac6f-4975-a9f8-91d44ef91cca',\n", " 'filterKey': 'train-filter/bbbc021/taxol-filter.txt',\n", " 'trainId': 't6LfEkNbinGSRvFThhiVwc',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-t7gCc4W89ZzqE1Fik6JobC-GMGUkQHAnktR5Dh5YNhAMY',\n", " 'messageId': 'a279349d-592a-413c-89c6-f20640c54eab',\n", " 'filterKey': 'train-filter/bbbc021/emetine-filter.txt',\n", " 'trainId': 't7gCc4W89ZzqE1Fik6JobC',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-tBZsAwLBr5tvowghSa2YYk-6WhT3xZCdB36K7UYNiFqW2',\n", " 'messageId': '38a29f4a-851a-43e6-9dd9-c54934e80589',\n", " 'filterKey': 'train-filter/bbbc021/AZ258-filter.txt',\n", " 'trainId': 'tBZsAwLBr5tvowghSa2YYk',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-taJrCuSEAJm5CSKLKNjjsk-gz7jjrzLQcX4TfPYDXwmaB',\n", " 'messageId': '5f1cb7d9-98fc-4a8f-80ac-d4c91c4f3eaa',\n", " 'filterKey': 'train-filter/bbbc021/cisplatin-filter.txt',\n", " 'trainId': 'taJrCuSEAJm5CSKLKNjjsk',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-ttWJoVUuu4GNmSJu6g91Vh-URHSgU9T47kD5xbzPCvrVZ',\n", " 'messageId': '1a07bc00-3894-4a58-bec4-7b1deef95450',\n", " 'filterKey': 'train-filter/bbbc021/vincristine-filter.txt',\n", " 'trainId': 'ttWJoVUuu4GNmSJu6g91Vh',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-vUiyUAgFjpXgPMiCQ5pHia-TYUqJr7YHwJ5naZn6SZy5v',\n", " 'messageId': '6302f008-803d-45d4-b125-18c00fc1f273',\n", " 'filterKey': 'train-filter/bbbc021/proteasomeinhibitorI-filter.txt',\n", " 'trainId': 'vUiyUAgFjpXgPMiCQ5pHia',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-wyudxQaKEvpFzeF8VF5wCF-h4Mr2Qzt8FLfjGvWEE2xbL',\n", " 'messageId': '0ab1be9d-c290-4062-9b31-67bc9db411eb',\n", " 'filterKey': 'train-filter/bbbc021/AZ841-filter.txt',\n", " 'trainId': 'wyudxQaKEvpFzeF8VF5wCF',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-xpvG7GXwz7iddSf6TteQr9-3J67tT2fF7T3FEHBeYwpJ2',\n", " 'messageId': '07ecced2-22ac-419c-a152-999cfdc7d7ec',\n", " 'filterKey': 'train-filter/bbbc021/AZ138-filter.txt',\n", " 'trainId': 'xpvG7GXwz7iddSf6TteQr9',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'},\n", " {'filterBucket': 'bioimage-search-input',\n", " 'sagemakerJobName': 'bioims-xv7SWrWkRDHFUtwcLn8BAr-6cSbY3aA2Yr3KnGnTtaVzX',\n", " 'messageId': '958827cc-94d2-4804-97b2-a46adb92dab7',\n", " 'filterKey': 'train-filter/bbbc021/methotrexate-filter.txt',\n", " 'trainId': 'xv7SWrWkRDHFUtwcLn8BAr',\n", " 'embeddingName': 'bbbc021',\n", " 'executeProcessPlate': 'false'}]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trainList" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'PP-2': 'Epithelial',\n", " 'emetine': 'Protein synthesis',\n", " 'AZ258': 'Aurora kinase inhibitors',\n", " 'cytochalasin B': 'Actin disruptors',\n", " 'ALLN': 'Protein degradation',\n", " 'mitoxantrone': 'DNA replication',\n", " 'AZ-C': 'Eg5 inhibitors',\n", " 'MG-132': 'Protein degradation',\n", " 'AZ841': 'Aurora kinase inhibitors',\n", " 'docetaxel': 'Microtubule stabilizers',\n", " 'mitomycin C': 'DNA damage',\n", " 'PD-169316': 'Kinase inhibitors',\n", " 'proteasome inhibitor I': 'Protein degradation',\n", " 'vincristine': 'Microtubule destabilizers',\n", " 'AZ138': 'Eg5 inhibitors',\n", " 'demecolcine': 'Microtubule destabilizers',\n", " 'mevinolin/lovastatin': 'Cholesterol-lowering',\n", " 'AZ-A': 'Aurora kinase inhibitors',\n", " 'alsterpaullone': 'Kinase inhibitors',\n", " 'etoposide': 'DNA damage',\n", " 'floxuridine': 'DNA replication',\n", " 'AZ-U': 'Epithelial',\n", " 'simvastatin': 'Cholesterol-lowering',\n", " 'anisomycin': 'Protein synthesis',\n", " 'nocodazole': 'Microtubule destabilizers',\n", " 'AZ-J': 'Epithelial',\n", " 'taxol': 'Microtubule stabilizers',\n", " 'camptothecin': 'DNA replication',\n", " 'epothilone B': 'Microtubule stabilizers',\n", " 'latrunculin B': 'Actin disruptors',\n", " 'cyclohexamide': 'Protein synthesis',\n", " 'methotrexate': 'DNA replication',\n", " 'colchicine': 'Microtubule destabilizers',\n", " 'cisplatin': 'DNA damage',\n", " 'DMSO': 'DMSO',\n", " 'cytochalasin D': 'Actin disruptors',\n", " 'chlorambucil': 'DNA damage',\n", " 'bryostatin': 'Kinase inhibitors',\n", " 'lactacystin': 'Protein degradation'}" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compound_moa_map" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def getCompoundLabel(compound): \n", " cnws =\"\".join(compound.split())\n", " return cnws.replace('/','-')" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "label_moa_map = {}\n", "labelCountMap = {}\n", "for c, m in compound_moa_map.items():\n", " label = getCompoundLabel(c)\n", " label_moa_map[label] = m\n", " labelCountMap[label]=compoundCountMap[c]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'PP-2': 'Epithelial',\n", " 'emetine': 'Protein synthesis',\n", " 'AZ258': 'Aurora kinase inhibitors',\n", " 'cytochalasinB': 'Actin disruptors',\n", " 'ALLN': 'Protein degradation',\n", " 'mitoxantrone': 'DNA replication',\n", " 'AZ-C': 'Eg5 inhibitors',\n", " 'MG-132': 'Protein degradation',\n", " 'AZ841': 'Aurora kinase inhibitors',\n", " 'docetaxel': 'Microtubule stabilizers',\n", " 'mitomycinC': 'DNA damage',\n", " 'PD-169316': 'Kinase inhibitors',\n", " 'proteasomeinhibitorI': 'Protein degradation',\n", " 'vincristine': 'Microtubule destabilizers',\n", " 'AZ138': 'Eg5 inhibitors',\n", " 'demecolcine': 'Microtubule destabilizers',\n", " 'mevinolin-lovastatin': 'Cholesterol-lowering',\n", " 'AZ-A': 'Aurora kinase inhibitors',\n", " 'alsterpaullone': 'Kinase inhibitors',\n", " 'etoposide': 'DNA damage',\n", " 'floxuridine': 'DNA replication',\n", " 'AZ-U': 'Epithelial',\n", " 'simvastatin': 'Cholesterol-lowering',\n", " 'anisomycin': 'Protein synthesis',\n", " 'nocodazole': 'Microtubule destabilizers',\n", " 'AZ-J': 'Epithelial',\n", " 'taxol': 'Microtubule stabilizers',\n", " 'camptothecin': 'DNA replication',\n", " 'epothiloneB': 'Microtubule stabilizers',\n", " 'latrunculinB': 'Actin disruptors',\n", " 'cyclohexamide': 'Protein synthesis',\n", " 'methotrexate': 'DNA replication',\n", " 'colchicine': 'Microtubule destabilizers',\n", " 'cisplatin': 'DNA damage',\n", " 'DMSO': 'DMSO',\n", " 'cytochalasinD': 'Actin disruptors',\n", " 'chlorambucil': 'DNA damage',\n", " 'bryostatin': 'Kinase inhibitors',\n", " 'lactacystin': 'Protein degradation'}" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "label_moa_map" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "train_compoundLabel_map = {}" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "train-filter/bbbc021/ALLN-filter.txt\n", "['train-filter', 'bbbc021', 'ALLN-filter.txt']\n", "['ALLN', '.txt']\n", "2KrMFC136oXVYJ7YCpNfr6\n", "train-filter/bbbc021/AZ-J-filter.txt\n", "['train-filter', 'bbbc021', 'AZ-J-filter.txt']\n", "['AZ-J', '.txt']\n", "42s7EYRjYWfW3Ly9gxPUzk\n", "train-filter/bbbc021/PP-2-filter.txt\n", "['train-filter', 'bbbc021', 'PP-2-filter.txt']\n", "['PP-2', '.txt']\n", "4CiT9BNcMV7ZftmY7YWArf\n", "train-filter/bbbc021/cytochalasinD-filter.txt\n", "['train-filter', 'bbbc021', 'cytochalasinD-filter.txt']\n", "['cytochalasinD', '.txt']\n", "6he8YfdaT4eLrspDsnJyFe\n", "train-filter/bbbc021/alsterpaullone-filter.txt\n", "['train-filter', 'bbbc021', 'alsterpaullone-filter.txt']\n", "['alsterpaullone', '.txt']\n", "7HRSPABX4n2rAoMs5LVwD8\n", "train-filter/bbbc021/simvastatin-filter.txt\n", "['train-filter', 'bbbc021', 'simvastatin-filter.txt']\n", "['simvastatin', '.txt']\n", "7btuhRyHiQFqhp27Hyh5EW\n", "train-filter/bbbc021/cytochalasinB-filter.txt\n", "['train-filter', 'bbbc021', 'cytochalasinB-filter.txt']\n", "['cytochalasinB', '.txt']\n", "7fTJ1Qjq5RJmk2kPZZ3t8R\n", "train-filter/bbbc021/docetaxel-filter.txt\n", "['train-filter', 'bbbc021', 'docetaxel-filter.txt']\n", "['docetaxel', '.txt']\n", "8CDun7CBUmC4gAz6MjvPCp\n", "train-filter/bbbc021/AZ-A-filter.txt\n", "['train-filter', 'bbbc021', 'AZ-A-filter.txt']\n", "['AZ-A', '.txt']\n", "9JyEVkSyapQuPfttU3Zuy6\n", "train-filter/bbbc021/epothiloneB-filter.txt\n", "['train-filter', 'bbbc021', 'epothiloneB-filter.txt']\n", "['epothiloneB', '.txt']\n", "a5TDDad6FBHpcz7uWw8nhx\n", "train-filter/bbbc021/cyclohexamide-filter.txt\n", "['train-filter', 'bbbc021', 'cyclohexamide-filter.txt']\n", "['cyclohexamide', '.txt']\n", "c9aVZXAoCg74i8QsAMPEQP\n", "train-filter/bbbc021/mitomycinC-filter.txt\n", "['train-filter', 'bbbc021', 'mitomycinC-filter.txt']\n", "['mitomycinC', '.txt']\n", "dcAeCFupHshz3dbeZiCCQk\n", "train-filter/bbbc021/colchicine-filter.txt\n", "['train-filter', 'bbbc021', 'colchicine-filter.txt']\n", "['colchicine', '.txt']\n", "g8DMAt72M4VUkoJTgF83n8\n", "train-filter/bbbc021/chlorambucil-filter.txt\n", "['train-filter', 'bbbc021', 'chlorambucil-filter.txt']\n", "['chlorambucil', '.txt']\n", "gR5FE1YpCTKRZUU9zRyRNK\n", "train-filter/bbbc021/mitoxantrone-filter.txt\n", "['train-filter', 'bbbc021', 'mitoxantrone-filter.txt']\n", "['mitoxantrone', '.txt']\n", "gy4pEscXEMRA8ySocm2qY9\n", "train-filter/bbbc021/floxuridine-filter.txt\n", "['train-filter', 'bbbc021', 'floxuridine-filter.txt']\n", "['floxuridine', '.txt']\n", "htHgfiEJXvwq4p5SKzfqcX\n", "train-filter/bbbc021/camptothecin-filter.txt\n", "['train-filter', 'bbbc021', 'camptothecin-filter.txt']\n", "['camptothecin', '.txt']\n", "jknhGNHribQdQfmcXzSCTH\n", "train-filter/bbbc021/PD-169316-filter.txt\n", "['train-filter', 'bbbc021', 'PD-169316-filter.txt']\n", "['PD-169316', '.txt']\n", "kGLL5LP2RF2rvN1BgqVGYC\n", "train-filter/bbbc021/etoposide-filter.txt\n", "['train-filter', 'bbbc021', 'etoposide-filter.txt']\n", "['etoposide', '.txt']\n", "n57tLbFxJoJuAkuMmiTeJt\n", "train-filter/bbbc021/demecolcine-filter.txt\n", "['train-filter', 'bbbc021', 'demecolcine-filter.txt']\n", "['demecolcine', '.txt']\n", "odEByheKEgLzd7pxxwJuhK\n", "train-filter/bbbc021/AZ-C-filter.txt\n", "['train-filter', 'bbbc021', 'AZ-C-filter.txt']\n", "['AZ-C', '.txt']\n", "p5tzajaTAAJU4XkCFanwKX\n", "train-filter/bbbc021/MG-132-filter.txt\n", "['train-filter', 'bbbc021', 'MG-132-filter.txt']\n", "['MG-132', '.txt']\n", "pbuH8k6n7X1f85wbZig2b1\n", "train-filter/bbbc021/lactacystin-filter.txt\n", "['train-filter', 'bbbc021', 'lactacystin-filter.txt']\n", "['lactacystin', '.txt']\n", "pmKd7eDdjgfSgPHqm66cFM\n", "train-filter/bbbc021/mevinolin-lovastatin-filter.txt\n", "['train-filter', 'bbbc021', 'mevinolin-lovastatin-filter.txt']\n", "['mevinolin-lovastatin', '.txt']\n", "qZU9GJ77LRqA2TdEnr6nTz\n", "train-filter/bbbc021/anisomycin-filter.txt\n", "['train-filter', 'bbbc021', 'anisomycin-filter.txt']\n", "['anisomycin', '.txt']\n", "rE3D2myZgJ8hKnSCb8aRMT\n", "train-filter/bbbc021/bryostatin-filter.txt\n", "['train-filter', 'bbbc021', 'bryostatin-filter.txt']\n", "['bryostatin', '.txt']\n", "rViqQ833enfkBrZESAAADn\n", "train-filter/bbbc021/nocodazole-filter.txt\n", "['train-filter', 'bbbc021', 'nocodazole-filter.txt']\n", "['nocodazole', '.txt']\n", "rwd7367RPqAr1LDWsnSE6b\n", "train-filter/bbbc021/AZ-U-filter.txt\n", "['train-filter', 'bbbc021', 'AZ-U-filter.txt']\n", "['AZ-U', '.txt']\n", "skLDWd41qhB5Wt4vYabdjB\n", "train-filter/bbbc021/latrunculinB-filter.txt\n", "['train-filter', 'bbbc021', 'latrunculinB-filter.txt']\n", "['latrunculinB', '.txt']\n", "t3cn8mXr3VouBDNh8zYHJb\n", "train-filter/bbbc021/taxol-filter.txt\n", "['train-filter', 'bbbc021', 'taxol-filter.txt']\n", "['taxol', '.txt']\n", "t6LfEkNbinGSRvFThhiVwc\n", "train-filter/bbbc021/emetine-filter.txt\n", "['train-filter', 'bbbc021', 'emetine-filter.txt']\n", "['emetine', '.txt']\n", "t7gCc4W89ZzqE1Fik6JobC\n", "train-filter/bbbc021/AZ258-filter.txt\n", "['train-filter', 'bbbc021', 'AZ258-filter.txt']\n", "['AZ258', '.txt']\n", "tBZsAwLBr5tvowghSa2YYk\n", "train-filter/bbbc021/cisplatin-filter.txt\n", "['train-filter', 'bbbc021', 'cisplatin-filter.txt']\n", "['cisplatin', '.txt']\n", "taJrCuSEAJm5CSKLKNjjsk\n", "train-filter/bbbc021/vincristine-filter.txt\n", "['train-filter', 'bbbc021', 'vincristine-filter.txt']\n", "['vincristine', '.txt']\n", "ttWJoVUuu4GNmSJu6g91Vh\n", "train-filter/bbbc021/proteasomeinhibitorI-filter.txt\n", "['train-filter', 'bbbc021', 'proteasomeinhibitorI-filter.txt']\n", "['proteasomeinhibitorI', '.txt']\n", "vUiyUAgFjpXgPMiCQ5pHia\n", "train-filter/bbbc021/AZ841-filter.txt\n", "['train-filter', 'bbbc021', 'AZ841-filter.txt']\n", "['AZ841', '.txt']\n", "wyudxQaKEvpFzeF8VF5wCF\n", "train-filter/bbbc021/AZ138-filter.txt\n", "['train-filter', 'bbbc021', 'AZ138-filter.txt']\n", "['AZ138', '.txt']\n", "xpvG7GXwz7iddSf6TteQr9\n", "train-filter/bbbc021/methotrexate-filter.txt\n", "['train-filter', 'bbbc021', 'methotrexate-filter.txt']\n", "['methotrexate', '.txt']\n", "xv7SWrWkRDHFUtwcLn8BAr\n" ] } ], "source": [ "for trainInfo in trainList:\n", " if 'filterKey' in trainInfo and len(trainInfo['filterKey'])>0:\n", " filterKey = trainInfo['filterKey']\n", " print(filterKey)\n", " a1=filterKey.split('/')\n", " print(a1)\n", " a2=a1[2].split(\"-filter\")\n", " print(a2)\n", " trainId = trainInfo['trainId']\n", " print(trainId)\n", " train_compoundLabel_map[trainId]=a2[0]" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'2KrMFC136oXVYJ7YCpNfr6': 'ALLN',\n", " '42s7EYRjYWfW3Ly9gxPUzk': 'AZ-J',\n", " '4CiT9BNcMV7ZftmY7YWArf': 'PP-2',\n", " '6he8YfdaT4eLrspDsnJyFe': 'cytochalasinD',\n", " '7HRSPABX4n2rAoMs5LVwD8': 'alsterpaullone',\n", " '7btuhRyHiQFqhp27Hyh5EW': 'simvastatin',\n", " '7fTJ1Qjq5RJmk2kPZZ3t8R': 'cytochalasinB',\n", " '8CDun7CBUmC4gAz6MjvPCp': 'docetaxel',\n", " '9JyEVkSyapQuPfttU3Zuy6': 'AZ-A',\n", " 'a5TDDad6FBHpcz7uWw8nhx': 'epothiloneB',\n", " 'c9aVZXAoCg74i8QsAMPEQP': 'cyclohexamide',\n", " 'dcAeCFupHshz3dbeZiCCQk': 'mitomycinC',\n", " 'g8DMAt72M4VUkoJTgF83n8': 'colchicine',\n", " 'gR5FE1YpCTKRZUU9zRyRNK': 'chlorambucil',\n", " 'gy4pEscXEMRA8ySocm2qY9': 'mitoxantrone',\n", " 'htHgfiEJXvwq4p5SKzfqcX': 'floxuridine',\n", " 'jknhGNHribQdQfmcXzSCTH': 'camptothecin',\n", " 'kGLL5LP2RF2rvN1BgqVGYC': 'PD-169316',\n", " 'n57tLbFxJoJuAkuMmiTeJt': 'etoposide',\n", " 'odEByheKEgLzd7pxxwJuhK': 'demecolcine',\n", " 'p5tzajaTAAJU4XkCFanwKX': 'AZ-C',\n", " 'pbuH8k6n7X1f85wbZig2b1': 'MG-132',\n", " 'pmKd7eDdjgfSgPHqm66cFM': 'lactacystin',\n", " 'qZU9GJ77LRqA2TdEnr6nTz': 'mevinolin-lovastatin',\n", " 'rE3D2myZgJ8hKnSCb8aRMT': 'anisomycin',\n", " 'rViqQ833enfkBrZESAAADn': 'bryostatin',\n", " 'rwd7367RPqAr1LDWsnSE6b': 'nocodazole',\n", " 'skLDWd41qhB5Wt4vYabdjB': 'AZ-U',\n", " 't3cn8mXr3VouBDNh8zYHJb': 'latrunculinB',\n", " 't6LfEkNbinGSRvFThhiVwc': 'taxol',\n", " 't7gCc4W89ZzqE1Fik6JobC': 'emetine',\n", " 'tBZsAwLBr5tvowghSa2YYk': 'AZ258',\n", " 'taJrCuSEAJm5CSKLKNjjsk': 'cisplatin',\n", " 'ttWJoVUuu4GNmSJu6g91Vh': 'vincristine',\n", " 'vUiyUAgFjpXgPMiCQ5pHia': 'proteasomeinhibitorI',\n", " 'wyudxQaKEvpFzeF8VF5wCF': 'AZ841',\n", " 'xpvG7GXwz7iddSf6TteQr9': 'AZ138',\n", " 'xv7SWrWkRDHFUtwcLn8BAr': 'methotrexate'}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_compoundLabel_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check that the counts match, we leave out the control DMSO:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(train_compoundLabel_map)==len(compound_moa_map)-1" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "tagClient = bi.client(\"tag\")" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "tagList = tagClient.getAllTags()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "compoundLabel_tag_map = {}\n", "for tag in tagList:\n", " id = tag['id']\n", " value = tag['tagValue']\n", " type = tag['tagType']\n", " if (value.startswith('compound:')):\n", " a1 = value.split(\":\")\n", " compoundLabel_tag_map[a1[1]]=id" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'AZ-U': 18,\n", " 'taxol': 51,\n", " 'alsterpaullone': 26,\n", " 'cyclohexamide': 33,\n", " 'PP-2': 25,\n", " 'camptothecin': 29,\n", " 'floxuridine': 41,\n", " 'PD-169316': 24,\n", " 'demecolcine': 36,\n", " 'anisomycin': 27,\n", " 'mitoxantrone': 47,\n", " 'cytochalasinB': 34,\n", " 'simvastatin': 50,\n", " 'AZ138': 19,\n", " 'AZ258': 20,\n", " 'bryostatin': 28,\n", " 'latrunculinB': 43,\n", " 'proteasomeinhibitorI': 49,\n", " 'methotrexate': 44,\n", " 'AZ-C': 16,\n", " 'nocodazole': 48,\n", " 'vincristine': 52,\n", " 'docetaxel': 37,\n", " 'colchicine': 32,\n", " 'AZ841': 21,\n", " 'MG-132': 23,\n", " 'etoposide': 40,\n", " 'lactacystin': 42,\n", " 'AZ-A': 15,\n", " 'DMSO': 22,\n", " 'cytochalasinD': 35,\n", " 'chlorambucil': 30,\n", " 'epothiloneB': 39,\n", " 'ALLN': 14,\n", " 'emetine': 38,\n", " 'mevinolin-lovastatin': 45,\n", " 'mitomycinC': 46,\n", " 'cisplatin': 31,\n", " 'AZ-J': 17}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compoundLabel_tag_map" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "searchClient = bi.client(\"search\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use the search service to construct a histogram of the distribution of matches to MOAs, where we pool the results for the images of a \"left out\" treatment. Here we survey across a range of pick values (which in practice shows remarkable insensitivity)." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "def getMoaHistogram(trainId, leftOutCompoundLabel=''):\n", " testSequence = []\n", "# Uncomment to observe the invariance of this parameter\n", "# for j in range(1,31):\n", "# testSequence.append(j)\n", " testSequence.append(10)\n", " print(\"***\")\n", " print(trainId)\n", " if leftOutCompoundLabel == '':\n", " leftOutCompoundLabel=train_compoundLabel_map[trainId]\n", " print(leftOutCompoundLabel)\n", " leftOutMoa = label_moa_map[leftOutCompoundLabel]\n", " print(leftOutMoa)\n", " print(\"===\")\n", " imageInfoMap={}\n", " dmsoTag = compoundLabel_tag_map['DMSO']\n", " searchPlateMap = {}\n", " searchCount=0\n", " imageListPlateMap={}\n", " for plate in plateList:\n", " plateId = plate['plateId']\n", " #print(\"plate {}\".format(plateId))\n", " images = imageClient.getImagesByPlateId(plateId)\n", " imageListPlateMap[plateId] = images\n", " print(\"Start search\")\n", " for plate in plateList:\n", " plateId = plate['plateId']\n", " images = imageListPlateMap[plateId]\n", " searchResponses = []\n", " for image in images:\n", " imageSourceId = image['Item']['imageSourceId']\n", " imageId = image['Item']['imageId']\n", " compound = sourceCompoundMap[imageSourceId]\n", " compoundLabel = getCompoundLabel(compound)\n", " concentration = sourceConcentrationMap[imageSourceId]\n", " if compoundLabel==leftOutCompoundLabel:\n", " #print(\"{} {} {} {}\".format(imageId, compound, compoundLabel, concentration))\n", " exclusionTags = []\n", " tag = compoundLabel_tag_map[compoundLabel]\n", " exclusionTags.append(tag)\n", " exclusionTags.append(dmsoTag)\n", " search = {\n", " \"trainId\" : trainId,\n", " \"queryImageId\" : imageId,\n", " \"exclusionTags\" : exclusionTags,\n", " \"requireMoa\" : \"true\",\n", " \"metric\" : \"Cosine\"\n", " }\n", " #print(search)\n", " searchResponse = searchClient.submitSearch(search)\n", " searchCount += 1\n", " searchResponses.append(searchResponse)\n", " searchPlateMap[plateId] = searchResponses\n", " searchResultsMap={}\n", " resultCount=0\n", " for plate in plateList:\n", " plateId = plate['plateId']\n", " searchResponses = searchPlateMap[plateId]\n", " for searchResponse in searchResponses:\n", " searchId = searchResponse['searchId']\n", " statusValue = 'submitted'\n", " while statusValue != 'completed' and statusValue != 'error':\n", " sleep(1)\n", " searchStatus = searchClient.getSearchStatus(searchId)\n", " statusValue = searchStatus['Item']['status']\n", " if statusValue == 'completed':\n", " searchResults = searchClient.getSearchResults(searchId)\n", " if plateId not in searchResultsMap:\n", " searchResultsMap[plateId] = []\n", " searchResultsMap[plateId].append(searchResults)\n", " resultCount += 1\n", " # Note, these values will not always match because not all images have\n", " # qualified ROIs from which an embedding can be calculated to serve\n", " # as a query.\n", " print(\"searchCount={} resultCount={}\".format(searchCount, resultCount))\n", " for testCount in testSequence:\n", " moaBinCounts = {}\n", " hitCount=0\n", " binCount=0\n", " for plate in plateList:\n", " plateId = plate['plateId']\n", " if plateId in searchResultsMap:\n", " searchResultsList = searchResultsMap[plateId]\n", " for searchResults in searchResultsList:\n", " for i in range(testCount):\n", " hitCount += 1\n", " searchResult = searchResults[i]\n", " hitImageId = searchResult['imageId']\n", " if hitImageId not in imageInfoMap:\n", " imageInfo = imageClient.getImageInfo(hitImageId, 'origin')\n", " imageInfoMap[hitImageId]=imageInfo\n", " imageInfo=imageInfoMap[hitImageId]\n", " imageSourceId = imageInfo['Item']['imageSourceId']\n", " hitCompound = sourceCompoundMap[imageSourceId]\n", " if hitCompound in compound_moa_map:\n", " moa = compound_moa_map[hitCompound]\n", " else:\n", " moa = \"unknown\"\n", " if moa in moaBinCounts:\n", " c = moaBinCounts[moa]\n", " c += 1\n", " binCount += 1\n", " moaBinCounts[moa] = c\n", " else:\n", " binCount += 1\n", " moaBinCounts[moa] = 1\n", " print(\"hitCount={} binCount={}\".format(hitCount, binCount))\n", " labelCount = labelCountMap[leftOutCompoundLabel]\n", " labelMoaCount = moaCountMap[leftOutMoa]\n", " adjustedLabelMoaCount = labelMoaCount - labelCount\n", " bestMoa=''\n", " bestScore=0.0\n", " for moa in moaBinCounts:\n", " c = moaBinCounts[moa]\n", " m = moaCountMap[moa]\n", " if moa == leftOutMoa:\n", " n = c / adjustedLabelMoaCount\n", " else:\n", " n = c / m\n", " if n > bestScore:\n", " bestMoa=moa\n", " bestScore=n\n", " elif n == bestScore and moa==leftOutMoa:\n", " bestMoa=moa\n", " bestScore=n\n", " for moa in moaBinCounts:\n", " c = moaBinCounts[moa]\n", " m = moaCountMap[moa]\n", " if moa == leftOutMoa:\n", " n = c / adjustedLabelMoaCount\n", " else:\n", " n = c / m\n", " if moa==bestMoa:\n", " print(\"{}> {} {} {}\".format(testCount, moa, c, n))\n", " else:\n", " print(\"{} {} {} {}\".format(testCount, moa, c, n))\n", " # Comment out below if observing multiple parameter values\n", " if bestMoa==leftOutMoa:\n", " return 1\n", " else:\n", " return 0" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "trainIdList = []\n", "for trainInfo in trainList:\n", " trainId = trainInfo['trainId']\n", " if trainId!='origin' and trainId!=BASELINE_TRAIN_ID:\n", " trainIdList.append(trainInfo['trainId'])\n", "trainIdList.sort()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['2KrMFC136oXVYJ7YCpNfr6',\n", " '42s7EYRjYWfW3Ly9gxPUzk',\n", " '4CiT9BNcMV7ZftmY7YWArf',\n", " '6he8YfdaT4eLrspDsnJyFe',\n", " '7HRSPABX4n2rAoMs5LVwD8',\n", " '7btuhRyHiQFqhp27Hyh5EW',\n", " '7fTJ1Qjq5RJmk2kPZZ3t8R',\n", " '8CDun7CBUmC4gAz6MjvPCp',\n", " '9JyEVkSyapQuPfttU3Zuy6',\n", " 'a5TDDad6FBHpcz7uWw8nhx',\n", " 'c9aVZXAoCg74i8QsAMPEQP',\n", " 'dcAeCFupHshz3dbeZiCCQk',\n", " 'g8DMAt72M4VUkoJTgF83n8',\n", " 'gR5FE1YpCTKRZUU9zRyRNK',\n", " 'gy4pEscXEMRA8ySocm2qY9',\n", " 'htHgfiEJXvwq4p5SKzfqcX',\n", " 'jknhGNHribQdQfmcXzSCTH',\n", " 'kGLL5LP2RF2rvN1BgqVGYC',\n", " 'n57tLbFxJoJuAkuMmiTeJt',\n", " 'odEByheKEgLzd7pxxwJuhK',\n", " 'p5tzajaTAAJU4XkCFanwKX',\n", " 'pbuH8k6n7X1f85wbZig2b1',\n", " 'pmKd7eDdjgfSgPHqm66cFM',\n", " 'qZU9GJ77LRqA2TdEnr6nTz',\n", " 'rE3D2myZgJ8hKnSCb8aRMT',\n", " 'rViqQ833enfkBrZESAAADn',\n", " 'rwd7367RPqAr1LDWsnSE6b',\n", " 'skLDWd41qhB5Wt4vYabdjB',\n", " 't3cn8mXr3VouBDNh8zYHJb',\n", " 't6LfEkNbinGSRvFThhiVwc',\n", " 't7gCc4W89ZzqE1Fik6JobC',\n", " 'tBZsAwLBr5tvowghSa2YYk',\n", " 'taJrCuSEAJm5CSKLKNjjsk',\n", " 'ttWJoVUuu4GNmSJu6g91Vh',\n", " 'vUiyUAgFjpXgPMiCQ5pHia',\n", " 'wyudxQaKEvpFzeF8VF5wCF',\n", " 'xpvG7GXwz7iddSf6TteQr9',\n", " 'xv7SWrWkRDHFUtwcLn8BAr']" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trainIdList" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "***\n", "2KrMFC136oXVYJ7YCpNfr6\n", "ALLN\n", "Protein degradation\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> Protein degradation 826 2.8680555555555554\n", "10 DNA damage 24 0.0625\n", "10 Protein synthesis 7 0.024305555555555556\n", "10 Microtubule stabilizers 38 0.0236318407960199\n", "10 DNA replication 60 0.15625\n", "10 Actin disruptors 1 0.003472222222222222\n", "10 Epithelial 1 0.00390625\n", "10 Cholesterol-lowering 3 0.015625\n", "2\n", "***\n", "42s7EYRjYWfW3Ly9gxPUzk\n", "AZ-J\n", "Epithelial\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> Epithelial 327 2.04375\n", "10 Protein synthesis 86 0.2986111111111111\n", "10 Microtubule stabilizers 37 0.023009950248756218\n", "10 DNA damage 450 1.171875\n", "10 Aurora kinase inhibitors 7 0.024305555555555556\n", "10 Protein degradation 34 0.08854166666666667\n", "10 Kinase inhibitors 4 0.020833333333333332\n", "10 Actin disruptors 15 0.052083333333333336\n", "3\n", "***\n", "4CiT9BNcMV7ZftmY7YWArf\n", "PP-2\n", "Epithelial\n", "===\n", "Start search\n", "searchCount=64 resultCount=64\n", "hitCount=640 binCount=640\n", "10 DNA replication 85 0.22135416666666666\n", "10> Cholesterol-lowering 448 2.3333333333333335\n", "10 Protein degradation 36 0.09375\n", "10 Protein synthesis 47 0.16319444444444445\n", "10 Microtubule stabilizers 20 0.012437810945273632\n", "10 Epithelial 3 0.015625\n", "10 DNA damage 1 0.0026041666666666665\n", "4\n", "***\n", "6he8YfdaT4eLrspDsnJyFe\n", "cytochalasinD\n", "Actin disruptors\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 Protein degradation 284 0.7395833333333334\n", "10 Microtubule stabilizers 99 0.061567164179104475\n", "10 DNA replication 10 0.026041666666666668\n", "10> Actin disruptors 374 1.9479166666666667\n", "10 Epithelial 52 0.203125\n", "10 Microtubule destabilizers 50 0.13020833333333334\n", "10 Protein synthesis 40 0.1388888888888889\n", "10 DNA damage 42 0.109375\n", "10 Cholesterol-lowering 9 0.046875\n", "5\n", "***\n", "7HRSPABX4n2rAoMs5LVwD8\n", "alsterpaullone\n", "Kinase inhibitors\n", "===\n", "Start search\n", "searchCount=64 resultCount=64\n", "hitCount=640 binCount=640\n", "10> Kinase inhibitors 601 4.6953125\n", "10 Epithelial 11 0.04296875\n", "10 DNA damage 24 0.0625\n", "10 Protein degradation 2 0.005208333333333333\n", "10 Aurora kinase inhibitors 2 0.006944444444444444\n", "6\n", "***\n", "7btuhRyHiQFqhp27Hyh5EW\n", "simvastatin\n", "Cholesterol-lowering\n", "===\n", "Start search\n", "searchCount=96 resultCount=95\n", "hitCount=950 binCount=950\n", "10 Protein degradation 514 1.3385416666666667\n", "10> Cholesterol-lowering 184 1.9166666666666667\n", "10 Epithelial 37 0.14453125\n", "10 Microtubule stabilizers 27 0.016791044776119403\n", "10 DNA replication 178 0.4635416666666667\n", "10 Microtubule destabilizers 10 0.026041666666666668\n", "7\n", "***\n", "7fTJ1Qjq5RJmk2kPZZ3t8R\n", "cytochalasinB\n", "Actin disruptors\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> Microtubule destabilizers 447 1.1640625\n", "10 Protein degradation 45 0.1171875\n", "10 DNA damage 109 0.2838541666666667\n", "10 Microtubule stabilizers 176 0.10945273631840796\n", "10 Actin disruptors 179 0.9322916666666666\n", "10 Protein synthesis 4 0.013888888888888888\n", "8\n", "***\n", "8CDun7CBUmC4gAz6MjvPCp\n", "docetaxel\n", "Microtubule stabilizers\n", "===\n", "Start search\n", "searchCount=96 resultCount=95\n", "hitCount=950 binCount=950\n", "10> Microtubule stabilizers 788 0.5211640211640212\n", "10 Protein degradation 48 0.125\n", "10 Microtubule destabilizers 39 0.1015625\n", "10 DNA damage 28 0.07291666666666667\n", "10 Protein synthesis 4 0.013888888888888888\n", "10 Epithelial 1 0.00390625\n", "10 Kinase inhibitors 40 0.20833333333333334\n", "10 Aurora kinase inhibitors 2 0.006944444444444444\n", "9\n", "***\n", "9JyEVkSyapQuPfttU3Zuy6\n", "AZ-A\n", "Aurora kinase inhibitors\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> Aurora kinase inhibitors 944 4.916666666666667\n", "10 Microtubule stabilizers 16 0.009950248756218905\n", "10\n", "***\n", "a5TDDad6FBHpcz7uWw8nhx\n", "epothiloneB\n", "Microtubule stabilizers\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 Microtubule stabilizers 511 0.33796296296296297\n", "10> Microtubule destabilizers 426 1.109375\n", "10 Actin disruptors 23 0.0798611111111111\n", "11\n", "***\n", "c9aVZXAoCg74i8QsAMPEQP\n", "cyclohexamide\n", "Protein synthesis\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> Protein synthesis 581 3.0260416666666665\n", "10 DNA damage 278 0.7239583333333334\n", "10 Epithelial 69 0.26953125\n", "10 Aurora kinase inhibitors 26 0.09027777777777778\n", "10 Protein degradation 5 0.013020833333333334\n", "10 Microtubule destabilizers 1 0.0026041666666666665\n", "12\n", "***\n", "dcAeCFupHshz3dbeZiCCQk\n", "mitomycinC\n", "DNA damage\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 Epithelial 47 0.18359375\n", "10> DNA damage 661 2.295138888888889\n", "10 Microtubule stabilizers 11 0.006840796019900498\n", "10 Aurora kinase inhibitors 79 0.2743055555555556\n", "10 Protein synthesis 142 0.4930555555555556\n", "10 Microtubule destabilizers 3 0.0078125\n", "10 Protein degradation 16 0.041666666666666664\n", "10 Kinase inhibitors 1 0.005208333333333333\n", "13\n", "***\n", "g8DMAt72M4VUkoJTgF83n8\n", "colchicine\n", "Microtubule destabilizers\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 Microtubule destabilizers 120 0.4166666666666667\n", "10 Microtubule stabilizers 335 0.20833333333333334\n", "10> Actin disruptors 460 1.5972222222222223\n", "10 DNA damage 25 0.06510416666666667\n", "10 Protein degradation 17 0.044270833333333336\n", "10 Kinase inhibitors 3 0.015625\n", "14\n", "***\n", "gR5FE1YpCTKRZUU9zRyRNK\n", "chlorambucil\n", "DNA damage\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> DNA damage 772 2.6805555555555554\n", "10 Protein synthesis 41 0.1423611111111111\n", "10 Epithelial 75 0.29296875\n", "10 Protein degradation 27 0.0703125\n", "10 Microtubule destabilizers 9 0.0234375\n", "10 Microtubule stabilizers 5 0.003109452736318408\n", "10 Kinase inhibitors 20 0.10416666666666667\n", "10 Aurora kinase inhibitors 11 0.03819444444444445\n", "15\n", "***\n", "gy4pEscXEMRA8ySocm2qY9\n", "mitoxantrone\n", "DNA replication\n", "===\n", "Start search\n", "searchCount=96 resultCount=89\n", "hitCount=890 binCount=890\n", "10> DNA replication 517 1.7951388888888888\n", "10 Microtubule stabilizers 96 0.05970149253731343\n", "10 Cholesterol-lowering 99 0.515625\n", "10 Protein degradation 98 0.2552083333333333\n", "10 DNA damage 49 0.12760416666666666\n", "10 Protein synthesis 17 0.059027777777777776\n", "10 Kinase inhibitors 10 0.052083333333333336\n", "10 Epithelial 4 0.015625\n", "16\n", "***\n", "htHgfiEJXvwq4p5SKzfqcX\n", "floxuridine\n", "DNA replication\n", "===\n", "Start search\n", "searchCount=96 resultCount=94\n", "hitCount=940 binCount=940\n", "10> DNA replication 858 2.9791666666666665\n", "10 Protein degradation 64 0.16666666666666666\n", "10 Cholesterol-lowering 18 0.09375\n", "17\n", "***\n", "jknhGNHribQdQfmcXzSCTH\n", "camptothecin\n", "DNA replication\n", "===\n", "Start search\n", "searchCount=96 resultCount=94\n", "hitCount=940 binCount=940\n", "10> Protein degradation 371 0.9661458333333334\n", "10 DNA replication 214 0.7430555555555556\n", "10 Protein synthesis 42 0.14583333333333334\n", "10 Microtubule stabilizers 186 0.11567164179104478\n", "10 DNA damage 104 0.2708333333333333\n", "10 Actin disruptors 3 0.010416666666666666\n", "10 Epithelial 10 0.0390625\n", "10 Eg5 inhibitors 10 0.052083333333333336\n", "18\n", "***\n", "kGLL5LP2RF2rvN1BgqVGYC\n", "PD-169316\n", "Kinase inhibitors\n", "===\n", "Start search\n", "searchCount=64 resultCount=64\n", "hitCount=640 binCount=640\n", "10> Kinase inhibitors 510 3.984375\n", "10 DNA damage 17 0.044270833333333336\n", "10 Epithelial 94 0.3671875\n", "10 Aurora kinase inhibitors 9 0.03125\n", "10 Protein synthesis 10 0.034722222222222224\n", "19\n", "***\n", "n57tLbFxJoJuAkuMmiTeJt\n", "etoposide\n", "DNA damage\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> DNA damage 643 2.232638888888889\n", "10 Microtubule destabilizers 50 0.13020833333333334\n", "10 Microtubule stabilizers 32 0.01990049751243781\n", "10 Protein degradation 153 0.3984375\n", "10 Aurora kinase inhibitors 23 0.0798611111111111\n", "10 Epithelial 18 0.0703125\n", "10 Protein synthesis 7 0.024305555555555556\n", "10 DNA replication 34 0.08854166666666667\n", "20\n", "***\n", "odEByheKEgLzd7pxxwJuhK\n", "demecolcine\n", "Microtubule destabilizers\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 Microtubule stabilizers 202 0.1256218905472637\n", "10> Microtubule destabilizers 758 2.6319444444444446\n", "21\n", "***\n", "p5tzajaTAAJU4XkCFanwKX\n", "AZ-C\n", "Eg5 inhibitors\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> Eg5 inhibitors 295 3.0729166666666665\n", "10 Microtubule stabilizers 492 0.30597014925373134\n", "10 DNA damage 104 0.2708333333333333\n", "10 Protein degradation 66 0.171875\n", "10 DNA replication 1 0.0026041666666666665\n", "10 Protein synthesis 2 0.006944444444444444\n", "22\n", "***\n", "pbuH8k6n7X1f85wbZig2b1\n", "MG-132\n", "Protein degradation\n", "===\n", "Start search\n", "searchCount=96 resultCount=83\n", "hitCount=830 binCount=830\n", "10 Kinase inhibitors 2 0.010416666666666666\n", "10 Microtubule stabilizers 64 0.03980099502487562\n", "10 Aurora kinase inhibitors 4 0.013888888888888888\n", "10> Protein degradation 544 1.8888888888888888\n", "10 DNA replication 103 0.2682291666666667\n", "10 DNA damage 63 0.1640625\n", "10 Actin disruptors 7 0.024305555555555556\n", "10 Protein synthesis 23 0.0798611111111111\n", "10 Epithelial 9 0.03515625\n", "10 Cholesterol-lowering 11 0.057291666666666664\n", "23\n", "***\n", "pmKd7eDdjgfSgPHqm66cFM\n", "lactacystin\n", "Protein degradation\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 Cholesterol-lowering 45 0.234375\n", "10> Protein degradation 316 1.0972222222222223\n", "10 Epithelial 105 0.41015625\n", "10 DNA damage 330 0.859375\n", "10 Microtubule stabilizers 21 0.013059701492537313\n", "10 Protein synthesis 23 0.0798611111111111\n", "10 Actin disruptors 12 0.041666666666666664\n", "10 DNA replication 60 0.15625\n", "10 Microtubule destabilizers 48 0.125\n", "24\n", "***\n", "qZU9GJ77LRqA2TdEnr6nTz\n", "mevinolin-lovastatin\n", "Cholesterol-lowering\n", "===\n", "Start search\n", "searchCount=96 resultCount=89\n", "hitCount=890 binCount=890\n", "10 Protein degradation 526 1.3697916666666667\n", "10 Epithelial 103 0.40234375\n", "10 Microtubule stabilizers 61 0.03793532338308458\n", "10 DNA replication 31 0.08072916666666667\n", "10> Cholesterol-lowering 148 1.5416666666666667\n", "10 Protein synthesis 13 0.04513888888888889\n", "10 DNA damage 7 0.018229166666666668\n", "10 Eg5 inhibitors 1 0.005208333333333333\n", "25\n", "***\n", "rE3D2myZgJ8hKnSCb8aRMT\n", "anisomycin\n", "Protein synthesis\n", "===\n", "Start search\n", "searchCount=96 resultCount=94\n", "hitCount=940 binCount=940\n", "10> Protein synthesis 377 1.9635416666666667\n", "10 Epithelial 138 0.5390625\n", "10 Kinase inhibitors 2 0.010416666666666666\n", "10 DNA damage 298 0.7760416666666666\n", "10 Aurora kinase inhibitors 8 0.027777777777777776\n", "10 Microtubule stabilizers 62 0.03855721393034826\n", "10 Actin disruptors 13 0.04513888888888889\n", "10 Protein degradation 35 0.09114583333333333\n", "10 DNA replication 7 0.018229166666666668\n", "26\n", "***\n", "rViqQ833enfkBrZESAAADn\n", "bryostatin\n", "Kinase inhibitors\n", "===\n", "Start search\n", "searchCount=64 resultCount=64\n", "hitCount=640 binCount=640\n", "10 DNA damage 74 0.19270833333333334\n", "10 Epithelial 112 0.4375\n", "10 Microtubule stabilizers 67 0.041666666666666664\n", "10 Microtubule destabilizers 4 0.010416666666666666\n", "10> Kinase inhibitors 346 2.703125\n", "10 Protein synthesis 9 0.03125\n", "10 Protein degradation 17 0.044270833333333336\n", "10 Aurora kinase inhibitors 11 0.03819444444444445\n", "27\n", "***\n", "rwd7367RPqAr1LDWsnSE6b\n", "nocodazole\n", "Microtubule destabilizers\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 Microtubule stabilizers 100 0.06218905472636816\n", "10> Microtubule destabilizers 624 2.1666666666666665\n", "10 Actin disruptors 185 0.6423611111111112\n", "10 Protein degradation 8 0.020833333333333332\n", "10 DNA damage 43 0.11197916666666667\n", "28\n", "***\n", "skLDWd41qhB5Wt4vYabdjB\n", "AZ-U\n", "Epithelial\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 DNA damage 112 0.2916666666666667\n", "10 Protein degradation 9 0.0234375\n", "10> Epithelial 416 2.6\n", "10 Protein synthesis 108 0.375\n", "10 Kinase inhibitors 302 1.5729166666666667\n", "10 Microtubule stabilizers 10 0.006218905472636816\n", "10 Aurora kinase inhibitors 3 0.010416666666666666\n", "29\n", "***\n", "t3cn8mXr3VouBDNh8zYHJb\n", "latrunculinB\n", "Actin disruptors\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 Microtubule destabilizers 445 1.1588541666666667\n", "10> Actin disruptors 330 1.71875\n", "10 Microtubule stabilizers 124 0.07711442786069651\n", "10 Epithelial 22 0.0859375\n", "10 DNA damage 8 0.020833333333333332\n", "10 Protein degradation 20 0.052083333333333336\n", "10 Kinase inhibitors 10 0.052083333333333336\n", "10 Aurora kinase inhibitors 1 0.003472222222222222\n", "30\n", "***\n", "t6LfEkNbinGSRvFThhiVwc\n", "taxol\n", "Microtubule stabilizers\n", "===\n", "Start search\n", "searchCount=1416 resultCount=1398\n", "hitCount=13980 binCount=13980\n", "10 Actin disruptors 559 1.9409722222222223\n", "10 Protein synthesis 102 0.3541666666666667\n", "10 Epithelial 169 0.66015625\n", "10 Protein degradation 1338 3.484375\n", "10 DNA damage 399 1.0390625\n", "10> Microtubule stabilizers 8320 43.333333333333336\n", "10 Microtubule destabilizers 1253 3.2630208333333335\n", "10 DNA replication 757 1.9713541666666667\n", "10 Cholesterol-lowering 116 0.6041666666666666\n", "10 Eg5 inhibitors 607 3.1614583333333335\n", "10 Kinase inhibitors 115 0.5989583333333334\n", "10 Aurora kinase inhibitors 245 0.8506944444444444\n", "31\n", "***\n", "t7gCc4W89ZzqE1Fik6JobC\n", "emetine\n", "Protein synthesis\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> Protein synthesis 715 3.7239583333333335\n", "10 DNA damage 141 0.3671875\n", "10 Protein degradation 28 0.07291666666666667\n", "10 Epithelial 59 0.23046875\n", "10 Microtubule stabilizers 14 0.008706467661691543\n", "10 Microtubule destabilizers 3 0.0078125\n", "32\n", "***\n", "tBZsAwLBr5tvowghSa2YYk\n", "AZ258\n", "Aurora kinase inhibitors\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 Protein synthesis 62 0.2152777777777778\n", "10 Epithelial 34 0.1328125\n", "10 DNA damage 290 0.7552083333333334\n", "10> Aurora kinase inhibitors 492 2.5625\n", "10 Protein degradation 78 0.203125\n", "10 Microtubule stabilizers 3 0.0018656716417910447\n", "10 Kinase inhibitors 1 0.005208333333333333\n", "33\n", "***\n", "taJrCuSEAJm5CSKLKNjjsk\n", "cisplatin\n", "DNA damage\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> DNA damage 799 2.7743055555555554\n", "10 Protein synthesis 81 0.28125\n", "10 Microtubule stabilizers 9 0.005597014925373134\n", "10 DNA replication 13 0.033854166666666664\n", "10 Protein degradation 6 0.015625\n", "10 Kinase inhibitors 5 0.026041666666666668\n", "10 Epithelial 36 0.140625\n", "10 Aurora kinase inhibitors 11 0.03819444444444445\n", "34\n", "***\n", "ttWJoVUuu4GNmSJu6g91Vh\n", "vincristine\n", "Microtubule destabilizers\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10> Microtubule destabilizers 707 2.454861111111111\n", "10 Microtubule stabilizers 253 0.15733830845771143\n", "35\n", "***\n", "vUiyUAgFjpXgPMiCQ5pHia\n", "proteasomeinhibitorI\n", "Protein degradation\n", "===\n", "Start search\n", "searchCount=96 resultCount=95\n", "hitCount=950 binCount=950\n", "10> Protein degradation 675 2.34375\n", "10 DNA damage 169 0.4401041666666667\n", "10 Protein synthesis 2 0.006944444444444444\n", "10 Epithelial 8 0.03125\n", "10 Kinase inhibitors 73 0.3802083333333333\n", "10 Microtubule stabilizers 10 0.006218905472636816\n", "10 DNA replication 1 0.0026041666666666665\n", "10 Cholesterol-lowering 4 0.020833333333333332\n", "10 Microtubule destabilizers 3 0.0078125\n", "10 Aurora kinase inhibitors 5 0.017361111111111112\n", "36\n", "***\n", "wyudxQaKEvpFzeF8VF5wCF\n", "AZ841\n", "Aurora kinase inhibitors\n", "===\n", "Start search\n", "searchCount=96 resultCount=93\n", "hitCount=930 binCount=930\n", "10> Aurora kinase inhibitors 625 3.2552083333333335\n", "10 Microtubule stabilizers 122 0.07587064676616916\n", "10 Microtubule destabilizers 9 0.0234375\n", "10 DNA damage 104 0.2708333333333333\n", "10 Kinase inhibitors 2 0.010416666666666666\n", "10 Protein degradation 57 0.1484375\n", "10 Protein synthesis 11 0.03819444444444445\n", "37\n", "***\n", "xpvG7GXwz7iddSf6TteQr9\n", "AZ138\n", "Eg5 inhibitors\n", "===\n", "Start search\n", "searchCount=96 resultCount=80\n", "hitCount=800 binCount=800\n", "10 Microtubule stabilizers 490 0.30472636815920395\n", "10> Eg5 inhibitors 242 2.5208333333333335\n", "10 Cholesterol-lowering 1 0.005208333333333333\n", "10 DNA damage 38 0.09895833333333333\n", "10 Protein degradation 9 0.0234375\n", "10 Kinase inhibitors 4 0.020833333333333332\n", "10 Microtubule destabilizers 7 0.018229166666666668\n", "10 DNA replication 7 0.018229166666666668\n", "10 Protein synthesis 2 0.006944444444444444\n", "38\n", "***\n", "xv7SWrWkRDHFUtwcLn8BAr\n", "methotrexate\n", "DNA replication\n", "===\n", "Start search\n", "searchCount=96 resultCount=96\n", "hitCount=960 binCount=960\n", "10 DNA damage 97 0.2526041666666667\n", "10> Protein degradation 863 2.2473958333333335\n", "==\n", "Percentage of compounds with correct predicted MOA=0.8421052631578947\n" ] } ], "source": [ "j=0\n", "correct=0\n", "for trainId in trainIdList:\n", " print(j+1)\n", " correct += getMoaHistogram(trainId)\n", " j += 1\n", "pc = correct/j\n", "print(\"==\")\n", "print(\"Percentage of compounds with correct predicted MOA={}\".format(pc))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }