{ "cells": [ { "attachments": { "image.png": { "image/png": "" } }, "cell_type": "markdown", "metadata": {}, "source": [ "![image.png](attachment:image.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Predict fraudulent providers in Medicare using Neptune ML\n", "RUN in Sagemaker Notebooks (not Studio): This notebook uses the medicare datasets provided by CMS. In this Notebook we'll walk through how Neptune ML can predict identification of fraudulent providers. To demonstrate this we'll predict the \"Fraud\" property of providers. We'll walk through each step of loading and exporting the data, configuring and training the model, and finally we'll show how to use that model to infer Fraud using Gremlin traversals." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## If Neptune has not been setup, please run the cloudformation script from the link below\n", "https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-quick-start.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking that we are ready to run Neptune ML \n", "Run the code below to check that this cluster is configured to run Neptune ML." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install sagemaker\n", "!pip install neptune_ml_utils" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import neptune_ml_utils as neptune_ml\n", "import sagemaker\n", "\n", "neptune_ml.check_ml_enabled()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the endpoint of the Neptune Cluster" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cluster_endpoint= neptune_ml.get_host() + \":8182\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Please check that service limits of the following are available. If not, pls raise a ticket with AWS Support\n", "\n", "ml.r5.xlarge for endpoint usage - 2\n", "ml.p3.2xlarge for training job usage - 2\n", "ml.r5.large for processing job usage - 2\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Delete existing graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%gremlin\n", "g.V().drop()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ensure you saved the \"vertex_input.csv\" and \"edge_input.csv\" files in S3 \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the nodes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create an IAM role that provides full access to Amazon Neptune and Amazon S3 as indicated in the instructions below\n", "https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-IAM.html#bulk-load-tutorial-IAM-CreateRole" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3_nodes_uri = \"REPLACE WITH THE S3URI WHERE VERTEX_INPUT.CSV HAS BEEN UPLOADED\"\n", "load_role = \"REPLACE WITH ARN OF IAM ROLE FROM ABOVE\"\n", "region = sagemaker.Session().boto_region_name" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "url = 'https://{0}/loader'.format(cluster_endpoint)\n", "headers = {'Content-Type': 'application/json'}\n", "data = {\n", " 'source': s3_nodes_uri,\n", " 'format': 'csv',\n", " 'iamRoleArn': load_role,\n", " 'region': region,\n", " 'failOnError': 'true',\n", " 'parallelism': 'MEDIUM',\n", " 'updateSingleCardinalityProperties': 'FALSE',\n", " 'queueRequest': 'TRUE'\n", "}\n", "\n", "response = requests.post(url, headers=headers, json=data)\n", "print(\"Status Code\", response.status_code)\n", "print(\"JSON Response \", response.json())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check to see if the load is complete - you should have a status 200 for a succesful execution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the edges" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3_edges_uri = \"REPLACE WITH S3 URI WHERE EDGE_INPUT.CSV HAS BEEN UPLOADED\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "url = 'https://{0}/loader'.format(cluster_endpoint)\n", "headers = {'Content-Type': 'application/json'}\n", "data = {\n", " 'source': s3_edges_uri,\n", " 'format': 'csv',\n", " 'iamRoleArn': load_role,\n", " 'region': region,\n", " 'failOnError': 'true',\n", " 'parallelism': 'MEDIUM',\n", " 'updateSingleCardinalityProperties': 'FALSE',\n", " 'queueRequest': 'TRUE'\n", "}\n", "\n", "response = requests.post(url, headers=headers, json=data)\n", "print(\"Status Code\", response.status_code)\n", "print(\"JSON Response \", response.json())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check to see if the load is complete - you should have a status 200 for a succesful execution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Provide an S3 bucket to store outputs of the data export job needed to train the model in SageMaker" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3_bucket_uri=\"REPLACE WITH S3URI OF OUTPUT BUCKET\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check to make sure the data is loaded" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%gremlin\n", "g.V().groupCount().by(label).unfold().order().by(keys)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If our nodes loaded correctly then the output is:\n", "\n", "* 1084 Drugs\n", "* 1836 HCPCS code\n", "* 5983 Providers \n", "\n", "To check that our edges loaded correctly we check the edge counts:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%gremlin\n", "g.E().groupCount().by(label).unfold().order().by(keys)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If our edges loaded correctly then the output is:\n", "\n", "* 57807 conducts_procedure\n", "* 62193 prescribes_drug\n", "\n", "\n", "## Prepare data for Training - This data is used for training the model in Sagemaker \n", "First, let's simulate new providers being added into our product knowledge graph by randomly removing the `fraud` property from 1% of all fraudulent providers (approx. 55) nodes. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%gremlin \n", "g.V().has('Provider', 'Fraud', 'YES').properties('Fraud').sample(55).drop()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's store these nodes to a variable \"test_fraud\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%gremlin --store-to test_fraud\n", "g.V().hasLabel('Provider').hasNot('Fraud').valueMap('NPI')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also randomly removing the `fraud` property from a few non-fraudulent providers (approx. 55) nodes. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%gremlin \n", "g.V().has('Provider', 'Fraud', 'NO').properties('Fraud').sample(55).drop()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets store all the simulated \"new\" nodes to a variable \"test_all\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%gremlin --store-to test_all\n", "g.V().hasLabel('Provider').hasNot('Fraud').valueMap('NPI', 'Fraud')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now lets prepare a pandas dataframes test_df that contains information on which of the simulated \"new\" nodes are classified as fraud versus. This will be necessary to evaluate the performance of our model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "test_fraud = pd.DataFrame.from_dict(test_fraud,dtype='string')\n", "test_fraud['Fraud'] = 'YES'\n", "test_all = pd.DataFrame.from_dict(test_all,dtype='string')\n", "test_all['NPI'] = test_all['NPI'].str.replace(']','', regex=True)\n", "test_all['NPI'] = test_all['NPI'].str.replace('[','', regex=True)\n", "test_fraud['NPI'] = test_fraud['NPI'].str.replace(']','', regex=True)\n", "test_fraud['NPI'] = test_fraud['NPI'].str.replace('[','', regex=True)\n", "test_df = test_all.merge(test_fraud,on='NPI',how = 'left')\n", "test_df = test_df.fillna('NO')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_df['Fraud'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets pick one of the providers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Double check if the providers in test_df no longer have `Fraud` values by picking a value randomly and using the query below " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "NPI = test_df['NPI'].head(1).to_string(index=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "NPI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Query the graph using the above NPI " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.V().has('Provider', 'NPI', ${NPI}).valueMap('NPI','Fraud')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# With these properties removed we're ready to build our Neptune ML Node Classification model to predict the Fraud indicator for these providers in our provider knowledge graph. \n", "\n", "\n", "# Export the data and model configuration\n", "\n", "
Note: Before exporting data ensure that Neptune Export has been configured as described here: Neptune Export Service
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The export process is triggered by calling to the [Neptune Export service endpoint](https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-data-export-service.html). \n", "The configuration options provided to the export service are broken into two main sections, selecting the target and configuring features. \n", "\n", "## Selecting the target\n", "\n", "In the first section, selecting the target, we specify what type of machine learning task will be run, and in the case of node classification, what the target node label and property we want to predict. To run a node classification task we are only required to specify the node label and target property to infer. \n", "\n", "In this example below we specify the `Fraud` property on the `Provider` node as our target for prediction by including these values in the `targets` sub-parameter of the `additionalParams` object as shown below. \n", "\n", "```\n", "\"additionalParams\": {\n", " \"neptune_ml\": {\n", " \"targets\": [\n", " {\n", " \"node\": \"Provider\",\n", " \"property\": \"Fraud\"\n", " }\n", " ],\n", " ....\n", "```\n", "\n", "## Configuring features\n", "The second section of the configuration, configuring features, is where we specify details about the types of data stored in our graph and how the machine learning model should interpret that data. \n", "When data is exported from Neptune all properties of all nodes are included. Each property is treated as a separate feature for the ML model. Neptune ML does its best to infer the correct type of feature for a property, in many cases, the accuracy of the model can be improved by specifying information about the property used for a feature. By default Neptune ML puts features into one of two categories:\n", "\n", "* If the feature represents a numerical property (float, double, int) then it is treated as a `numerical` feature type. In this feature type data is represented as a continuous set of numbers. For example, 'total_claim_count_mean'\n", "* All other property types are represented as `category` features. In this feature type, each unique value of data is represented as a unique value in the set of classifications used by the model. For our data, we have already one-hot encoded the only category feature \"Female\". \n", "\n", "If all of the properties fit into these two feature types then no configuration changes are needed at the time of export. However, in many scenarios these defaults are not always the best choice. In these cases, additional configuration options should be specified to better define how the property should be represented as a feature. For example, to handle text in the field, we could use techniques such as Word2vec from natural language processing to create a vector of data that represents a string of text.. \n", "\n", "\n", "Running the cell below we set the export configuration and run the export process. Neptune export is capable of automatically creating a clone of the cluster by setting `cloneCluster=True` which takes about 20 minutes to complete and will incur additional costs while the cloned cluster is running. Exporting from the existing cluster takes about 5 minutes but requires that the `neptune_query_timeout` parameter in the [parameter group](https://docs.aws.amazon.com/neptune/latest/userguide/parameters.html) is set to a large enough value (>72000) to prevent timeout errors." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_params={ \n", "\"command\": \"export-pg\", \n", "\"params\": { \"endpoint\": neptune_ml.get_host(),\n", " \"profile\": \"neptune_ml\",\n", " \"useIamAuth\": neptune_ml.get_iam(),\n", " \"cloneCluster\": False\n", " }, \n", "\"outputS3Path\": f'{s3_bucket_uri}/neptune-export',\n", "\"additionalParams\": {\n", " \"neptune_ml\": {\n", " \"version\": \"v2.0\",\n", " \"split_rate\": [0.7,0.1,0.2],\n", " \"targets\": [\n", " {\n", " \"node\": \"Provider\",\n", " \"property\": \"Fraud\",\n", " \"type\": \"classification\"\n", " }\n", " ]\n", " }\n", " },\n", "\"jobSize\": \"medium\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%neptune_ml export start --export-url {neptune_ml.get_export_service_host()} --export-iam --wait --store-to export_results\n", "${export_params}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# ML data processing, model training, and endpoint creation\n", "\n", "Once the export job is completed we are now ready to train our machine learning model and create the inference endpoint. Training our Neptune ML model requires three steps. \n", " \n", "
Note: The cells below only configure a minimal set of parameters required to run a model training.
\n", "\n", "## Data processing \n", "The first step (data processing) processes the exported graph dataset using standard feature preprocessing techniques to prepare it for use by DGL. This step performs functions such as feature normalization for numeric data and encoding text features using word2vec. At the conclusion of this step the dataset is formatted for model training. \n", "\n", "This step is implemented using a SageMaker Processing Job and data artifacts are stored in a pre-specified S3 location once the job is complete.\n", "\n", "Additional options and configuration parameters for the data processing job can be found using the links below:\n", "\n", "* [Data Processing](https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-on-graphs-processing.html)\n", "* [dataprocessing command](https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-api-dataprocessing.html)\n", "\n", "Run the cells below to create the data processing configuration and to begin the processing job." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The training_job_name can be set to a unique value below, otherwise one will be auto generated\n", "training_job_name=neptune_ml.get_training_job_name('node-classification')\n", "\n", "processing_params = f\"\"\"\n", "--config-file-name training-data-configuration.json\n", "--job-id {training_job_name} \n", "--s3-input-uri {export_results['outputS3Uri']} \n", "--s3-processed-uri {str(s3_bucket_uri)}/preloading \"\"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%neptune_ml dataprocessing start --wait --store-to processing_results {processing_params}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model training\n", "The second step (model training) trains the ML model that will be used for predictions. The model training is done in two stages. The first stage uses a SageMaker Processing job to generate a model training strategy. A model training strategy is a configuration set that specifies what type of model and model hyperparameter ranges will be used for the model training. Once the first stage is complete, the SageMaker Processing job launches a SageMaker Hyperparameter tuning job. The SageMaker Hyperparameter tuning job runs a pre-specified number of model training job trials on the processed data, and stores the model artifacts generated by the training in the output S3 location. Once all the training jobs are complete, the Hyperparameter tuning job also notes the training job that produced the best performing model.\n", "\n", "Additional options and configuration parameters for the data processing job can be found using the links below:\n", "\n", "* [Model Training](https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-on-graphs-model-training.html)\n", "* [modeltraining command](https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-api-modeltraining.html)\n", "\n", "
Information: The model training process takes ~20 minutes
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "training_params=f\"\"\"\n", "--job-id {training_job_name}-1\n", "--data-processing-id {training_job_name}\n", "--instance-type ml.p3.2xlarge\n", "--s3-output-uri {str(s3_bucket_uri)}/training\n", "--max-hpo-number 2\n", "--max-hpo-parallel 2 \"\"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%neptune_ml training start --wait --store-to training_results {training_params}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Endpoint creation\n", "\n", "The final step is to create the inference endpoint which is an Amazon SageMaker endpoint instance that is launched with the model artifacts produced by the best training job. This endpoint will be used by our graph queries to return the model predictions for the inputs in the request. The endpoint once created stays active until it is manually deleted. Each model is tied to a single endpoint.\n", "\n", "Additional options and configuration parameters for the data processing job can be found using the links below:\n", "\n", "* [Inference Endpoint](https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-on-graphs-inference-endpoint.html)\n", "* [Endpoint command](https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-api-endpoints.html)\n", "\n", "
Information: The endpoint creation process takes ~5-10 minutes
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint_params=f\"\"\"\n", "--id {training_job_name}\n", "--model-training-job-id {training_job_name} \"\"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%neptune_ml endpoint create --wait --store-to endpoint_results {endpoint_params}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once this has completed we get the endpoint name for our newly created inference endpoint. The cell below will set the endpoint name which will be used in the Gremlin queries below. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint=endpoint_results['endpoint']['name']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Predicting Fraud using Gremlin queries\n", "\n", "Now that we have our inference endpoint setup let's query our provider knowledge graph to predict Fraud indicators for the providers that we updated earlier. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets predict the Fraud indicator for `1003809195`. To accomplish this we need to add two steps:\n", "\n", "- First, we add the `with()` step to specify the inference endpoint we want to use with our Gremlin query like this\n", "`g.with(\"Neptune#ml.endpoint\",\"\")`. \n", "\n", "
Note: The endpoint values are automatically passed into the queries below
\n", "\n", "- Second, when we ask for the property within our query we use the `properties()` step with an additional `with()` step (`with(\"Neptune#ml.classification\")`) which specifies that we want to retrieve the predicted value for this property.\n", "\n", "Putting these items together we get the query below, which will predict `Fraud` for the same provider we checked earlier " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%gremlin\n", "g.with(\"Neptune#ml.endpoint\", '${endpoint}').\n", " V().has('NPI', ${NPI}).properties('Fraud').with(\"Neptune#ml.classification\").value()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Comparing the accuracy of predicted and actual genres" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, lets predict the Fraud indicator for all providers without a `fraud` attribute and store the results in a variable called \"results\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%gremlin --store-to results\n", "g.with(\"Neptune#ml.endpoint\",\"${endpoint}\").\n", " V().hasLabel('Provider').hasNot('Fraud').\n", " project('NPI', 'predicted').\n", " by('NPI').\n", " by( properties(\"Fraud\").with(\"Neptune#ml.classification\").value()).\n", " order(local).by(keys, desc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Prepare a dataframe results that stores the predicted as well as the original values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "results = pd.DataFrame.from_dict(results,dtype='string')\n", "results = results.merge(test_df,on='NPI',how = 'inner')\n", "results['predicted'] = results['predicted'].astype('str')\n", "results = pd.get_dummies(results, drop_first = True )\n", "results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Store the results in arrays that we will use to prepare confusion matrix and a classification report" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_preds = results.predicted_YES.values.astype('float32')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_true = results.Fraud_YES.values.astype('float32')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Install seaborn and sklearn libraries that are needed for preparing the confusion matrix and a classification report" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install seaborn" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install sklearn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import necessary libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from sklearn.metrics import confusion_matrix\n", "from sklearn.metrics import balanced_accuracy_score\n", "from sklearn.metrics import classification_report" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create confusion matrix and classification report" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_confusion_matrix(y_true, y_predicted):\n", "\n", " cm = confusion_matrix(y_true, y_predicted)\n", " # Get the per-class normalized value for each cell\n", " cm_norm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n", " \n", " # We color each cell according to its normalized value, annotate with exact counts.\n", " ax = sns.heatmap(cm_norm, annot=cm, fmt=\"d\")\n", " ax.set(xticklabels=[\"non-fraud\", \"fraud\"], yticklabels=[\"non-fraud\", \"fraud\"])\n", " ax.set_ylim([0,2])\n", " plt.title('Confusion Matrix')\n", " plt.ylabel('Real Classes')\n", " plt.xlabel('Predicted Classes')\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Balanced accuracy = {:.3f}\".format(balanced_accuracy_score(y_true, y_preds)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_confusion_matrix(y_true, y_preds)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(classification_report(\n", " y_true, y_preds, target_names=['non-fraud', 'fraud']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparing the `original` versus the `predicted` results we see that our model did a good job of predicting whether the provider is fraudulent (46 out of 55) while also minimizing the number of falsely classified as fraud (3 out of 55)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cleaning up \n", "Now that you have completed this walkthrough you have created a Sagemaker endpoint which is currently running and will incur the standard charges. If you are done trying out Neptune ML and would like to avoid these recurring costs, run the cell below to delete the inference endpoint.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neptune_ml.delete_endpoint(training_job_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to the inference endpoint the CloudFormation script that you used has setup several additional resources. If you are finished then we suggest you delete the CloudFormation stack to avoid any recurring charges. For instructions, see Deleting a Stack on the [Deleting a Stack on the Amazon Web Services CloudFormation Console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-delete-stack.html). Be sure to delete the root stack (the stack you created earlier). Deleting the root stack deletes any nested stacks." ] } ], "metadata": { "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 4 }