# Build an Amazon SageMaker Pipeline to Transform Raw Texts to A Knowledge Graph This repository provides a pipeline to create a knowledge graph from raw texts. The pipeline concatenate major steps including: - Data processing: transform labeled text data to the Subject-Predicate-Object (SPO) format - Training: use a RNN-based algorithm to train an AI model to predict SPOs from given texts - Create a Neptune database: if the training metric (F1-Score) passes the threshold, create a Neptune database - Batch Transform: use the model trained in the `Training` step to do inferences on the test data - Bulk load: transform the inference results to the format which can be recognized by the `bulkload` function of Neptune, and load the transformed data to the Neptune database.
## Prerequisites - [Create an AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/) or use an existing AWS account. - [Create a SageMaker Notebook instance](https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html). When you set up the notebook instance, you need to pay attention to following configurations: 1. IAM role: you should attach policies of `AmazonSageMakerFullAccess`, `IAMFullAccess`, `AmazonS3FullAccess`, `AmazonSNSFullAccess` and `NeptuneFullAccess` to the IAM role. 2. Network: in order to access the Neptune database created in the pipeline, a VPC is required to run the notebook. ## Security See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. ## License This library is licensed under the MIT-0 License. See the LICENSE file.