{ "cells": [ { "cell_type": "markdown", "id": "a23f3eb1", "metadata": {}, "source": [ "# How to use the Cohort Modeler Data Generator\n", "\n", "This notebook is currently the mechanism on how to generate data for the Cohort Modeler Sample Dataset. This will not be how we generate the data long term as we are working with the Neptune team to get sample data set uploaded that lowers the burden on customers. \n", "\n", "To use this notebook, run through ever cell. **The only modification you currently need to make is in the S3 upload cell you will need to place your S3 bucket name as a place to upload the CSVs generated through this process and ensure you have the required IAM credentials to do so.**" ] }, { "cell_type": "markdown", "id": "4dc68d58", "metadata": {}, "source": [ "## Establish libraries\n", "\n", "Import libraries and seed any random functions (to make this re-creatable)" ] }, { "cell_type": "code", "execution_count": null, "id": "b5a9fc12", "metadata": {}, "outputs": [], "source": [ "!pip install faker\n", "!pip install uuid\n", "!pip install boto3" ] }, { "cell_type": "code", "execution_count": null, "id": "1a9b7c6a", "metadata": {}, "outputs": [], "source": [ "import csv\n", "import random\n", "from faker import Faker\n", "import numpy as np\n", "import uuid\n", "\n", "myseed = 1234\n", "\n", "Faker.seed(myseed)\n", "random.seed(myseed)\n", "np.random.seed(myseed)\n", "\n", "### Setting Player Cohort Size\n", "userSize = 1000\n", "campaignSize = 10\n" ] }, { "cell_type": "markdown", "id": "121b8f51", "metadata": {}, "source": [ "## Create fake users" ] }, { "cell_type": "code", "execution_count": null, "id": "be68ff25", "metadata": {}, "outputs": [], "source": [ "usersList = []\n", "for x in range(userSize):\n", " newUser = {}\n", " newUser['~id'] = str(uuid.uuid4()) \n", " newUser['~label'] = 'player'\n", " newUser['playerId:String'] = Faker().unique.user_name()\n", " newUser['status:String'] = 'Active' if random.random() < 0.98 else 'Inactive'\n", " newUser['joinedDate:Date'] = Faker().date_between(start_date='-5y')\n", " newUser['lastPlayed:Date'] = Faker().date_between(start_date=newUser['joinedDate:Date'])\n", " newUser['ea_reputation:Int'] = int(np.random.normal(0, 25))\n", " newUser['ea_altruism:Int'] = int(abs(np.random.normal(0, 10)))\n", " newUser['ea_duty:Int'] = int(abs(np.random.normal(0, 3)))\n", " newUser['ea_mischief:Int'] = int(np.random.normal(0, 1.5))\n", " newUser['ea_malice:Int'] = int(abs(np.random.normal(0, 1)))\n", " newUser['ea_atrisk:Int'] = int(abs(np.random.normal(0, 1)))\n", " newUser['stat_totalSkinPurchases:Int'] = int(abs(np.random.normal(15,5)))\n", " newUser['stat_totalCurrencyPurchases:Int'] = int(abs(np.random.normal(1000,200))) #Add more non toxicity realated options\n", " newUser['stat_idleMinutes:Int'] = int(abs(np.random.normal(360,120)))\n", " newUser['stat_marketingEmailClickThroughs:Int'] = int(abs(np.random.normal(15,4)))\n", " newUser['stat_lastChatTimestamp:Int'] = int(abs(np.random.normal(1621279571,1296000)))\n", " newUser['stat_lastGameSession:Int'] = int(abs(np.random.normal(1621279571,1296000)))\n", " newUser['stat_lastGameSessionLength:Int'] = int(abs(np.random.normal(180,60)))\n", " newUser['stat_longestGameSessionLength:Int'] = int(abs(np.random.normal(480,120)))\n", " newUser['stat_shortestGameSessionLength:Int'] = int(abs(np.random.normal(10,5)))\n", " newUser['stat_medianGameSessionLength:Int'] = int(abs(np.random.normal(60,10)))\n", " \n", " usersList.append(newUser)\n", " \n", "print(usersList[0])" ] }, { "cell_type": "markdown", "id": "4afd97e2", "metadata": {}, "source": [ "## Create fake campaigns" ] }, { "cell_type": "code", "execution_count": null, "id": "93cca5e5", "metadata": {}, "outputs": [], "source": [ "campaignList = []\n", "emailOpenedCampaign = []\n", "emailSentCampaign = []\n", "\n", "for x in range(campaignSize):\n", " newCampaign = {}\n", " newCampaign['~id'] = campaign = str(uuid.uuid4())\n", " newCampaign['~label'] = 'campaign'\n", " newCampaign['name:String'] = Faker().text(max_nb_chars=20)\n", " newCampaign['stat_totalEmailOpened:Int'] = emailOpened =int(abs(np.random.normal(25,1)))\n", " newCampaign['stat_messagesSent:Int'] = emailSent =int(abs(np.random.normal(100,0)))\n", " newCampaign['stat_messagesDelivered:Int'] = emailDelivered =int(abs(np.random.normal(60,2)))\n", " newCampaign['stat_dailyActive:Int'] = int(abs(np.random.normal(60,5)))\n", " newCampaign['stat_newPlayers:Int'] = int(abs(np.random.normal(5,1)))\n", " \n", " emailOpenedCampaign.append(emailOpened)\n", " emailSentCampaign.append(emailSent)\n", " campaignList.append(newCampaign)\n", " " ] }, { "cell_type": "markdown", "id": "de573f99", "metadata": {}, "source": [ "## Create player actions" ] }, { "cell_type": "code", "execution_count": null, "id": "f7933251", "metadata": {}, "outputs": [], "source": [ "actionTypes = ['action_chat',\n", "'action_sharepii',\n", "'action_partyjoin',\n", "'action_randomheal',\n", "'action_grief',\n", "'action_badname',\n", "'action_harass',\n", "'action_stalk',\n", "'action_badlanguage',\n", "'action_endorse',\n", "'action_report',\n", "'action_badimage']\n", " \n", "\n", "actionList = []\n", "\n", "for action in actionTypes:\n", " newAction = {}\n", " newAction['~id'] = action\n", " newAction['~label'] = 'action'\n", " actionList.append(newAction)\n", " \n", "print(actionList)" ] }, { "cell_type": "markdown", "id": "0489a1a8", "metadata": {}, "source": [ "## Create Player Edges to Actions (engagedIn)\n", "\n", "Likely need to change the statistics based on bad activities... right now it the following code equally randomizes between good and bad actions." ] }, { "cell_type": "code", "execution_count": null, "id": "356628fb", "metadata": {}, "outputs": [], "source": [ "engageList = []\n", "\n", "for x in range(userSize):\n", " numEngage = abs(int(np.random.normal(2, 3)))\n", " numEngage = 0 if numEngage < 0 else numEngage\n", " uniqueEngage = []\n", " for y in range(numEngage):\n", " newEngage = {}\n", " newEngage['~from'] = usersList[x]['~id']\n", " randEngage = random.choice(actionList)['~id']\n", " while(True):\n", " if randEngage not in uniqueEngage:\n", " uniqueEngage.append(randEngage)\n", " break\n", " else:\n", " randEngage = random.choice(actionList)['~id']\n", " newEngage['~to'] = randEngage\n", " newEngage['~label'] = 'EngagedIn'\n", " newEngage['iterations:Int'] = abs(int(np.random.normal(800, 2000)))\n", " engageList.append(newEngage)" ] }, { "cell_type": "markdown", "id": "ee63a6ff", "metadata": {}, "source": [ "## Create player interactions" ] }, { "cell_type": "code", "execution_count": null, "id": "0f4ac62d", "metadata": {}, "outputs": [], "source": [ "interList = []\n", "\n", "for x in range(userSize):\n", " numInter = int(np.random.normal(20, 8))\n", " numInter = 0 if numInter < 0 else numInter\n", " uniqueFriends = []\n", " uniqueFriends.append(usersList[x]['~id'])\n", " for y in range(numInter):\n", " newInter = {}\n", " newInter['~from'] = usersList[x]['~id']\n", " randUser = random.choice(usersList)['~id']\n", " while(True):\n", " if randUser not in uniqueFriends:\n", " uniqueFriends.append(randUser)\n", " break\n", " else:\n", " randUser = random.choice(usersList)['~id']\n", " newInter['~to'] = randUser\n", " newInter['~label'] = 'Interactions'\n", " newInter['action_chat:Int'] = abs(int(np.random.normal(2000, 2000)))\n", " newInter['action_sharepii:Int'] = abs(int(np.random.normal(1, 1)))\n", " newInter['action_partyjoin:Int'] = abs(int(np.random.normal(4, 5)))\n", " newInter['action_randomheal:Int'] = abs(int(np.random.normal(0.4, 1.4)))\n", " newInter['action_endorse:Int'] = abs(int(np.random.normal(3, 6)))\n", " newInter['action_report:Int'] = abs(int(np.random.normal(0.2, 0.5)))\n", " interList.append(newInter)" ] }, { "cell_type": "markdown", "id": "77f010c8", "metadata": {}, "source": [ "## Create campaigns interactions" ] }, { "cell_type": "code", "execution_count": null, "id": "1c22faea", "metadata": {}, "outputs": [], "source": [ "campaignInteractionList = []\n", "campaignBidirectionalInteractionList = []\n", "campUserInteractionList = []\n", "campBidirectionalList = []\n", "\n", "for x in range(0,campaignSize):\n", " campUserInteractionList.append(random.sample(range(0, len(usersList)), emailSentCampaign[x]))\n", "\n", "for x in range(0, len(emailOpenedCampaign)):\n", " campBidirectionalList.append(random.sample(range(0, len(campUserInteractionList[x])), emailOpenedCampaign[x]))\n", "\n", "for x in range(0,campaignSize):\n", " for items in campUserInteractionList[x]:\n", " newCamp = {}\n", " newCamp['~from'] = campaignList[x]['~id']\n", " newCamp['~to'] = usersList[items]['~id']\n", " newCamp['~label'] = 'MarketingInteractions'\n", " campaignInteractionList.append(newCamp)\n", " for items in campBidirectionalList[x]:\n", " newCamp = {}\n", " newCamp['~from'] = usersList[items]['~id']\n", " newCamp['~to'] = campaignList[x]['~id']\n", " newCamp['~label'] = 'CustomerMarketingInteractions'\n", " newCamp['campaign_login:Int'] = random.randrange(20)\n", " newCamp['campaign_emailOpened:Int'] = random.randrange(1,5)\n", " newCamp['campaign_linkClicked:Int'] = random.randrange(0,2)\n", " campaignBidirectionalInteractionList.append(newCamp)\n", "\n", "print()" ] }, { "cell_type": "markdown", "id": "555fc7ba", "metadata": {}, "source": [ "## Export lists to CSVs" ] }, { "cell_type": "code", "execution_count": null, "id": "a26fcf8f", "metadata": {}, "outputs": [], "source": [ "toExport = [usersList, actionList, interList, engageList, campaignInteractionList, campaignList, campaignBidirectionalInteractionList]\n", "filenames = ['user_vertices','action_vertices','interaction_edges','engagement_edges', 'campaign_edges', 'campaign_vertices', 'campaign_bidirectional_edges']\n", "\n", "for index, currentList in enumerate(toExport):\n", " with open('./' + filenames[index] + '.csv', 'w') as csvFile:\n", " writer = csv.DictWriter(csvFile, escapechar=' ',quoting=csv.QUOTE_NONE,fieldnames = currentList[0].keys())\n", " writer.writeheader()\n", " for p in currentList:\n", " writer.writerow(p)\n", " csvFile.close()" ] }, { "cell_type": "markdown", "id": "b5655a78", "metadata": {}, "source": [ "## Upload CSVs to S3\n", "\n", "\n", "### **To upload you will need to add the S3 bucket you want to upload to and the objects in that bucket in the IAM policy attached to the notebook.**" ] }, { "cell_type": "code", "execution_count": null, "id": "23442b54", "metadata": {}, "outputs": [], "source": [ "import logging\n", "import boto3\n", "from botocore.exceptions import ClientError\n", "\n", "def upload_file(file_name, bucket, object_name=None):\n", " # If S3 object_name was not specified, use file_name\n", " if object_name is None:\n", " object_name = file_name\n", "\n", " # Upload the file\n", " s3_client = boto3.client('s3')\n", " try:\n", " response = s3_client.upload_file(file_name, bucket, object_name)\n", " except ClientError as e:\n", " logging.error(e)\n", " return False\n", " return True\n", "\n", "bucket_name = \"cohort-modeler\" #this needs to be updated to your bucket\n", "\n", "filenames_csv = ['user_vertices.csv','action_vertices.csv','interaction_edges.csv','engagement_edges.csv', 'campaign_edges.csv', 'campaign_vertices.csv', 'campaign_bidirectional_edges.csv']\n", "\n", "\n", "for item in filenames_csv:\n", " upload_file(item, bucket_name)\n" ] }, { "cell_type": "markdown", "id": "2975efc1", "metadata": {}, "source": [ "## What to do next\n", "Move over to the Cohort Modeler Sample Notebook as there will be a cell to upload the data generated into Neptune" ] }, { "cell_type": "code", "execution_count": null, "id": "d152c837", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 }