{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext sparksql_magic" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "22/03/17 19:45:50 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0\n", "22/03/17 19:45:50 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException\n" ] }, { "data": { "text/html": [ "
databaseName
default
" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sparksql\n", "show databases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Input" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df_rent_input = spark.read.csv('./data/input/rent', header=True)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "24000" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_rent_input.count()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "df_rent_input.createOrReplaceTempView('rent_input')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "22/03/17 19:46:05 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException\n", " \r" ] }, { "data": { "text/html": [ "
offset
1
3
5
4
2
" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sparksql\n", "select distinct offset from rent_input" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Output" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "df_rent_output = spark.read.parquet('./data/output/rent')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "24000" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_rent_output.count()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "df_rent_output.count()\n", "df_rent_output.createOrReplaceTempView('rent_output')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
offset
1
3
5
4
2
" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sparksql\n", "select distinct offset from rent_output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reverse User_id Domain" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "df_reverse_user_id = spark.read.parquet('./data/reverse/user_id_sha512sha256')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1000" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_reverse_user_id.count()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "df_reverse_user_id.createOrReplaceTempView('reverse_user_id')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reverse Cretit_card domain" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "df_reverse_credit_card = spark.read.parquet('./data/reverse/credit_card_sha512sha256')" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "24000" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_reverse_credit_card.count()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "df_reverse_credit_card.createOrReplaceTempView('reverse_credit_card')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Minus clause to validate table" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " \r" ] }, { "data": { "text/html": [ "
idcredit_carddate_renttime_rentmoviemovie_genresoffset
" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sparksql \n", "select \n", " id, credit_card, date_rent, time_rent, movie, movie_genres, offset \n", "from rent_input\n", "minus \n", "select \n", " rui.user_id as id, rcc.credit_card, date_rent, time_rent, movie, movie_genres, offset \n", "from rent_output as ro\n", "INNER JOIN reverse_user_id as rui ON (ro.id = rui.user_id_mask) \n", "INNER JOIN reverse_credit_card as rcc ON (ro.credit_card = rcc.credit_card_mask)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "only showing top 20 row(s)\n" ] }, { "data": { "text/html": [ "
date_renttime_rentmoviemovie_genresidcredit_cardoffset
2020-04-1219:24:10Blinkity BlankAnimationabae1023971539d380f71298401acd4d48791a92b19e67d00fd452b686401e5418e61c9b4b33d0e198bb5d91b2c451ed341d4d3a214f1f51c2cc7f98ee6547d81
2020-04-123:54:29What Have I Done to Deserve This? (¿Qué he hecho yo para merecer esto!!)Comedy|Dramab3fd63acc23e666fc95b24b102ab1d7ae39ec26c4c45205b2d4c027c9c9cb966e2bea2000bbbe37c77e1773573b13d36a9057a19586a4902fe635719bea21ab21
2020-04-121:33:37Second in CommandAction|Thrillerc6546b74e8d033688b8800d673090d264e54b3bd921ef463c063807335b4b4f275f77c2b11155463513424b0d56f6209923711d6388f70c65308fc24090890b61
2020-04-124:32:28Buddha Collapsed Out of ShameDrama|War0747df874d1cad8841a36e09fd6933f617d11179638517e3de858e534e14d55461c425599469eac91e185721412aacd3c33d6de97824bb6086ef5dc89b7127131
2020-04-1216:06:10Mechanic, TheAction|Drama|Thriller6d0347ffb6b203d0bdcd592c40090c1afcaa34e2c470fb81d391435101952ee409aa5109a2327f531c860dca9fb7355b6546536c134093e9d7d7318719e139581
2020-04-122:03:09Prince & Me II: The Royal Wedding, TheComedy|Romancef44e68de464337affafae6e374941bfb93f31b497dc80cadfdb3fe049d3720721fcbcd98bdb399d2bbb4e54dd5c569975dda151e313e4d250978564f778e03331
2020-04-127:23:01NeverlandAdventure|Fantasy7dd3f1ce8e9902f2e25ab006a6b043b1fe6a21a8ed23fb68f51164f98cd29d70ed986ee20009536b41ec2b9d5c1b4a67355c0435f8f162b87c592d74dc90eb231
2020-04-126:39:22Nine DeadCrime|Mystery|Thrillerc6d180917fb29aa932990e4f91b7ba500022290113fc32fdeb7890518a0c21797907e94aaee1e669d0691fb249ec3a9aa593f2a0eb144bdc8427f3604d6d25cb1
2020-04-1219:40:11One Magic ChristmasDrama|Fantasyfc00e9897992742c919a403f3584958655310c45e11e8e76b30e44588be696a12b0c7156801270b5c5a4b21dedaa11ac8a6b320e2bf00ed0e60d7c68da570c651
2020-04-1219:26:41Sachs' Disease (La maladie de Sachs)Drama4d1931a512802ccd241903e6143578ed79eca7e071092abd7f097ec32a514ec0dddd2b1dc80a9ac760e762cae6a7522b53061fdb3df7413862745e8a6eaa1c571
2020-04-1220:41:33They Shoot Horses, Don't They?Dramaa7f4ab82a2cb8f4525744290953e5d94848c5d65eddf2d87d89cf25bce2d7b6a42de9da3c52fef2ef0a9f4154db6b60af601ec9d4b88dfcb8f83ea5240606d651
2020-04-1212:58:30Ghost in the Shell 2: Innocence (a.k.a. Innocence) (Inosensu)Action|Animation|Drama|Sci-Fi|Thriller1de000473a2c2b24c015267a782671d96b03ee80ebc097d4458fa44e3df719768b397c6893d8a187ff0c87e60afa47bd349ba1b38da43fb7ed34eb59365628731
2020-04-1222:08:18Non-StopAction|Comedy|Crime976c48b49f3ac5876c1324890637240cd690d26810aa075e4e532d4dfc3f7e51652bc8dc0330e6e9ea3301e446eb0be45994c2d7b927f9e301d31c004ef726451
2020-04-126:40:17One Body Too ManyComedy|Horror|Mystery44ed348fcdfa8dc1c989a3472fb09ec9ee80caca16a58f0c57f5eac69746e6fab574f4e7faf1573ad6dc1f3a337432e14a774ce726b420dc94886340cddb32cb1
2020-04-124:44:23Passed AwayComedyd49af67241e07bcd226e947eb174d90863765639e8f6a19cdb95d0e58fd5748a20673f87a6755a1e2d77eac8eab9ffaac617b749e8c53511a8ab3ad6ad0f8b261
2020-04-1216:43:55Pursuit of HappinessComedy|Romance58de4007dce28d93f72a40c9c14f8939823e1247a1f02e9419d33357af5e4d8b861441ca5191a2a2088145669ea531a90be9d50f00b75402d1829f4a0f803c451
2020-04-1211:43:12North Star (a.k.a. Tashunga)Action|Adventure|Crime|Drama|Westernb5885124f7f8f227b40f9d56960685ece9f3e341a8f9e5fd20c7c30c45b33ba81ee1de023446e2fd47ecdc4695895271a97c10c715df0e8de2ad21a87f4e28661
2020-04-1216:29:11Casual Sex?Comedyc782d5c8ff7f1f91c0166689c5d104c39461ea850fd1a3de16d61b71ba5eb4effd858d4abd3206ec89acb4ade3922c491eb429efbca47795454f9f29c3caa1161
2020-04-121:27:27Crush, TheThriller3a1670bb3d0aee9d93115126512c79611291451e714b9bf03fc11054a78f84bb6375f466b8e88eb53a47b4340af43dfa0e7f68dacd61a55d0baba5d3d4ed158e1
2020-04-1222:54:40HornsHorror|Mystery3c94fec6ab03a123da5292db9e40104c4f03b08fddf7ec2efcfe45a91a20a5505e00419da3f9fdb642319d6cc07fa3395563f97c87c67d237d3374876b73e27d1
" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sparksql \n", "select\n", "*\n", "from rent_output" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "only showing top 20 row(s)\n" ] }, { "data": { "text/html": [ "
idcredit_carddate_renttime_rentmoviemovie_genresoffset
149031363254527408702020-04-1210:52:53Dry Cleaning (Nettoyage à sec)Drama1
235774915519607332020-04-124:42:13Alone (Issiz adam)Drama|Romance1
335368746851930002020-04-1216:53:05Family Stone, TheComedy|Drama|Romance1
43446817151183622020-04-1222:55:42Man Who Captured Eichmann, TheDrama|War1
550483754439666182020-04-125:27:32Desperately Seeking SusanComedy|Drama|Romance1
62016211336106182020-04-1215:12:07Caine (Shark!)Action|Adventure|Thriller1
740179528050334252020-04-121:45:48The ScapegoatDrama1
840179579344661072020-04-1212:35:09Injury to One, AnDocumentary1
93379410603940112020-04-125:19:47Skeleton CrewHorror1
1063998348874308312020-04-121:01:10Age of IceAdventure|Sci-Fi1
1149173789185678622020-04-1220:15:35Godzilla's Revenge (Gojira-Minira-Gabara: Oru Kaijû Daishingeki) (All Monsters Attack)Children1
1255615351371091822020-04-1219:01:21Mrs. Pollifax-Spynull1
1335611778983585292020-04-1223:18:28Muppets From SpaceChildren|Comedy1
1451001499526595092020-04-1218:48:35God Said 'Ha!'Comedy1
1535522659079876012020-04-1217:35:023000 Miles to GracelandAction|Thriller1
16303880594459012020-04-1222:40:37TransmorphersAction|Adventure|Sci-Fi1
1735573144186885742020-04-1217:49:14SuspectCrime|Drama|Thriller1
1835603891497944412020-04-124:56:02Timecrimes (Cronocrímenes, Los)Sci-Fi|Thriller1
1935294688414759602020-04-1210:33:51Horseman, TheCrime|Thriller1
2067639434730448172412020-04-1222:09:07Raging BullDrama1
" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sparksql \n", "SELECT *\n", " FROM rent_input" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "only showing top 20 row(s)\n" ] }, { "data": { "text/html": [ "
date_renttime_rentmoviemovie_genresidcredit_cardoffset
2020-04-1221:48:57HimizuChildren|Crime|Drama5a3d36f3c5fd7c57f33c564590ff5132a999c9c52934f7d2d76773231a14ee1a24ce6a634ce326653cc3369ffaba5077dde7029f4d5dfd3b5500b757909801fd1
2020-04-127:23:01NeverlandAdventure|Fantasy7dd3f1ce8e9902f2e25ab006a6b043b1fe6a21a8ed23fb68f51164f98cd29d70ed986ee20009536b41ec2b9d5c1b4a67355c0435f8f162b87c592d74dc90eb231
2020-04-124:26:2999 francsComedye75d986b87db8a9f803ecf8388e8d2f60817c5cf14dd32d566f01a161a08ea3f186c0ba3bc0e0d47892882b5fa298a24b6be89ed0c0541d3d485bb9f6b7cdf971
2020-04-1219:26:41Sachs' Disease (La maladie de Sachs)Drama4d1931a512802ccd241903e6143578ed79eca7e071092abd7f097ec32a514ec0dddd2b1dc80a9ac760e762cae6a7522b53061fdb3df7413862745e8a6eaa1c571
2020-04-1213:13:24Cabin BoyComedy99fc323d8ed63269908df021afd3f5e71a945fdf5ae76e081d7d6eebb56cb8a2e9d0136c647b810cfd4837add2fb31a5072ab1074da305eaffa1f742001bd13a1
2020-04-121:20:44LailaDrama|Romancea992199c0b5a96c67ef35d54367bb373d86d025607948d07f87fefe3ed896fd05152d7d99e156aafb8f527cbe0f5a8ecb2f888c68dfedeca5f5ac20b22e2779a1
2020-04-1214:45:48Magnificent Seven, TheAdventure|Westerncc836f5987b01318435a75ff6d0e6598b8902f2cc9e14e9d6d787edba366deb9a0081d6c93bf48fb9577513d326be5f226e0bd20e3853045cfaf3ea9dd3fbb351
2020-04-126:24:04BeautifulDrama|Thriller666a5391d3a9e296dae35207d4fba459c33bb64dcc3e56019699b3b51f1d289fca6aebaef15fd1e193767adfff913b072c17787d6a91a835f5e1cc625f843a921
2020-04-1213:53:53U TurnCrime|Drama|Mysteryabeab0626396b2a733f517eae43287e38ced15ce7006296a385e2f9a56ea4517d107052dc40b767b532d327020e03b3d664db3ecdf0bbc6bac3908316fc06e751
2020-04-121:17:46Kill the IrishmanAction|Crime29c6ebf11da7419171f8f780dfffa6f92aed43a81c553c6cebd3e2f9aa3e91ec0415c7791fcea322c8ab2cc8663364e24723151bcb749159430cc763970d628d1
2020-04-126:32:16Nazis: A Warning from History, TheDocumentary|War72f02662a32787464b4d4de34f045624e7c1f52247eb4f320f71290e0fa18b8299f7505cd0cf8175875823476f32a2a7dc46a43e56ad801c79cab447e0b9a3eb1
2020-04-122:22:36Not My TypeComedy|Romance9cf959a5170d8139a42e925277f1a81cc7b8d87755bf2e250fabf1ae7021030fcc46c3752e143b7af0aaa4b19a94764d19a17a39308dd04797ceba4f86065b971
2020-04-1223:39:21Partners: The Movie II (Aibô: Gekijô-ban II)Dramaccce3a074af34d3021814f212aa066e8dcc89dc0e88e4f8f1c112b6a788ff6ab4ac0245808f665c6b1ba18e2923314e87b017e46f5d998483801ea337672ab5b1
2020-04-1215:20:53Lost in the Desertnull1ec60d1a29e3852cc7c41e2396ecb9b94eb3aa4547cfe638734921581bc59f950321c89ea8f3cdddd2b4cadf5e1cd369822b4b9a47cea5356fc4c19bd87c52871
2020-04-1212:47:583 Ninjas: High Noon On Mega MountainAction|Children|Comedyf7ad3d340a57d32472e9cc52e6083c4829f657ccce89cd269a5e6e0a332ec00bdc398f1dd4b9461e2e79834e51ee776c4d10d8c9b43132db1abcd650846f63ee1
2020-04-124:38:44Forty Shades of BlueDrama017baa8bcd349fe30bf83f7de5bf008fdd0d703a01ff644a94d0cfc14369f5c2c8247c128245059c8f1b5453ac5475618ad9f6fd5927c926502151f86d9896131
2020-04-1220:22:42Funny PeopleComedy|Drama58de4007dce28d93f72a40c9c14f8939823e1247a1f02e9419d33357af5e4d8b38ed94f3068651e045327e0d02683419500dd66b6adf7759e827c714b86674601
2020-04-1217:20:55Wedding Gift, TheDrama|Romance543cf4238c6291216c7cd2eb7442e0811b38e1389d27734a64e913b4084c23e65a8106f74d232adbb320556341465f02fe1f4fd0950f015f962aa34534592bfd1
2020-04-1220:01:01DeliriousComedy|Dramac3c66b769c6b2aec4436b958a8e668b29aad8606f5148fd7889ef5442af2ae15bc2d673ba5763613aa7fee155dde6d67311fa05f1b62788eec8d3ae2d1f3b8871
2020-04-126:42:28Boys Life 4: Four PlayComedy|Drama9b9e2ce075d949d957b2d7f5a8b3203627987ddd85d9493085aa894c81332742e6b85d5006cdc836aadd20d8e7c2ed8dee8ed8fea8031d32a47467e61fa7d8791
" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sparksql \n", "SELECT * from rent_output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }