--- title: "3. Data Cleansing & Reshaping Lab" weight: 30 chapter: true draft : false --- # Data Cleansing & Reshaping Lab In the previous two excersies of this Lab you have learned how to integrate data. In this lab, you will learn how to clean and reshape data using IBM Data Refinery flow. ## Learning Objectives: > #### In this tutorial you will learn: > 1. How to filter data based on age criteria > 2. How to visualize data ## Prerequisites > 1. IBM Cloud Pak for Data > 2. IBM Data Refinery > 3. External Data Sources (Amazon S3, Amazon Aurora PostgreSQL) > 4. IBM Watson Knowledge Catalog ## Estimated time It should take you approximately 15 minutes to complete this lab. ## Lab Steps: >#### Step 1: Create Data Refinery flow 1. Go to project home page by clicking Navigation Menu -> Projects -> All projects. From the project home page click **Add asset** and select **Data Refinery**. ![Data Ingestion](/images/30_governance_lab/data_refinery_1.png) 2. Select the final ingested data from **datastage lab**. eg **Datastage_Output_Table_v1** ![Data Ingestion](/images/30_governance_lab/data_refinery_2.png) > #### Step 2. Define filter and criteria 1. Click New step ![Data Ingestion](/images/30_governance_lab/data_refinery_3.png) 2. Click Remove duplicate to remove duplicate _Email Addresses_ ![Data Ingestion](/images/30_governance_lab/data_refinery_4.png) ![Data Ingestion](/images/30_governance_lab/data_refinery_4.1.png) 3. Filter healthcare personnel by age 65+ ![Data Ingestion](/images/30_governance_lab/data_refinery_5.png) ![Data Ingestion](/images/30_governance_lab/data_refinery_6.png) 4. Click Profile to see data statistics ![Data Ingestion](/images/30_governance_lab/data_refinery_8.png) ![Data Ingestion](/images/30_governance_lab/data_refinery_9.png) 5. Click Visualization to create Pie chart of Places ![Data Ingestion](/images/30_governance_lab/data_refinery_10.png) ![Data Ingestion](/images/30_governance_lab/data_refinery_11.png) 6.Create job to apply the changes using Save and create a job option. ![Data Ingestion](/images/30_governance_lab/save_refinery.png) 7. Give a name to the job and click Next ![Data Ingestion](/images/30_governance_lab/data_refinery_12.png) ![Data Ingestion](/images/30_governance_lab/data_refinery_13.png) ![Data Ingestion](/images/30_governance_lab/data_refinery_14.png) ![Data Ingestion](/images/30_governance_lab/data_refinery_15.png) ![Data Ingestion](/images/30_governance_lab/data_refinery_16.png) 8. Within a few seconds reshaped asset will be added to the Project. ![Data Ingestion](/images/30_governance_lab/data_refinery_17.png) 9. Click navigation menu then click Catalog then All catalogs and then New Catalog to create a new catalog. ![Data Ingestion](/images/30_governance_lab/create_catalog.png) 10. Provide a name to catalog and then click on Enforce Data Protection Rules and the click on Create to create new catalog. ![Data Ingestion](/images/30_governance_lab/create_catalog_2.png) ![Data Ingestion](/images/30_governance_lab/create_catalog_3.png) 11. Go back to Project (Data_Fabric_Project) and the check the reshaped asset generated by Data Refinery flow. Click on the right most three dots of the asset as shown below and then click Publish to Catalog. ![Data Ingestion](/images/30_governance_lab/data_refinery_18.png) ![Data Ingestion](/images/30_governance_lab/data_refinery_19.png) 12. Go back to Data_Fabric_Catalog and click on asset to open. ![Data Ingestion](/images/30_governance_lab/create_catalog_4.png) 13. Click Asset tab to view the data. ![Data Ingestion](/images/30_governance_lab/create_catalog_5.png) 14. Click Profile to view data statistics. ![Data Ingestion](/images/30_governance_lab/create_catalog_6.png) ## Summary This lab you have learned how to clean or reshape the data using IBM data refinery. Also we have learned how to create catalog and how to export data from project to catalog.