--- title: "Introduction" weight: 10 chapter: true draft: false --- # Introduction to Lab 2 Author: [Arpit Nanavati](https://www.linkedin.com/in/arpitn/) Welcome to the Governance Lab ! **Data fabric** is a highly scalable, distributed data architecture comprising both shared data assets and streamlined data integration and governance capabilities that can be used to tackle modern data challenges. A typical data fabric solution will comprising of components such as Data Catalog, Data Integration, Data Governance, Data Visualisation etc. In this **Governance Lab**, you will solve the challenges faced by different personas in Data Analytics. * **Data Scientists** spend 80% of their time in discovering, curating, and cleansing the data. How to provide them quality data for analytics? * **Data Engineers** face lots of challenges while integrating data from multiple data sources. How can they quickly and efficiently collect and integrate data? * **Data Steward** deals with data privacy and protection challenges. How to ensure that the data is governed, and no sensitive information is shared with data consumers? These Gaps could now be addressed by the governed data fabric architecture using IBM Cloud Pak for Data (CP4D). After completing this lab, you will understand how to : * How to create connection between external data sources and **IBM Cloud Pak for Data** using CP4D custom connectors. * How to ingest data from multiple data sources. * How to clean, filter, or reshape data. * How to query data from multiple data sources without copying or moving the data using **IBM Data Virtualization** service. * How to create data integration pipeline to transform and integrate data from heterogeneous data sources using **IBM DataStage**. * How to protect sensitive data (such as PII) using **IBM Watson Knowledge Catalog**. * How to schedule job to run data integration pipeline periodically.