--- title: "Stage Raw Data" chapter: true weight: 1 --- ## Stage Raw Data In this video, we will demostrate how to install a package and then use the package to stage our data that lives in the S3 bucket.
*Summary of this video:* 1. Check a new branch. To do this, you will need to check out the `main`/`master` branch first and then click on create a new branch (`feature/stage_tpch_data`). If you get the button `Pull from Remote`, feel free to do so. This is an indictation that something has changed the remote branch and you will want to pull those changes into your remote branch. 2. Create a `packages.yml` file on the same level as your `dbt_project.yml` file 3. Add the following code to the file. ```yml packages: - package: dbt-labs/dbt_external_tables version: 0.8.0 - package: dbt-labs/dbt_utils version: 0.8.1 ``` The video mentions that installing the `dbt_external_tables` package will also install the `dbt_utils` package because it is used as a dependency. While this was true in previous versions, the updated version being used now does not perform the same auto-install. It's important to include the `dbt_utils` package as shown above when creating your `packages.yml` file to avoid errors when creating your external tables in step 11. 4. Run `dbt deps` to install the package. A way to remember this command is deps = dependencies. 5. Create a `staging` folder in the `models` folder. 6. Create a folder named `tpch` inside of the `staging` folder. 7. Create the `sources.yml` file inside of the tpch folder. 8. Paste this code into the `sources.yml` file: ```yml version: 2 sources: - name: tpch database: dbtworkshop schema: tpch_s3_data loader: s3 loaded_at_field: collector_tstamp tables: - name: lineitem external: location: "s3://dbt-data-lake-[Insert AWS Account ID]]/dbtworkshopdata/lineitem/" row_format: serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' table_properties: "('skip.header.line.count'='1')" columns: - name: l_orderkey data_type: varchar(255) description: "primary key" - name: l_partkey data_type: varchar(255) - name: l_suppkey data_type: varchar(255) - name: l_linenumber data_type: varchar(255) - name: l_quantity data_type: varchar(255) - name: l_extendedprice data_type: varchar(255) - name: l_discount data_type: varchar(255) - name: l_tax data_type: varchar(255) - name: l_returnflag data_type: varchar(255) - name: l_linestatus data_type: varchar(255) - name: l_shipmode data_type: varchar(255) - name: l_shipdate data_type: timestamp - name: l_commitdate data_type: timestamp - name: l_receiptdate data_type: timestamp - name: orders external: location: "s3://dbt-data-lake-[Insert AWS Account ID]/dbtworkshopdata/orders/" row_format: serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' table_properties: "('skip.header.line.count'='1')" columns: - name: o_orderkey data_type: varchar(255) description: "primary key" - name: o_custkey data_type: varchar(255) - name: o_orderstatus data_type: varchar(255) - name: o_totalprice data_type: varchar(255) - name: o_orderdate data_type: varchar(255) - name: o_orderpriority data_type: varchar(255) - name: o_clerk data_type: varchar(255) - name: o_shippriority data_type: varchar(255) - name: o_comment data_type: varchar(255) ``` 9. Now go to the S3 Management Console and find a S3 bucket that starts with the prefix `dbt-data-lake-` 10. Update the location field in the sources.yml file to point to your lineitem and orders files. They will be almost the same as this `s3://dbt-data-lake-486758181003/dbtworkshopdata/lineitem/` with different numbers for you S3 bucket name. Do not include the `.csv` 11. Run the command `dbt run-operation stage_external_sources` to create the external tables. 12. Run `dbt docs generate` to the documentation site. 13. Click on `view docs` to view the documentation. ## Helpful Links - If you wish to see the external tables you created via the `dbt run-operation stage_external_sources`, go into the Redshift Query editor, connect and check out the tables in the schema `tpch_s3_data` - [Operations Macros](https://docs.getdbt.com/reference/commands/run-operation)