## Create single output files for recipe jobs using AWS Glue DataBrew [AWS Glue DataBrew](https://aws.amazon.com/glue/features/databrew/) offers over 350 pre-built transformations to automate data preparation tasks (such as filtering anomalies, standardizing formats, and correcting invalid values) that would otherwise require days or weeks writing hand-coded transformations. You can now choose single or multiple output files instead of autogenerated files for your DataBrew recipe jobs. You can generate a single output file when the output is small or downstream systems need to consume it more easily, such as visualization tools. Alternatively, you can specify your desired number of output files when configuring a recipe job. This gives you the flexibility to manage recipe job output for visualization, data analysis, and reporting, while helping prevent you from generating too many files. In some cases, you may also want to customize the output file partitions for efficient storage and transfer. In this post, we walk you through how to connect and transform data from an Amazon Simple Storage Service (Amazon S3) data lake and configure the output as a single file via the DataBrew console. ## Solution overview The following diagram illustrates our solution architecture. ![Architecture](/image/BDB-2185-image001.png) DataBrew queries sales order data from the S3 data lake and performs data transformation. Then the DataBrew job writes the final output back to the data lake in a single file. To implement the solution, you complete the following high-level steps: + Create a dataset. + Create a DataBrew project using the dataset. + Build a transformation recipe. + Create and run a DataBrew recipe job on the full data. ## Prerequisites To complete this solution, you should have an AWS account and the appropriate permissions to create the resources required as part of the solution. You also need a dataset in Amazon S3. For our use case, we use a mock dataset. You can download the data files from GitHub. On the Amazon S3 console, upload all three CSV files to an S3 bucket. ![S3 Upload](/image/BDB-2185-image003.png) The schema of the data file is as follow: + order_id : Order ID of the transaction + product_id : Product ID of the product + customer_id : Customer ID + product : Name of the product + state_name : State where transaction happend + amount : Amount of the transaction + currency : Currency of the transaction + timestamp : Timestamp of the transaction + transaction_date : Transaction date of the transaction ![schema](/image/schema.PNG) ## Security See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. ## License This library is licensed under the MIT-0 License. See the LICENSE file.