# Datajet Examples Here's a set of examples to get you started with FireLens Datajet and Familiar with it's features. Before getting started, we suggest running FireLens Datajet on an EC2 instance with a role that allows it to hit CloudWatch, S3, Firehose, and Kinesis. Please see the following document for the propper IAM roles and policies. [Document](https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit) The first example is: Run Fluent Bit on the side and use FireLens Datajet to send some test data over. - Onboarding: hard - Conceptually: easy Start up Fluent Bit with the following configuration: ``` [SERVICE] Grace 30 Log_Level debug [INPUT] Name forward Listen 0.0.0.0 Port 24224 [OUTPUT] Name stdout Match * ``` Use the following Datajet config to send sample data to CloudWatch ``` { "generator": { "name": "color-logger", "config": { "payloadSize": 70, "disableSignal": true } }, "datajet": { "name": "forward", "config": { "logStream": "stdout" } }, "stage": { "batchRate": 200, "batchLimit": 200 } } ``` Concepts - Stage: The datajet configuration defines a single stage that creates data with the generator, and sends that data with the datajet. The speed and duration of the stage is configurable in the stage section. - Generator: The generator is responsable for creating test data. It doesn't know how fast it needs to send the data or how much data it needs to send. It is simply responsible for generating new data. See a list of genators and their configuration options in the `./generators/generator-index.ts` file - Datajet: The datajet is responsable for sending out the data generated by the generator. It also doesn't care about how fast to send the data or how much data to send. It simply is responsible for sending out data. See a list of Datajets in the `./datajet/datajet-index.ts` file. But what if we want to send multiple sources of data to multiple inputs at the same time? Glad you asked. Example 2: Synchronizer Start up Fluent Bit with the following configuration: ``` [SERVICE] Grace 30 Log_Level debug [INPUT] Name forward Listen 0.0.0.0 Port 24224 [INPUT] Name tcp Tag sample_tcp Listen 0.0.0.0 Port 6270 Format json [OUTPUT] Name stdout Match * ``` Use the following Datajet config to send sample data to CloudWatch via the tcp input and file input ``` { "component": "synchronizer", "config": { "repeat": 1, "waitBefore": 0.0, "waitAfter": 0.0, "waitBetween": 0.0, "isAsync": true }, "children": [ { "generator": { "name": "color-logger", "config": { "payloadSize": 70, "disableSignal": true } }, "datajet": { "name": "forward", "config": { "logStream": "stdout" } }, "stage": { "batchRate": 200, "batchLimit": 200 } }, { "generator": { "name": "basic", "config": { "contentLength": 200, "batchSize": 2000, "key": "log" } }, "datajet": { "name": "tcp", "config": { "host": "0.0.0.0", "port": 6270 } }, "stage": { "batchRate": 200, "batchLimit": 200 } } ] } ``` Description: Here we see the synchronizer being used to coordinate multiple stages to send data to Fluent Bit in parallel. Synchonizers will allow you to mimmick a customer application where logs are being sent to fluent bit through multiple different routes with a variety of formats. Concepts: - Synchronizer: A synchronizer coordinates multiple stages to run in parallel or one after the other any number of times. - isAsync Configuation: Is async describes whether the stages will run in parallel. If isAsync is false, the next stage will only run after the last stage is complete - repeat Configuration: Repeat is used to run the stage contents multiple times - waitBefore, waitAfter, waitBetween Configuations: The waits define how long to wait before after and between loops. Questions: - How would you mock up an application where 1kb logs are sent out via Forward at a rate of 1 log per second for 10 seconds, then no logs are emitted for 10 seconds, and then the cycle occurs. - How would you mock up an application where the application sends log data over TCP port 6550 at a steady rate of 1Kb logs at 2 logs per second, but every 10 seconds sends logs over TCP port 7550 at a bursty rate of 3Kb logs at 1000 logs per second for 1 second? Example 3: Nested Synchronizer In this example we will use a File Datajet. Make a folder on your machine in the root directory called `tmp-datajet` to store the logs ``` sudo mkdir -p /tmp-datajet/ sudo chmod -R 777 /tmp-datajet/ ``` Start up Fluent Bit with the following configuration: ``` [SERVICE] Grace 30 Log_Level debug [INPUT] Name tcp Tag sample_tcp Listen 0.0.0.0 Port 6270 Format json [INPUT] Name tail Tag tailLogs Path /tmp-datajet/*.log refresh_interval 2 rotate_wait 5 db /tmp-datajet/fluentbit-logs.db db.sync normal db.locking true buffer_chunk_size 40MB buffer_max_size 1GB skip_long_lines on mem_buf_limit 1GB [OUTPUT] Name stdout Match * ``` Use the following Datajet config to send sample data to Fluent Bit via the tcp input and file input ``` { "component": "synchronizer", "config": { "repeat": 1, "waitBefore": 0.0, "waitAfter": 0.0, "waitBetween": 0.0, "isAsync": false }, "children": [ { "generator": { "name": "basic", "config": { "contentLength": 200, "batchSize": 1, "key": "log" } }, "datajet": { "name": "tcp", "config": { "host": "0.0.0.0", "port": 6270 } }, "stage": { "batchRate": 2, "batchLimit": 200 } }, { "component": "synchronizer", "config": { "repeat": 10, "waitBefore": 0.0, "waitAfter": 0.0, "waitBetween": 10.0, "isAsync": false }, "children": [ { "generator": { "name": "increment", "config": { "contentLength": 3000, "batchSize": 1 } }, "datajet": { "name": "file", "config": { "folder": "/tmp-datajet" } }, "stage": { "batchRate": 10, "batchLimit": 10 } } ] } ] } ```