# Data Prepper Developer Guide This page is for anybody who wishes to contribute code to Data Prepper. Welcome! ## Contributions First, please read our [contribution guide](../CONTRIBUTING.md) for more information on how to contribute to Data Prepper. ## Installation Prerequisites ### Java Versions Building Data Prepper requires JDK 11 or 17. The Data Prepper Gradle build runs in a Java 11 or 17 JVM, but uses [Gradle toolchains](https://docs.gradle.org/current/userguide/toolchains.html) to compile the Java code using Java 11. If you have a JDK 11 installed locally, Gradle will use your installed JDK 11. If you do not, Gradle will install JDK 11. All main source code builds on JDK 11, so it must be compatible with Java 11. The test code (unit and integration tests) runs using JDK 11. ## Building from source The assemble task will build the Jar files and create a runnable distribution without running the integration tests. If you are just looking to build Data Prepper from source, this build is faster than running the integration test suite. To build the project from source, run the following command from the project root: ``` ./gradlew assemble ``` ### Full Project Build Running the build command will assemble the Jar files needed for running DataPrepper. It will also run the integration test suite. To build, run the following command from the project root: ``` ./gradlew build ``` ## Running the project Before running Data Prepper, check that configuration files (see [configuration](configuration.md) docs for more information) have been put in the respective folders under Data Prepper home directory. When building from source, Data Prepper home directory is at `release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64` ($VERSION is the current version as defined in [gradle.properties](../gradle.properties)). The configuration files should be put in the following folders: 1. `data-prepper-config.yaml` in `config/` folder 2. `pipelines.yaml` in `pipelines/` folder Go to home directory: ``` cd release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64 ``` Data Prepper can then be run with the following commands: ``` bin/data-prepper ``` You can also supply your own pipeline configuration file path followed by the server configuration file path, but the support for this method will be dropped in a future release. Example: ``` bin/data-prepper pipelines.yaml data-prepper-config.yaml ``` Additionally, Log4j 2 configuration file is read from `config/log4j2.properties` in the application's home directory. ## Building & Running the Docker Image In some cases, you may wish to build a local Docker image and run it. This is useful if you are making a change to the Docker image, are looking to run a bleeding-edge Docker image, or are needing a custom-built Docker image of Data Prepper. ### Building the Docker Image To build the Docker image, run: ``` ./gradlew clean :release:docker:docker ``` If successful, the Docker image will be available locally. The repository is `opensearch-data-prepper` and the tag is the current version as defined in [gradle.properties](../gradle.properties). You can run the following command in Linux environments to see your Data Prepper Docker images: ``` docker images | grep opensearch-data-prepper ``` The results will look somewhat like the following: ``` opensearch-data-prepper 2.0.0-SNAPSHOT 3e81ef26250c 23 hours ago 566MB ``` ### Running from a Local Docker Image If you build a local Docker image, you can run it using a variation on the following command. You may wish to change the ports you map depending on your specific pipeline configuration. ``` docker run \ -p 21890:21890 \ -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml \ -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml \ opensearch-data-prepper:2.0.0-SNAPSHOT ``` ## Contributing your Code to Data Prepper When you are ready to contribute a change to Data Prepper, please create a GitHub Pull Request (PR). Your PR should target `main`. The Data Prepper maintainers will review your PR and merge it once it is approved. Some changes containing bug fixes or security fixes may be eligible for a patch release. If you believe your change should be a patch release, please see [Backporting](#backporting) ### Branches The Data Prepper maintainers use the `main` branch for the next upcoming release (major or minor). Near the time of the next release, we create a release branch for that upcoming release (e.g. `1.2`). We perform our release builds from this branch. Any patch releases also build from that release branch. ### Backporting When you create a PR which targets `main` and need this change as a patch to a previous version of Data Prepper, use the auto backport GitHub Action. All you need to do is add the label `backport ` to your PR which is targeting `main`. After the PR is merged, the GitHub Action will create a new PR to cherry-pick those changes into the `` branch. A Data Prepper maintainer will need to approve and merge the backported code into the target branch. The auto-generated PR will be on a branch named `backport/backport--to-`. Data Prepper supports patch releases only on the latest version (e.g. 2.1) and on the last version for the previous major release (e.g. 1.4 after 2.0 has been released). These releases are only for bug fixes or security fixes. Please use backports only for bug and security fixes and only targeting candidate releases. You can ask about backporting in your PR or by creating a GitHub issue to request that a previous change be backported. ## Coding Guidance ### Documentation Documentation is very important for users of Data Prepper and contributors. We are using the following conventions for documentation. 1. Document features in markdown. Plugins should have detailed documentation in a `README.md` file in the plugin project directory. Documentation for all of Data Prepper should be included in the [Data Prepper documentation](https://opensearch.org/docs/latest/data-prepper/index/). 2. Provide Javadocs for all public classes, methods, and fields. Plugins need not follow this guidance since their classes are generally not exposed. 3. Avoid commenting within code, unless it is required to understand that code. ### Code For the most part, we use common Java conventions. Here are a few things to keep in mind. 1. Use descriptive names for classes, methods, fields, and variables. 2. Avoid abbreviations unless they are widely accepted 3. Use final on all variables which are not reassigned 4. Wildcard imports are not allowed. 5. Static imports are preferred over qualified imports when using static methods 6. Prefer creating non-static methods whenever possible. Static methods should generally be avoid as they are often used as a shortcut. Sometimes static methods are the best solution such as when using a builder. 7. Public utility or “common” classes are not permitted. 1. They are fine in test code 2. They are fine if package protected 8. Use Optional for return values if the value may not be present. This should be preferred to returning null. 9. Do not create checked exceptions, and do not throw checked exceptions from public methods whenever possible. In general, if you call a method with a checked exception, you should wrap that exception into an unchecked exception. 1. Throwing checked exceptions from private methods is acceptable. ### Formatting Please use the following formatting guidelines: * Java indent is 4 spaces. No tabs. * Maximum line width is 140 characters * We place opening braces at the end of the line, rather than on its own line The official formatting rules for this project are committed as a Checkstyle configuration in [`config/checkstyle/checkstyle.xml`](../config/checkstyle/checkstyle.xml). If you are using IntelliJ, you can use the unofficial Checkstyle IDEA plugin. [These instructions](https://stackoverflow.com/a/26957047/650176) may be useful for configuring the rules. ### Dependencies 1. You should first raise an issue in the Data Prepper project if you are interested in adding a new dependency to the core projects. 2. Avoid using dependencies which provide similar functionality to existing dependencies. 1. For example, this project uses Jackson, so do not add Gson 2. If core Java has the function or feature, prefer it over an external library. Example: Guava’s hashcode and equals methods when Java’s Objects class has them. ### Testing We have the following categories for tests: * Unit tests - Test a single class in isolation. * Integration tests - Test a large component or set of classes in isolation. * End-to-end tests - Tests which run an actual Data Prepper. The should generally be in the [`e2e-test`](../e2e-test) project. Testing Guidelines: 1. Use JUnit 5 for all new test suites 1. You are encouraged to update existing JUnit 4 tests to JUnit 5, but this is not necessary. 2. Use Hamcrest of assertions 3. Use Mockito for mocking 4. Each class should have a unit test. 5. Unit test class names should end with Test. 6. Each large component should have an integration test. 1. A good example is a plugin. Plugins should have their own integration tests which integrate all of the plugin’s classes. However, these tests do not run a full Data Prepper. 7. Integration test class names should end with IT. 8. Test names should indicate what is being tested, if we see a failed test we should be able to look at the test name and have a good idea about what just failed with minimal context about the code being written 1. Two good approaches may be used, depending on what you are testing: 1. methodUnderTest_condition_result 2. test_when_something_condition_then_something_else 2. Please avoid generic test names like “testSuccess” ### Gradle 1. Our Gradle builds use Groovy, so follow our normal Java styles in the build files. For example, use camel case rather than snake case. 2. Use Gradle strings (single quote) unless you need string interpolation. If you need string interpolation, use a GString (double quotes) ## CI Builds Before merging in your PR, the Data Prepper continuous integration (CI) builds must pass. These builds run as GitHub Actions. If an Action is failing, please view the log and determine what is causing your commit to fail. If a test fails, please check the *Summary* section of that Action. There may be artifacts for the test results. You can download these and view the result information. Additionally, many builds have *Unit Test Results* job which includes a summary of the results. ## More Information We have the following pages for specific development guidance on the topics: * [Plugin Development](plugin_development.md) * [Error Handling](error_handling.md) * [Logs](logs.md)