# Setup of local development environment
> **Note**: The subsequent steps assume an installation on ___MacOS/OSX___
## 1) Installation of tools and dependencies
### Python
Everything will be installed in virtual environment in your local project folder.
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-test.txt
```
### Java
Install openjdk and maven.
```bash
brew tap homebrew/cask-versions
brew update
brew tap homebrew/cask
brew tap adoptopenjdk/openjdk
brew install --cask adoptopenjdk11
brew install maven
```
### Apache Spark
Install spark dependencies:
```bash
mkdir spark_deps
cd spark_deps
wget https://jdbc.postgresql.org/download/postgresql-42.2.23.jar
```
Install the AWS dependencies for Apache Hadoop:
1. check the current version of hadoop: ```ll -al .venv/lib/python3.9/site-packages/pyspark/jars |grep hadoop```
2. create a POM file in the spark_deps folder (make sure the version field matches the current hadoop version):
```xml
4.0.0
com.mycompany.app
my-app
1
org.apache.hadoop
hadoop-aws
3.3.1
```
Then, run:
```bash
mvn --batch-mode -f ./pom.xml -DoutputDirectory=./jars dependency:copy-dependencies
mv jars/* .
```
## 2) Test the installation
> **Note:** Don't forget to switch the new virtual environment in your IDE too.
### Package creation
Test whether the python package (wheel) can be build through which the data-product-processor is distributed.
```commandline
pip install -U pip wheel setuptools
python3 setup.py bdist_wheel
```
### Local invocation of data-product-processor
To test if the data-product-processor can be executed correctly, follow the subsequent steps.
Alternatively you can run the whole solution from the command line:
```commandline
data-product-processor \
--JOB_NAME "TEST" \
--product_path /tests/assets/integration \
--default_data_lake_bucket \
--aws_profile \
--aws_region
```
Optionally you might need to export Spark Home if the Spark environment is not found in your installation.
```commandline
export SPARK_HOME="$(pwd)/.venv/lib/python3.9/site-packages/pyspark"
```
Run the tests from command line (while the virtual environment is activated):
```commandline
pytest
```
# Troubleshooting / common errors
## py4j
```
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM
```
Resolve through:
```commandline
export PYTHONPATH="${SPARK_HOME}/python;${SPARK_HOME}/python/lib/py4j-0.10.9-src.zip;${PYTHONPATH}"
```
## Sfl4j not found
```commandline
[NOT FOUND ] org.slf4j#slf4j-api;1.7.5!slf4j-api.jar
```
**Solution**
Remove dir in .ivy2/cache, ivy2/jars and .m2/repository