# Introduction to Metadata Ingestion ## Integration Options DataHub supports both **push-based** and **pull-based** metadata integration. Push-based integrations allow you to emit metadata directly from your data systems when metadata changes, while pull-based integrations allow you to "crawl" or "ingest" metadata from the data systems by connecting to them and extracting metadata in a batch or incremental-batch manner. Supporting both mechanisms means that you can integrate with all your systems in the most flexible way possible. Few recipes are included in the [examples/recipes](./examples/recipes) directory. Ingesting Metadata through recipe, which will push the Metadata to a Rest Endpoint of Datahub. Get the GMS_ENDPOINT by executing ``` kubectl get svc ``` Get the loadbalance url for service - datahub-datahub-gms you url will look like this - http://{load balance url for gms service}:8080 Genrate a token from datahub UI: Go to Settings on right top corner and genrate GMS_TOKEN and make a note of it ## Prerequisites inside a virtualenv install ``` python3 -m pip install 'acryl-datahub[datahub-rest]' ``` ``` python3 -m pip install 'acryl-datahub' ``` For Glue ``` python3 -m pip install 'acryl-datahub[glue]' ``` installing a particular version ``` python3 -m pip install 'acryl-datahub==0.8.38' ``` ## Running this recipe is as simple as: cd examples/recipes chnage the value for ```shell datahub ingest -c redshift_to_datahub_new.yml ``` or if you want to override the default endpoints, you can provide the environment variables as part of the command like below: ```shell DATAHUB_GMS_HOST="https://my-datahub-server:8080" DATAHUB_GMS_TOKEN="my-datahub-token" datahub ingest -c recipe.yaml ``` ### Programmatic Pipeline In some cases, you might want to configure and run a pipeline entirely from within your custom Python script. Here is an example of how to do it. - [glue_ingestion.py](./examples/code/glue_ingestion.py) - a basic glue to REST programmatic pipeline.