{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[](https://github.com/aws/aws-sdk-pandas)\n", "\n", "# 1 - Introduction" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## What is AWS SDK for pandas?\n", "\n", "An [open-source](https://github.com/aws/aws-sdk-pandas>) Python package that extends the power of [Pandas](https://github.com/pandas-dev/pandas>) library to AWS connecting **DataFrames** and AWS data related services (**Amazon Redshift**, **AWS Glue**, **Amazon Athena**, **Amazon Timestream**, **Amazon EMR**, etc).\n", "\n", "Built on top of other open-source projects like [Pandas](https://github.com/pandas-dev/pandas), [Apache Arrow](https://github.com/apache/arrow) and [Boto3](https://github.com/boto/boto3), it offers abstracted functions to execute usual ETL tasks like load/unload data from **Data Lakes**, **Data Warehouses** and **Databases**.\n", "\n", "Check our [list of functionalities](https://aws-sdk-pandas.readthedocs.io/en/3.2.1/api.html)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## How to install?\n", "\n", "awswrangler runs almost anywhere over Python 3.8, 3.9 and 3.10, so there are several different ways to install it in the desired environment.\n", "\n", " - [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.2.1/install.html#pypi-pip)\n", " - [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.2.1/install.html#conda)\n", " - [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.2.1/install.html#aws-lambda-layer)\n", " - [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.2.1/install.html#aws-glue-python-shell-jobs)\n", " - [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.2.1/install.html#aws-glue-pyspark-jobs)\n", " - [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.2.1/install.html#amazon-sagemaker-notebook)\n", " - [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.2.1/install.html#amazon-sagemaker-notebook-lifecycle)\n", " - [EMR Cluster](https://aws-sdk-pandas.readthedocs.io/en/3.2.1/install.html#emr-cluster)\n", " - [From source](https://aws-sdk-pandas.readthedocs.io/en/3.2.1/install.html#from-source)\n", "\n", "Some good practices for most of the above methods are:\n", " - Use new and individual Virtual Environments for each project ([venv](https://docs.python.org/3/library/venv.html))\n", " - On Notebooks, always restart your kernel after installations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's Install it!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install awswrangler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Restart your kernel after the installation!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2.0.0'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import awswrangler as wr\n", "\n", "wr.__version__" ] } ], "metadata": { "kernelspec": { "display_name": "awswrangler-v9JnknIF-py3.8", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "vscode": { "interpreter": { "hash": "83297b058d59ee0acd247586c837429190a8258f15c0eea6234359f5557dde51" } } }, "nbformat": 4, "nbformat_minor": 4 }