{ "cells": [ { "cell_type": "markdown", "source": [ "# S3 Write API" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "raw", "source": [ ".. article-info::\n", " :avatar-outline: muted\n", " :author: Sanhe\n", " :date: Apr 20, 2023\n", " :read-time: 15 min read\n", " :class-container: sd-p-2 sd-outline-muted sd-rounded-1" ], "metadata": { "collapsed": false, "raw_mimetype": "text/restructuredtext", "pycharm": { "name": "#%% raw\n" } } }, { "cell_type": "markdown", "source": [ "## What is S3 Write API?\n", "\n", "AWS offers many S3 Write APIs for put, copy, delete objects, and more. Since write APIs can cause irreversible impacts, it's important to ensure that you understand the behavior of the API before using it. In this section, we will learn how to use these APIs.\n", "\n", "## Simple Text / Bytes Read and Write" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "raw", "source": [ "- :meth:`~s3pathlib.core.rw.ReadAndWriteAPIMixin.read_text`\n", "- :meth:`~s3pathlib.core.rw.ReadAndWriteAPIMixin.read_bytes`\n", "- :meth:`~s3pathlib.core.rw.ReadAndWriteAPIMixin.write_text`\n", "- :meth:`~s3pathlib.core.rw.ReadAndWriteAPIMixin.write_bytes`" ], "metadata": { "collapsed": false, "raw_mimetype": "text/restructuredtext", "pycharm": { "name": "#%% raw\n" } } }, { "cell_type": "code", "execution_count": 2, "outputs": [ { "data": { "text/plain": "S3Path('s3://s3pathlib/file.txt')" }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from s3pathlib import S3Path\n", "\n", "s3path = S3Path(\"s3://s3pathlib/file.txt\")\n", "s3path" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 3, "outputs": [ { "data": { "text/plain": "'Hello Alice!'" }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3path.write_text(\"Hello Alice!\")\n", "s3path.read_text()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 4, "outputs": [ { "data": { "text/plain": "b'Hello Bob!'" }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3path.write_bytes(b\"Hello Bob!\")\n", "s3path.read_bytes()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "Note that the ``s3path.write_bytes()`` or ``s3path.write_text()`` will overwrite the existing file silently. They don't raise an error if the file already exists. If you want to avoid overwrite, you can check the existence of the file before writing." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 5, "outputs": [], "source": [ "if s3path.exists() is False:\n", " s3path.write_text(\"Hello Alice!\")" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "The ``s3path.write_bytes()`` and ``s3path.write_text()`` will return a new object representing the object you just put. This is because on a versioning enabled bucket, the ``put_object`` API will create a new version of the object. So the ``s3path.write_bytes()`` and ``s3path.write_text()`` should return the new version of the object." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 6, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "False\n" ] } ], "source": [ "# in regular bucket, there's no versioning\n", "s3path_new = s3path.write_text(\"Hello Alice!\")\n", "print(s3path_new == s3path)\n", "print(s3path_new is s3path)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 7, "outputs": [], "source": [ "# in versioning enabled bucket, write_text() will create a new version\n", "s3path = S3Path(\"s3://s3pathlib-versioning-enabled/file.txt\")\n", "s3path_v1 = s3path.write_text(\"v1\")\n", "s3path_v2 = s3path.write_text(\"v2\")" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 8, "outputs": [ { "data": { "text/plain": "'v1'" }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3path_v1.read_text(version_id=s3path_v1.version_id)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 9, "outputs": [ { "data": { "text/plain": "'v2'" }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3path_v2.read_text(version_id=s3path_v2.version_id)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 10, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "v1 = FpAUGgRibznqKGqCcHUc_c_95Hn7ZaJE\n", "v2 = a8tyUUnxHJFt2J3LhEARHrMsOnSYqiSN\n" ] } ], "source": [ "print(f\"v1 = {s3path_v1.version_id}\")\n", "print(f\"v2 = {s3path_v2.version_id}\")" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "## File-like object IO\n", "\n", "[File Object](https://docs.python.org/3/glossary.html#term-file-object) is an object exposing a file-oriented API (with methods such as ``read()`` or ``write()``) to an underlying resource. Depending on the way it was created, a file object can mediate access to a real on-disk file or to another type of storage or communication device (for example standard input/output, in-memory buffers, sockets, pipes, etc.). File objects are also called file-like objects or streams.\n", "\n", "- [json](https://docs.python.org/3/library/json.html)\n", "- [yaml](https://pyyaml.org/wiki/PyYAMLDocumentation)\n", "- [pandas](https://pandas.pydata.org/docs/reference/io.html)\n", "- [polars](https://pola-rs.github.io/polars/py-polars/html/reference/io.html)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "raw", "source": [ ".. note::\n", "\n", " Special Thanks to `smart_open `_. :meth:`S3Path.open ` is just a wrapper around ``smart_open``.\n", "\n", ".. tip::\n", "\n", " :meth:`S3Path.open ` also support ``version_id`` parameter." ], "metadata": { "collapsed": false, "raw_mimetype": "text/restructuredtext", "pycharm": { "name": "#%% raw\n" } } }, { "cell_type": "markdown", "source": [ "### JSON" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 11, "outputs": [], "source": [ "import json\n", "\n", "s3path = S3Path(\"s3://s3pathlib/data.json\")\n", "\n", "# write to s3\n", "with s3path.open(mode=\"w\") as f:\n", " json.dump({\"name\": \"Alice\"}, f)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 12, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'name': 'Alice'}\n" ] } ], "source": [ "# read from s3\n", "with s3path.open(mode=\"r\") as f:\n", " print(json.load(f))" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "### YAML" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 13, "outputs": [], "source": [ "import yaml\n", "\n", "s3path = S3Path(\"s3://s3pathlib/config.yml\")\n", "\n", "# write to s3\n", "with s3path.open(mode=\"w\") as f:\n", " yaml.dump({\"name\": \"Alice\"}, f)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 14, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'name': 'Alice'}\n" ] } ], "source": [ "# read from s3\n", "with s3path.open(mode=\"r\") as f:\n", " print(yaml.load(f, Loader=yaml.SafeLoader))" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "### Pandas" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 15, "outputs": [], "source": [ "import pandas as pd\n", "\n", "s3path = S3Path(\"s3://s3pathlib/data.csv\")\n", "\n", "df = pd.DataFrame(\n", " [\n", " (1, \"Alice\"),\n", " (2, \"Bob\"),\n", " ],\n", " columns=[\"id\", \"name\"]\n", ")\n", "\n", "# write to s3\n", "with s3path.open(mode=\"w\") as f:\n", " df.to_csv(f, index=False)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 16, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " id name\n", "0 1 Alice\n", "1 2 Bob\n" ] } ], "source": [ "# read from s3\n", "with s3path.open(mode=\"r\") as f:\n", " df = pd.read_csv(f)\n", " print(df)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "### Polars" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 17, "outputs": [], "source": [ "import polars as pl\n", "\n", "s3path = S3Path(\"s3://s3pathlib/data.parquet\")\n", "\n", "df = pl.DataFrame(\n", " [\n", " (1, \"Alice\"),\n", " (2, \"Bob\"),\n", " ],\n", " schema=[\"id\", \"name\"]\n", ")\n", "\n", "# write to s3\n", "with s3path.open(mode=\"wb\") as f:\n", " df.write_parquet(f)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 18, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shape: (2, 2)\n", "┌─────┬───────┐\n", "│ id ┆ name │\n", "│ --- ┆ --- │\n", "│ i64 ┆ str │\n", "╞═════╪═══════╡\n", "│ 1 ┆ Alice │\n", "│ 2 ┆ Bob │\n", "└─────┴───────┘\n" ] } ], "source": [ "# read from s3\n", "with s3path.open(mode=\"rb\") as f:\n", " df = pl.read_parquet(f)\n", " print(df)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "## Tagging and Metadata\n", "\n", "[Object Tag](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) is a key-value that can help categorize storage. Tag is mutable so you can update it anytime.\n", "\n", "You can set [object metadata](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html) in Amazon S3 at the time you upload the object. Object metadata is a set of name-value pairs. After you upload the object, you cannot modify object metadata (immutable). The only way to modify object metadata is to make a copy of the object and set the metadata." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 19, "outputs": [], "source": [ "s3path = S3Path(\"s3://s3pathlib/file.txt\")" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 20, "outputs": [ { "data": { "text/plain": "S3Path('s3://s3pathlib/file.txt')" }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# put initial metadata and tags\n", "s3path.write_text(\"Hello\", metadata={\"name\": \"alice\", \"age\": \"18\"}, tags={\"name\": \"alice\", \"age\": \"18\"})" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "raw", "source": [ "We have the folloinwg methods to interact with tags:\n", "\n", "- :meth:`~s3pathlib.core.tagging.TaggingAPIMixin.get_tags`: Get s3 object tags in key value pairs dict.\n", "- :meth:`~s3pathlib.core.tagging.TaggingAPIMixin.put_tags`: Do full replacement of s3 object tags.\n", "- :meth:`~s3pathlib.core.tagging.TaggingAPIMixin.update_tags`: Do partial updates of s3 object tags." ], "metadata": { "collapsed": false, "raw_mimetype": "text/restructuredtext", "pycharm": { "name": "#%% raw\n" } } }, { "cell_type": "code", "execution_count": 21, "outputs": [ { "data": { "text/plain": "{'name': 'alice', 'age': '18'}" }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# you can use ``S3Path.get_tags()`` to get tags\n", "# this method returns a tuple with two item\n", "# the first item is the version_id\n", "# the second item is the tags\n", "s3path.get_tags()[1]" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 22, "outputs": [ { "data": { "text/plain": "{'name': 'alice', 'age': '24', 'email': 'alice@email.com'}" }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# do partial update\n", "s3path.update_tags({\"age\": \"24\", \"email\": \"alice@email.com\"})\n", "s3path.get_tags()[1]" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 23, "outputs": [ { "data": { "text/plain": "{'age': '30'}" }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# do full replacement\n", "s3path.put_tags({\"age\": \"30\"})\n", "s3path.get_tags()[1]" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 24, "outputs": [ { "data": { "text/plain": "{}" }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# if an object doesn't have tag, it will return empty dict\n", "s3path_without_tags = S3Path(\"s3://s3pathlib/file-without-tags.txt\")\n", "s3path_without_tags.write_text(\"Hello\")\n", "s3path_without_tags.get_tags()[1]" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "raw", "source": [ "You can access the object metadata using the :attr:`~s3pathlib.core.metadata.MetadataAPIMixin.metadata` property method. It will first inspect the object-level cache; if not found, it will fetch the metadata from S3 and cache it." ], "metadata": { "collapsed": false, "raw_mimetype": "text/restructuredtext", "pycharm": { "name": "#%% raw\n" } } }, { "cell_type": "code", "execution_count": 25, "outputs": [ { "data": { "text/plain": "{'age': '18', 'name': 'alice'}" }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3path.metadata" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "There's [no way to only update the metadata without updating the content](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html). You have to put the object again with the new metadata." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 26, "outputs": [], "source": [ "# the ``write_text`` method returns a new ``S3Path`` object representing the new object (with new metadata)\n", "s3path_new = s3path.write_text(\"Hello\", metadata={\"name\": \"alice\", \"age\": \"24\"})" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 27, "outputs": [ { "data": { "text/plain": "{'age': '18', 'name': 'alice'}" }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# You will see old metadata because you are accessing the metadata cache of the old ``S3Path``\n", "# the cache was updated when you did the ``write_text`` above\n", "s3path.metadata" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 28, "outputs": [ { "data": { "text/plain": "{'name': 'alice', 'age': '24'}" }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# You will see new metadata\n", "s3path_new.metadata" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 29, "outputs": [ { "data": { "text/plain": "{'age': '24', 'name': 'alice'}" }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# You can also create a new ``S3Path`` object (without cache) and access the metadata\n", "S3Path(\"s3://s3pathlib/file.txt\").metadata" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "## Delete, Copy, Move (Cut)\n", "\n" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "raw", "source": [ "``s3pathlib`` provides the following APIs:\n", "\n", "- :meth:`~s3pathlib.core.delete.DeleteAPIMixin.delete`: delete object or directory (recursively). similar to `os.remove `_ and `shutil.rmtree `_\n", "- :meth:`~s3pathlib.core.copy.CopyAPIMixin.copy_to`: copy object or directory (recursively) from one location to another. similar to `shutil.copy `_ and `shutil.copytree `_\n", "- :meth:`~s3pathlib.core.copy.CopyAPIMixin.move_to`: move (cut) object or directory (recursively) from one location to another. similar to `shutil.move `_" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% raw\n" } } }, { "cell_type": "markdown", "source": [ "### Delete\n", "\n", "The ``delete`` API is the recommended API from 2.X.Y to delete:\n", "\n", "- object\n", "- directory\n", "- specific version of an object\n", "- all versions of an object\n", "- all object all versions in a directory\n", "\n", "By default, if you are trying to delete everything in S3 bucket, it will prompt to confirm the deletion. You can skip the confirmation by setting ``skip_prompt=True``." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 30, "outputs": [ { "data": { "text/plain": "3" }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3dir = S3Path(\"s3://s3pathlib/tmp/\")\n", "s3dir.joinpath(\"README.txt\").write_text(\"readme\")\n", "s3dir.joinpath(\"file.txt\").write_text(\"Hello\")\n", "s3dir.joinpath(\"folder/file.txt\").write_text(\"Hello\")\n", "s3dir.count_objects()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 31, "outputs": [ { "data": { "text/plain": "False" }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Delete a file\n", "s3path_readme = s3dir.joinpath(\"README.txt\")\n", "s3path_readme.delete()\n", "s3path_readme.exists()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 32, "outputs": [ { "data": { "text/plain": "2" }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3dir.count_objects()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 33, "outputs": [ { "data": { "text/plain": "0" }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Delete the entire folder\n", "s3dir.delete()\n", "s3dir.count_objects()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 34, "outputs": [ { "data": { "text/plain": "[S3Path('s3://s3pathlib-versioning-enabled/file.txt'),\n S3Path('s3://s3pathlib-versioning-enabled/file.txt'),\n S3Path('s3://s3pathlib-versioning-enabled/file.txt')]" }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Delete a specific version of an object (permanently delete)\n", "s3path = S3Path(\"s3://s3pathlib-versioning-enabled/file.txt\")\n", "s3path.delete(is_hard_delete=True)\n", "v1 = s3path.write_text(\"v1\").version_id\n", "v2 = s3path.write_text(\"v2\").version_id\n", "v3 = s3path.write_text(\"v3\").version_id\n", "s3path.list_object_versions().all()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 35, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "An error occurred (NoSuchVersion) when calling the GetObject operation: The specified version does not exist.\n" ] } ], "source": [ "s3path.delete(version_id=v1)\n", "try:\n", " s3path.read_text(version_id=v1)\n", "except Exception as e:\n", " print(e)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 36, "outputs": [ { "data": { "text/plain": "[S3Path('s3://s3pathlib-versioning-enabled/file.txt'),\n S3Path('s3://s3pathlib-versioning-enabled/file.txt')]" }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3path.list_object_versions().all()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 37, "outputs": [ { "data": { "text/plain": "[]" }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Delete all versions of an object (permanently delete)\n", "s3path.delete(is_hard_delete=True)\n", "s3path.list_object_versions().all()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 38, "outputs": [ { "data": { "text/plain": "[S3Path('s3://s3pathlib-versioning-enabled/tmp/file1.txt'),\n S3Path('s3://s3pathlib-versioning-enabled/tmp/file1.txt'),\n S3Path('s3://s3pathlib-versioning-enabled/tmp/file2.txt'),\n S3Path('s3://s3pathlib-versioning-enabled/tmp/file2.txt')]" }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Delete all objects all versions in a directory (permanently delete)\n", "s3dir = S3Path(\"s3://s3pathlib-versioning-enabled/tmp/\")\n", "s3path1 = s3dir.joinpath(\"file1.txt\")\n", "s3path2 = s3dir.joinpath(\"file2.txt\")\n", "s3dir.delete(is_hard_delete=True)\n", "s3path1.write_text(\"v1\")\n", "s3path1.write_text(\"v2\")\n", "s3path2.write_text(\"v1\")\n", "s3path2.write_text(\"v2\")\n", "s3dir.list_object_versions().all()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 39, "outputs": [ { "data": { "text/plain": "[]" }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3path.delete(is_hard_delete=True)\n", "s3path.list_object_versions().all()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "### Copy" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 40, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Copy s3://s3pathlib/source/data.json to s3://s3pathlib-versioning-enabled/target/file.txt ...\n", "content of s3://s3pathlib-versioning-enabled/target/file.txt is: 'this is data'\n", "S3Path('s3://s3pathlib/source/data.json') still exists: True\n" ] } ], "source": [ "s3path_source = S3Path(\"s3://s3pathlib/source/data.json\")\n", "s3path_source.write_text(\"this is data\")\n", "s3path_target = s3path.change(new_dirname=\"target\")\n", "print(f\"Copy {s3path_source.uri} to {s3path_target.uri} ...\")\n", "s3path_source.copy_to(s3path_target, overwrite=True)\n", "print(f\"content of {s3path_target.uri} is: {s3path_target.read_text()!r}\")\n", "print(f\"{s3path_source} still exists: {s3path_source.exists()}\")" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "### Move\n", "\n", "move is actually copy then delete the original file. It's a shortcut of ``copy_to`` and ``delete``." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 41, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Copy s3://s3pathlib/source/config.yml to s3://s3pathlib-versioning-enabled/target/file.txt ...\n", "content of s3://s3pathlib-versioning-enabled/target/file.txt is: 'this is config'\n", "S3Path('s3://s3pathlib/source/config.yml') still exists: False\n" ] } ], "source": [ "s3path_source = S3Path(\"s3://s3pathlib/source/config.yml\")\n", "s3path_source.write_text(\"this is config\")\n", "s3path_target = s3path.change(new_dirname=\"target\")\n", "print(f\"Copy {s3path_source.uri} to {s3path_target.uri} ...\")\n", "s3path_source.move_to(s3path_target, overwrite=True)\n", "print(f\"content of {s3path_target.uri} is: {s3path_target.read_text()!r}\")\n", "print(f\"{s3path_source} still exists: {s3path_source.exists()}\")" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "## Upload File or Folder" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "raw", "source": [ "``s3pathlib`` provides the following APIs:\n", "\n", "- :meth:`~s3pathlib.core.upload.UploadAPIMixin.upload_file`: upload a file to s3.\n", "- :meth:`~s3pathlib.core.upload.UploadAPIMixin.upload_dir`: upload a directory (recursively) to a s3 prefix.\n", "\n", "**Upload File**" ], "metadata": { "collapsed": false, "raw_mimetype": "text/restructuredtext", "pycharm": { "name": "#%% raw\n" } } }, { "cell_type": "code", "execution_count": 45, "outputs": [ { "data": { "text/plain": "False" }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# at begin, the file does not exist\n", "s3path = S3Path(\"s3pathlib\", \"daily-report.txt\")\n", "s3path.exists()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 46, "outputs": [ { "data": { "text/plain": "True" }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# upload a file, then file should exist\n", "from pathlib_mate import Path\n", "\n", "# create some test files\n", "path = Path(\"daily-report.txt\")\n", "path.write_text(\"this is a daily report\")\n", "s3path.upload_file(path) # or absolute path as string\n", "\n", "s3path.exists()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 47, "outputs": [ { "data": { "text/plain": "'this is a daily report'" }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3path.read_text()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 48, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cannot write to s3://s3pathlib/daily-report.txt, s3 object ALREADY EXISTS! open console for more details https://console.aws.amazon.com/s3/object/s3pathlib?prefix=daily-report.txt.\n" ] } ], "source": [ "# By default, upload file doesn't allow overwrite, but you can set overwrite as True to skip that check.\n", "try:\n", " s3path.upload_file(path, overwrite=False)\n", "except Exception as e:\n", " print(e)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "**Upload Folder**\n", "\n", "You can easily upload the entire folder to S3. The folder structure will be preserved." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 50, "outputs": [ { "data": { "text/plain": "False" }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# at begin, the folder does not exist\n", "s3dir = S3Path(\"s3pathlib\", \"uploaded-documents/\")\n", "s3dir.exists()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 51, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "S3Path('s3://s3pathlib/uploaded-documents/README.txt')\n", "S3Path('s3://s3pathlib/uploaded-documents/folder/file.txt')\n" ] } ], "source": [ "# create some test files\n", "dir_documents = Path(\"documents\")\n", "dir_documents.joinpath(\"folder\").mkdir(exist_ok=True, parents=True)\n", "dir_documents.joinpath(\"README.txt\").write_text(\"read me first\")\n", "dir_documents.joinpath(\"folder\", \"file.txt\").write_text(\"this is a file\")\n", "\n", "s3dir.upload_dir(dir_documents, overwrite=True)\n", "\n", "# inspect s3 dir folder structure\n", "for s3path in s3dir.iter_objects():\n", " print(s3path)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "## What's Next\n", "\n", "With a thorough understanding of all the features provided by s3pathlib, it's time to see how you can use this package to develop applications for production." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }