# S3 Write API

## What is S3 Write API?

AWS offers many S3 Write APIs for put, copy, delete objects, and more. Since write APIs can cause irreversible impacts, it's important to ensure that you understand the behavior of the API before using it. In this section, we will learn how to use these APIs.

## Simple Text / Bytes Read and Write

In [2]:
from s3pathlib import S3Path

s3path = S3Path("s3://s3pathlib/file.txt")
s3path

S3Path('s3://s3pathlib/file.txt')

In [3]:
s3path.write_text("Hello Alice!")
s3path.read_text()

'Hello Alice!'

In [4]:
s3path.write_bytes(b"Hello Bob!")
s3path.read_bytes()

b'Hello Bob!'

Note that the ``s3path.write_bytes()`` or ``s3path.write_text()`` will overwrite the existing file silently. They don't raise an error if the file already exists. If you want to avoid overwrite, you can check the existence of the file before writing.

In [5]:
if s3path.exists() is False:
    s3path.write_text("Hello Alice!")

The ``s3path.write_bytes()`` and ``s3path.write_text()`` will return a new object representing the object you just put. This is because on a versioning enabled bucket, the ``put_object`` API will create a new version of the object. So the ``s3path.write_bytes()`` and ``s3path.write_text()`` should return the new version of the object.

In [6]:
# in regular bucket, there's no versioning
s3path_new = s3path.write_text("Hello Alice!")
print(s3path_new == s3path)
print(s3path_new is s3path)

True
False


In [7]:
# in versioning enabled bucket, write_text() will create a new version
s3path = S3Path("s3://s3pathlib-versioning-enabled/file.txt")
s3path_v1 = s3path.write_text("v1")
s3path_v2 = s3path.write_text("v2")

In [8]:
s3path_v1.read_text(version_id=s3path_v1.version_id)

'v1'

In [9]:
s3path_v2.read_text(version_id=s3path_v2.version_id)

'v2'

In [10]:
print(f"v1 = {s3path_v1.version_id}")
print(f"v2 = {s3path_v2.version_id}")

v1 = FpAUGgRibznqKGqCcHUc_c_95Hn7ZaJE
v2 = a8tyUUnxHJFt2J3LhEARHrMsOnSYqiSN


## File-like object IO

[File Object](https://docs.python.org/3/glossary.html#term-file-object) is an object exposing a file-oriented API (with methods such as ``read()`` or ``write()``) to an underlying resource. Depending on the way it was created, a file object can mediate access to a real on-disk file or to another type of storage or communication device (for example standard input/output, in-memory buffers, sockets, pipes, etc.). File objects are also called file-like objects or streams.

- [json](https://docs.python.org/3/library/json.html)
- [yaml](https://pyyaml.org/wiki/PyYAMLDocumentation)
- [pandas](https://pandas.pydata.org/docs/reference/io.html)
- [polars](https://pola-rs.github.io/polars/py-polars/html/reference/io.html)

### JSON

In [11]:
import json

s3path = S3Path("s3://s3pathlib/data.json")

# write to s3
with s3path.open(mode="w") as f:
    json.dump({"name": "Alice"}, f)

In [12]:
# read from s3
with s3path.open(mode="r") as f:
    print(json.load(f))

{'name': 'Alice'}


### YAML

In [13]:
import yaml

s3path = S3Path("s3://s3pathlib/config.yml")

# write to s3
with s3path.open(mode="w") as f:
    yaml.dump({"name": "Alice"}, f)

In [14]:
# read from s3
with s3path.open(mode="r") as f:
    print(yaml.load(f, Loader=yaml.SafeLoader))

{'name': 'Alice'}


### Pandas

In [15]:
import pandas as pd

s3path = S3Path("s3://s3pathlib/data.csv")

df = pd.DataFrame(
    [
        (1, "Alice"),
        (2, "Bob"),
    ],
    columns=["id", "name"]
)

# write to s3
with s3path.open(mode="w") as f:
    df.to_csv(f, index=False)

In [16]:
# read from s3
with s3path.open(mode="r") as f:
    df = pd.read_csv(f)
    print(df)

   id   name
0   1  Alice
1   2    Bob


### Polars

In [17]:
import polars as pl

s3path = S3Path("s3://s3pathlib/data.parquet")

df = pl.DataFrame(
    [
        (1, "Alice"),
        (2, "Bob"),
    ],
    schema=["id", "name"]
)

# write to s3
with s3path.open(mode="wb") as f:
    df.write_parquet(f)

In [18]:
# read from s3
with s3path.open(mode="rb") as f:
    df = pl.read_parquet(f)
    print(df)

shape: (2, 2)
┌─────┬───────┐
│ id  ┆ name  │
│ --- ┆ ---   │
│ i64 ┆ str   │
╞═════╪═══════╡
│ 1   ┆ Alice │
│ 2   ┆ Bob   │
└─────┴───────┘


## Tagging and Metadata

[Object Tag](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) is a key-value that can help categorize storage. Tag is mutable so you can update it anytime.

You can set [object metadata](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html) in Amazon S3 at the time you upload the object. Object metadata is a set of name-value pairs. After you upload the object, you cannot modify object metadata (immutable). The only way to modify object metadata is to make a copy of the object and set the metadata.

In [19]:
s3path = S3Path("s3://s3pathlib/file.txt")

In [20]:
# put initial metadata and tags
s3path.write_text("Hello", metadata={"name": "alice", "age": "18"}, tags={"name": "alice", "age": "18"})

S3Path('s3://s3pathlib/file.txt')

In [21]:
# you can use ``S3Path.get_tags()`` to get tags
# this method returns a tuple with two item
# the first item is the version_id
# the second item is the tags
s3path.get_tags()[1]

{'name': 'alice', 'age': '18'}

In [22]:
# do partial update
s3path.update_tags({"age": "24", "email": "alice@email.com"})
s3path.get_tags()[1]

{'name': 'alice', 'age': '24', 'email': 'alice@email.com'}

In [23]:
# do full replacement
s3path.put_tags({"age": "30"})
s3path.get_tags()[1]

{'age': '30'}

In [24]:
# if an object doesn't have tag, it will return empty dict
s3path_without_tags = S3Path("s3://s3pathlib/file-without-tags.txt")
s3path_without_tags.write_text("Hello")
s3path_without_tags.get_tags()[1]

{}

In [25]:
s3path.metadata

{'age': '18', 'name': 'alice'}

There's [no way to only update the metadata without updating the content](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html). You have to put the object again with the new metadata.

In [26]:
# the ``write_text`` method returns a new ``S3Path`` object representing the new object (with new metadata)
s3path_new = s3path.write_text("Hello", metadata={"name": "alice", "age": "24"})

In [27]:
# You will see old metadata because you are accessing the metadata cache of the old ``S3Path``
# the cache was updated when you did the ``write_text`` above
s3path.metadata

{'age': '18', 'name': 'alice'}

In [28]:
# You will see new metadata
s3path_new.metadata

{'name': 'alice', 'age': '24'}

In [29]:
# You can also create a new ``S3Path`` object (without cache) and access the metadata
S3Path("s3://s3pathlib/file.txt").metadata

{'age': '24', 'name': 'alice'}

## Delete, Copy, Move (Cut)



### Delete

The ``delete`` API is the recommended API from 2.X.Y to delete:

- object
- directory
- specific version of an object
- all versions of an object
- all object all versions in a directory

By default, if you are trying to delete everything in S3 bucket, it will prompt to confirm the deletion. You can skip the confirmation by setting ``skip_prompt=True``.

In [30]:
s3dir = S3Path("s3://s3pathlib/tmp/")
s3dir.joinpath("README.txt").write_text("readme")
s3dir.joinpath("file.txt").write_text("Hello")
s3dir.joinpath("folder/file.txt").write_text("Hello")
s3dir.count_objects()

3

In [31]:
# Delete a file
s3path_readme = s3dir.joinpath("README.txt")
s3path_readme.delete()
s3path_readme.exists()

False

In [32]:
s3dir.count_objects()

2

In [33]:
# Delete the entire folder
s3dir.delete()
s3dir.count_objects()

0

In [34]:
# Delete a specific version of an object (permanently delete)
s3path = S3Path("s3://s3pathlib-versioning-enabled/file.txt")
s3path.delete(is_hard_delete=True)
v1 = s3path.write_text("v1").version_id
v2 = s3path.write_text("v2").version_id
v3 = s3path.write_text("v3").version_id
s3path.list_object_versions().all()

[S3Path('s3://s3pathlib-versioning-enabled/file.txt'),
 S3Path('s3://s3pathlib-versioning-enabled/file.txt'),
 S3Path('s3://s3pathlib-versioning-enabled/file.txt')]

In [35]:
s3path.delete(version_id=v1)
try:
    s3path.read_text(version_id=v1)
except Exception as e:
    print(e)

An error occurred (NoSuchVersion) when calling the GetObject operation: The specified version does not exist.


In [36]:
s3path.list_object_versions().all()

[S3Path('s3://s3pathlib-versioning-enabled/file.txt'),
 S3Path('s3://s3pathlib-versioning-enabled/file.txt')]

In [37]:
# Delete all versions of an object (permanently delete)
s3path.delete(is_hard_delete=True)
s3path.list_object_versions().all()

[]

In [38]:
# Delete all objects all versions in a directory (permanently delete)
s3dir = S3Path("s3://s3pathlib-versioning-enabled/tmp/")
s3path1 = s3dir.joinpath("file1.txt")
s3path2 = s3dir.joinpath("file2.txt")
s3dir.delete(is_hard_delete=True)
s3path1.write_text("v1")
s3path1.write_text("v2")
s3path2.write_text("v1")
s3path2.write_text("v2")
s3dir.list_object_versions().all()

[S3Path('s3://s3pathlib-versioning-enabled/tmp/file1.txt'),
 S3Path('s3://s3pathlib-versioning-enabled/tmp/file1.txt'),
 S3Path('s3://s3pathlib-versioning-enabled/tmp/file2.txt'),
 S3Path('s3://s3pathlib-versioning-enabled/tmp/file2.txt')]

In [39]:
s3path.delete(is_hard_delete=True)
s3path.list_object_versions().all()

[]

### Copy

In [40]:
s3path_source = S3Path("s3://s3pathlib/source/data.json")
s3path_source.write_text("this is data")
s3path_target = s3path.change(new_dirname="target")
print(f"Copy {s3path_source.uri} to {s3path_target.uri} ...")
s3path_source.copy_to(s3path_target, overwrite=True)
print(f"content of {s3path_target.uri} is: {s3path_target.read_text()!r}")
print(f"{s3path_source} still exists: {s3path_source.exists()}")

Copy s3://s3pathlib/source/data.json to s3://s3pathlib-versioning-enabled/target/file.txt ...
content of s3://s3pathlib-versioning-enabled/target/file.txt is: 'this is data'
S3Path('s3://s3pathlib/source/data.json') still exists: True


### Move

move is actually copy then delete the original file. It's a shortcut of ``copy_to`` and ``delete``.

In [41]:
s3path_source = S3Path("s3://s3pathlib/source/config.yml")
s3path_source.write_text("this is config")
s3path_target = s3path.change(new_dirname="target")
print(f"Copy {s3path_source.uri} to {s3path_target.uri} ...")
s3path_source.move_to(s3path_target, overwrite=True)
print(f"content of {s3path_target.uri} is: {s3path_target.read_text()!r}")
print(f"{s3path_source} still exists: {s3path_source.exists()}")

Copy s3://s3pathlib/source/config.yml to s3://s3pathlib-versioning-enabled/target/file.txt ...
content of s3://s3pathlib-versioning-enabled/target/file.txt is: 'this is config'
S3Path('s3://s3pathlib/source/config.yml') still exists: False


## Upload File or Folder

In [45]:
# at begin, the file does not exist
s3path = S3Path("s3pathlib", "daily-report.txt")
s3path.exists()

False

In [46]:
# upload a file, then file should exist
from pathlib_mate import Path

# create some test files
path = Path("daily-report.txt")
path.write_text("this is a daily report")
s3path.upload_file(path) # or absolute path as string

s3path.exists()

True

In [47]:
s3path.read_text()

'this is a daily report'

In [48]:
# By default, upload file doesn't allow overwrite, but you can set overwrite as True to skip that check.
try:
    s3path.upload_file(path, overwrite=False)
except Exception as e:
    print(e)

cannot write to s3://s3pathlib/daily-report.txt, s3 object ALREADY EXISTS! open console for more details https://console.aws.amazon.com/s3/object/s3pathlib?prefix=daily-report.txt.


**Upload Folder**

You can easily upload the entire folder to S3. The folder structure will be preserved.

In [50]:
# at begin, the folder does not exist
s3dir = S3Path("s3pathlib", "uploaded-documents/")
s3dir.exists()

False

In [51]:
# create some test files
dir_documents = Path("documents")
dir_documents.joinpath("folder").mkdir(exist_ok=True, parents=True)
dir_documents.joinpath("README.txt").write_text("read me first")
dir_documents.joinpath("folder", "file.txt").write_text("this is a file")

s3dir.upload_dir(dir_documents, overwrite=True)

# inspect s3 dir folder structure
for s3path in s3dir.iter_objects():
    print(s3path)

S3Path('s3://s3pathlib/uploaded-documents/README.txt')
S3Path('s3://s3pathlib/uploaded-documents/folder/file.txt')


## What's Next

With a thorough understanding of all the features provided by s3pathlib, it's time to see how you can use this package to develop applications for production.