# Pure S3 Path Manipulation

## What is Pure S3 Path

A Pure S3 Path is a Python object that represents an AWS S3 bucket, object, or folder. However, it's important to note that a Pure S3 Path object does not make any calls to the AWS API, nor does it imply the existence of the corresponding S3 object. Rather, it's a lightweight abstraction that allows you to work with S3 paths in a Pythonic, object-oriented manner without incurring any network overhead.

In [1]:
from s3pathlib import S3Path

s3path = S3Path("s3://bucket/folder/file.txt")
print(s3path)

S3Path('s3://bucket/folder/file.txt')


## Construct an S3 Path object in Python

### From bucket, and key parts

In a file system, you typically use a file path like ``C:\\Users\username\file.txt`` on Windows or ``/Users/username/file.txt`` on a POSIX system. It's similarly intuitive to construct an S3 Path from a string.

In [2]:
# construct from bucket, key parts
s3path = S3Path("bucket", "folder", "file.txt")
s3path

S3Path('s3://bucket/folder/file.txt')

In [3]:
# construct from full path also works
s3path = S3Path("bucket/folder/file.txt")
s3path

S3Path('s3://bucket/folder/file.txt')

S3 uses ``/`` as a [delimiter to organize and browse your keys hierarchically](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-prefixes.html). With ``s3pathlib``, the delimiter is handled intelligently.

In [4]:
s3path = S3Path("bucket", "/folder/", "/file.txt")
s3path

S3Path('s3://bucket/folder/file.txt')

### From S3 URI

[S3 URI](https://repost.aws/questions/QUFXlwQxxJQQyg9PMn2b6nTg/questions/QUFXlwQxxJQQyg9PMn2b6nTg/what-is-s3-uri-in-simple-storage-service?) is the unique resource identifier within the context of the S3 protocol. They follow this naming convention: ``s3://bucket-name/key-name``. You can create an S3 Path from S3 URI.

In [5]:
s3path = S3Path("s3://bucket/folder/file.txt")
s3path

S3Path('s3://bucket/folder/file.txt')

In [6]:
s3path = S3Path.from_s3_uri("s3://bucket/folder/file.txt")
s3path

S3Path('s3://bucket/folder/file.txt')

### From S3 ARN

[S3 ARN](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-arn-format.html) is the Amazon Resource Name of an S3 resources. They follow this naming convention: ``arn:aws:s3:::bucket_name/key_name``. You can create an S3 Path from S3 ARN.

In [7]:
s3path = S3Path("arn:aws:s3:::bucket/folder/file.txt")
s3path

S3Path('s3://bucket/folder/file.txt')

In [8]:
s3path = S3Path.from_s3_arn("arn:aws:s3:::bucket/folder/file.txt")
s3path

S3Path('s3://bucket/folder/file.txt')

## S3 Path Types

S3 Path is a logical concept that can represent different types of AWS S3 concepts. Here is the list of S3 Path types:

1. üìú **Classic S3 object**: represents an S3 object, such as ``s3://bucket/folder/file.txt``.
2. üìÅ **Logical S3 directory**: represents an S3 directory, such as ``s3://bucket/folder/``.
3. ü™£ **S3 bucket**: represents an S3 bucket, such as ``s3://bucket/``
4. **Void Path**: denotes the absence of any bucket or key, essentially representing a blank slate, no bucket, no key, no nothing.
5. **Relative Path**: represents a path relative to another S3 Path. For example, the relative path from ``s3://bucket/folder/file.txt`` to ``s3://bucket/`` is simply ``folder/file.txt``. A relative path can be joined with another S3 Path to create a new S3 Path. Importantly, any concrete path joined with a void path will result in the original concrete path.
6. **Concrete Path**: represents an S3 Path that refers to a concrete object in the S3 storage system. This includes classic S3 object paths, logical S3 directory paths, and S3 bucket paths. Any concrete path joined with a relative path will result in another concrete path.

### Classic S3 object

Similar to a file on your local laptop, an [S3 object](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingObjects.html) stores your data. At any given moment, it could be just a pointer, and the object doesn't have to exist in S3.

In [9]:
s3path = S3Path("s3://bucket/folder/file.txt")
s3path

S3Path('s3://bucket/folder/file.txt')

In [10]:
s3path.is_file()

True

In [11]:
s3path.is_dir()

False

In [12]:
s3path.is_bucket()

False

In [13]:
s3path.is_void()

False

In [14]:
s3path.is_relpath()

False

### Logical S3 Directory

Since [AWS S3 is an object storage system](https://aws.amazon.com/s3/), not a file system, directories are only a logical concept in AWS S3. AWS uses / as the path delimiter in S3 keys. There are two types of directories in AWS S3:

- Hard directory: When you create a folder in the S3 console, it creates a special object without any content (an empty string) with the ``/`` character at the end of the key. You can see the folder as an object in the [list_objects](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/list_objects_v2.html) API response.
- Soft directory: This type of directory does not actually exist; it is [a virtual concept used to help organize your objects in a folder](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html). For example, if you have an S3 object like ``s3://bucket/folder/file.txt``, then the ``s3://bucket/folder/`` path is a soft folder. Although you can see it in the S3 console, it does not actually exist.

You can create a S3 directory from string, URI, ARN.

In [15]:
s3dir = S3Path("bucket", "folder/")
s3dir

S3Path('s3://bucket/folder/')

In [16]:
s3dir = S3Path("s3://bucket/folder/")
s3dir

S3Path('s3://bucket/folder/')

In [17]:
s3dir = S3Path("arn:aws:s3:::bucket/folder/")
s3dir

S3Path('s3://bucket/folder/')

In [18]:
s3dir = S3Path("bucket", "folder").to_dir()
s3dir

S3Path('s3://bucket/folder/')

You can also use "is XYZ test" methods on S3 directory too.

In [19]:
s3dir.is_dir()

True

In [20]:
s3dir.is_file()

False

In [21]:
s3dir.is_bucket()

False

In [22]:
s3dir.is_void()

False

In [23]:
s3dir.is_relpath()

False

### S3 Bucket

An S3 bucket is a special type of directory that can be thought of as a "root" directory without a key. In other words, it represents the top-level directory of the bucket, and it is both a bucket and a directory in its own right.

In [24]:
s3bkt = S3Path("bucket")
s3bkt

S3Path('s3://bucket/')

In [25]:
s3bkt.is_bucket()

True

In [26]:
s3bkt.is_dir()

True

In [27]:
s3bkt.is_file()

False

In [28]:
s3bkt = S3Path("bucket/folder/file.txt").root
s3bkt

S3Path('s3://bucket/')

### Void Path

While Void path should not be used in your application, it can serve as an indicator that something is wrong if you accidentally attempt to use a Void path to perform an S3 API operation.

In [29]:
s3path = S3Path()
s3path

S3VoidPath()

In [30]:
s3path.is_void()

True

In [31]:
s3path.is_file()

False

In [32]:
s3path.is_dir()

False

In [33]:
s3path.is_bucket()

False

In [34]:
s3path.is_relpath()

True

### Relative Path

Relative paths are very useful for S3 Path calculations. For example, if you want to move all objects in folder ``A`` to another folder ``B``, you can use the relative path from each object ``C`` to ``A`` to calculate the target location in ``B``. Specifically, the target location for each object can be found by joining the relative path from ``C`` to ``A`` with the folder path ``B``. In other words, the formula for the target path is: ``Target = B + (C - A)``.

In [35]:
# The correct way
s3relpath = S3Path("s3://bucket/folder/file.txt").relative_to(S3Path("s3://bucket/folder"))
s3relpath

S3RelPath('file.txt')

In [36]:
# The manual way (NOT RECOMMENDED)
s3relpath = S3Path.make_relpath("file.txt")
s3relpath

S3RelPath('file.txt')

In [37]:
s3path = S3Path("s3://another-bucket/another-folder").to_dir().joinpath(s3relpath)
s3path

S3Path('s3://another-bucket/another-folder/file.txt')

### S3 Path Variable Naming Convention

I recommend the following variable naming convention for different types of S3 Path. So when you read the code, you can easily tell what to expect.

- ``s3path_xyz``: Classic S3 object
- ``s3dir_xyz``: Logical S3 directory
- ``s3bkt_xyz``: S3 bucket
- ``s3void_xyz``: Void Path
- ``s3relpath_xyz``: Relative Path

## S3 Path Attributes

In [38]:
# create an instance
s3path = S3Path("bucket", "folder", "file.txt")

In [39]:
s3path.bucket

'bucket'

In [40]:
s3path.key

'folder/file.txt'

In [41]:
s3path.parts

['folder', 'file.txt']

In [42]:
try:
    s3path.bucket = "new-bucket"
except Exception as e:
    print(e)

can't set attribute S3Path.bucket


In [43]:
s3path.uri

's3://bucket/folder/file.txt'

In [44]:
s3path.arn

'arn:aws:s3:::bucket/folder/file.txt'

In [45]:
s3path.console_url

'https://console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'

In [46]:
s3path.us_gov_cloud_console_url

'https://console.amazonaws-us-gov.com/s3/object/bucket?prefix=folder/file.txt'

In [47]:
s3path.basename

'file.txt'

In [48]:
s3path.fname

'file'

In [49]:
s3path.ext

'.txt'

In [50]:
s3path.dirname

'folder'

In [51]:
s3path.abspath

'/folder/file.txt'

In [52]:
s3path.parent

S3Path('s3://bucket/folder/')

In [53]:
s3path.dirpath

'/folder/'

## S3 Path Methods

### Comparison

Because every ``S3Path`` object corresponds to an S3 URI (except for relative paths), it's often useful to compare these URIs. Therefore, the comparison operator is implemented for ``S3Path``, allowing you to compare one ``S3Path`` to another.

In [54]:
S3Path("bucket/file.txt") == S3Path("bucket/file.txt")

True

In [55]:
S3Path("bucket") == S3Path("bucket")

True

In [56]:
S3Path("bucket1") == S3Path("bucket2")

False

In [57]:
S3Path("bucket1") < S3Path("bucket2")

True

In [58]:
S3Path("bucket1") <= S3Path("bucket2")

True

In [59]:
# right one is a prefix of the left one
S3Path("bucket/a/1.txt") > S3Path("bucket/a/")

True

In [60]:
S3Path("bucket/a/1.txt") < S3Path("bucket/a/2.txt")

True

### Hash

``S3Path`` is hashable. You can use [set](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset) data structure to deduplicate them.

In [61]:
p1 = S3Path("bucket", "1.txt")
p2 = S3Path("bucket", "2.txt")
p3 = S3Path("bucket", "3.txt")
set1 = {p1, p2}
set2 = {p2, p3}

In [62]:
# union
set1.union(set2)

{S3Path('s3://bucket/1.txt'),
 S3Path('s3://bucket/2.txt'),
 S3Path('s3://bucket/3.txt')}

In [63]:
# intersection
set1.intersection(set2)

{S3Path('s3://bucket/2.txt')}

In [64]:
# difference
set1.difference(set2)

{S3Path('s3://bucket/1.txt')}

### Mutate the immutable S3Path

**Copy**

In [65]:
s3path1 = S3Path("bucket", "folder", "file.txt")
s3path2 = s3path1.copy()
s3path2

S3Path('s3://bucket/folder/file.txt')

In [66]:
s3path1 == s3path2

True

In [67]:
s3path1 is s3path2

False

**Change**

In [68]:
s3path = S3Path("bkt", "a", "b", "c.jpg")

In [69]:
# only change the bucket
s3path.change(new_bucket="new-bkt")

S3Path('s3://new-bkt/a/b/c.jpg')

In [70]:
# only change the absolute path
s3path.change(new_abspath="x/y/z.png")

S3Path('s3://bkt/x/y/z.png')

In [71]:
# only change the file extention
s3path.change(new_ext=".png")

S3Path('s3://bkt/a/b/c.png')

In [72]:
# only change the file name
s3path.change(new_fname="ddd")

S3Path('s3://bkt/a/b/ddd.jpg')

In [73]:
# only change the base name (file name + file extension)
s3path_new = s3path.change(new_basename="ddd.png")
s3path_new

S3Path('s3://bkt/a/b/ddd.png')

In [74]:
s3path_new.is_file()

True

In [75]:
# only change the base name, but this time it becomes a folder
s3path_new = s3path.change(new_basename="ddd/")
s3path_new

S3Path('s3://bkt/a/b/ddd/')

In [76]:
s3path_new.is_dir()

True

In [77]:
# only change the dir name
s3path.change(new_dirname="ddd/")

S3Path('s3://bkt/a/ddd/c.jpg')

In [78]:
# only change the dir name
s3path.change(new_dirname="ddd")

S3Path('s3://bkt/a/ddd/c.jpg')

In [79]:
s3path.change(new_dirpath="xxx/yyy/")

S3Path('s3://bkt/xxx/yyy/c.jpg')

**Join**

``S3Path.joinpath`` is a very powerful method.

In [80]:
s3path1 = S3Path("bucket", "folder", "subfolder", "file.txt")
s3path1

S3Path('s3://bucket/folder/subfolder/file.txt')

In [81]:
s3path2 = s3path1.parent
s3path2

S3Path('s3://bucket/folder/subfolder/')

In [82]:
relpath1 = s3path1.relative_to(s3path2)
relpath1

S3RelPath('file.txt')

In [83]:
# join concrete path with a relative path
s3path2.joinpath(relpath1)

S3Path('s3://bucket/folder/subfolder/file.txt')

In [84]:
s3path3 = s3path2.parent
s3path3

S3Path('s3://bucket/folder/')

In [85]:
relpath2 = s3path2.relative_to(s3path3)
relpath2

S3RelPath('subfolder/')

In [86]:
s3path3.joinpath(relpath2, relpath1)

S3Path('s3://bucket/folder/subfolder/file.txt')

In [87]:
s3path3.joinpath("subfolder", "file.txt")

S3Path('s3://bucket/folder/subfolder/file.txt')

In [88]:
# it's OK if you mess up with the "/"
s3path3.joinpath("/subfolder/", "/file.txt")

S3Path('s3://bucket/folder/subfolder/file.txt')

The ``/`` operator provide a syntax sugar for ``joinpath`` method

In [89]:
s3path = S3Path("bucket")
s3path / "file.txt"

S3Path('s3://bucket/file.txt')

In [90]:
s3path / "folder" / "file.txt"

S3Path('s3://bucket/folder/file.txt')

### Calculate Relative Path

In [91]:
S3Path("bucket", "a/b/c").relative_to(S3Path("bucket", "a")).parts

['b', 'c']

In [92]:
S3Path("bucket", "a").relative_to(S3Path("bucket", "a")).parts

[]

In [93]:
# this won't work
try:
    S3Path("bucket", "a").relative_to(S3Path("bucket", "a/b/c")).parts
except Exception as e:
    print(e)

s3://bucket/a does not start with s3://bucket/a/b/c


The ``-`` operator override provide a syntax sugar for ``relative_to`` method.

In [94]:
(S3Path("bucket", "a/b/c") - S3Path("bucket", "a")).parts

['b', 'c']

## What's Next

Now that we have established the basics of working with ``s3pathlib``, let's explore how to use it to interact with the AWS S3 API.