Copying buckets

Moving content in an s3 bucket to another bucket

AWS S3 Sync

Amazon S3 will copy the objects directly between buckets without objects being downloaded to your local machine. Useful when.

You want to copy data from one bucket to another
You want to avoid an intermedidate download/upload step
You want s3 data to be available in another region
Importing public datasets to a private bucket that you control
You want to backup an S3 bucket by copying its contents to another bucket

aws s3 sync s3://origin/path s3://destination/path

For more information see the Aws S3 Sync Documentation

Parallel

If you're copying large amounts of data, you can start multiple aws s3 sync processes from different shells. See blow

# sync files starting with "2014"
aws s3 sync \
    --exclude "*" \
    --include "2014*"
    s3://origin/path \
    s3://destination/path

# sync files starting with "2015"
aws s3 sync \
    --exclude "*" \
    --include "2015*"
    s3://origin/path \
    s3://destination/path

The --exclude and --include parameters are processed on the client side, and can cause problems if you're trying to move a huge amount of (small) files.

You can speed up the transfer by changing --max_concurrent_requests (from default 10, to 50 or so) in your AWS cli configuration.

See: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html

More performant (Managed)

Suggested for larger than ~1TB transfers:

s3-dist-cp (EMR)

s3-batch (Batch Job)

PreviousS3 NextMap-Reduce

Last updated 5 years ago

Was this helpful?