The global data ecosystem has grown faster over the last decade and now it has become a little challenging to select prime data technology. With more than 32% of the world’s public cloud share, Amazon Web Services (AWS) is the leader in this space. It serves almost 190 countries with scalability, durability, and security. Since its inception, S3 storage has become an internal part of thousands of companies for data storage and data management.
The Amazon Simple Storage Service (S3) is a cloud storage solution provided by Amazon Web Services (AWS). With a key-based object storage architecture, Amazon S3 is well suited for storing massive amounts of structured and unstructured data. Unlike the operating systems we are all familiar with, Amazon S3 does not store files in a file system and instead of that, it stores files as objects. Object Storage allows users to upload files the same as the other popular cloud storage products like Dropbox and Google Drive.
Recommended Article: Azure vs AWS Which Works Best for Serverless Architecture
What is Amazon S3 Transfer Manager?
Transfer Manager is considered one of the significant APIs inside the AWS SDK (amazon web servicesoftware development kit). It provides easy and convenient management for uploads and downloads between your application and Amazon S3. It hides the complex process of transferring files behind a simple API. Transfer Manager performs two operations, i.e. upload and download. From there, you can upload and download objects to interact with your data transfers.
Whenever possible, Transfer Manager tries to use a couple of threads to upload multiple parts of a single upload at once. When dealing with massive data sets, this can have a significant increase in productivity. Transfer Manager is present on top of the Java bindings of the AWS Common Runtime S3 client.
Parallel Upload via Multipart Upload
Multipart Upload offers you to upload a single object into small parts. You can upload object parts independently in any order and after all parts are uploaded, Amazon S3 presents the data as a single object. For instance, when your object size reaches 100 MB, you should use multipart instead of a single operation because this allows you to create parallel uploads.
Transfer Manager uses Amazon S3 multipart upload API for upload operation; it converts one single PutObjectRequest to multiple MultiPartUpload requests and then sends these requests simultaneously to achieve more durability and high performance.
Parallel Download via Byte-Range Fetches
Transfer Manager utilizes byte-range fetches for download operations. By using the Range HTTP header in a GET object request, you can fetch a byte range from an object for transferring only the desired portion. For instance, it splits a GetObjectRequest into multiple smaller requests, each of which retrieves a specific portion of the object. This helps you achieve high performance as compared to a single whole-object request. Fetching a smaller portion of a large object also allows your application to improve retry times when requests are interrupted.
If you are uploading an object as a single object while working with the transfer manager 1. x, the transfer manager will not be able to increase the downloading speed. To increase the downloading speed in transfer manager 1. x, an object must be uploaded using multipart upload. This is no longer a limitation in the transfer manager 2. x. With transfer manager 2.x, downloading an object does not depend on how the object was originally uploaded.
Getting Started
- Add a dependency for the Transfer Manager
First, include the separate dependency in the project.
XML
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3-transfer-manager</artifactId>
<version>2.17.123-PREVIEW</version>
</dependency>
- Instantiate the Transfer Manager
You can instantiate the Transfer Manager easily using the default settings
Java:
S3TransferManager transferManager = S3TransferManager.create();
- Upload a file to Amazon S3
For uploading a file to Amazon S3, you need to provide a file path along with PutObjectRequest that should be used for the upload.
Java:
FileUpload upload = transferManager.uploadFile(b -> b.source(Paths.get(“myFile.txt”))
.putObjectRequest(req -> req.bucket(“bucket”)
.key(“key”)));
upload.completionFuture().join();
- Download an Amazon S3 Object to a File
For downloading an object in Amazon S3 you need to provide the destination file path along with the GetObjectRequest that should be used for the download.
Java:
FileDownload download =
transferManager.downloadFile(b -> b.destination(Paths.get(“myFile.txt”))
.getObjectRequest(req -> req.bucket(“bucket”)
.key(“key”)));
download.completionFuture().join();
Conclusion:
Customers of all sizes and industries can use Amazon S3 to store and protect any amount of data for a range of use cases, such as data analytics, data lakes, backup, restore, and much more. Transfer manager 2. x is better than transfer manager 1.x in many ways. You can check the developer guide and source code on Github of Transfer Manager for the AWS SDK for Java 2. x for complete documentation.