Transfer Data to Amazon S3 Buckets and Access Data Using MATLAB Datastore
To work with data in the cloud, you can upload to Amazon S3™ and then use datastores to access the data in S3 from MATLAB® or from workers in your cluster.
Set up Access
To work with remote data in Amazon S3, set up access first:
Create an IAM (Identity and Access Management) user using your AWS® root account. For more information, see Creating an IAM User in Your AWS Account.
Generate an access key to receive an access key ID and a secret access key. For more information, see Managing Access Keys for IAM Users.
Specify your AWS Access Key ID, Secret Access Key, and Region (or Session Token if you are using an AWS temporary token) of the bucket as system environment variables in your MATLAB command window using the
setenv
(MATLAB) command:setenv("AWS_ACCESS_KEY_ID","YOUR_AWS_ACCESS_KEY_ID") setenv("AWS_SECRET_ACCESS_KEY","YOUR_AWS_SECRET_ACCESS_KEY") setenv("AWS_DEFAULT_REGION","YOUR_AWS_DEFAULT_REGION")
To permanently set these environment variables, set them in your user or system environment.
Before R2020a: For MATLAB releases prior to R2020a, use
AWS_REGION
instead ofAWS_DEFAULT_REGION
.If you are using MATLAB Parallel Server™ on Cloud Center, configure your cloud cluster to access S3 services.
After creating a cloud cluster, copy your AWS credentials to your cluster workers. In MATLAB, select Parallel > Create and Manage Clusters. In the Cluster Profile Manager, select your cloud cluster profile. Scroll to the
EnvironmentVariables
property and add (environment variable name only) AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION. If you are using AWS temporary credentials, also add AWS_SESSION_TOKEN. For more details, see Set Environment Variables on Workers (Parallel Computing Toolbox).
Upload Data to Amazon S3 from Local Machine
You can upload data to Amazon S3 by using the AWS S3 web page. However, for efficient file transfers to and from Amazon S3, use the command line. Follow these steps to upload data:
Download and install the AWS Command Line Interface tool. This tool allows you to use commands specific to AWS in your MATLAB command window.
Create a bucket for your data using the following command in your MATLAB command window:
!aws s3 mb s3://mynewbucket
Upload your data using the following command in your MATLAB command window:
For example, to upload the CIFAR-10 images data set from your local machine to Amazon S3, use this command:!aws s3 cp mylocaldatapath s3://mynewbucket --recursive
!aws s3 cp path/to/cifar10/in/the/local/machine s3://MyExampleCloudData/cifar10/ --recursive
Read Data from Amazon S3 in MATLAB
After you store your data in Amazon S3, you can use datastores to access the data from
your MATLAB or your cluster workers. For example, the following code shows how to use an
imageDatastore
to read images from an S3 bucket. Replace
's3://MyExampleCloudData/cifar10'
with the URL of your S3 bucket.
Create an
imageDatastore
object pointing to the URL of the S3 bucket.imds = imageDatastore('s3://MyExampleCloudData/cifar10',... IncludeSubfolders=true, ... LabelSource="foldernames");
Read a specified image from Amazon S3 using the
readimage
function. For example:img = readimage(imds, 1);
Display the image to screen using the
imshow
function in your desktop client MATLAB.imshow(img)
You can use an imageDatastore
to read data from the cloud
in your desktop client MATLAB, or when running code on your cluster workers. For details,
see Work with Remote Data (MATLAB).
For a step-by-step example showing how to train a convolutional neural network using data stored in Amazon S3, see Train Network in the Cloud Using Automatic Parallel Support (Deep Learning Toolbox).
Write Data to Amazon S3 from MATLAB
You can use datastores to write data from MATLAB or cluster workers to Amazon S3. The following example shows how to use a
tabularTextDatastore
object to read tabular data from Amazon S3 into a tall array, preprocess it, and then write it back to Amazon S3.
Create a datastore object pointing to the URL of the S3 bucket.
ds = tabularTextDatastore('s3://bucketname/dataset/airlinesmall.csv', ... 'TreatAsMissing', 'NA', 'SelectedVariableNames', {'ArrDelay'});
Read the tabular data into a tall array and preprocess it by removing missing entries and sorting.
tt = tall(ds); tt = sortrows(rmmissing(tt));
Write the data back to Amazon S3 using the
write
function.write('s3://bucketname/preprocessedData/',tt);
To read your tall data back, use the
datastore
function.ds = datastore('s3://bucketname/preprocessedData/'); tt = tall(ds);
To use datastores to read and write files or data of other formats, see Getting Started with Datastore (MATLAB).