Transfer Data to Amazon S3 Buckets and Access Data Using MATLAB

To work with data in the cloud, you can upload it to Amazon S3™ and then access the data in Amazon S3 from MATLAB® or from workers in your cluster.

Set up Access

To work with remote data in Amazon S3, first set up access by following these steps:

  1. Create an identity and access management (IAM) user using your AWS® root account. For more information, see Creating an IAM User in Your AWS Account.

  2. Generate an access key to receive an access key ID and a secret access key. For more information, see Managing Access Keys for IAM Users.

  3. Specify your AWS access key ID, secret access key, and region of the bucket as system environment variables in your MATLAB Parallel Server™ command window using the setenv (MATLAB) command.

    setenv("AWS_ACCESS_KEY_ID","YOUR_AWS_ACCESS_KEY_ID")
    setenv("AWS_SECRET_ACCESS_KEY","YOUR_AWS_SECRET_ACCESS_KEY")
    setenv("AWS_DEFAULT_REGION","YOUR_AWS_DEFAULT_REGION")
    
    If you are using an AWS temporary token (such as with AWS Federated Authentication), you must specify your session token instead of the region.
    setenv("AWS_SESSION_TOKEN","YOUR_AWS_SESSION_TOKEN")

    To permanently set these environment variables, set them in your user or system environment.

    Before R2020a: Use AWS_REGION instead of AWS_DEFAULT_REGION.

  4. If you are using MATLAB Parallel Server on Cloud Center, configure your cloud cluster to access S3 services.

    After you create a cloud cluster, copy your AWS credentials to your cluster workers. In MATLAB Parallel Server, select Parallel > Create and Manage Clusters. In the Cluster Profile Manager, select your cloud cluster profile. Scroll to the EnvironmentVariables property and add these environment variable names: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION. If you are using AWS temporary credentials, also add AWS_SESSION_TOKEN. For more details, see Set Environment Variables on Workers (Parallel Computing Toolbox).

Upload Data to Amazon S3 from Local Machine

This section shows you how to upload some data sets to Amazon S3 from your local machine. Later sections show you some ways to work with remote image and text data. To obtain these data sets on your local machine, follow these steps.

  • The Example Food Images data set contains 978 photographs of food in nine classes. You can download this data set to your local machine using this command in MATLAB.

    fprintf("Downloading Example Food Image data set ... ")
    filename = matlab.internal.examples.downloadSupportFile('nnet', 'data/ExampleFoodImageDataset.zip');
    fprintf("Done.\n")
    
    unzip(filename,"MyLocalFolder/FoodImageDataset");

  • To obtain the Traffic Signal Work Orders data set on your local machine, use this command.

    fprintf("Downloading Traffic Signal Work Orders data set ... ")
    zipFile = matlab.internal.examples.downloadSupportFile("textanalytics","data/Traffic_Signal_Work_Orders.zip");
    fprintf("Done.\n")
    
    unzip(zipFile,"MyLocalFolder/TrafficDataset");

You can upload data to Amazon S3 by using the AWS S3 web page. For more efficient file transfers to and from Amazon S3, use the command line.

To upload the Example Food Images data set and the Traffic Signal Work Orders data set from your local machine to Amazon S3, follow these steps.

  1. Download and install the AWS Command Line Interface tool. This tool supports commands specific to AWS in your MATLAB command window.

  2. Create a bucket for your data using the following command in your MATLAB command window. Replace MyCloudData with the name of your Amazon S3 bucket.

    !aws s3 mb s3://MyCloudData

  3. Upload your data using the following command in your MATLAB command window.

    !aws s3 cp mylocaldatapath s3://MyCloudData --recursive

    For example, to upload the Example Food Images data set from your local machine to your Amazon S3 bucket, use this command.

    !aws s3 cp MyLocalFolder/FoodImageDataset s3://MyCloudData/FoodImageDataset/ --recursive

    To upload the Traffic Signal Work Orders data set from your local machine to your Amazon S3 bucket, use this command.

    !aws s3 cp MyLocalFolder/TrafficDataset s3://MyCloudData/TrafficDataset/ --recursive

Access Data from Amazon S3 in MATLAB

After you store your data in Amazon S3, you can use Data Import and Export (MATLAB) functions in MATLAB to read or write data from the Amazon S3 bucket in MATLAB. MATLAB functions that support a remote location in their filename input arguments allow access to remote data. To check if a specific function allows remote access, refer to its function page.

For example, you can use imread (MATLAB) to read images from an Amazon S3 bucket. Replace s3://MyCloudData with the URL of your Amazon S3 bucket.

  1. Read an image from Amazon S3 using the imread (MATLAB) function.

    img = imread("s3://MyCloudData/FoodImageDataset/french_fries/french_fries_90.jpg");

  2. Display the image using the imshow (MATLAB) function.

    imshow(img)

To write data into the Amazon S3 bucket, you can similarly use Data Import and Export (MATLAB) functions which support write access to remote data. To check if a specific function allows remote access, refer to its function page.

Read Data from Amazon S3 in MATLAB Using Datastores

For large data sets in Amazon S3, you can use datastores to access the data from your MATLAB client or your cluster workers. A datastore is a repository for collections of data that are too large to fit in memory. Datastores allow you to read and process data stored in multiple files on a remote location as a single entity. For example, use an imageDatastore (MATLAB) to read images from an Amazon S3 bucket. Replace s3://MyCloudData with the URL of your Amazon S3 bucket.

  1. Create an imageDatastore object that points to the URL of the Amazon S3 bucket.

    imds = imageDatastore("s3://MyCloudData/FoodImageDataset/", ...
     IncludeSubfolders=true, ...
     LabelSource="foldernames");
  2. Read the first image from Amazon S3 using the readimage (MATLAB) function.

    img = readimage(imds,1);

  3. Display the image using the imshow (MATLAB) function.

    imshow(img)

To use datastores to read files or data of other formats, see Getting Started with Datastore (MATLAB).

For a step-by-step example that shows how to train a convolutional neural network using data stored in Amazon S3, see Train Network in the Cloud Using Automatic Parallel Support (Deep Learning Toolbox).

Write Data to Amazon S3 from MATLAB Using Datastores

You can use datastores to write data from MATLAB or cluster workers to Amazon S3. For example, follow these steps to use a tabularTextDatastore (MATLAB) object to read tabular data from Amazon S3 into a tall array, preprocess it, and then write it back to Amazon S3.

  1. Create a datastore object that points to the URL of the Amazon S3 bucket.

    ds = tabularTextDatastore("s3://MyCloudData/TrafficDataset/Traffic_Signal_Work_Orders.csv");
    
  2. Read the tabular data into a tall array and preprocess it by removing missing entries and sorting the data.

    tt = tall(ds);
    tt = sortrows(rmmissing(tt));

  3. Write the data back to Amazon S3 using the write (MATLAB) function.

    write("s3://MyCloudData/TrafficDataset/preprocessedData/",tt);
    

  4. To read your tall data back, use the datastore (MATLAB) function.

    ds = datastore("s3://MyCloudData/TrafficDataset/preprocessedData/");
    tt = tall(ds);
    

To use datastores to write files or data of other formats, see Getting Started with Datastore (MATLAB).

Related Topics