Accessing Data on the S3 Bucket

The data by the Empatica Health Monitoring Platform is securely stored and encrypted within AWS Cloud Services, particularly within an Amazon S3 bucket, later referred to as S3 Bucket.

There are multiple ways to access the data stored in the S3 Bucket:

File Transfer Tools: This option is user-friendly and allows you to easily move data between the S3 Bucket and your computer. Cyberduck is our recommended choice, as it's free, open-source, and compatible with both Windows and macOS.
AWS CLI v2 (Command Line Interface): With minimal programming/scripting requirements, AWS CLI v2 provides a command-line interface that enables automation and seamless script integration. It might be a bit more technical than your typical file transfer tools, but it comes with added control and flexibility.
AWS SDKs (Software Development Kit): For a more involved and integrated approach, AWS offers multiple SDKs. These SDKs are designed for various programming languages like Python, Java, JavaScript, and more. They allow for programmatic access to Amazon S3, making it possible to integrate data retrieval and manipulation into your applications. This option is best suited for scenarios where you need advanced control and scalability for managing large amounts of data.

Using any of the methods mentioned above, access to the S3 Bucket is authorized through a set of Data Access Keys. These keys are managed via the Care Portal and must be kept confidential.

You can create a set of Data Access Keys directly from the Data Access Keys page of the Care Portal: Generating and Revoking Data Access Keys

Upon completing the request process, the Data Access Keys will be displayed in the portal, and you will also have the option to download them in a CSV file format. This is what you’ll find in this CSV file:

ACCESS KEY ID
This is the AWS Access Key ID to be used when connecting to the S3 Bucket.
SECRET ACCESS KEY
This is the AWS Secret Access Key to be used when connecting to the S3 Bucket.
S3 ACCESS URL
This is the organization-specific location where your data is stored. It will look like this:
```
s3://bucket-name/version/dir-name/
```

Access Keys are Confidential: Always treat access keys as secret. Never publish or expose them in any public medium.
Versioning Systems: If you use a versioning system (like Git), ensure that access keys are never saved in plain text or committed. Consider using environment variables or secrets management tools.
Exposing keys can lead to data breaches and unauthorized access.

Accessing the data with Cyberduck

Cyberduck is a desktop application available for Windows and macOS, enabling easy access to your data.
It can be freely downloaded from here: Download Cyberduck

In order to facilitate frequent access to the S3 Bucket, a connection bookmark in Cyberduck needs to be created.

Using the 'Open Connection' button does not allow to configure some necessary fields required for accessing the S3 Bucket.

Click on the + icon on the bottom left of the window.
A pop-up window will appear, where the connection to the S3 Bucket has to be configured as follows:
Select Amazon S3 as the connection type.
Give it a Nickname that makes it easy to identify the connection, e.g. Empatica Data Sandbox or Empatica Data Bucket.
Add your Access Key ID and Secret Access Key.
Expand the More Options section and enter the Path.
Cyberduck requires a Path field without the leading s3://
e.g. If S3 Access URL is s3://bucket-name/version/dir-name/, you have to remove the initial s3:// and Path will be bucket-name/version/dir-name/

Make sure that path does not start with s3:// and does end /

The configuration is automatically saved. You can now close the window, and access to the S3 Bucket is ready. If you ever need to change the Access Key ID and Secret Access Key, simply click on the bookmark once and select the pencil icon from the bottom-left corner.

From now on, you can easily access the S3 Bucket by double-clicking on the created bookmark.

If the connection is successful, a new window will open, granting you access to browse through the folder structure.

Cyberduck will display an error message if no data has been uploaded yet, indicating that the S3 Data Bucket is empty.

You can download the data by simply dragging and dropping it to a folder on your computer.

Please note that S3 Bucket access configuration steps may vary when using other file transfer tools, such as WinSCP. For specific instructions, refer to their respective documentation.

Accessing the data with AWS CLI

AWS CLI v2 (Command Line Interface) is a command-line tool that enables you to manage AWS services directly from your computer's terminal or command prompt. It facilitates interactions with AWS resources and automates various tasks, including listing data files and synchronizing entire folders.

Empatica is not affiliated with nor supporting AWS CLI v2 products.

It can be freely downloaded from here: Install the latest version of the AWS CLI

The following procedure shows an example of how to use AWS CLI v2 on macOS. Further details on how to use it with S3 are available in the official documentation provided by Amazon AWS: Using Amazon S3 with the AWS CLI - AWS Command Line Interface

Credentials Setup

Start by configuring your access credentials using the AWS CLI configure command. You will then be prompted to provide the Access Key ID and Secret Access Key when required. You can leave the Default region name and Default output format empty if desired.

Open a Terminal and enter
```
aws configure
```

Enter the Access Key ID and Secret Access Key, leave region and output format empty

AWS Access Key ID [None]: <YOUR_ACCESS_KEY_ID>
AWS Secret Access Key [None]: <YOU_SECRET_ACCESS_KEY_ID>
Default region name [None]: <LEAVE_EMPTY>
Default output format [None]: <LEAVE_EMPTY>

For additional information on AWS credentials, please refer to the AWS official documentation.

Sync files locally

To keep a local folder synchronized with the contents of the S3 Bucket, use the 'sync' command. This ensures that all folders and files from the S3 Bucket are copied and updated in a local folder of your choice. Existing files with no changes are retained, while files no longer present in S3 are deleted.

Set ACCESS_URL and the output LOCAL_PATH as environment variables

export ACCESS_URL=<insert your S3 Access URL here>

# LOCAL PATH - The local folder where to store files retrieved from S3
export LOCAL_PATH=<insert a local path here>

Run the following command to sync the files locally:
```
aws s3 sync ${ACCESS_URL} ${LOCAL_PATH}
```

You need to run this command each time you want to update your local folder. The sync time may vary depending on the number of files in your S3 Bucket.

Accessing the data with AWS SDK

Amazon Web Services (AWS) offers Software Development Kits (SDKs) for multiple programming languages, providing developers with a unified and consistent interface to interact with the S3 Bucket.

Empatica is not affiliated with nor endorsing AWS SDK products.

It offers support for multiple programming languages and can be integrated into various types of applications, but it requires advanced programming skills.

The utilization of an SDK depends on your chosen programming language:
JavaScript: https://aws.amazon.com/sdk-for-javascript/
Python: https://aws.amazon.com/sdk-for-python/
PHP: https://aws.amazon.com/sdk-for-php/
.NET: https://aws.amazon.com/sdk-for-net/
Ruby: https://aws.amazon.com/sdk-for-ruby/
Java: https://aws.amazon.com/sdk-for-java/
Go: https://aws.amazon.com/sdk-for-go/
C++: https://aws.amazon.com/sdk-for-cpp/

Boto3 example script

While we can't offer a universal example due to language-specific syntax, here's an example using the Python SDK, boto3, to list all objects in the S3 Bucket. Make sure to set your AWS credentials as demonstrated in 'Accessing the data with AWS CLI'.

The values for BUCKET_NAME and PREFIX are taken from the S3 Access URL available in your access keys CSV.

In the example below, the S3 Access URL is: s3://empatica-us-east-1-prod-data/v2/001/
BUCKET_NAME is the first part of the URL, right after s3://
PREFIX is the remaining part of the URL without the initial /

Depending on your plan, the S3 Access URL might differ from the one in the example.
Ensure BUCKET_NAME does not contain s3:// and PREFIX does end with a /

import boto3

BUCKET_NAME = "empatica-us-east-1-prod-data"
PREFIX = "v2/001/"

s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(BUCKET_NAME)
bucket.objects.all()
for my_bucket_object in bucket.objects.filter(Prefix = PREFIX):
    print(my_bucket_object.key)

When running this script, the terminal prints each object present in your S3 Data Bucket.

Credentials Setup

Sync files locally

Boto3 example script

Articles in this section