Skip to main content

Example: Integrating data across multiple S3 accounts

This example illustrates how to create an API that merges data from two separate S3 buckets each under separate AWS accounts.

info

If you are not familiar with RAW we recommend checking out our Getting started guide first. To use RAW, you need an account which you can create and use for free here.

We have been given two sets of credentials for two different S3 buckets, each of which contains JSON files of a specific format.

info

If you want to try this example, you can deploy the following endpoint:

Read dta from S3 buckets across separate AWS accounts
Learn how to serve data live from two S3 buckets across two separate AWS accounts.

Usage:

/aws/s3/buckets

Here's for example a file found in bucket s3://log-server-a.

{
"creation_date": "2022-04-01",
"entries": [{"hostname": "host01"}, {"hostname": "host02"}]
}

Here's a file found in the second bucket s3://log-server-b.

{
"creation_date": "2022-04-03",
"entries": [{"hostname": "host95"}, {"hostname": "host96"}, {"hostname": "host97"}]
}

We're interested in the content of the entries field of these JSON files. Our goal is to read every JSON file across both buckets and merge their entries lists into a single one.

["host01","host02", ...., "host95","host96","host97"]

The code executed by the REST API works as follows.

The read_logs function computes the list of all hostnames found in a given bucket.

read_logs(bucket: string, path: string, aws_config: record(region: string, accessKey: string, secret: string)) =
let
// list all files of the bucket path
bucketLocation = S3.Build(
bucket,
path,
region = aws_config.region,
accessKey = aws_config.accessKey,
secretKey = aws_config.secret
),
files = Location.Ls(bucketLocation),
// open each file as JSON
contents = List.Transform(files, f -> Json.Read(f, json_type)),
// `Explode` the entries field
entries = List.Explode(contents, c -> c.entries)
in
// project only the 'hostname' column to obtain the expected list of strings
entries.hostname

read_logs is called on both s3://log-server-a and s3://log-server-b with the corresponding set of credentials.

let
awsAccountA = {region: "eu-west-1", accessKey: "<access-key-for-a>", secret: "<secret-for-a>"},
awsAccountB = {region: "eu-west-1", accessKey: "<access-key-for-b>", secret: "<secret-for-b>"}
// Union the lists returned by `read_logs` for both buckets/accounts.
in List.Union(
read_logs("log-bucket-a", "/*.json", awsAccountA),
read_logs("log-bucket-b", "/*.json", awsAccountB)
)
warning

Never store sensitive information as clear text in the code. Instead use secrets, which are key/value pairs that are stored securely outside of the source code. Secrets can be accessed using the built-in function Environment.Secret.

Ready to try it out?

Register for free and start building today!

Otherwise, if you have questions/comments, join us in our Community!