Example: Integrating data across multiple S3 accounts
This example illustrates how to create an API that merges data from two separate S3 buckets each under separate AWS accounts.
If you are not familiar with RAW we recommend checking out our Getting started guide first. To use RAW, you need an account which you can create and use for free here.
We have been given two sets of credentials for two different S3 buckets, each of which contains JSON files of a specific format.
If you want to try this example, you can deploy the following endpoint:
Read dta from S3 buckets across separate AWS accounts
- Overview
- Code
Usage:
/aws/s3/buckets
main() =
let
// The type of each JSON file
json_type = type record(creation_date: string, entries: list(record(hostname: string))),
// Function returning the concatenated list of hostnames for a specific path & aws_config
read_logs(bucket: string, path: string, aws_config: record(region: string, accessKey: string, secret: string)) =
let
// list all files of the bucket path
bucket = S3.Build(
bucket,
path,
region = aws_config.region,
accessKey = aws_config.accessKey,
secretKey = aws_config.secret
),
files = Location.Ls(bucket),
// open each file as JSON
contents = List.Transform(files, (f) -> Json.Read(f, json_type)),
// `Explode` the entries field
entries = List.Explode(contents, (c) -> c.entries)
in
// project only the 'hostname' column to obtain the expected list of strings
entries.hostname
in
let
awsAccountA = {
region: "eu-west-1",
accessKey: "AKIAZ6SK5NCTDAAESLXU",
secret: "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"
},
awsAccountB = {
region: "eu-west-1",
accessKey: "AKIAZ6SK5NCTLPK7QE4N",
secret: "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"
}
in
// Union the lists returned by `read_logs` for both buckets/accounts.
List.Union(
read_logs("log-server-a", "/*.json", awsAccountA),
read_logs("log-server-b", "/*.json", awsAccountB)
)
Here's for example a file found in bucket s3://log-server-a
.
{
"creation_date": "2022-04-01",
"entries": [{"hostname": "host01"}, {"hostname": "host02"}]
}
Here's a file found in the second bucket s3://log-server-b
.
{
"creation_date": "2022-04-03",
"entries": [{"hostname": "host95"}, {"hostname": "host96"}, {"hostname": "host97"}]
}
We're interested in the content of the entries
field of these JSON files. Our
goal is to read every JSON file across both buckets and merge their entries
lists into a single one.
["host01","host02", ...., "host95","host96","host97"]
The code executed by the REST API works as follows.
The read_logs
function computes the list of all hostnames found in a given bucket.
read_logs(bucket: string, path: string, aws_config: record(region: string, accessKey: string, secret: string)) =
let
// list all files of the bucket path
bucketLocation = S3.Build(
bucket,
path,
region = aws_config.region,
accessKey = aws_config.accessKey,
secretKey = aws_config.secret
),
files = Location.Ls(bucketLocation),
// open each file as JSON
contents = List.Transform(files, f -> Json.Read(f, json_type)),
// `Explode` the entries field
entries = List.Explode(contents, c -> c.entries)
in
// project only the 'hostname' column to obtain the expected list of strings
entries.hostname
read_logs
is called on both s3://log-server-a
and s3://log-server-b
with the corresponding set of credentials.
let
awsAccountA = {region: "eu-west-1", accessKey: "<access-key-for-a>", secret: "<secret-for-a>"},
awsAccountB = {region: "eu-west-1", accessKey: "<access-key-for-b>", secret: "<secret-for-b>"}
// Union the lists returned by `read_logs` for both buckets/accounts.
in List.Union(
read_logs("log-bucket-a", "/*.json", awsAccountA),
read_logs("log-bucket-b", "/*.json", awsAccountB)
)
Never store sensitive information as clear text in the code.
Instead use secrets, which are key/value pairs that are stored securely outside of the source code.
Secrets can be accessed using the built-in function Environment.Secret
.
Ready to try it out?
Register for free and start building today!Otherwise, if you have questions/comments, join us in our Community!