Accessing data from data lakes
Learn how to access data stored in data lakes.
info
If you want to try this example, you can deploy the following endpoint:
Read dta from S3 buckets across separate AWS accounts
Learn how to serve data live from two S3 buckets across two separate AWS accounts.
- Overview
- Code
Usage:
/aws/s3/buckets
main() =
let
// The type of each JSON file
json_type = type record(creation_date: string, entries: list(record(hostname: string))),
// Function returning the concatenated list of hostnames for a specific path & aws_config
read_logs(path: string, aws_config: record(region: string, accessKey: string, secret: string)) =
let
// list all files of the bucket path
bucket = S3.Build(
path,
region = aws_config.region,
accessKey = aws_config.accessKey,
secretKey = aws_config.secret
),
files = Location.Ls(bucket),
// open each file as JSON
contents = List.Transform(files, (f) -> Json.Read(f, json_type)),
// `Explode` the entries field
entries = List.Explode(contents, (c) -> c.entries)
in
// project only the 'hostname' column to obtain the expected list of strings
entries.hostname
in
let
awsAccountA = {region: "eu-west-1", accessKey: "AKIAZ6SK5NCTDAAESLXU", secret: "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"},
awsAccountB = {region: "eu-west-1", accessKey: "AKIAZ6SK5NCTLPK7QE4N", secret: "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"}
in
// Union the lists returned by `read_logs` for both buckets/accounts.
List.Union(
read_logs("s3://log-server-a/*.json", awsAccountA),
read_logs("s3://log-server-b/*.json", awsAccountB)
)
Introduction
Snapi supports reading data directly from S3.
To do so, you must first build a location and then read the file. For example, the following code reads a CSV file from S3:
let
location = S3.Build("s3://my-bucket/data.csv", accessKey = "<AWS ACCESS KEY>", secretKey = "<AWS SECRET KEY>")
in
Csv.InferAndRead(location)
You can use wildcards in the S3 location.
note
If you want to access data from other data lakes that are not yet supported, reach out to us.