Skip to main content

Complex Datasets

Learn how to handle complex data.

Let's assume a JSON document, which contains product costs by country. This document contains nested collections: there is a list of country/products entries, and then within each products field, there is a list of category/cost entries. This is a typical example of nested data, where collections can contain other collections nested inside them.

[
{
"country": "CH",
"products": [ { "category": "Keyboard", "cost": 50 }, { "category": "Keyboard", "cost": 70 }, { "category": "Monitor", "cost": 450 } ]
},
{
"country": "US",
"products": [ { "category": "Keyboard", "cost": 20 }, { "category": "Monitor", "cost": 200 } ]
}
]

Sometimes we need to bring items of a nested collection to the top. This is called unnesting a collection.

info

If you want to try this example, you can deploy the following endpoint:

Tutorial 9 - Handling nested data in a collection
Learn how to process data in a collection with other collections inside them.

Sample usage:

/tutorial/9-explode?maxCost=<int>

For instance, to obtain the list products that cost less than 300:

/tutorial/9-explode?maxCost=300

For instance, suppose we want to "move" the fields category and cost of the inner collection items to the parent. Doing so would lead to a collection of flat records, such as:

countrycategorycost
CHKeyboard50
CHKeyboard70
CHMonitor450
USKeyboard20
USMonitor200

To accomplish this, use Collection.Explode for collections, or List.Explode for list. These functions receives two arguments: the collection or list to process, and a function with the inner collection to "explode" to the top-level.

For instance, assume the following code:

let items = Json.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/sales.json")
in Collection.Explode(items, i -> i.products)

This leads to the original dataset augmented with the columns of the nested collection (category, cost):

[
{
"country": "CH",
"products": [ { "category": "Keyboard", "cost": 50 }, { "category": "Keyboard", "cost": 70 }, { "category": "Monitor", "cost": 450 } ],
"category": "Keyboard",
"cost": 50
},
{
"country": "CH",
"products": [ { "category": "Keyboard", "cost": 50 }, { "category": "Keyboard", "cost": 70 }, { "category": "Monitor", "cost": 450 } ],
"category": "Keyboard",
"cost": 70
},
{
"country": "CH",
"products": [ { "category": "Keyboard", "cost": 50 }, { "category": "Keyboard", "cost": 70 }, { "category": "Monitor", "cost": 450 } ],
"category": "Monitor",
"cost": 450
},
{
"country": "US",
"products": [ { "category": "Keyboard", "cost": 20 }, { "category": "Monitor", "cost": 200 } ],
"category": "Keyboard",
"cost": 20
},
{
"country": "US",
"products": [ { "category": "Keyboard", "cost": 20 }, { "category": "Monitor", "cost": 200 } ],
"category": "Monitor",
"cost": 200
}
]

Note that category and cost are now part of the top-level collection.

The function passed as an argument to Collection.Explode can apply arbitary computations. For instance, in the following example the products inner collections are filtered by item price:

main(maxCost: int) =
let items = Json.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/sales.json"),
unnested = Collection.Explode(items, i -> Collection.Filter(i.products, p -> p.cost <= maxCost))
in Collection.Transform(unnested, i -> { i.country, i.category, i.cost })