Producing Complex Data

In the previous notebook we saw how to query complex data and started exploring RAW's data model.

In this notebook we continue this exploration by showing how RAW queries can also output complex data structures, which will be particularly useful when we see how to export RAW results into formats like JSON or XML.

Collections in the output field of a SELECT

We start by the following query:

Recall that products is a "nested table".

Let's analyze the following query:

This query returns two rows: one for CH, another for US.

The first column is the country, and the second column is a list of the cost of the products in that country.

How does this work?

The inner SELECT contains (SELECT p.cost FROM sale.products AS p). We can think of this as a normal query, over a table called sale.products. THe output of that query is a table with the cost of each product.

When we compose it in a single query:

SELECT sale.country, (SELECT p.cost FROM sale.products AS p) AS products_cost
FROM READ("https://raw-tutorial.s3.amazonaws.com/sales.json") AS sale

... then the result is a table, where the second column contains another table.

If we were to represent the output as JSON, it would look like:

[
  {"country": "CH",
   "products_cost": [50, 70, 450]},
  {"country": "US",
   "products_cost": [20, 200]}
]

To further demonstrate that SELECTs are just operations over collections of data, let's add a filter to the inner SELECT:

This query filtered the products in the inner SELECT for those that cost > 60.

We can even do aggregations:

This query counts the number of products in each country that cost over 60.

Extensions to GROUP BY

We start by a traditional aggregation in SQL.

This query lists the number of airports per country.

In RAW, however, GROUP BY produces "groups" that be queried.

When the GROUP BY keyword is used, the * is bound to the group.

To query the entire "group" for a given country - i.e. the airports in each country - we can do:

THE * is a "nested table" containing all rows in the group defined by the GROUP BY clause.

In this example the * is all the airports in a given Country, since the query does GROUP BY Country.

Since * is a table, we can query it as normally:

... or even ...

This groups the airports by Country, and then by City.

The COUNT(*) in the inner SELECT refers to the groups created by GROUP BY City.

Top-Level Collections

Let's look in more detail at the output of the following queries:

This query returns a list of countries. Each row in the output has the column name.

If we were to visualize the output as JSON, it would be:

[
  {"name": "CH"},
  {"name": "US"}
]

Now the following query:

... appears similar but note that AS alias is not included.

The output is different: each row is in fact a string. There is not record.

If we were to visualize the output as JSON, it would be:

  ["CH", "US"]

We can confirm this by asking the output type of the query, with the RAW Jupyter magic %%query_validate.

Note that the first example returns collection(record(name: string)), which is RAW's type representation for a collection of records, each with a single field name of type string.

The second returns collection(string), which is RAW's type representation for a collection of strings.

Top-Level Records

The syntax (field1: "One", field2: 1) is used to create a record with two fields: field1, a string with value "One", and field2, a integer with value 1.

Collections and Records can be nested in RAW, so the following is a valid query:

If we again use %%query_validate to see the output type:

Therefore we confirm the output of this query is a record with two fields: Countries, a collection of strings, and Number_of_Airports, a long.

Next: Advanced Data Discovery