SELECT

SELECT is similar to SQL’s SELECT statement.

Syntax:

SELECT [ ALL | DISTINCT ] exp [AS idn], ...
FROM [ [ exp [AS idn]] | [ idn = exp ] ], ...
[ WHERE exp ]
[ GROUP BY exp [AS idn], ... ]
[ HAVING exp ]
[ ORDER BY exp [ASC | DESC], ... ]
[ LIMIT exp ]

SELECT and SELECT ALL are equivalent. The return type depends on the query. For example:

DATA := [(a: 1), (a: 2), (a: 1)];
SELECT a FROM DATA // Output type is a collection(int)
SELECT SUM(a) FROM DATA // Output type is an int nullable
SELECT SUM(a) AS x FROM DATA // Output type is a record(x: int nullable)

SELECT DISTINCT removes duplicates from the output. For example:

DATA := [(a: 1), (a: 2), (a: 1)];
SELECT DISTINCT a FROM DATA // [2, 1]

Whenever SELECT DISTINCT returns a collection, the order is not guaranteed unless ORDER BY is used.

The execution of a SELECT has an implicit step-by-step evaluation:

  1. First the FROMs are evaluated, as if producing the cross product of all sources;

  2. Followed by applying the WHERE filter;

  3. Followed by applying the GROUP BY aggregations;

  4. Followed by applying the HAVING filter on the aggregations;

  5. Followed by applying the projections;

  6. Followed by applying the DISTINCT, which removes duplicates from the output (if it is a collection);

  7. Followed by applying the ORDER BY, which reorders the output (if it is a collection);

  8. Followed by applying the LIMIT, which reduces the output (if it is a collection).

FROM

The FROM contains a sequence of one or more generators. If a generator is null, there are two possible cases:

  • If the generator is depends on another generator of the same SELECT (unnesting data), then the null generator is equivalent to the collection being empty. For example: SELECT x FROM table1 AS t1, t1.data AS x would skip rows where t1.data is null.

  • Otherwise, then the SELECT expression cannot be evaluated and returns null. For example: SELECT * FROM table1, table2 returns null if either table1 or table2 are null.

Note that generators do not have to be collections. For instance, SELECT * FROM table1 AS t1, 1 AS x is allowed, even though 1 is not a collection. This is equivalent to SELECT * FROM table1 AS t1, [1] AS x. Non-collection generators, however, must be aliased (with AS). For instance, SELECT * FROM table1 AS t1, 1 is not allowed, while SELECT * FROM table1 AS t1, 1 AS x is allowed (note the AS x).

Expressions evaluated after the FROM have the reserved identifier * (star) added to a newly created scope. (Note that if GROUP BY is present, it will redefine star in a nested scope.)

  • In the case of a single generator in the FROM, the star is defined as the inner element of the collection.

  • In the case of multiple generators, the star is defined as a record that merges all inner elements of the generators. For example in SELECT * FROM table1, table2, where table1 has type collection(record(a: int, b: string)) and table2 has type collection(record(c: float)) the star has type record(a: int, b: string, c: float).

The FROM table1 AS t1 assigns each element of collection table1 to the identifier t1. Similarly to SQL, FROM table1 can also be used without AS; in this case, if table1 is a collection of records, identifiers are created per record field automatically, which may lead to ambiguity when using them if the same identifier is used across two tables or is already in scope.

WHERE

The WHERE filters rows. The expression must be of boolean type. If the expression is null then it assumes the value of false, i.e. rows are filtered out.

GROUP BY

The GROUP BY groups values by the given sequence of expressions. If the expressions are given an alias as in <exp> AS <idn> then the identifier is added to a new scope and may be used in subsequent expressions of the SELECT. However, the alias is not in scope in subsequent expressions of the GROUP BY.

Expressions evaluated after the GROUP BY have two reserved identifiers added to the newly created scope: * (star) and partition. Star contains the collection of elements in the group, where the inner elements are a record that merges all inner elements of the FROM generators (similarly to the star defined in the FROM). partition also contains the collection of elements in the group, similarly to star. However, the partition inner elements are:

  • In the case of a single generator in the FROM, the partition is defined as the inner element of the collection.

  • In the case of multiple generators, the partition is defined as a record that merges all generators. For example in SELECT partition FROM table1 AS t1, table2 GROUP BY t1.a, where table1 has type collection(record(a: int, b: string)) and table2 has type collection(record(c: float)) the partition has type collection(record(_1: collection(record(a: int, b: string)), _2: collection(record(c: float)))).

Grouping elements requires an equality comparison defined for each value, so that elements are put in the same group. This equality comparison is the same as the one used by IS / IS NOT DISTINCT FROM keyword.

HAVING

The HAVING filters rows after aggregation. It requires a GROUP BY. The expression must be boolean. If the expression is null then it assumes the value of false, i.e. rows are filtered out.

Projections

Assuming:

table1 := [(first: "John", last: "Doe"), (first: "Jane", last: "Doe")];
  • SELECT first FROM table1 is ["John", "Jane"]. Note that the output type is a collection of strings (no record is created).

  • SELECT first AS name FROM table1 is [(name: "John", name: "Jane")]. Note that the output type is a collection of records with a single field name.

  • SELECT first AS name, last AS name FROM table1 is valid but produces an output type whose record will be ambiguous if projecting field name.

ORDER BY

ORDER BY orders the output of the collection.

Ordering requires an inequality comparison defined for each value. This inequality comparison treats null as a value, rather than unknown, which is smaller than all other values (but equal to null).

As in SQL, DISTINCT cannot be combined with ORDER BY in the same SELECT expression.

LIMIT

The LIMIT requires a integer expression whose evaluation describes the maximum number of records to be returned.

LIMIT NULL is equivalent to omitting the clause.

LIMIT does not necessarily produce the same output across re-executions unless ORDER BY is used with an absolute order.

Attention

LIMIT NULL is not currently implemented.

Aggregations

Similarly to SQL, RAW includes support for aggregations. These aggregations follow the same style as in conventional SQL. In particular:

  • SQL-style aggregations can only be used as part of a SELECT

  • SQL-style aggregations cannot be nested: e.g. SELECT MAX(age + MAX(age)) FROM students is invalid.

The SQL aggregations supported are:

  • MAX(<number type, string type, date type, time type, timestamp type or interval type>)

  • MIN(<number type, string type, date type, time type, timestamp type or interval type>)

  • SUM(<number type or interval type>)

  • AVG(<number type or interval type>)

  • COUNT(<anything>)

  • FIRST(<anything>)

  • LAST(<anything>)