7. SQL Compatibility

Thus far we have focused on RQL (RAW Query Language) queries.

However, RAW users are not forced to use RQL. In fact, RAW also provides SQL as a first-class query language.

This SQL mode is provided for strict compatibility with SQL standards. It is also helpful when submitting queries to RAW from external tools that only “speak” SQL.

In Jupyter, this is available using the %%sql magic.

In SQL mode, however, none of the extensions seen so far are available. Therefore, to use SQL, users have first to define views in RQL, and then consume them from SQL. That’s because keywords like READ, to read data from source, are not part of the SQL specification.

(In later notebooks we will see how to define materialized views and tables in RAW, which provides additional ways to make data available for the SQL mode.)

[1]:
%load_ext raw_magic

We start by defining a view over a CSV file in RQL as before:

[2]:
%%view airports

SELECT * FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv")
View "airports" was replaced

Now that airports is available, we can query it using SQL.

For this we use %%sql instead of %%rql:

[5]:
%%sql

SELECT * FROM airports LIMIT 5
[5]:
AirportIDNameCityCountryIATA_FAAICAOLatitudeLongitudeAltitudeTimezoneDSTTZ
1GorokaGorokaPapua New GuineaGKAAYGA-6.081689145.391881528210.0UPacific/Port_Moresby
2MadangMadangPapua New GuineaMAGAYMD-5.207083145.78872010.0UPacific/Port_Moresby
3Mount HagenMount HagenPapua New GuineaHGUAYMH-5.826789144.295861538810.0UPacific/Port_Moresby
4NadzabNadzabPapua New GuineaLAEAYNZ-6.569828146.72624223910.0UPacific/Port_Moresby
5Port Moresby Jacksons IntlPort MoresbyPapua New GuineaPOMAYPY-9.443383147.2200514610.0UPacific/Port_Moresby

If we filter airports by city, we see the SQL syntax being used:

[6]:
%%sql

SELECT * FROM airports WHERE City = 'Lisbon'
[6]:
AirportIDNameCityCountryIATA_FAAICAOLatitudeLongitudeAltitudeTimezoneDSTTZ
1638LisboaLisbonPortugalLISLPPT38.781311-9.1359193740.0EEurope/Lisbon
7752Lisbon Cruise TerminalLisbonPortugalN38.712606-9.12248300.0EEurope/Lisbon

In RQL, we use "" for strings instead, as in:

[8]:
%%rql

SELECT * FROM airports WHERE City = "Lisbon"
[8]:
AirportIDNameCityCountryIATA_FAAICAOLatitudeLongitudeAltitudeTimezoneDSTTZ
1638LisboaLisbonPortugalLISLPPT38.781311-9.1359193740.0EEurope/Lisbon
7752Lisbon Cruise TerminalLisbonPortugalN38.712606-9.12248300.0EEurope/Lisbon

SQL Limitations

The SQL layer is meant to conform to the SQL 2003 standard.

The major limitation with the SQL layer is that only RAW entities that are tables are made available to the SQL layer.

A “table” is, in RAW, a collection of records, whose fields are all primitive types.

[11]:
%%query_validate

airports
collection(
    record(
        AirportID: int,
        Name: string,
        City: string,
        Country: string,
        IATA_FAA: string,
        ICAO: string,
        Latitude: double,
        Longitude: double,
        Altitude: int,
        Timezone: double,
        DST: string,
        TZ: string))

The view airports is compatible with SQL.

Let’s define a new view that is not SQL compatible:

[12]:
%%view airports_1

SELECT City, * FROM airports GROUP BY City
[15]:
%%query_validate

airports_1
collection(
    record(
        City: string,
        _2: collection(
            record(
                AirportID: int,
                Name: string,
                City: string,
                Country: string,
                IATA_FAA: string,
                ICAO: string,
                Latitude: double,
                Longitude: double,
                Altitude: int,
                Timezone: double,
                DST: string,
                TZ: string))))

This type is not SQL compatible, and therefore, not made available to query from the SQL layer.

[16]:
%%sql

SELECT * FROM airports_1
org.apache.calcite.runtime.CalciteContextException: From line 3, column 15 to line 3, column 24: Object 'AIRPORTS_1' not found. Positions: 3:15 to 3:24
  3: SELECT * FROM airports_1
                   ^^^^^^^^^^