1. First Query¶

Let’s write our first queries in RAW and along the way introduce some basic concepts in the system.

We start by initializing the RAW magic (the RAW client) for Jupyter.

[1]:

%load_ext raw_magic


Our first query uses public data stored on S3, available through https.

Note the use of %%rql, which is the RAW magic command to run an RQL query; RQL stands for the RAW Query Language

[9]:

%%rql

SELECT *
LIMIT 5

[9]:

AirportIDNameCityCountryIATA_FAAICAOLatitudeLongitudeAltitudeTimezoneDSTTZ
1GorokaGorokaPapua New GuineaGKAAYGA-6.081689145.391881528210.0UPacific/Port_Moresby
3Mount HagenMount HagenPapua New GuineaHGUAYMH-5.826789144.295861538810.0UPacific/Port_Moresby
5Port Moresby Jacksons IntlPort MoresbyPapua New GuineaPOMAYPY-9.443383147.2200514610.0UPacific/Port_Moresby

This query is reading a CSV file, projecting all columns in the file, and returning the first 5 rows.

The READ keyword tells RAW to read data from the given URL. This is common in RAW: data sources are specified in the query.

As the query starts to execute, RAW will infer the format and structure of the data. In this case, the data is a CSV file and RAW makes it available as expected: as a table. The column names are retrieved from the CSV file, which contains a header in the first line.

We can execute more complex queries using regular SQL language features. For example:

[11]:

%%rql

SELECT Country, COUNT(*) AS Number_Airports
GROUP BY Country
ORDER BY Number_Airports DESC
LIMIT 3

[11]:

CountryNumber_Airports
United States1697
With the exception of the READ keyword, this query looks like normal SQL. The READ keyword, however, means users do not have to create tables and load data into them, or discover the schema. This is done automatically by RAW.