1. First Query

Let’s write our first queries in RAW and along the way introduce some basic concepts in the system.

We start by initializing the RAW magic (the RAW client) for Jupyter.

[1]:
%load_ext raw_magic

Our first query uses public data stored on S3, available through https.

Note the use of %%rql, which is the RAW magic command to run an RQL query; RQL stands for the RAW Query Language

[9]:
%%rql

SELECT *
FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv")
LIMIT 5
[9]:
AirportIDNameCityCountryIATA_FAAICAOLatitudeLongitudeAltitudeTimezoneDSTTZ
1GorokaGorokaPapua New GuineaGKAAYGA-6.081689145.391881528210.0UPacific/Port_Moresby
2MadangMadangPapua New GuineaMAGAYMD-5.207083145.78872010.0UPacific/Port_Moresby
3Mount HagenMount HagenPapua New GuineaHGUAYMH-5.826789144.295861538810.0UPacific/Port_Moresby
4NadzabNadzabPapua New GuineaLAEAYNZ-6.569828146.72624223910.0UPacific/Port_Moresby
5Port Moresby Jacksons IntlPort MoresbyPapua New GuineaPOMAYPY-9.443383147.2200514610.0UPacific/Port_Moresby

This query is reading a CSV file, projecting all columns in the file, and returning the first 5 rows.

The READ keyword tells RAW to read data from the given URL. This is common in RAW: data sources are specified in the query.

As the query starts to execute, RAW will infer the format and structure of the data. In this case, the data is a CSV file and RAW makes it available as expected: as a table. The column names are retrieved from the CSV file, which contains a header in the first line.

We can execute more complex queries using regular SQL language features. For example:

[11]:
%%rql

SELECT Country, COUNT(*) AS Number_Airports
FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv")
GROUP BY Country
ORDER BY Number_Airports DESC
LIMIT 3
[11]:
CountryNumber_Airports
United States1697
Canada435
Germany321

Here we see countries with most airports.

With the exception of the READ keyword, this query looks like normal SQL. The READ keyword, however, means users do not have to create tables and load data into them, or discover the schema. This is done automatically by RAW.

For performance, however, RAW will create caches of the data. We will discuss caching in RAW in more details later in this tutorial.