Let's write our first queries in RAW and along the way introduce some basic concepts in the system.
We start by initializing the RAW magic (the RAW client) for Jupyter.
Our first query uses public data stored on S3, available through
Note the use of
%%rql, which is the RAW magic command to run an RQL query; RQL stands for the RAW Query Language
%%rql SELECT * FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv") LIMIT 5
|1||Goroka||Goroka||Papua New Guinea||GKA||AYGA||-6.081689||145.391881||5282||10.0||U||Pacific/Port_Moresby|
|2||Madang||Madang||Papua New Guinea||MAG||AYMD||-5.207083||145.7887||20||10.0||U||Pacific/Port_Moresby|
|3||Mount Hagen||Mount Hagen||Papua New Guinea||HGU||AYMH||-5.826789||144.295861||5388||10.0||U||Pacific/Port_Moresby|
|4||Nadzab||Nadzab||Papua New Guinea||LAE||AYNZ||-6.569828||146.726242||239||10.0||U||Pacific/Port_Moresby|
|5||Port Moresby Jacksons Intl||Port Moresby||Papua New Guinea||POM||AYPY||-9.443383||147.22005||146||10.0||U||Pacific/Port_Moresby|
This query is reading a CSV file, projecting all columns in the file, and returning the first 5 rows.
READ keyword tells RAW to read data from the given URL.
This is common in RAW: data sources are specified in the query.
As the query starts to execute, RAW will infer the format and structure of the data. In this case, the data is a CSV file and RAW makes it available as expected: as a table. The column names are retrieved from the CSV file, which contains a header in the first line.
We can execute more complex queries using regular SQL language features. For example:
%%rql SELECT Country, COUNT(*) AS Number_Airports FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv") GROUP BY Country ORDER BY Number_Airports DESC LIMIT 3
Here we see countries with most airports.
With the exception of the
READ keyword, this query looks like normal SQL.
READ keyword, however, means users do not have to create tables and load data into them, or discover the schema.
This is done automatically by RAW.
For performance, however, RAW will create caches of the data. We will discuss caching in RAW in more details later in this tutorial.
Next: Data Discovery