3. Views

So far we have run queries using the %%rql magic.

As data analysis gets more complex, it is difficult to express all transformations in a single query.

Views are a mechanism in RAW to store and reference queries you’ve done before.

They are conceptually similar to virtual views in classical databases.

[1]:
%load_ext raw_magic

To define a view, we use the %%view magic, as:

%%view <view name>

<query>

Let’s define our first view, which simply list all the contents of a file.

[3]:
%%view airports

SELECT * FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv")

This creates a view named airports, which consists of reading the CSV file.

To use this view, we can just reference it in a query.

[5]:
%%rql

SELECT * FROM airports LIMIT 2
[5]:
AirportIDNameCityCountryIATA_FAAICAOLatitudeLongitudeAltitudeTimezoneDSTTZ
1GorokaGorokaPapua New GuineaGKAAYGA-6.081689145.391881528210.0UPacific/Port_Moresby
2MadangMadangPapua New GuineaMAGAYMD-5.207083145.78872010.0UPacific/Port_Moresby

This query now resembles even more traditional SQL.

In fact, it achieved a form of “virtualization” by hiding the exact location of the original data, and presenting it to the consumer under a name.

If the location of the file were changed, or its query changed to clean some data for instance, the view name would be kept stable.

We can create other views similarly:

[14]:
%%view airports_per_country

SELECT Country, COUNT(*) AS Number_Airports
FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv")
GROUP BY Country
ORDER BY Number_Airports DESC
View "airports_per_country" was replaced
[15]:
%%rql

SELECT * FROM airports_per_country ORDER BY Number_Airports DESC LIMIT 5
[15]:
CountryNumber_Airports
United States1697
Canada435
Germany321
Australia263
Russia249

We can also list the views available using a Jupyter magic:

[16]:
%views_list
airports
airports_per_country