So far we have run queries using the
As data analysis gets more complex, it is difficult to express all transformations in a single query.
Views are a mechanism in RAW to store and reference queries you’ve done before.
They are conceptually similar to virtual views in classical databases.
To define a view, we use the
%%view magic, as:
%%view <view name> <query>
Let’s define our first view, which simply list all the contents of a file.
%%view airports SELECT * FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv")
This creates a view named
airports, which consists of reading the CSV file.
To use this view, we can just reference it in a query.
%%rql SELECT * FROM airports LIMIT 2
|1||Goroka||Goroka||Papua New Guinea||GKA||AYGA||-6.081689||145.391881||5282||10.0||U||Pacific/Port_Moresby|
|2||Madang||Madang||Papua New Guinea||MAG||AYMD||-5.207083||145.7887||20||10.0||U||Pacific/Port_Moresby|
This query now resembles even more traditional SQL.
In fact, it achieved a form of “virtualization” by hiding the exact location of the original data, and presenting it to the consumer under a name.
If the location of the file were changed, or its query changed to clean some data for instance, the view name would be kept stable.
We can create other views similarly:
%%view airports_per_country SELECT Country, COUNT(*) AS Number_Airports FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv") GROUP BY Country ORDER BY Number_Airports DESC
View "airports_per_country" was replaced
%%rql SELECT * FROM airports_per_country ORDER BY Number_Airports DESC LIMIT 5
We can also list the views available using a Jupyter magic: