9. Materialized Views¶
Views in RAW can benefit from RAW’s automated caching described previously.
However, for many use cases such a “best-effort” mechanism, is not sufficient as it does not provide explicit performance guarantees.
Materialized Views are another mechanism in RAW, which provides much stricter guarantees of when the original data source is queried.
Materialized views are similar to views except that the data is forcefully cached in RAW.
This cache is maintained indefinitely and is only refreshed if needed by the query logic.
To define materialized views in RAW, use the
%%materialized_view airports_materialized SELECT * FROM READ("https://raw-tutorial.s3.amazonaws.com/airports.csv", cache := interval "30 day")
This materialized view will be created the first time it is used, and will be kept.
It will only be refresh when a query refers to it after 30 days.
Note that materialized views are not created at definition time - they are created the first time a query referring to them is executed.
Joins and Duration¶
Suppose we have another materialized view:
%%materialized_view trips_materialized SELECT * FROM READ("https://raw-tutorial.s3.amazonaws.com/trips.json", cache := interval "15 day")
If a query joins both materialized views, it will benefit from materialized input on both sides.
%%rql SELECT * FROM airports_materialized a, trips_materialized t WHERE t.origin=a.IATA_FAA
If we were to define this query as a materialized view, one question is how often would that new materialized view have to be refreshed?
The answer is that in case of joining multiple materialized views, the final materialized view has the smallest duration of each of its constituents parts.
In this example since one materialized view refreshes every 30 days, and another every 15 days, the final materialized view refreshes also every 15 days, when the materialized view of 15 days refreshes itself.
How refreshes happen¶
In general, refreshing materialized views requires executing the original query and storing the new data.
In some situations, depending on the data source, we foresee the possibility to use “deltas” of information to speed up this process. However, this should be considered an internal optimization of RAW.