Architecture

The following diagram describes the main components of RAW (shown in red) along with the other services that RAW interacts with.

../../_images/architecture.png

RAW consists of the following components:

  • “RAW Client”, for instance the command-line client, or the Python APIs;

  • “RAW Frontend” (not shown in the diagram), which is the Web-based administration user interface of RAW.

  • “RAW Executor”, which manages user sessions and executes the queries;

  • “RAW Credentials”, which is responsible for holding access credentials to source systems;

  • “RAW Storage”, which is responsible for managing cached data, views, materialized views and other temporary data.

The “RAW Frontend”, “RAW Executor” and “RAW Credentials” services require a backend database to operate. This is typically deplyed by RAW during installation but may also use existing database solutions to leverage e.g. existing HA solutions. Contact us for additional information.

Life of a Query

A user submits a query in RAW using the RAW Client.

  1. (Optional) The “RAW client” registers any required source credentials in the “RAW Credentials” service. This is only required if the source system require credentials and there is no existing credentials store available;

  2. The “RAW client” submits the query to the “RAW Executor”, via a REST API;

  3. The “RAW executor” prepares the query for execution and validates credentials in the process. The query is submitted for execution to the Kubernetes cluster, where a pod is responsible for managing the user session. (A user session is analogous to a database connection.);

  4. The query may revalidate credentials during execution;

  5. The query contacts external sources to retrieve data as needed;

  6. The query may cache data in cluster or external storage (e.g. HDFS, S3, Ceph); the bookkeeping metadata is kept in the “RAW Storage”;

  7. Results and/or logs may be collected back into the “RAW Executor”;

  8. Results and/or logs are sent back to the “RAW Client”.