As described in Architecture, RAW is composed of the following components:
“RAW Client”, for instance the command-line client, or the Python API;
“RAW Frontend”, which is the Web-based administration user interface of RAW.
“RAW Executor”, which manages user sessions and executes the queries;
“RAW Credentials”, which is responsible for holding access credentials to source systems;
“RAW Storage”, which is responsible for managing cached data, views, materialized views, tables and other temporary data.
RAW can be adapted to multiple deployment configurations. This is achieved by using different implementations of components. For instance, the “RAW Storage” service has multiple implementations, whether the storage location is a local file system, HDFS or S3. The standard product features included in RAW for each of these components is described below.
The “RAW Frontend”, “RAW Executor” and “RAW Credentials” services require a backend database to operate. This is typically deplo yed by RAW during installation but may also use existing database solutions to leverage e.g. existing HA solutions. Contact us for details.
Types of Product Features¶
The product features in RAW are final, experimental or preview:
Features marked as “experimental” are newer features, whose user interface may still evolve significantly. These need to be enabled at installation time. Care should be taken in using them, due to possible changes. These are provided to obtain early user feedback.
Features marked as “preview” are considered more stable than experimental; in particular, the user interface is considered stable. However, these features are newer and not widely-used in production. For this reason, these must also be enabled at installation time.
All other features are considered final, stable and ready for production use.
Features marked as “soon” are currently in development and will be provided as “experimental” in upcoming RAW versions.
Contact us directly to discuss new features or relative priority. These include desired query language extensions, input or output format connectors, cache storage locations, BI connectors, or custom credentials management.
Standard Product Features¶
Pure functional query language, targetting backward-compatibility with SQL
Multiple extensions for:
Rich data model (refer to “Data model support”)
Automated file format discovery and schema inference
Querying data from multiple input formats and locations
Exporting data to multiple formats
Ability to embed Python functions in queries
Data model support
(Multi-dimensional) Array data
Microsoft SQL Server
Local File System
Text files / Log files
Arrow (in development)
Automated caching of intermediate query results
Local File System
Internal RAW format
Parquet format - for tabular data (soon)
Delta format - for tabular data (soon)
Query Engine Features
Tables (in development)
Support for INSERT, UPDATE, DELETE statements for tabular data
Docker Compose (soon)
Google Cloud (soon)
Min: 10 CPU cores w/ 4 GB per core
Max: undefined. Tested up to ~1000 CPU cores
Adminstration Web Interface with:
Command-line tool (Linux, MacOS, Windows)
Jupyter Notebook Magic for Python
BI Tools Clients
REST API allows integration with multiple BI tools
Excel connector (experimental)
JDBC connector (experimental)
Security and Isolation
User sessions isolated in Docker containers/Kubernetes pods
Credentials managed by separate Credentials service
Credentials are retrieved by query engine as needed
Credentials define the cache reuse: caches created for a user may be used by another user that currently has access to the same set of credentials
Allows custom development and integration of a Credentials service, which fetches credentials using custom logic
Included “Credentials Service” provides per-user credentials management: each user supplies own credentials