Product Features¶
As described in Architecture, RAW is composed of the following components:
“RAW Client”, for instance the command-line client, or the Python API;
“RAW Frontend”, which is the Web-based administration user interface of RAW.
“RAW Executor”, which manages user sessions and executes the queries;
“RAW Credentials”, which is responsible for holding access credentials to source systems;
“RAW Storage”, which is responsible for managing cached data, views, materialized views, tables and other temporary data.
RAW can be adapted to multiple deployment configurations. This is achieved by using different implementations of components. For instance, the “RAW Storage” service has multiple implementations, whether the storage location is a local file system, HDFS or S3. The standard product features included in RAW for each of these components is described below.
The “RAW Frontend”, “RAW Executor” and “RAW Credentials” services require a backend database to operate. This is typically deplo yed by RAW during installation but may also use existing database solutions to leverage e.g. existing HA solutions. Contact us for details.
Types of Product Features¶
The product features in RAW are final, experimental or preview:
Features marked as “experimental” are newer features, whose user interface may still evolve significantly. These need to be enabled at installation time. Care should be taken in using them, due to possible changes. These are provided to obtain early user feedback.
Features marked as “preview” are considered more stable than experimental; in particular, the user interface is considered stable. However, these features are newer and not widely-used in production. For this reason, these must also be enabled at installation time.
All other features are considered final, stable and ready for production use.
Features marked as “soon” are currently in development and will be provided as “experimental” in upcoming RAW versions.
Contact us directly to discuss new features or relative priority. These include desired query language extensions, input or output format connectors, cache storage locations, BI connectors, or custom credentials management.
Standard Product Features¶
Query Language
Pure functional query language, targetting backward-compatibility with SQL
Multiple extensions for:
Rich data model (refer to “Data model support”)
Automated file format discovery and schema inference
Querying data from multiple input formats and locations
Exporting data to multiple formats
Ability to embed Python functions in queries
Data model support
Tabular data
Hierarchical data
(Multi-dimensional) Array data
Input
Relational Databases
ORACLE
Microsoft SQL Server
MySQL
PostgreSQL
SQLite
Teradata
Locations
HDFS
S3
Dropbox
HTTP(S)
Local File System
Ceph (soon)
File Formats
CSV
XML
JSON
HJSON
Excel
Text files / Log files
Parquet
Avro
ORC
Web Services
Arrow (in development)
HDF5 (experimental)
Kafka (experimental)
Output
JSON
HJSON
CSV
SQLite
Parquet
Avro
RAW-format
OpenAPI REST
Web Services
Storage
Automated caching of intermediate query results
Storage Locations
HDFS
S3
Local File System
Ceph (soon)
Storage Formats
Internal RAW format
Parquet format - for tabular data (soon)
Delta format - for tabular data (soon)
Query Engine Features
Queries
Views
Materialized Views
Packages
Tables (in development)
Support for INSERT, UPDATE, DELETE statements for tabular data
MVCC support
Deployment
Kubernetes
AWS (preview)
Docker Compose (soon)
OpenShift (soon)
Azure (soon)
Google Cloud (soon)
Scalability
Min: 10 CPU cores w/ 4 GB per core
Max: undefined. Tested up to ~1000 CPU cores
User Interface
Adminstration Web Interface with:
Scratchpad
Data Catalog
Credentials Management
Query Logging
System Monitoring
Command-line tool (Linux, MacOS, Windows)
Clients
Python
Jupyter Notebook Magic for Python
Scala/Java
REST API
HTTP
WebSockets
C (experimental)
BI Tools Clients
REST API allows integration with multiple BI tools
Tableau Web Data Connector (preview)
Excel connector (experimental)
JDBC connector (experimental)
Authorization
OAuth 2.0
Auth0 support
Security and Isolation
User sessions isolated in Docker containers/Kubernetes pods
Credentials managed by separate Credentials service
Credentials
Credentials are retrieved by query engine as needed
Credentials define the cache reuse: caches created for a user may be used by another user that currently has access to the same set of credentials
Allows custom development and integration of a Credentials service, which fetches credentials using custom logic
Included “Credentials Service” provides per-user credentials management: each user supplies own credentials