Product Features

As described in Architecture, RAW is composed of the following components:

  • “RAW Client”, for instance the command-line client, or the Python API;

  • “RAW Frontend”, which is the Web-based administration user interface of RAW.

  • “RAW Executor”, which manages user sessions and executes the queries;

  • “RAW Credentials”, which is responsible for holding access credentials to source systems;

  • “RAW Storage”, which is responsible for managing cached data, views, materialized views and other temporary data.

RAW can be adapted to multiple deployment configurations. This is achieved by using different implementations of components. For instance, the “RAW Storage” service has multiple implementations, whether the storage location is a local file system, HDFS or S3. The standard product features included in RAW for each of these components is described below.

The “RAW Frontend”, “RAW Executor” and “RAW Credentials” services require a backend database to operate. This is typically deplo yed by RAW during installation but may also use existing database solutions to leverage e.g. existing HA solutions. Contact us for details.

Types of Product Features

The product features in RAW are final, experimental or preview:

  • Features marked as “experimental” are newer features, whose user interface may still evolve significantly. These need to be enabled at installation time. Care should be taken in using them, due to possible changes. These are provided to obtain early user feedback.

  • Features marked as “preview” are considered more stable than experimental; in particular, the user interface is considered stable. However, these features are newer and not widely-used in production. For this reason, these must also be enabled at installation time.

  • All other features are considered final, stable and ready for production use.

Features marked as “soon” are currently in development and will be provided as “experimental” in upcoming RAW versions.

Contact us directly to discuss new features or relative priority. These include desired query language extensions, input or output format connectors, cache storage locations, BI connectors, or custom credentials management.

Standard Product Features

  • Query Language

    • Pure functional query language, targetting backward-compatibility with SQL

    • Multiple extensions for:

      • Rich data model (refer to “Data model support”)

      • Automated file format discovery and schema inference

      • Querying data from multiple input formats and locations

      • Exporting data to multiple formats

      • Ability to embed Python functions in queries

  • Data model support

    • Tabular data

    • Hierarchical data

    • (Multi-dimensional) Array data

  • Input

    • Relational Databases

      • ORACLE

      • Microsoft SQL Server

      • MySQL

      • PostgreSQL

      • SQLite

      • Teradata

    • Locations

      • HDFS

      • S3

      • Dropbox

      • HTTP(S)

      • Local File System

      • Ceph (soon)

    • File Formats

      • CSV

      • XML

      • JSON

      • HJSON

      • Excel

      • Text files / Log files

      • Parquet

      • Avro

      • ORC

      • Web Services

      • Arrow (in development)

  • Output

    • JSON

    • HJSON

    • CSV

    • SQLite

    • Parquet

    • Avro

    • RAW-format

    • OpenAPI REST

    • Web Services

  • Storage

    • Automated caching of intermediate query results

    • Storage Locations

      • HDFS

      • S3

      • Local File System

      • Ceph (soon)

    • Storage Formats

      • Internal RAW format

      • Parquet format - for tabular data (soon)

      • Delta format - for tabular data (soon)

  • Query Engine Features

    • Queries

    • Views

    • Materialized Views

    • Packages

  • Deployment

    • Kubernetes

    • AWS (preview)

    • Docker Compose (soon)

    • OpenShift (soon)

    • Azure (soon)

    • Google Cloud (soon)

  • Scalability

    • Min: 10 CPU cores w/ 4 GB per core

    • Max: undefined. Tested up to ~1000 CPU cores

  • User Interface

    • Adminstration Web Interface with:

      • Scratchpad

      • Data Catalog

      • Credentials Management

      • Query Logging

      • System Monitoring

    • Command-line tool (Linux, MacOS, Windows)

  • Clients

    • Python

    • Jupyter Notebook Magic for Python

    • Scala/Java

    • REST API

      • HTTP

      • WebSockets

    • C (experimental)

  • BI Tools Clients

    • REST API allows integration with multiple BI tools

    • Excel connector (experimental)

    • JDBC connector (experimental)

  • Authorization

    • OAuth 2.0

    • Auth0 support

  • Security and Isolation

    • User sessions isolated in Docker containers/Kubernetes pods

    • Credentials managed by separate Credentials service

  • Credentials

    • Credentials are retrieved by query engine as needed

    • Credentials define the cache reuse: caches created for a user may be used by another user that currently has access to the same set of credentials

    • Allows custom development and integration of a Credentials service, which fetches credentials using custom logic

    • Included “Credentials Service” provides per-user credentials management: each user supplies own credentials