What is Snapi?
Snapi is new data manipulation language designed to deliver data quickly. It includes a rich built-in library and ready-to-use connectors to the most common data sources. It is designed to be easy to get started but powerful enough to handle the most complex data manipulation use cases.
Snapi main features are:
- built-in support for querying data directly from databases, files or web services at source;
- support for complex data types and data transformations;
- declarative, type-safe language with dynamic behaviors;
- safe language with strong error handling;
Why another programming language?
Our goal is to enable programmers to query any data, anywhere.
Before building Snapi, we tried using - and even extending! - SQL to comfortably cope with the complex datasets we found in the real world. But despite our best attempts, SQL was difficult to use in practice: handling a complex JSON, XML or other data formats, requires flattening deep hierarchies, remembering format-specific constructs and unique optimization techniques. These hurdles just made it inpractical to use.
Another challenge was dealing with unknown or messy data. SQL is designed for use in classical database systems where data is loaded and schemas known upfront. This is again a struggle in the real world where a lot of developer time is sometimes spent discovering schemas or dealing with messy data.
For these reasons, and to achieve our goals of querying any data, anywhere, we had to build our own Snapi programming language.
Show me some code!
Here is a simple Snapi program:
// Let's define a function called 'main' that receives a string called 'main'.
main(name: string) =
// We use the 'let' keyword to bind identifiers.
// Here we defined 'capitalized_name' and 'prefix'.
// The 'in' keyword is where the final expression of the function is defined.
capitalized_name = String.Capitalize(name),
prefix = "Hello"
prefix + capitalized_name + "!"
// Let's call this function.
// The result will be "Hello Jane!"
This simple program shows a few features:
- how to define functions and identifiers;
- and how to use built-in libraries (e.g. 'String.Capitalize')
Now let's see at an example querying data.
main(id: int) =
machines = Json.InferAndRead("s3://raw-tutorial/ipython-demos/predictive-maintenance/machines.json")
Collection.Filter(machines, x -> x.machineID == id)
This example queries a JSON file from S3. This JSON file contains a collection of data. We then filter the data on the JSON file so that the 'id' argument matches the JSON field 'machineID'.
We hope you 🤩! Keep reading to learn more.
What's new in depth
Let's have a quick overview of the main features in Snapi.
Built-in support for querying data
A key feature of Snapi is the built-in support for querying data directly from databases, files or web services.
Database connectors are included for the most commonly-used SQL-based relational databases, with more been adding frequently. Locations such as S3 are also supported, as well as HTTP(s).
There is a built-in support in the language for securely storing credentials, so that your source code does not contain sensitive information.
Support for complex data types
Snapi supports primitive data types such as numbers, temporals (date, time, timestamp, interval), strings, booleans. You can find more information on the supported data types here.
Snapi also supports lista and collections, which can be nested arbitrarily. Lists are similar to Collections, but its elements are computed when the list is constructed. Lists provide additional capabilities compared to collections, at a cost in performance, memory usage and different behaviours in error handling.
Support for data transformations
Like other query languages, Snapi includes support for filtering, joining, grouping or ordering data. In addition, Snapi includes additional operations such as unnesting data. The set of operations can be seen in the respective documentation for Lists and Collections.
Snapi is a high-level declarative language. For instance, when specifying a filter and a join over a collection, the optimizer will perform modifications and optimizations to the code and choose the most appropriate join implementation to use. Refer to the evaluation model for some additional details.
Type-safe language with dynamic behaviors
Snapi is a type-safe language, which means that most common errors will be detected at compilation time.
However, there is built-in support for discovering data schemas, which allows users to query data without having to specify the schema - instead, the compiler will run a background task to discover the schema and infer the types automatically. Refer here for details on how inference works.
Safe language with strong error handling
Snapi is a safe language. For instance, Snapi does not require you to open or close files, database connections, or other resources. This is all handled for you automatically, making it a safer language to use and avoid common classes of errors.
Moreover, Snapi has a special approach to error handling, in particular for propagating errors, which is both safer and more useful in practice for querying data.
Snapi is designed to scale from a single thread to a large cluster, in a way that is transparent to the user.