Inference

Snapi is a type-safe language, which means that the type of every expression must be well-defined and checked prior to execution. Type-safe languages are safer and more efficient than dynamic languages but usually do so at the cost of flexibility.

However, some operations in Catalog such as Csv.InferAndRead allow the user to read a CSV file whose structure is not known, and then refer to its field names. This behaviour is more typical of dynamic languages, but is achieved in a type-safe manner in Catalog using inference. This means that during compilation, the Catalog engine samples the contents of the CSV file to determine its type.

For instance, suppose a CSV file is stored at http://example.org/file.csv with the following contents:

Name,Age
Miguel,42
Benjamin,45

In Snapi, this data be queried as:

let
    location = "http://example.org/file.csv",
    data = Csv.InferAndRead(location)
in
    Collection.Filter(data, row -> row.age > 43)

When Csv.InferAndRead is analyzed to determine its type, the contents of the location must be sampled. This means that the value of location itself must be known. This is done using a technique called staged compilation, which means that a Snapi program may trigger the execution of sub-programs to determine the value of arguments so that Csv.InferAndRead can then know the location to sample and then determine its own output type. If the staged compiler cannot determine the value, then the compilation fails with an error.

For instance, in the following example the system fails with an error:

main(url: string) =
    let
        data = Csv.InferAndRead(url)
    in
        Collection.Filter(data, row -> row.age > 43)

In this example, the url is not known at compile time as it can represent any location. Therefore, the system cannot determine the output type of Csv.InferAndRead. Therefore, it cannot assert that the data has a field called age, therefore, the compilation fails. In this example, the user is required to specify the type of the data using Csv.Read instead, as in:

main(url: string) =
    let
        data = Csv.Read(url, type collection(record(name: string, age: int)))
    in
        Collection.Filter(data, row -> row.age > 43)

Refer to the documentation of CSV, JSON, and most other formats for the usage of InferAndRead versus Read. In particular, the inference readers have multiple optional parameters to adjust the sampling of data.