Skip to main content

Collections and Lists

Collections and lists are both structures that hold sequences of items of the same type. There are some differences between the two, which are summarized at the end of this document.

Collections

The elements in a collection are not computed until its contents are consumed.

For instance, the following:

// Read schema01.table01 from database db01
table = PostgreSQL.InferAndRead("db01", "schema01", "table01")

This function returns a collection. The rows are only read from PostgreSQL as the data is being consumed. The PostgreSQL table contents are not materialized in-memory or on-disk.

Lists

Lists are a structure holding a sequence of items of the same type. Lists are similar to Collections, but its elements are computed when the list is constructed. Lists provide additional capabilities compared to collections, at a cost in performance, memory usage and different behaviours in error handling.

To build a list:

[1, 2, 3]

This example defines a list of three elements of type int.

All elements of a list must be of the same type.

When to use collections versus lists

Collections and lists have different characteristics.

  • Because a list is materialized, it is possible to access elements by position using the List.Get operation. This feature does not exist for collections.
  • Collections have a smaller memory footprint than lists, and in general should be preferred for larger datasets.
  • Since a collection isn't materialized, every time it is processed with functions like Collection.Contains or Collection.Filter, its elements are recomputed from the start. If code processing a collection is part of an outer loop, the execution could be slow. In such cases, converting the collection into a list using List.From gives a better performance by materializing the items once.
  • Because lists are materialized, a list can be in an error state. That is, it is possible to check if a list failed to compute. This feature does not exist for collections. Instead, a collection will fail while it is being consumed. This can have an impact when developing code that handles errors. Refer to the section on Error handling for additional details.

Additionally, refer to the section on Evaluation model for more details.