Skip to main content


HJSON is a variation of JSON commonly used in Hadoop where each line contains a JSON object. Therefore, HJSON data is always a collection of JSON objects, where each JSON object must be of a compatible type.

{"id": 1, "description": "First row", "extra": 1.2}
{"id": 2, "description": "Second row", "extra": null}

The type of this data is collection(record(id: int, description: string, extra: double)).

When reading HJSON data without inference, the user is only required to pass the inner type - in the example above record(id: int, description: string, extra: double) - since HJSON data are always collections.

Multiple variants are supported:

  • if connection or other problems trigger a runtime error or return null;
  • if reading a single location or, in the case of file systems read directories or wildcards and merge their contents;
  • if inference is used or whether the type and format properties are specified.