Skip to main content

Joins

Learn how to join data.

A collection can be joined with another collection using Collection.Join or Collection.EquiJoin. A list can be joined with another list using List.Join or List.EquiJoin.

Join vs EquiJoin

An EquiJoin is a special type of join where the matching condition must be an equality, i.e. when the two lists or collections being joined should be matched using an equality. EquiJoins offer better performance and should be used when possible.

Example

For this example, we use two datasets: a CSV file with information on airports, and a JSON file with information on trips.

The CSV file with the dataset on airports looks like:

"AirportID","Name","City","Country","IATA_FAA","ICAO","Latitude","Longitude","Altitude","Timezone","DST","TZ"
1,"Goroka","Goroka","Papua New Guinea","GKA","AYGA",-6.081689,145.391881,5282,10,"U","Pacific/Port_Moresby"
2,"Madang","Madang","Papua New Guinea","MAG","AYMD",-5.207083,145.7887,20,10,"U","Pacific/Port_Moresby"
3,"Mount Hagen","Mount Hagen","Papua New Guinea","HGU","AYMH",-5.826789,144.295861,5388,10,"U","Pacific/Port_Moresby"
4,"Nadzab","Nadzab","Papua New Guinea","LAE","AYNZ",-6.569828,146.726242,239,10,"U","Pacific/Port_Moresby"
...

The JSON file with a information on trips looks like:

[
{ "reason": "Holidays", "origin": "GVA", "destination": "KEF", "dates": ... }
{ "reason": "Work", "origin": "GVA", "destination": "SFO", "dates": ... }
{ "reason": "Holidays", "origin": "GVA", "destination": "LCY", "dates": ... }
]

We want to join both collections using the IATA_FAA field of the CSV file with the destination field of the JSON file. Since we want an equality, we will use an equijoin. The Collection.EquiJoin function receives both collections to join, then a function per collection to obtain the join key, as shown:

let trips = Json.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/trips.json"),
airports = Csv.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/airports.csv")
in Collection.EquiJoin(trips, airports, trip -> trip.destination, airport -> airport.IATA_FAA)

We can also express the same join condition but using Collection.Join instead. The Collection.Join function receives both collections to join, then a function with elements from both collection that defines the join condition, as shown:

let airports = Csv.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/airports.csv"),
trips = Json.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/trips.json")
in Collection.Join(trips, airports, (trip, airport) -> trip.destination == airport.IATA_FAA)
info

If you want to try some examples using joins, you can deploy the following endpoints:

Joining two collections
Learn how to join data from two datasets.

Sample usage:

/trips
Join on-prem with on-cloud databases.
Learn how to join data from two different databases.

Performs an equality join between two tables in two different servers:

  • A table 'trips' in a PostgreSQL database, hostname "example-psql.raw-labs.com".
  • A table 'airports' in a MySQL database, hostname "example-mysql.raw-labs.com".

The table 'trips' looks like:

trip_idreasonorigindestinationdeparture_datearrival_date
0HolidaysGVAKEF2016-02-272016-03-06
1WorkGVASFO2016-04-102016-04-17
2HolidaysGVALCY2016-05-222016-05-29

The table 'airports' looks like:

airport_idnamecitycountryiata_faaicaolatitudelongitudealtitudetimezonedsttz
18ReykjavikReykjavikIcelandRKVBIRK64.130000-21.94055648.0000NAtlantic/Reykjavik
19SiglufjordurSiglufjordurIcelandSIJBISI66.133333-18.91666710.0000NAtlantic/Reykjavik
20VestmannaeyjarVestmannaeyjarIcelandVEYBIVM63.424303-20.278875326.0000NAtlantic/Reykjavik
21Sault Ste MarieSault Sainte MarieCanadaYAMCYAM46.485001-84.509445630.000-5AAmerica/Toronto
22Winnipeg St AndrewsWinnipegCanadaYAVCYAV50.056389-97.032500760.000-6AAmerica/Winnipeg

The fields origin and destination in table trips correspond to the 3-letter IATA code stored in table airports, field iata_faa. So in this example we are creating an equality join on trips.destination and airports.iata_faa.

Sample usage:

/trips