Joins
Learn how to join data.
A collection can be joined with another collection using Collection.Join
or Collection.EquiJoin
.
A list can be joined with another list using List.Join
or List.EquiJoin
.
Join vs EquiJoin
An EquiJoin is a special type of join where the matching condition must be an equality, i.e. when the two lists or collections being joined should be matched using an equality. EquiJoins offer better performance and should be used when possible.
Example
For this example, we use two datasets: a CSV file with information on airports, and a JSON file with information on trips.
The CSV file with the dataset on airports looks like:
"AirportID","Name","City","Country","IATA_FAA","ICAO","Latitude","Longitude","Altitude","Timezone","DST","TZ"
1,"Goroka","Goroka","Papua New Guinea","GKA","AYGA",-6.081689,145.391881,5282,10,"U","Pacific/Port_Moresby"
2,"Madang","Madang","Papua New Guinea","MAG","AYMD",-5.207083,145.7887,20,10,"U","Pacific/Port_Moresby"
3,"Mount Hagen","Mount Hagen","Papua New Guinea","HGU","AYMH",-5.826789,144.295861,5388,10,"U","Pacific/Port_Moresby"
4,"Nadzab","Nadzab","Papua New Guinea","LAE","AYNZ",-6.569828,146.726242,239,10,"U","Pacific/Port_Moresby"
...
The JSON file with a information on trips looks like:
[
{ "reason": "Holidays", "origin": "GVA", "destination": "KEF", "dates": ... }
{ "reason": "Work", "origin": "GVA", "destination": "SFO", "dates": ... }
{ "reason": "Holidays", "origin": "GVA", "destination": "LCY", "dates": ... }
]
We want to join both collections using the IATA_FAA
field of the CSV file with the destination
field of the JSON file.
Since we want an equality, we will use an equijoin.
The Collection.EquiJoin
function receives both collections to join, then a function per collection to obtain the join key, as shown:
let trips = Json.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/trips.json"),
airports = Csv.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/airports.csv")
in Collection.EquiJoin(trips, airports, trip -> trip.destination, airport -> airport.IATA_FAA)
We can also express the same join condition but using Collection.Join
instead.
The Collection.Join
function receives both collections to join, then a function with elements from both collection that defines the join condition, as shown:
let airports = Csv.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/airports.csv"),
trips = Json.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/trips.json")
in Collection.Join(trips, airports, (trip, airport) -> trip.destination == airport.IATA_FAA)
If you want to try some examples using joins, you can deploy the following endpoints:
Joining two collections
- Overview
- Code
Sample usage:
/trips
main() =
let
trips = Json.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/trips.json"),
airports = Csv.InferAndRead("https://raw-tutorial.s3.eu-west-1.amazonaws.com/airports.csv")
in
Collection.EquiJoin(trips, airports, (trip) -> trip.destination, (airport) -> airport.IATA_FAA)
// The following test will run if you press the [Run Code] button directly.
main()
Join on-prem with on-cloud databases.
- Overview
- Code
Performs an equality join between two tables in two different servers:
- A table 'trips' in a PostgreSQL database, hostname "example-psql.raw-labs.com".
- A table 'airports' in a MySQL database, hostname "example-mysql.raw-labs.com".
The table 'trips' looks like:
trip_id | reason | origin | destination | departure_date | arrival_date |
---|---|---|---|---|---|
0 | Holidays | GVA | KEF | 2016-02-27 | 2016-03-06 |
1 | Work | GVA | SFO | 2016-04-10 | 2016-04-17 |
2 | Holidays | GVA | LCY | 2016-05-22 | 2016-05-29 |
The table 'airports' looks like:
airport_id | name | city | country | iata_faa | icao | latitude | longitude | altitude | timezone | dst | tz |
---|---|---|---|---|---|---|---|---|---|---|---|
18 | Reykjavik | Reykjavik | Iceland | RKV | BIRK | 64.130000 | -21.940556 | 48.000 | 0 | N | Atlantic/Reykjavik |
19 | Siglufjordur | Siglufjordur | Iceland | SIJ | BISI | 66.133333 | -18.916667 | 10.000 | 0 | N | Atlantic/Reykjavik |
20 | Vestmannaeyjar | Vestmannaeyjar | Iceland | VEY | BIVM | 63.424303 | -20.278875 | 326.000 | 0 | N | Atlantic/Reykjavik |
21 | Sault Ste Marie | Sault Sainte Marie | Canada | YAM | CYAM | 46.485001 | -84.509445 | 630.000 | -5 | A | America/Toronto |
22 | Winnipeg St Andrews | Winnipeg | Canada | YAV | CYAV | 50.056389 | -97.032500 | 760.000 | -6 | A | America/Winnipeg |
The fields origin
and destination
in table trips
correspond to the 3-letter IATA code stored in table airports
, field iata_faa
.
So in this example we are creating an equality join on trips.destination
and airports.iata_faa
.
Sample usage:
/trips
main() =
let
trips = PostgreSQL.InferAndRead(
"raw",
"example",
"trips",
host = "example-psql.raw-labs.com",
username = "**************",
password = "**************"
),
airports = MySQL.InferAndRead(
"raw",
"airports",
host = "example-mysql.raw-labs.com",
username = "**************",
password = "**************"
)
in
Collection.EquiJoin(trips, airports, (trip) -> trip.destination, (airport) -> airport.iata_faa)
// The following test will run if you press the [Run Code] button directly.
main()