20. Debugging Queries

In a previous notebook, we showed how to write queries against a relational database system. We mentioned that some operations are pushed down to the source system. But how can we be sure?

This notebook shows how to debug queries in RAW.

This requires us to use the Python API directly.

[1]:
from rawapi import new_raw_client

cli = new_raw_client()

Let’s redo the query that reads data from a PostgreSQL database in the Python API.

We need to use the query_as method. The default query method returns directly Python objects and as we shall see, we need to obtain a stream.

[6]:
result = cli.query_as("""
SELECT * FROM READ_PGSQL("psql01", "public", "movies") WHERE startyear=1935 LIMIT 10
""", "csv")

The result object is an input stream of data that can be consumed.

However, it has an extra field, called query_id.

[10]:
result.query_id
[10]:
'b876bed3-ab38-427e-8632-f1319b13c26f-12'

This is a unique identifier for the query.

It can be used to obtain execution logs of that query.

[11]:
cli.get_query_logs(result.query_id)
[11]:
{'id': 'b876bed3-ab38-427e-8632-f1319b13c26f-12',
 'query': '\nSELECT * FROM READ_PGSQL("psql01", "public", "movies") WHERE startyear=1935 LIMIT 10\n',
 'createdAt': [2021, 1, 10, 16, 14, 1, 495000000],
 'state': None,
 'detail': 'output-format: csv',
 'entries': [{'level': 'INFO',
   'message': 'Finished writing output in 460 ms',
   'timestamp': [2021, 1, 10, 16, 14, 2, 343000000]},
  {'level': 'INFO',
   'message': 'Starting to write output...',
   'timestamp': [2021, 1, 10, 16, 14, 1, 880000000]},
  {'level': 'DEBUG',
   'message': 'Finished preparing query execution in 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 878000000]},
  {'level': 'DEBUG',
   'message': 'Preparing query execution',
   'timestamp': [2021, 1, 10, 16, 14, 1, 876000000]},
  {'level': 'DEBUG',
   'message': 'Finished query compilation in 369 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 875000000]},
  {'level': 'DEBUG',
   'message': 'Finished query compilation in 355 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 873000000]},
  {'level': 'TRACE',
   'message': 'Compiler timing for emit: 336 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 870000000]},
  {'level': 'TRACE',
   'message': 'Compiler timing for emit: 59 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 857000000]},
  {'level': 'TRACE',
   'message': 'Compiler timing for emit: 8 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 821000000]},
  {'level': 'DEBUG',
   'message': 'Query to be pushed down is:\n"""SELECT  "_$248"."tconst" AS "tconst","_$248"."startyear" AS "startyear","_$248"."primarytitle" AS "primarytitle","_$248"."originaltitle" AS "originaltitle","_$248"."primaryname" AS "primaryname" FROM """ + _$247 + """ AS "_$248" WHERE ("_$248"."startyear" = 1935)    LIMIT 10"""',
   'timestamp': [2021, 1, 10, 16, 14, 1, 813000000]},
  {'level': 'TRACE',
   'message': 'Compiler timing for transpile: 7 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 811000000]},
  {'level': 'TRACE',
   'message': 'Output program is:\npgsql [_$247 : table_ref(\n    table(\n        `tconst`: varchar null,\n        `startyear`: integer null,\n        `primarytitle`: varchar null,\n        `originaltitle`: varchar null,\n        `primaryname`: varchar null))] select(\n    integer("10"),\n    all,\n    [_$248 <- scan(_$247)],\n    [=(_$248.`startyear`, integer("1935"))],\n    [],\n    [],\n    [tconst : _$248.`tconst`,startyear : _$248.`startyear`,primarytitle : _$248.`primarytitle`,originaltitle : _$248.`originaltitle`,primaryname : _$248.`primaryname`],\n    [])',
   'timestamp': [2021, 1, 10, 16, 14, 1, 809000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for RDBMSOptimizer: 3 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 807000000]},
  {'level': 'DEBUG',
   'message': 'Compiler cache miss/expired: will compile.',
   'timestamp': [2021, 1, 10, 16, 14, 1, 801000000]},
  {'level': 'TRACE',
   'message': 'Compiler timing for transpile: 254 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 796000000]},
  {'level': 'TRACE',
   'message': 'Output program is:\nto_iterator(\n    iterator_to_operator(\n        (_$279) <- pmap(\n            parallelize_iterator(\n                get_cache(\n                    "cache:a2e9db36d566b8a7d9b9aff9cc2fc7c3f666433cd2031f9eed1af6fc88202950",\n                    cons(none(interval)),\n                    iterator_to_operator(\n                        (_$276) <- pspawn(\n                            operator_to_iterator(\n                                (_$276) <- write_cache(\n                                    to_operator(\n                                        (_$270) <- with_retry_strategy(\n                                            1,\n                                            some(\n                                                interval(none, none, none, none, none, none, none, 50)),\n                                            with_cache_strategy(\n                                                none(interval),\n                                                read_sql(\n                                                    get_database_connection(\n                                                        "pgsql:psql01"),\n                                                    ["\\"public\\".\\"movies\\""],\n                                                    "pgsql [_$247 : table_ref(\\n    table(\\n        `tconst`: varchar null,\\n        `startyear`: integer null,\\n        `primarytitle`: varchar null,\\n        `originaltitle`: varchar null,\\n        `primaryname`: varchar null))] limit(\\n    project(\\n        _$rdbms3 <- project(\\n            _$rdbms2 <- filter(\\n                _$rdbms1 <- project(\\n                    _$248 <- scan(_$247),\\n                    tconst : _$248.`tconst`,\\n                    startyear : _$248.`startyear`,\\n                    primarytitle : _$248.`primarytitle`,\\n                    originaltitle : _$248.`originaltitle`,\\n                    primaryname : _$248.`primaryname`),\\n                =(_$rdbms1.`startyear`, integer(\\"1935\\"))),\\n            tconst : _$rdbms2.`tconst`,\\n            startyear : _$rdbms2.`startyear`,\\n            primarytitle : _$rdbms2.`primarytitle`,\\n            originaltitle : _$rdbms2.`originaltitle`,\\n            primaryname : _$rdbms2.`primaryname`),\\n        tconst : _$rdbms3.`tconst`,\\n        startyear : _$rdbms3.`startyear`,\\n        primarytitle : _$rdbms3.`primarytitle`,\\n        originaltitle : _$rdbms3.`originaltitle`,\\n        primaryname : _$rdbms3.`primaryname`),\\n    integer(\\"10\\"))",\n                                                    record(\n                                                        tconst:option(\n                                                            string),\n                                                        startyear:option(\n                                                            int),\n                                                        primarytitle:option(\n                                                            string),\n                                                        originaltitle:option(\n                                                            string),\n                                                        primaryname:option(\n                                                            string))))))),\n                                [])),\n                        []))),\n            \\(_$277, _$278) -> operator_to_iterator(\n                (_$279) <- read_cache(\n                    iterator_to_operator((_$280) <- _$278, []),\n                    ,\n                    []),\n                [])),\n        []))',
   'timestamp': [2021, 1, 10, 16, 14, 1, 794000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L1PSimplifier: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 788000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L1ToL1P: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 786000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L1Simplifier2: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 784000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L1FilterPushdown: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 782000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L1ColumnPruning: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 781000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L1AutoAlgebraCache: 15 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 779000000]},
  {'level': 'TRACE',
   'message': 'Caching operator:\nto_operator(\n    (_$270) <- with_retry_strategy(\n        1,\n        some(interval(none, none, none, none, none, none, none, 50)),\n        with_cache_strategy(\n            none(interval),\n            read_sql(\n                get_database_connection("pgsql:psql01"),\n                ["\\"public\\".\\"movies\\""],\n                "pgsql [_$247 : table_ref(\\n    table(\\n        `tconst`: varchar null,\\n        `startyear`: integer null,\\n        `primarytitle`: varchar null,\\n        `originaltitle`: varchar null,\\n        `primaryname`: varchar null))] limit(\\n    project(\\n        _$rdbms3 <- project(\\n            _$rdbms2 <- filter(\\n                _$rdbms1 <- project(\\n                    _$248 <- scan(_$247),\\n                    tconst : _$248.`tconst`,\\n                    startyear : _$248.`startyear`,\\n                    primarytitle : _$248.`primarytitle`,\\n                    originaltitle : _$248.`originaltitle`,\\n                    primaryname : _$248.`primaryname`),\\n                =(_$rdbms1.`startyear`, integer(\\"1935\\"))),\\n            tconst : _$rdbms2.`tconst`,\\n            startyear : _$rdbms2.`startyear`,\\n            primarytitle : _$rdbms2.`primarytitle`,\\n            originaltitle : _$rdbms2.`originaltitle`,\\n            primaryname : _$rdbms2.`primaryname`),\\n        tconst : _$rdbms3.`tconst`,\\n        startyear : _$rdbms3.`startyear`,\\n        primarytitle : _$rdbms3.`primarytitle`,\\n        originaltitle : _$rdbms3.`originaltitle`,\\n        primaryname : _$rdbms3.`primaryname`),\\n    integer(\\"10\\"))",\n                record(\n                    tconst:option(string),\n                    startyear:option(int),\n                    primarytitle:option(string),\n                    originaltitle:option(string),\n                    primaryname:option(string))))))',
   'timestamp': [2021, 1, 10, 16, 14, 1, 776000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L1Simplifier: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 761000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L1Inliner: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 759000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L3ToL1: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 756000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for RuleBasedCompOptimizer: 1 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 754000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for CompQualifiersReorderer: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 751000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for TableReaderPruner: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 749000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L3Simplifier: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 746000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L3Inliner: 2 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 744000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for ElimUnusedLets: 3 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 739000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L4ToL3: 6 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 733000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for WhenDesugarer: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 725000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for SparkSqlPushdown: 4 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 724000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for RDBMSPushdown: 21 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 717000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for FunDefsToBinds: 5 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 693000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for AnonFromsDesugarer3: 4 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 686000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for AggSimplifier: 4 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 680000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for SelectDesugarer: 3 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 674000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for AnonColumnsDesugarer2: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 669000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for AnonFromsDesugarer2: 4 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 667000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L4Inliner: 5 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 661000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L4Globals: 5 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 653000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for ReorderDecls: 4 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 646000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for SelectProjectDesugarer: 3 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 640000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for SplitSelectProjectJoinFilter: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 635000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for StarDesugarer: 8 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 633000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for TableDesugarer: 3 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 623000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for AnonFromsDesugarer1: 4 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 617000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L4AggDesugarer: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 612000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for AnonColumnsDesugarer1: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 610000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for FixNonCollectionFroms: 3 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 607000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for L4Simplifier: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 602000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for TypeAliasesDesugarer: 2 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 600000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for ReadersDesugarer: 4 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 597000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for ResolveOverloads2: 4 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 591000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for ResolvePreImports: 34 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 582000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for ResolveImports: 1 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 546000000]},
  {'level': 'TRACE',
   'message': 'Phase timing for ResolveOverloads: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 543000000]},
  {'level': 'DEBUG',
   'message': 'Compiler cache miss/expired: will compile.',
   'timestamp': [2021, 1, 10, 16, 14, 1, 539000000]},
  {'level': 'TRACE',
   'message': 'Compiler timing for transpile: 0 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 531000000]},
  {'level': 'DEBUG',
   'message': 'Compiler cache miss/expired: will compile.',
   'timestamp': [2021, 1, 10, 16, 14, 1, 529000000]},
  {'level': 'DEBUG',
   'message': 'Starting query compilation',
   'timestamp': [2021, 1, 10, 16, 14, 1, 516000000]},
  {'level': 'TRACE',
   'message': 'Compiler timing for parseAndValidate: 8 ms',
   'timestamp': [2021, 1, 10, 16, 14, 1, 514000000]},
  {'level': 'DEBUG',
   'message': 'Starting query compilation',
   'timestamp': [2021, 1, 10, 16, 14, 1, 504000000]}]}

Let’s search for a specific message:

[14]:
for entry in cli.get_query_logs(result.query_id)['entries']:
    if 'Query to be pushed down' in entry['message']: print(entry['message'])
Query to be pushed down is:
"""SELECT  "_$248"."tconst" AS "tconst","_$248"."startyear" AS "startyear","_$248"."primarytitle" AS "primarytitle","_$248"."originaltitle" AS "originaltitle","_$248"."primaryname" AS "primaryname" FROM """ + _$247 + """ AS "_$248" WHERE ("_$248"."startyear" = 1935)    LIMIT 10"""

Here we can see the exact query that was sent to the PostgreSQL server, confirming that the filter was indeed pushed down.

The query log is also available in the Web Admin UI:

Query%20Log.png
[ ]: