Regular Expressions

Construction

Example

Remarks

r”<escaped regex syntax>”

r”\w+”

Match a word. Note the \\ to escape the \.

r”””<unescaped regex syntax>”””

r”””\w+”””

Match a word. No need to escape \ since inside triple quotes.

Syntax Regular expression constructs are:

Description

\w

Word

\W

Not a word.

\s

Space.

\S

Not a space.

\d

Digit.

\D

Not a digit.

\t

Tab.

.

Match any character.

<regex>*

Match any number of times.

<regex>?

Match or not.

<regex>+

Match any number of times but at least once.

<regex>{<n>}

Repeat n times.

^

Begin of line.

$

End of line.

[<x>-<y>]

Interval from x to y.

<regex> | <regex>

Or.

TRY_PARSE

Parses a string with a regular expression. Returns either a single element or a record of strings with the number of groups in the regular expression. If the regular expression fails to match, returns null.

Syntax:

PARSE(<string>, <regex>)

For example:

TRY_PARSE("ab cd", r"""(\w+).*""") // "ab"
TRY_PARSE("ab cd", r"""(\w+)\s+(\w+).*""") // ("ab", "cd"), equivalent to (_1: "ab", _2: "cd")

PARSE

Similar to TRY_PARSE but fail with a runtime error if the regular expression fails to match.

Syntax:

PARSE(<string>, <regex>)

PARSE AS

PARSE AS parsers a collection of strings.

Syntax:

Description

<strings> PARSE AS <regex>

Save as NULL ON FAIL.

<strings> PARSE AS <regex> NULL ON FAIL

Sets the value to null for those strings that fail to match the regular expression.

<strings> PARSE AS <regex> SKIP ON FAIL

Skips those strings that fail to match the regular expression.

<strings> PARSE AS <regex> ERROR ON FAIL

Fails with runtime error if any string fails to match the regular expression.

<strings> PARSE AS <regex> <exp> ON FAIL

Replaces the value by <exp> for those strings that fail to match the regular expression.

For example:

["one word", "two words"] PARSE AS r"""(\w+).*""" \\ [("one"), ("two")]

Note

PARSE AS and INTO are often used together as in:

SELECT *
FROM text_file PARSE AS r"""(\w+)\s+(\d+)"""
INTO (first: _1, second: CAST(_2 AS int))

This parses a text_file into an unnamed record with two elements, which is then converted into a named record with field second of type int.