Characters and Strings

Construction

Type

Description

'a'

char

A single character.

"abc"

string

A single-line string.

"""abc"""

string

A multiline string.

char(<int>)

char

Takes an integer and returns the character with the corresponding Unicode code point.

string(<char>)

string

Constructs a string of length 1 with the single character.

Strings or characters can include the following special characters:

  • the escaped special characters \0 (null character), \ (backslash), \t (horizontal tab), \n (line feed), \r (carriage return), \” (double quote) and \’ (single quote)

  • an arbitrary Unicode scalar, written as \u{n}, where n is a 1-8 digit hexadecimal number with a value equal to a valid Unicode code point.

Attention

Character type is not currently implemented.

Concatenation: +

Concatenates two strings or a string with a value that can be casted to string.

Syntax:

<expression> + <expression>

For example:

"ab" + "cd" // "abcd"
"ab" + 'c' // "abc"
'a' + "bc" // "abc"
"ab" + 123 // "ab123"
"ab" + null // null
null + "ab" // null

CONCAT

Concatenates strings.

Similar to + except that null values are equivalent to an empty string.

Syntax:

CONCAT(<expression1>, <expression2> [..., <expression n>])

For example:

concat("Hello", " ", "World") // "Hello World"
concat("Hello", null, " ", "Jane") // "Hello Jane"
concat("Hello", null, null, null) // "Hello"

TRIM

Removes white space characters from both sides of a string.

Syntax:

trim(<string>)

For example:

trim("  123 ") // "123"
trim("123 ") // "123"
trim(null) // null

LTRIM

Removes white space characters from beginning, left side of a string.

Syntax:

ltrim(<string>)

For example:

ltrim("  123 ") // "123 "
ltrim("123 ") // "123 "
ltrim(null) // null

RTRIM

Removes white space characters from end, right side of a string.

Syntax:

rtrim(<string>)

For example:

rtrim("  123 ") // "  123"
rtrim("  123") // "  123"
rtrim(null) // null

REPLACE

Replace a substring or a regular expression pattern, by a new string.

Syntax:

replace(<string>, <regex string to replace>, <new string>)

For example:

replace("Hello John", "John", "Jane") // "Hello Jane"
replace("Hello John", "o", "+") // "Hell+ J+hn!"
replace("Hello John", "John", null) // null
replace("Hello John", null, "Jane") // null
replace(null, "A", "B") // null

REVERSE

Reverse a string.

Syntax:

reverse(<string>)

For example:

reverse("1234") // "4321"
reverse(null) // null

REPLICATE

Copy a string or character N times.

Syntax:

replicate(<string>, <int>)

For example:

replicate('x', 4) // "xxxx"
replicate("abc,", 2) // "abc,abc,"

UPPER

Convert a string to uppercase.

Syntax:

upper(<string>)

For example:

upper("abC") // "ABC"
upper(null) // null

LOWER

Convert a string to lowercase.

Syntax:

lower(<string>)

For example:

lower("abC") // "abc"
lower(null) // null

SPLIT

Split a string into a collection of strings given a character or a string as separator.

Syntax:

split(<string>, <string or char separator>)

For example:

split("Value1||Value2", "||") // ["Value1","Value2"]
split("Value1|Value2", '|') // ["Value1","Value2"]
split("Value", null) // null
split(null, "1") // null

LENGTH

Also known as LEN.

Get the size of a string.

Syntax:

len(<string>)
length(<string>)

For example:

length("Hello John") // 10
len("Hello") // Same as length("Hello")
len(null) // null

SUBSTRING

Also known as SUBSTR.

Extract a substring from a string, given the start index and length of extraction.

Index starts at 1. A negative number as the length means the remainder of the string.

Syntax:

substr(<string>, <int index>, <int size>)
substring(<string>, <int index>, <int size>)

For example:

substring("Hello John", 7, 4) // "John"
substring("Hello John", 7, -1) // "John"
substr("Hello John", 7, 4) // Same as substring("Hello John", 7, 4)
substr(null, 10, 10) // null

LIKE

Compares a string with a SQL-style pattern or to a regular expression.

A SQL-like pattern is a string where % represents a wildcard.

Syntax:

<string> LIKE <string or regex>

For example:

"Compare two strings" LIKE "%two%" // true
"Compare two strings" NOT LIKE "%two%" // false
"Compare two strings" LIKE r"""\w+\s+.*""" // true

MKSTRING

Merges a collection of strings into a single string, with a separator. Null strings are skipped. Null collections lead to a null result.

Syntax:

MKSTRING(<collection of strings>, <separator string>)

For example:

MKSTRING(["string1", "string2"], " - ") // "string1 - string2"
MKSTRING(["string1"], " - ")            // "string1"
MKSTRING(["12", null, "34"], ";")       //  "12;34"

MKSTRING can be applied any expression, for example a SELECT statement evaluating to a collection of strings.

MKSTRING(countries UNION ALL cities, ";")
MKSTRING((SELECT name FROM people), ";")

Attention

Keep in mind the possibility of a parser ambiguity when passing a complex expression as a parameter to MKSTRING: in the second example above, if not having put parenthesis around the inner SELECT, the separator parameter of MKSTRING would have been parsed as the second generator expression of the SELECT.

COUNT_SUBSTRING

Count the number of occurrences of a substring in a string.

Syntax:

COUNT_SUBSTRING(<string>, <substring>)
COUNT_SUBSTR(<string>, <substring>)

For example:

COUNT_SUBSTRING("aXbX", "X")    // 2
COUNT_SUBSTRING("aXbX", "y")    // 0
COUNT_SUBSTRING("aXbX", "aX")   // 1

LEVENSHTEIN_DISTANCE

Calculates the Levenshtein distance between two strings.

The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

Syntax:

LEVENSHTEIN_DISTANCE(<string>, <string>)

For example:

ORD

ORD takes a character and returns the corresponding Unicode code point value.

Syntax:

ord(<char>)

For example:

ord('a') // 65

Index and Range operations

Index and range operations are used to extract characters of strings from strings.

Index operations have a variant with suffix ? that returns null instead of a runtime error for out-of-bounds accesses.

Syntax:

<string>[<int index>]
<string>[<start index>:<end index>]
<string>[<int index>]?

For example:

"abcd"[1] // 'b'
"abcd"[1:3] // "bc"
"abcd"[-1] // 'd'
"abcd"[5] // Runtime error!
"abcd"[5]? // null

Attention

Index and range operations not currently implemented.

Comparisons

Strings

Comparison binary operators (e.g. =, <>, >, <, …), do a lexicographical string comparison.

For example:

"A" > "a" // true
"aab" > "aaa" // true
"a" > null // null

Characters

Character type supports the comparison binary operators, which compares the underlying Unicode code point value:

'b' > 'a' // true
null > 'a' // null

Casts

Strings can be casted to:

  • number types. In the case of decimal this can be used to convert a user constant (string) into a lossless number.

  • temporal types.

Characters can be casted to strings, which creates a string with a single character.

Attention

Casting strings to temporal types is currently not implemented.