langur

semi-integrated regex

There are currently 2 forms of regex literals, both for re2.

While re2 does not have backreferences and some other features common to regexes, it guarantees to run in a time directly correlated with the size of the input. Common regexes do not do that.

re/pattern/ re2 literal with interpretation of langur escape codes (first)
RE/pattern/ re2 literal without interpretation of langur escape codes

As with string literals, the lowercase forms, such as re// interpret langur escape codes and the uppercase forms, such as RE//, do not.

Valid quote mark pairs are the same as for strings. Also the same are interpolation and langur escape codes.

Regex literals can be passed to regex functions, such as match() and replace().

re2 modifiers

Re2 modifiers are s, m, i, and U.

Langur 0.9.3 adds the x modifier for free-spacing mode. This is not a normal part of re2 regex so far. In free-spacing mode, you can also use line comments, using a # mark.

Modifiers are separated from the re or RE token, and from each other, with a colon.

re:i/abc/ # case insensitive search for "abc"

match(re:s:m:U:i/a.c.*$/, "a\nC\n") == "a\nC"

You can use negated modifiers on a regex literal such as re:-i/.../. While case sensitive is the default, this could be useful if a regex is interpolated into another regex (keeps the interpolated section case sensitive, even if the outer section is not).

other modifiers

You can use an any modifier to allow any code point in a string or regex literal.

As of 0.9.3, an esc modifier may be used on an re2 literal to automatically escape all interpolation strings (instead of specifying it on each interpolated value).

regex block quotes

Using a block modifier after an re or RE token, separated by a colon, allows you to use block quotes with a specific marker. A block modifier is always the last modifier on a literal.

Note that this does not make a regex pattern "free-spacing." Use the free-spacing modifier for that.

The ending marker must be on a line of its own, with no trailing spaces. As of 0.9.3, leading spaces and tabs are allowed.

val .re = re:block BLOCK_STRING some multi-line regex ... BLOCK_STRING

The line return after the opening marker and the line return before the ending marker are not part of the regex pattern.

escape code resolution

The escape codes of langur will not always match the escape codes in a variety of regex. For example, the \P code represents a Unicode paragraph separator to langur, but a negated property class to re2 and other regexes. This is not a conflict, since they are not interpreted together.

Using the lowercase forms, langur escape codes will be interpreted before a pattern is passed to the regex compiler, so that re/\\P{Lu}/ and RE/\P{Lu}/ will produce the same regex.

escaping metacharacters

Using an esc modifier on an interpolation, such as $re/\{.x:esc}/ indicates that you want to escape metacharacters.

You can also use the reEsc() function for re2.

regex functions

The regex matching functions (match(), submatch(), replace(), etc.) understand all regex types available in langur.

For a value to test, these functions accept anything and convert it to a string if necessary (auto-stringification).

in given expressions

A regex in place of a variable or condition with no explicit operators in a given expression is used to test a value against the regex.

Non-string values are converted to strings (auto-stringification) before being compared against the regex.

given .x, .y { case re/abc+/: ... # both match re2 regex pattern abc+ case _, re/zzz/: ... # .y matches re2 regex pattern zzz }

given re/a+/ { case "abcd": ... # "abcd" matches re2 pattern re/a+/ case re/zzz/: ... # re/a+/ same regex as re/zzz/, which it isn't }