Regular Expression Extended Mode and Comments for ECMAScript

Status

Stage: 1
Champion: Ron Buckton (@rbuckton)

For detailed status of this proposal see TODO, below.

Authors

Ron Buckton (@rbuckton)

Motivations

The RegExp Extended mode is a feature commonly supported amongst multiple regular expression engines that makes it possible to write regular expressions that are easier to read and understand through the introduction of insignificant white space and comments.

Prior Art

Perl: Comments, Line Comments
PCRE: Comments, Line Comments
Boost.Regex: Comments
.NET: Comments, Line Comments
Oniguruma: Comments
Hyperscan: Comments
ICU: Comments, Line Comments
Glib/GRegex: Comments, Line Comments

See https://rbuckton.github.io/regexp-features/features/comments.html and https://rbuckton.github.io/regexp-features/features/line-comments.html for additional information.

Syntax

Flags

Extended mode (`x`)

Prior Art: Perl, PCRE, Boost.Regex, .NET, Oniguruma, Hyperscan, ICU, Glib/GRegex (feature comparison)

The extended mode (x) flag treats unescaped whitespace characters as insignificant, allowing for multi-line regular expressions. It also enables Line Comments.

NOTE: The x-mode flag can be used inside of a Modifier

NOTE: While the x-mode flag can be used in a RegularExpressionLiteral, it does not permit the use of LineTerminator in RegularExpressonLiteral. For multi-line regular expressions you would need to use the RegExp constructor.

NOTE: Perl's original x-mode treated whitespace as insignificant anywhere within a pattern except for within character classes. Perl v5.26 introduced the xx flag which also ignores non-escaped SPACE and TAB characters. Should we chose to adopt the x-mode flag, we could opt to treat it as Perl's xx mode at the outset.

Inline Comments

Prior Art: Perl, PCRE, Boost.Regex, .NET, Oniguruma, Hyperscan, ICU, Glib/GRegex (feature comparison)

An inline comment is a sequence of characters that is ignored by pattern matching and can be used to document a pattern.

(?#comment) — The entire expression is removed from the pattern.
- The text of comment may not contain other ( or ) characters (instead, they must be escaped).
- When parsing a RegularExpressionLiteral, the text of comment also may not contain / (unless it is escaped).
  - NOTE: It may be necessary to escape [ and ] as well, unless we are able to change the definition for RegularExpressionBody in the specification so as to permit an unbalanced pair of [ and ] within comment. See Issue #1.

NOTE: This has no conflicts with existing syntax, as ECMAScript currently produces an error for this syntax in both u and non-u modes.

Line Comments

Prior Art: Perl, PCRE, .NET, ICU, Glib/GRegex (feature comparison)

A Line Comment is a sequence of characters starting with # and ending with \n (or the end of the pattern) that is ignored by pattern matching and can be used to document a pattern.

# comment — A line comment in a multi-line RegExp

NOTE: Requires the x-mode flag.

NOTE: Inside of x-mode, the # character must be escaped (using \#) outside of a character class.

NOTE: Not supported in x mode in a Regular Expression Literal

Examples

Insignificant Whitespace

const re = /(foo) (bar) (baz)/x;
re.test("foobarbaz"); // true

Comments

const re = /foo(?#comment)bar/;
re.test("foobar"); // true

Line Comments

const re = new RegExp(String.raw`
    # match ASCII alpha-numerics
    [a-zA-Z0-9]
`, "x");

API

Flags

Extended mode (`x`)

API

RegExp.prototype.extended (Boolean) — Indicates the x-mode flag is set.

History

October 27th, 2021 — Proposed for Stage 1 (slides)
- Outcome: Advanced to Stage 1

TODO

The following is a high-level list of tasks to progress through each stage of the TC39 proposal process:

Stage 1 Entrance Criteria

Identified a "champion" who will advance the addition.
Prose outlining the problem or need and the general shape of a solution.
Illustrative examples of usage.
High-level API.

Stage 2 Entrance Criteria

Initial specification text.
Transpiler support (Optional).

Stage 3 Entrance Criteria

Complete specification text.
Designated reviewers have signed off on the current spec text.
The ECMAScript editor has signed off on the current spec text.

Stage 4 Entrance Criteria

Test262 acceptance tests have been written for mainline usage scenarios and merged.
Two compatible implementations which pass the acceptance tests: [1], [2].
A pull request has been sent to tc39/ecma262 with the integrated spec text.
The ECMAScript editor has signed off on the pull request.

Regular Expression Extended Mode and Comments for ECMAScript

Status

Authors

Motivations

Prior Art

Syntax

Flags

Extended mode (x)

Inline Comments

Line Comments

Examples

Insignificant Whitespace

Comments

Line Comments

API

Flags

Extended mode (x)

API

History

TODO

Stage 1 Entrance Criteria

Stage 2 Entrance Criteria

Stage 3 Entrance Criteria

Stage 4 Entrance Criteria

Extended mode (`x`)

Extended mode (`x`)