Regular Expression Buffer Boundaries for ECMAScript

Status

Stage: 1
Champion: Ron Buckton (@rbuckton)

For detailed status of this proposal see TODO, below.

Authors

Motivations

NOTE: See https://github.com/rbuckton/proposal-regexp-features for an overview of how this proposal fits into other possible future features for Regular Expressions.

Buffer Boundaries are a common feature across a wide array of regular expression engines that allow you to match the start or end of the entire input regardless of whether the m (multiline) flag has been set. Buffer Boundaries also allow you to match the start/end of a line and the start/end of the input in a single RegExp using the m flag.

Prior Art

See https://rbuckton.github.io/regexp-features/features/buffer-boundaries.html for additional information.

Syntax

Buffer boundaries are similar to the ^ and $ anchors, except that they are not affected by the m (multiline) flag:

  • \A — Matches the start of the input.
  • \z — Matches the end of the input.
  • \Z — A zero-width assertion consisting of an optional newline at the end of the buffer. Equivalent to (?=\R?\z).

NOTE: Requires the u or v flag, as \A, \z, and \Z are currently just escapes for A, z and Z without the u or v flag.

NOTE: Not supported inside of a character class.

For more information about the v flag, see https://github.com/tc39/proposal-regexp-set-notation.

For more information about the \R escape sequence, see https://github.com/rbuckton/proposal-regexp-r-escape.

Examples

// without buffer boundaries
const pattern = String.raw`^foo$`;
const re1 = new RegExp(pattern, "u");
re1.test("foo"); // true
re1.test("foo\nbar"); // false

const re2 = new RegExp(pattern, "um");
re1.test("foo"); // true
re1.test("foo\nbar"); // true

// with buffer boundaries
const pattern = String.raw`\Afoo\z`;
const re1 = new RegExp(pattern, "u");
re1.test("foo"); // true
re1.test("foo\nbar"); // false

const re2 = new RegExp(pattern, "um");
re1.test("foo"); // true
re1.test("foo\nbar"); // false

// mixing buffer boundaries and anchors
const re = /\Afoo|^bar$|baz\z/um;
re.test("foo");         // true
re.test("foo\n");       // true
re.test("\nfoo");       // false

re.test("bar");         // true
re.test("bar\n");       // true
re.test("\nbar");       // true

re.test("baz");         // true
re.test("baz\n");       // false
re.test("\nbaz");       // true

// trailing buffer boundary
const re = /end\Z/u;
re.test("end");         // true
re.test("end\n");       // true (optional newline)
re.test("end\n\n");     // false

History

  • October 28, 2021 — Proposed for Stage 1 (slides)
    • Outcome: Advanced to Stage 1

TODO

The following is a high-level list of tasks to progress through each stage of the TC39 proposal process:

Stage 1 Entrance Criteria

  • Identified a "champion" who will advance the addition.
  • Prose outlining the problem or need and the general shape of a solution.
  • Illustrative examples of usage.
  • High-level API.

Stage 2 Entrance Criteria

Stage 3 Entrance Criteria

Stage 4 Entrance Criteria

  • Test262 acceptance tests have been written for mainline usage scenarios and merged.
  • Two compatible implementations which pass the acceptance tests: [1], [2].
  • A pull request has been sent to tc39/ecma262 with the integrated spec text.
  • The ECMAScript editor has signed off on the pull request.