Legacy RegExp features in JavaScript

Current status

ECMAScript proposal at stage 3 of the process, see https://github.com/tc39/proposals

Introduction

This is a specification draft for the legacy (deprecated) RegExp features in JavaScript, i.e., static properties of the constructor like RegExp.$1 as well as the RegExp.prototype.compile method.

This does not reflect what the implementations do, but what the editor thinks to be the least bad thing they ought to do in order to maintain web compatibility.

RegExp static properties (currently not part of ECMA 262,see tc39/ecma262#137) are specified such that:

  • The values returned by those properties are updated each time a successful match is done.
  • They may be deleted. (This is important for secured environments that want to avoid global side-effects.)

The proposal includes another feature that needs consensus and implementation experience before being specced:

  • RegExp legacy static properties as well as RegExp.prototype.compile are disabled for instances of proper subclasses of RegExp as well as for cross-realm regexps. See the detailed motivation here.

We have attempted to identify potential risks induced by the the backward-compatibility break introduced by that feature.

See also the differences between this spec and the current implementations.


The amendments are relative to the last ECMAScript specification draft found at: https://tc39.github.io/ecma262/ Changes relative to existing algorithms are marked in bold.

All the amendments are part of Annex B, including those that modify objects or algorithm defined in other parts of the spec.

%RegExp%

The %RegExp% instrinsic object, which is the builtin RegExp constructor, has the following additional internal slots:

  • [[RegExpInput]]
  • [[RegExpLastMatch]]
  • [[RegExpLastParen]]
  • [[RegExpLeftContext]]
  • [[RegExpRightContext]]
  • [[RegExpParen1]]
  • [[RegExpParen2]]
  • [[RegExpParen3]]
  • [[RegExpParen4]]
  • [[RegExpParen5]]
  • [[RegExpParen6]]
  • [[RegExpParen7]]
  • [[RegExpParen8]]
  • [[RegExpParen9]]

The initial value of all these internal slots is the empty String.

RegExpAlloc ( newTarget )

RegExp instances have an additional slot which optionally keeps a reference to its constructor. It is used for deciding whether a nonstandard legacy feature is enabled for that regexp. The RegExpAlloc abstract operation is modified as follows:

  1. Let obj be ? OrdinaryCreateFromConstructor(newTarget, "%RegExpPrototype%", «[[RegExpMatcher]], [[OriginalSource]], [[OriginalFlags]], [[Realm]], **[[LegacyFeaturesEnabled]]**»).
  2. Let thisRealm be the current Realm Record.
  3. Set the value of _obj_’s [[Realm]] internal slot to thisRealm.
  4. If SameValue(newTarget, thisRealm.[[Intrinsics]].[[%RegExp%]]) is true, then
    1. Set the value of _obj_’s [[LegacyFeaturesEnabled]] internal slot to true.
  5. Else,
    1. Set the value of _obj_’s [[LegacyFeaturesEnabled]] internal slot to false.
  6. Perform ! DefinePropertyOrThrow(obj, "lastIndex", PropertyDescriptor {[[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false}).
  7. Return obj.

RegExpBuiltInExec ( R, S )

In the RegExpBuiltInExec abstract operation, a hook is added for updating the static properties of %RegExp% after a successful match. The last three steps of the algorithm are modified as follows:

  1. ...
  2. (current step 23) Perform ! CreateDataProperty(A, "0", matchedSubstr).
  3. Let capturedValues be an new empty List.
  4. (current step 24) For each integer i such that i > 0 and in
    1. ...
    2. (current step 24.e) Perform ! CreateDataProperty(A, ToString(i) , capturedValue).
    3. Append capturedValue to the end of capturedValues.
  5. Let thisRealm be the current Realm Record.
  6. Let rRealm be the value of _R_’s [[Realm]] internal slot.
  7. If SameValue(thisRealm, rRealm) is true, then
    1. If the value of _R_’s [[LegacyFeaturesEnabled]] internal slot is true, then
      1. Perform UpdateLegacyRegExpStaticProperties(%RegExp%, S, lastIndex, e, capturedValues).
    2. Else,
      1. Perform InvalidateLegacyRegExpStaticProperties(%RegExp%).
  8. (current step 25) Return A.

UpdateLegacyRegExpStaticProperties ( C, S, startIndex, endIndex, capturedValues )

The abstract operation UpdateLegacyRegExpStaticProperties updates the values of the static properties of %RegExp% after a successful match.

  1. Assert: C is an Object that has a [[RegExpInput]] internal slot.
  2. Assert: Type(S) is String.
  3. Let len be the number of code units in S.
  4. Assert: startIndex and endIndex are integers such that 0 ≤ startIndexendIndexlen.
  5. Assert: capturedValues is a List of Strings.
  6. Let n be the number of elements in capturedValues.
  7. Set the value of _C_’s [[RegExpInput]] internal slot to S.
  8. Set the value of _C_’s [[RegExpLastMatch]] internal slot to a String whose length is endIndex - startIndex and containing the code units from S with indices startIndex through endIndex - 1, in ascending order.
  9. If n > 0, set the value of _C_’s [[RegExpLastParen]] internal slot to the last element of capturedValues.
  10. Else, set the value of _C_’s [[RegExpLastParen]] internal slot to the empty String.
  11. Set the value of _C_’s [[RegExpLeftContext]] internal slot to a String whose length is startIndex and containing the code units from S with indices 0 through startIndex - 1, in ascending order.
  12. Set the value of _C_’s [[RegExpRightContext]] internal slot to a String whose length is len - endIndex and containing the code units from S with indices endIndex through len - 1, in ascending order.
  13. For each integer i such that 1 ≤ i ≤ 9
    1. If in, set the value of _C_’s [[RegExpPareni]] internal slot to the ith element of capturedValues.
    2. Else, set the value of _C_’s [[RegExpPareni]] internal slot to the empty String.

InvalidateLegacyRegExpStaticProperties ( C)

The abstract operation InvalidateLegacyRegExpStaticProperties marks the values of the static properties of %RegExp% as non-available.

  1. Assert: C is an Object that has a [[RegExpInput]] internal slot.
  2. Set the value of the following internal slots of C to empty:
  • [[RegExpInput]]
  • [[RegExpLastMatch]]
  • [[RegExpLastParen]]
  • [[RegExpLeftContext]]
  • [[RegExpRightContext]]
  • [[RegExpParen1]]
  • [[RegExpParen2]]
  • [[RegExpParen3]]
  • [[RegExpParen4]]
  • [[RegExpParen5]]
  • [[RegExpParen6]]
  • [[RegExpParen7]]
  • [[RegExpParen8]]
  • [[RegExpParen9]]

Additional properties of the RegExp constructor

All the below properties are accessor properties who have the attributes { [[Enumerable]]: false, [[Configurable]]: true }. Moreover, for the properties whose setter is not explicitely defined, the [[Set]] attribute is set to undefined.

The accessors check for their this value, so that the properties do not appear to be inherited by subclasses.

Abstract operations

GetLegacyRegExpStaticProperty( C, thisValue, internalSlotName ).

The abstract operation GetLegacyRegExpStaticProperty is used when retrieving a value from a legacy RegExp static property.

  1. Assert C is an object that has an internal slot named internalSlotName.
  2. If SameValue(C, thisValue) is false, throw a TypeError exception.
  3. Let val be the value of the internal slot of C named internalSlotName.
  4. If val is empty, throw a TypeError exception.
  5. Return val.

SetLegacyRegExpStaticProperty( C, thisValue, internalSlotName, val ).

The abstract operation SetLegacyRegExpStaticProperty is used when assigning a value to a legacy RegExp static property.

  1. Assert C is an object that has an internal slot named internalSlotName.
  2. If SameValue(C, thisValue) is false, throw a TypeError exception.
  3. Let strVal be ? ToString(val).
  4. Set the value of the internal slot of C named internalSlotName to strVal.

RegExp.input

get RegExp.input

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpInput]]).

set RegExp.input = val

  1. Perform ? SetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpInput]], val).

RegExp.$_

get RegExp.$_

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpInput]]).

set RegExp.$_ = val

  1. Perform ? SetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpInput]], val).

get RegExp.lastMatch

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpLastMatch]]).

get RegExp.$&

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpLastMatch]]).

get RegExp.lastParen

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpLastParen]]).

get RegExp.$+

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpLastParen]]).

get RegExp.leftContext

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpLeftContext]]).

get RegExp.$`

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpLeftContext]]).

get RegExp.rightContext

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpRightContext]]).

get RegExp.$'

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpRightContext]]).

get RegExp.$1

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpParen1]]).

get RegExp.$2

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpParen2]]).

get RegExp.$3

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpParen3]]).

get RegExp.$4

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpParen4]]).

get RegExp.$5

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpParen5]]).

get RegExp.$6

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpParen6]]).

get RegExp.$7

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpParen7]]).

get RegExp.$8

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpParen8]]).

get RegExp.$9

  1. Return ? GetLegacyRegExpStaticProperty(%RegExp%, this value, [[RegExpParen9]]).

RegExp.prototype.compile ( pattern, flags )

The modification below will disable RegExp.prototype.compile for objects that are not direct instances of RegExp as well as in case of mismatch between realms.

  1. Let O be the this value.
  2. If Type(O) is not Object or Type(O) is Object and O does not have a [[RegExpMatcher]] internal slot, then
    1. Throw a TypeError exception.
  3. Let thisRealm be the current Realm Record.
  4. Let oRealm be the value of _O_’s [[Realm]] internal slot.
  5. If SameValue(thisRealm, oRealm) is false, throw a TypeError exception.
  6. If the value of _R_’s [[LegacyFeaturesEnabled]] internal slot is false, throw a TypeError exception.
  7. If Type(pattern) is Object and pattern has a [[RegExpMatcher]] internal slot, then
    1. If flags is not undefined, throw a TypeError exception.
    2. Let P be the value of _pattern_’s [[OriginalSource]] internal slot.
    3. Let F be the value of _pattern_’s [[OriginalFlags]] internal slot.
  8. Else,
    1. Let P be pattern.
    2. Let F be flags.
  9. Return ? RegExpInitialize(O, P, F).