String dedent

Champions: @jridgewell, @hemanth

Author: @mmkal

Status: draft

Original TC39 thread

Problem

The current syntax forces users to choose between strange-looking unindented template literals, template literals containing unwanted indentation, or relying on third-party dependencies to "fix" template margins. They also require escaping backticks, limiting the ability to put code samples inside templates for languages which include backticks (e.g. javascript/markdown/bash).

The current options:

Strange-looking code

class MyClass {
  print() {
    console.log(`create table student(
  id int primary key
  name text
)`)
  }
}

Sensible-looking code, with unwanted indents in output string

class MyClass {
  print() {
    console.log(`
      create table student(
        id int primary key
        name text
      )
    `)
  }
}

This logs:


      create table student(
        id int primary key
        name text
      )
    

Note the newlines at the start and the end of the string, and a margin on each line.

With a library

A commonly used library is dedent:

import dedent from 'dedent'

class MyClass {
  print() {
    console.log(dedent`
      create table student(
        id int primary key
        name text
      )
    `)
  }
}

Now both the code and the resulting output look about right, but we had to add a runtime dependency. This makes what could be pure data less portable. You can no longer copy-paste snippets without installing an npm package, possibly installing the types too, then importing and using it. This solution also cannot be used in ecmascript-compatible data formats like json5.

Similarly, jest-inline-snapshots automatically "dedent" literal snapshots at runtime. There is also a babel plugin which moves the "dedenting" to compile-time.

This presents a problem when template expression tags also need to be used:

const raw = dedent`
  query foo {
    bar
  }
`;

gql`${raw}`; // <- this doesn't work right.

To get around this, we have to rely on the tagged template being callable as a regular function:

gql(raw)

This doesn't allow for customized behaviour of expressions. For example, slonik protects against sql injection by automatically sanitizing expression parameters. To solve that, we could extended dedent to compose with other tags:

const query = dedent(sql)`
  select *
  from students
  where name = ${name}
`

But there are negative implications to this. Quote from the original tc39 thread:

[This approach] is not free (some runtime processing and WeakMap lookup), and will conflict with the "templateness" in proposals like https://github.com/tc39/proposal-array-is-template-object.

It'd be better this could be supported at the language level, to avoid:

  • all of the above complexity
  • the runtime implications of squashing so much parsing responsibility into dedent or similar libraries
  • inconsistencies between implementations of "dedenters" (jest vs dedent vs babel-plugin-dedent)
  • the need for coupling dedent to the implementation particulars of other tagged template literal functions
  • the need to teach formatters like prettier that they can safely adjust the margins of some templates, but not others (and even in these cases, the formatter would have knowledge about the context and implementation details of the various dedenting libraries):
    • dedent`...`
    • dedent(anything)`...`
    • .toMatchInlineSnapshot(`...`)

There are several other libraries which each have a very similar purpose (and each behave slightly differently). Some examples:

Proposed solution

Allow specifying triple-, quintuple, or septuple, or any-odd-number-uple backtick-delimited literals, which behave almost the same as a regular single backticked template literal, with a few key differences:

  • The string is automatically "dedented", along the lines of what the dedent library does. A simple strawman algorithm:
    • the first line (including the opening delimiter) is ignored
    • the last line (including the closing delimiter) is ignored if it contains only whitespace
    • the "margin" is calculated using the whitespace at the beginning of the first line after the opening delimiter
    • that margin is removed from the start of every line
  • The opening delimiter must be immediately followed by a newline or the closing delimiter
  • The closing delimiter should only contain whitespace between it and the previous newline
  • Backticks inside the string don't need to be escaped

The examples above would be solved like this:

class MyClass {
  print() {
    console.log(```
      create table student(
        id int primary key,
        name text
      )
    ```)
  }
}

This will output the template with margins stripped:

create table student(
  id int primary key,
  name text
)

Custom expressions would work without any special composition of tag template functions:

const query = sql```
  select *
  from studients
  where name = ${name}
```

We can also avoid the need for escaping backticks when they're needed inside the template:

const printBashCommand = () => {
  console.log(```
    ./some-bash-script.sh `ls`
  ```);
};

Using more backticks allows for triple-backticks inside templates without escaping:

const getMarkdown = () => {
  return `````
    # blah blah

    ```json
    { "foo": "bar" }
    ```

    some _more_ *markdown*
  `````;
};

The behavior when later lines lack the whitespace prefix of the first line, is not yet defined:

const tbd = ```
  The first line starts with two spaces
but a later line doesn't.
```

In other languages

  • PHP - <<< heredoc/nowdoc The indentation of the closing marker dictates the amount of whitespace to strip from each line.

Syntax Alternatives Considered

Some potential alternatives to the multi-backtick syntax:

  • A built-in runtime method along the lines of the dedent library, e.g. String.dedent:
  • Triple-backticks only (not five, or seven, or 2n+1):
    • Pros:
      • Simpler implementation and documentation
    • Cons:
      • Triple-backticks within templates would need to be escaped
  • Using another character for dedentable templates, e.g. |||
    • Pros:
      • Should be easy to select a character which would be a syntax error currently, so the risk even of very contrived breaking changes could go to near-zero
      • Could match existing languages with similar features, e.g. jsonnet
      • More intuitive difference from single-backticks
    • Cons:
      • Less intuitive similarities to single-backticks, wouldn't be as obvious that tagged template literals should work

Q&A

Is this backwards compatible?

This could be partially implemented with no syntax changes, since it's technically already valid syntax:

```abc```

Is equivalent to

((``)`abc`)``

Where the empty-strings are being used as tagged template functions. i.e. when run, this code will try to use the empty string

(``)

as an template tag, passing in 'abc', the return value of which is then used as another es string tag which receives the empty string. Obviously, none of that will currently work at runtime, because an empty string is not a function. So no functioning code should be affected by this change.

Some parsing changes would be needed to allow for unescaped backticks inside triple-backticked templates.

Why not use a library?

To summarise the problem section above:

  • avoid a dependency for the desired behaviour of the vast majority of multiline strings (dedent has millions of downloads per week)
  • make code snippets more portable
  • improved performance
  • better discoverability - the feature can be documented publicly, and used in code samples which wouldn't otherwise rely on a package like dedent, which is on major version 0 without an update in three years
  • establish a standard that can be adopted by JSON-superset implementations like json5
  • give code generators a way to output readable code with correct indentation properties (e.g. jest inline snapshots)
  • support "dedenting" tagged template literal functions with customized expression parameter behaviour (e.g. slonik)
  • allow formatters/linters to safely enforce code style without needing to be coupled to the runtime behaviour of multiple libraries in combination