Squarespace has open-sourced its template compiler, written in Java and 20 times faster than the previous Node-based version. It is licensed under the Apache 2.0 License.
The language Squarespace uses to build templates is based on json-template, a minimal declarative template language for Python and JavaScript that was inspired by Google CTemplate.
History
In 2012, a new version of the Squarespace platform was completed and json-template was selected as the template language. The minimal, declarative syntax was attractive due to its simplicity and safety, compared to languages which had a richer set of control-flow instructions, local variables, etc.
Initially, all page templates were rendered using a version of the json-template language written in JavaScript and running in a Node process. However, this Node-based compiler presented some serious operational and maintenance challenges.
I joined Squarespace in late 2012, and in early 2013 began constructing a replacement template compiler in Java, the company's main development language. Given the design problems with the old compiler, a direct port was not feasible. The new compiler was designed and implemented from scratch, which enabled several improvements to be made.
Goals
The original goals for the project were:
- Implement the full json-template language and all plugins.
- Meet or exceed parsing and execution performance of the Node-based compiler.
- Implement a fast tokenizer with a separate instruction tree assembler, to enable other more advanced features, like syntax error recovery, validation mode, etc.
- Minimize avoidable string copies and other overhead.
- High test coverage.
- Syntax error recovery, which allows the compiler to continue past errors and report them later.
- Add an explicit plugin interface, for extending the compiler with new formatters and predicates.
- Add a validation mode to support displaying interactive syntax feedback in online editors and desktop IDEs.
- Create a command line wrapper for testing, experimentation, and build-phase rendering if needed. Improvements to the json-template syntax itself could not be introduced, since the new compiler had to maintain compatibility with all existing sites in production.
Internals
The syntax consists of curly brace–delimited instructions which reference variables in a separate context tree (more below). Some instructions are blocks which can contain others. A full description of the template syntax can be found in the Squarespace developer documentation.
<div id="logo" data-content-field="site-title"> {.section website} <h1 class="logo{.section logoImageUrl} image{.or} site-title{.end}"> <a href="/"> {.if logoImageUrl} <img src="{logoImageUrl}?format=750w" alt="{siteTitle}" /> {.or} {siteTitle} {.end} </a> </h1> <div class="logo-subtitle">{siteTagLine}</div> {.end} </div>
Parsing of the template is split into two separate components: a tokenizer and an assembler. Tokenization is performed in a single scan over the raw template string, locating all start “{“ and end “}” delimiters, and producing a stream of instruction tokens as output. A textual representation of this token list would look like this:
TEXT SECTION TEXT SECTION TEXT OR_PREDICATE TEXT END TEXT IF TEXT VARIABLE TEXT VARIABLE TEXT OR_PREDICATE TEXT VARIABLE TEXT END TEXT VARIABLE TEXT END TEXT EOF
This instruction stream is fed incrementally to a state machine which assembles instructions into a valid syntax tree. Validity is coded into the rules of the state machine. For example, a START
instruction must have a corresponding END
instruction.
A textual representation of the syntax tree is below, listing each instruction's type, its line and character offset, followed by some instruction-specific attributes.
TEXT {1,1} (len=48) "<div id=\"logo\" data-content-field=\"site- ..." SECTION {2,1} website TEXT {2,19} (len=17) "\n\t<h1 class=\"logo" SECTION {3,17} logoImageUrl TEXT {3,40} (len=6) " image" OR_PREDICATE {3,46} TEXT {3,51} (len=11) " site-title" END {3,62} TEXT {3,68} (len=20) "\">\n\t\t<a href=\"/\">\n\t\t" IF {5,3} logoImageUrl TEXT {5,21} (len=14) "\n\t\t\t<img src=\"" VARIABLE {6,14} logoImageUrl TEXT {6,28} (len=19) "?format=750w\" alt=\"" VARIABLE {6,47} siteTitle TEXT {6,58} (len=7) "\" />\n\t\t" OR_PREDICATE {7,3} TEXT {7,8} (len=4) "\n\t\t\t" VARIABLE {8,4} siteTitle TEXT {8,15} (len=3) "\n\t\t" END {9,3} TEXT {9,9} (len=43) "\n\t\t</a>\n\t</h1>\n\t<div class=\"logo-subtitl ..." VARIABLE {12,29} siteTagLine TEXT {12,42} (len=7) "</div>\n" END {13,1} TEXT {13,7} (len=8) "\n</div>\n"
Once a valid instruction tree has been assembled, the compiler can execute it using a given context tree. The context tree holds all of the data needed to populate the template, and its JSON representation would look like this:
{ "website": { "logoImageUrl": "/images/logo.png", "siteTitle": "Squarespace", "siteTagLine": "Set Your Website Apart" } }
Once the above template is parsed, assembled, and executed against the JSON context, the final output would look like this:
<div id="logo" data-content-field="site-title"> <h1 class="logo image"> <a href="/"> <img src="/images/logo.png?format=750w" alt="Squarespace" /> </a> </h1> <div class="logo-subtitle">Set Your Website Apart</div> </div>
The compiler also has a syntax error recovery mode. In this mode, any error in the tokenize and assembly phases is collected and the parse continues. The state machine will always produce a valid, executable syntax tree, and all errors that occurred during assembly can be reported later (errors can be appended to the rendered output in an HTML comment, for example).
For example, an additional END
instruction was added here to the end of the template:
<div class="logo-subtitle">{siteTagLine}</div> {.end} </div> {.end}
Tokenization would not see this as an error, but the assembler would catch it.
Below is the textual representation of the error produced by the above typo. If a template compilation encountered such an error in production we would append the error to the rendered output, inside an HTML comment.
<!-- SyntaxError MISMATCHED_END at line 15 character 1: Mismatched END found at ROOT. -->
Errors also have a JSON representation, with individual attributes including the line and column offset where the error occurred, the error's enum type, the message, etc. This information can be used by an interactive editor to place contextual messages near the location of the syntax error.
Performance
Performance of the template compiler is a hard requirement since a single page render may require hundreds of separate compilations. Often blocks or fragments of the page are backed by separately rendered templates. Given this, a tiny performance improvement is multiplied over thousands of compilations on millions of page views. Cutting response times by even this amount can produce a large increase in overall server page rendering capacity.
Key performance improvements came from minimizing memory allocations and implementing a fast tokenizer and assembler with minimal backtracking. Additional effort was put into hand-coding the tokenizer so it could remain fast while correctly handling the ambiguity of JavaScript and json-template both using a single curly brace as a delimiter.
A large percentage of the text in a template is copied directly to the output unaltered. The compiler uses a lightweight instruction that represents a "view" into the backing template, referencing the start and end positions of the string. When the time comes to execute this instruction, the characters are copied from the backing template directly to the output buffer, avoiding the intermediate string copy.
Benchmarks have shown the new compiler to be 20 times faster on average compared with the Node-based version. This resulted in a significant reduction in page response times.