Strings are a leaky abstraction for HTML
One piece of feedback I often hear when people see Phlex for the first time is that it’s a “leaky abstraction”. “Why would I write some abstract Ruby DSL1 when I can just write HTML?” they say.
I’m not convinced these people really understand what “leaky abstraction” means, the advantages of abstractions in general, or the real risks of working with a leaky abstractions, so let’s talk about it.
Phlex is just an abstraction, not a leaky abstraction
Leaky abstractions are abstractions that don’t entirely encapsulate the thing they are abstracting over. So for example if your abstraction is fine up to a point, but then you have to break out via some awkward escape hatch, that would be a leaky abstraction.
But HTML is a simple language, it only really has three parts to it: tags, attributes and text. There are also comments, doctypes and (inside embedded SVGs) there are CDATA sections, but we can group these as special kinds of tags to simplify our model.
These three parts can be entirely described through Ruby constructs:
- tags are methods that take blocks for nested content;
- attributes are keyword arguments; and
- text is just a string (or other string-like primitive) 2.
Because Phlex perfectly encompasses the entirety of HTML, it is not a leaky abstraction — it’s just an abstraction.
ERB is also an abstraction
For everyone asking “why not just write HTML?” I would ask you how do you loop over a collection, embed dynamic data or write a conditional in HTML? You can’t because HTML isn’t a programming language — it’s designed to be a static build target not the source code.
In order to generate HTML from dynamic data, you need to use a language with at least loops, conditionals and interpolation.
ERB is an abstraction, but it is not an abstraction over HTML. ERB is an abstraction over string interpolation (plus limited HTML escaping). ERB makes no attempt to understand or model HTML. It only models:
- toggling between Ruby code and output strings; and
- toggling between raw output and escaped output.
Slim, Haml, etc. are all essentially the same — they just use worse syntax3 to represent the Ruby bits and the stringy bits.
Strings are a leaky abstraction for HTML
Perhaps “leaky” is too generous here. Strings are a shitty abstraction over HTML. Strings model literally nothing about HTML.
Strings don’t know about HTML elements, they don’t know which elements are void (self-closing) and which are not, they don’t know about attributes, they don’t know which contexts are safe or how to properly escape interpolated content in specific contexts.
It’s incredibly easy to mess up HTML when you write it in a string, sometimes with critical security consequences. You can misspell an element name, forget to close a tag, attribute value or comment, you can mess up a doctype or output unsafe user data that makes your application vulnerable to cross-site-scripting (XSS).
It’s very difficult to do any of these things in Phlex. If you misspell a tag, you’ll get a MethodMissing
error. If you don’t close a tag or a comment, you’ll get a syntax error. If you never pass a block to a standard element, it will insert a closing tag anyway. If you pass a block to a void element, it will raise.
And Phlex knows the precise HTML context at every step, so it can provide better safeguards. It restricts the use of unsafe HTML attributes — you have to explicitly mark your value as safe before you can pass it in. It also checks, for example, that your href
attribute doesn’t start with javascript:
preventing cross-site-scripting.
Because Phlex actually models HTML, it’s much more difficult to go wrong. And because the abstraction is implemented in a programming language (Ruby), you don’t need to use interpolation and you don’t need context switching to get loops and conditionals.
You can either use an abstraction over strings or an abstraction over HTML, but you’ll always need to use an abstraction, unless your application is entirely static.
Leaky abstractions can be fine anyway
This is besides the point, but I want to address the idea that leaky abstractions are always bad, because they’re not.
To be effective, we operate on the basis of simplified models. Models don’t need to be perfect, complete or even accurate. They just need to be useful.
To give an example, my mental model of how a car works is “I push this pedal to make it go faster, I push this pedal to make it go slower, etc.” I’m not thinking about how exactly the whole engine and braking system works when I’m driving.4
A mechanic will have to use a different model when servicing the car. My mental model is a leaky abstraction over cars, but that’s okay because it’s a model for driving.
A leaky abstraction can be useful and that’s fine. Leaky abstractions become a problem when they are not fit for purpose and you are constantly working around them.
To bring this back on point, Markdown is an example of a very leaky but also very useful abstraction over HTML. It does not even attempt to model all of HTML.
Markdown does fall back to HTML gracefully, since you can just write HTML tags directly in the Markdown. Though this escape hatch is rarely enabled in practice because having an abstraction over a specific limited set of HTML is useful.
Phlex’ abstraction over HTML is getting better
At present, Phlex’ model of HTML is quite simplistic. When it comes to HTML attributes there are a lot of very precise rules.
For example, besides actual boolean attributes where you specify the attribute without a value, there are four different types of enumerated string boolean attributes:
- Hatch —
"open"
/"closed"
- Toggle —
"on"
/"off"
- Affirmation —
"yes"
/"no"
- Enumerated Boolean —
"true"
/"false"
You must use the right one in the right place, depending on the element and attribute you’re passing it to.
Then there are dates and times, which need to be formatted in very particular ways and must also be real dates. There are attributes that must be one of a specific set of enumerated strings. There are rules that say ‘if this attribute on this element is this then this other attribute must be one of these’ or ‘this other attribute is required’.
Work is ongoing to build these rules into an extended validator for Phlex that will catch all of these problems and explain how to fix them with links to the HTML documentation on MDN.
Footnotes
-
I don’t know if Phlex is a DSL. It’s just an abstract class with instance methods on it. ↩
-
In this context, “string-like primitive” includes Integers, Floats, etc. which can be easily represented as text. ↩
-
It’s the significant whitespace and poor developer tooling that gets me. ↩
-
Thank you David Thomas for this analogy, which I heard recently in an episode of The Code with Jason Podcast. ↩