More typing with less typing, an introduction to Literal

With Ruby’s extensive meta-programming capabilities and its dynamic runtime, Rubyists can build in a few minutes the kinds of APIs that might take weeks or months in other languages.

But while this dynamism is Ruby’s greatest strength, it can lead to programs that are difficult to maintain — especially over extended periods of time with many programmers.

Looking at a given Ruby class, we know it has various properties by checking the initialiser and attribute accessors, but we don’t know what each of those properties are expected to be without tracking down all the call sites.

Unless we look at every single place where this class is used, we can’t know the limits of any property, because the class itself defines no limits. The class becomes coupled to each implementation that uses it rather than to an interface it declares and owns.

This can be managed to some extent with extensive API documentation, but this kind of documentation is extremely expensive to produce and maintain. It’s also liable to get out of date, at which point it is worse than no documentation.

But even the best, most extensive, up-to-date API documentation can’t enforce the documented rules or catch rule-breakers.

The result of all this is that Ruby applications — particularly older, larger Ruby applications tend to have quite high defect rates. In most places I’ve worked with Ruby, there was a “usual” number of runtime exceptions tolerated and ignored.

The most common exceptions I see are things like MethodMissing, which is typically a sign that the object calling the method didn’t know what it was calling it on. It thought it had a string but it actually had nil.

If you do hit a MethodMissing exception, consider yourself lucky. In one case at Shopify, we had a variable we expected to be a string turn out to be nil due to a combination of bugs in Rails and Puma. We had used this variable in a query to delete a specific user’s Zendesk support tickets.

query: "email: #{email_address}"

When you interpolate strings like this, Ruby calls to_s on the interpolated object (email_address). Unfortunately, nil is very happy to receive to_s and return an empty string "". The query became "email: ", which Zendesk interpreted to mean it should delete all support tickets.

Many thousands of support tickets were deleted before this code was stopped and luckily we were able to restore them from a backup.

Static typing in Ruby

One way to prevent this kind of bug from happening is to explicitly declare expected types. And over the years, there have been various approaches to static type checking in Ruby.

Sorbet lets you define method signatures in Ruby code and enforce them in a static type check step. Steep is another static type checker for Ruby, and it works with RBS type signatures, which are either defined in a separate file or in magic comments just above each method.

The problem with both these approaches is first that they are quite difficult to introduce. To check types, you need to know the types. And to know the types you need the rest of your code to check the types.

The utility of a static type checker is limited at best until all of your code and third party code is typed.

I just don’t see this happening in the Ruby ecosystem and part of the reason is the next problem: when Ruby plays to its strengths with dynamism and meta-programming, static typing goes out the window.

Even if you can keep your application code quite plain and static, that’s not going to be the case of all the libraries you depend on.

Introducing static type checking into a Ruby codebase takes a significant amount of time and effort.

Dynamic types in Ruby

There’s another approach that embraces Ruby’s dynamism and meta-programming with equally dynamic type checks at runtime. But what is a runtime type in Ruby? I thought Ruby didn’t have types?

I define a type as a description of a set of objects. What would that look like in Ruby? We’re looking for an interface that when given an object determines if that object would be in the described set.

I would argue that the case-equality (triple equals method) is exactly that interface by convention. And this would mean any object that implements ===(object) is a Ruby type.

Almost every object in Ruby implements this interface. Let’s look at a few examples:

Classes check that the object is an instance of the class.

String  === "hello" # => true
String  === 1       # => false

Integer === "hello" # => false
Integer === 1       # => true

Modules check that the object extends the module (or is an instance of a class that includes the module).

Enumerable === [1, 2, 3] # => true

Ranges check that the object is covered by the range.

(1..10) === 5   # => true
(1..10) === 1.1 # => true
(1..10) === 11  # => false

Regular expressions check that the object matches the pattern.

/\d+/ === "1" # => true
/\d+/ === "a" # => false

Procs alias === to call, which means procs can be used to define predicate types with arbitrary logic.

-> (obj) { obj.length > 5 } === "abc"    # => false
-> (obj) { obj.length > 5 } === "abcdef" # => true

Strings check that the object is a string with the same value.

"hello" === "hello" # => true
"hello" === "world" # => false

Most other objects check that the given object is the same instance as itself, but of course you can override this with your own implementation.

my_object = Object.new
my_object === my_object # => true

Ruby uses the ===(object) ‘type’ interface in case statements and pattern matching as well as methods such as Enumerable#all?(type) and Enumerable#any?(type) so this is already a well established pattern for checking a type in Ruby.

["a", "b", "c"].any?("c") # => true
["a", "b", "c"].any?("d") # => false

And we can take it further by modelling generic types as Ruby classes. Let’s build a generic array type where the ===(object) method checks that the object is an array and that each of its items match the generic type.

class ArrayType
  def initialize(type)
    @type = type
  end

  def ===(object)
    Array === object && object.all?(@type)
  end
end

With this class in place we can now generate a type for an array of type we like. Here we’ll make a type for an array of strings:

ArrayType.new(String) === ["a"]    # => true
ArrayType.new(String) === ["a", 1] # => false

Let’s take this one step further and define a constructor function for our generic type. I would usually want to use the function Array but this taken already so let’s put an underscore before it.

def _Array(t) = ArrayType.new(t)

Now we can build array types with less typing. 🥁

_Array(String) === ["a"] # => true

The Literal Ruby gem has a whole suite of types just like this.

Checking the types

Okay, so we can easily create types as Ruby objects — or indeed recognise our existing Ruby objects as the types they already are. But how do we check them? Literal provides a few different tools to do this.

First, let’s look at structured objects.

Structured Objects

Literal has two kinds of structured objects: Literal::Struct and Literal::Data. They are meant to stand in for Ruby’s Struct and Data objects in your application. Literal::Struct is mutable by default, while Literal::Data is immutable.

To define a structured data object, we can inherit from Literal::Data and use the prop macro. The first argument is the name of the property, the second is the type.

class User < Literal::Data
  prop :name, String
  prop :age, Integer
end

Literal generates an initialiser for this class that checks the types and assigns instance variables from keyword arguments. It also generates a reader method.

On Literal::Struct, it generates both a reader and writer method for each property by default.

Properties default to using keyword arguments, but you can pass a third argument to prop specifying the kind of argument as :positional, :*, :**, or :&.

class UserGroup < Literal::Data
  prop :name, String, :positional
  prop :users, _Array(User), :*
  prop :options, Hash, :**
  prop :validation, Proc, :&
end

The initialiser this UserGroup generates will be:

def initialize(name, *users, **options, &validation)
  # ...
end

All arguments are required by default. There are two ways you can make them optional. You can provide a default value:

prop :name, String, default: "Unknown"

If the default value isn’t frozen, you’ll need to pass it wrapped in a Proc, otherwise if you for example set a default empty array, each instance would use the same exact array which is probably not what you wanted.

prop :users, _Array(User), default: -> { [] }

The other option is to pass a type that can be nil. Literal will exercise the type calling ===(nil) to check if the type accepts nil. In Literal, the easiest way to make a nilable type is to use the type constructor _Nilable(type).

prop :name, _Nilable(String)

You’ll do well to avoid nilables whenever possible.

Regular classes

We’ve looked at structured objects in Literal, but these come with some built in assumptions that they are value objects — they’re comparable by value, they can be converted to a Hash, etc. What about regular classes?

The same prop macro can be adopted by regular Ruby classes by either inheriting from Literal::Object or extending Literal::Properties.

Literal::Object is literally a class that extends Literal::Properties for your convenience.

class Literal::Object
  extend Literal::Properties
end

We can use it like this:

class MyDomainObject < Literal::Object
  prop :name, String, reader: :public, writer: :public
end

Or if we need to inherit from something else, like this:

class Components::Base < Phlex::HTML
  extend Literal::Properties
end

class Components::Button < Components::Base
  prop :type, _Union(:primary, :secondary)
  prop :size, _Union(:small, :medium, :large)
end

We’ve used a new type here _Union which means “one of these types”. You can also think of _Nilable as returning a union of nil OR the type you passed in.

The _Union type can be thought of as going through each of the types passed in and calling === on each one until it finds a match.

In reality, there is a special optimisation that means it can check unions of primitives (such as the symbols above) much faster. It could check a value against a union of millions of primitives instantaneously.

Performance

Let’s talk about performance for a moment. You might be thinking this is great, but isn’t all this extra work going to slow my application down?

The answer is probably yes, but I bet you couldn’t even measure it in production. Literal’s types are highly optimised and have very little overhead. In fact, all the built-in types do zero allocations at check time. The only allocations happen at boot time when creating the type objects.

Literal types are also immutable. Any memory they do use should be shared between forked processes.

There’s a small chance using Literal could improve the performance of your application. It encourages more thoughtful API design. It produces consistent object shapes by defining instance variables in the same order. It also encourages patterns that could lead to better inline cache utilisation.

A Goldilocks solution

Earlier, I said that from any Ruby class, you could check the initialiser to see what properties it has. What was missing was any information about what those properties themselves are.

We’ve looked at what constitutes a type in Ruby — it’s an object that describes a set of objects via its === method. And we’ve looked at how you can use tools from Literal to generate initialisers and writer methods that enforce these types at runtime. What does that get us?

Less typing

To start with, it gets us less typing. We literally have to type less on the keyboard to define an initialiser that takes properties and assigns them to instance variables. If bugs are a function of code, less code already means fewer bugs.

The concept of a “property”

Where before we had four concepts: instance variable, initialiser, reader, writer — we grouped these into one concept: the property.

Local documentation

We can look at the top of any Ruby class using these tools and immediately see what properties that class has and what they are expected to be.

Enforced conventions owned by the class not its users

Each class now owns its interface — what it accepts and what it doesn’t accept. It can’t be used in any other way without raising an exception, which you will usually catch during development or testing.

Compatibility with existing code and meta-programming

Since these conventions are enforced at runtime, they deliver value immediately, not once you’ve annotated every file in your project. They also work with meta-programming.

Your existing tests are more useful

Let’s say you had an a test where all it did was hit a controller endpoint and load the view. If that view rendered a hundred components and each component had strict property constraints, you’ve immediately got a lot of coverage.

It’s not a great test, but you’ve significantly increased the odds that if something went wrong, your test would fail when Literal raised an exception.

Even your bad tests do more for you.

Failing fast

When there’s a bug, you want to fail as quickly as possible. If you fail late, you might charge that payment and then fail to deliver the product.

Even if your tests didn’t catch the bug, Literal might catch it in production, raising a clear exception that points to exactly what went wrong. Literal’s exceptions will tell you what it expected, on what method, for what attribute and what it got instead.

With some types like _Array, the exception will even tell you which specific item in the array was wrong.

Conclusion

Literal can’t do everything a static type checker can do. It only checks types at object boundaries and it only checks them at runtime — on code that is exercised. But I think it’s the ultimate Goldilocks solution. You get 80% of the value of type checking with very little effort.

One final point, if you’re a die-hard believer in duck-types, you can use Literal for that too. Instead of defining types like String, you could use more specific interfaces such as:

Buffer = _Interface(:bytesize, :<<)

This will check that the object responds to the methods bytesize and <<.

Personally, I do a little of both. Usually, I know I’m expecting a specific class of object. Sometimes I’m expecting an interface that I can detect with a module. Enumerable is a great example and with Literal, I can select _Enumerable(OtherType).

If the expectation truly is any class that implements these methods, I create an _Interface and give it a sensible name.

Over the next few weeks, we’ll explore the kinds of interesting types you can create in Literal. We’ll look at other Literal tools such as Enums, Flags, Values, and Delegators (some of these are still experimental). We’ll look at how types can be compared to each other and how that can help us improve runtime performance with new upcoming Literal collection objects.

If you want to learn more about Literal now, check out the website and GitHub repo.