Comparing Ivar and Strict Ivars
I was excited to see a post in yesterday morning’s RubyWeekly about the release of Avdi Grimm’s Ivar gem. He had sent me a preview a few days ago, but his new post goes into more detail.
Now that there are two gems to help you catch undefined instance variables in Ruby — I wrote about mine a couple of weeks ago — I thought it’d be a good idea to talk about the differences from my perspective and also explain some of the reasoning behind my approach.
You’d be forgiven for thinking these gems might be pretty similar behind the scenes, but in fact the two approaches are essentially opposites. That’s what makes this so interesting and why I wanted to talk about it.
Ivar does static analysis triggered at runtime, while Strict Ivars injects runtime analysis statically.
In this post, we’ll explore:
- Architecture — how do these libraries work?
- Correctness — potential false positives and false negatives
- Consequences — what should happen when an assumption is not met?
- Performance — what are the performance implications?
- Ergonomics — API feels and nice things.
- Integration — what does it cost to adopt one of these gems and what does it cost to back out?
Architecture
Ivar
Ivar can be configured to check specific classes or all classes. When you ask it to check all your classes, it uses a TracePoint
to trace the :end
event.
This event is triggered after each “first definition” of a class or module — though only if defined with the class
or module
keyword. It will not trigger for Class.new
.
When the TracePoint is triggered, as long as the class was defined in a file and that file is in your project root, it includes the module Ivar::Checked
into the class as if you had done so yourself.
The Ivar::Checked
module when included then prepends another module, which defines initialize
like so.
def initialize(*args, **kwargs, &block)
if @__ivar_skip_init
super
else
@__ivar_skip_init = true
manifest = Ivar.get_or_create_manifest(self.class)
manifest.process_before_init(self, args, kwargs)
super
check_ivars
end
end
This initialize
method is what triggers the check and it also sets up a fast path that will skip the check when you initialize the same class again.
When it comes to doing the check, Ivar first collects all the known instance variables, methods and method locations. Then for each file with methods in it, one Prism visitor finds the method it’s looking for and another Prism visitor takes a method body and looks for instance variable references.
If it finds any unknown instance variable references, it passes them up to the configured policy, which in turn logs, warns, raises, etc.
Strict Ivars
In contrast, Strict Ivars takes the opposite approach. Before there is any runtime information to speak of, it uses the require-hooks
gem to hook into code loading.
Require Hooks has three different strategies and it picks the most appropriate strategy based on your version of Ruby and whether Bootsnap is enabled.
In the simplest case, Require Hooks patches RubyVM::InstructionSequence.load_iseq
, which is expected to receive a file path and return a RubyVM::InstructionSequence
object.
When Bootsnap is enabled, Bootsnap defines this method. This is how it is able to load instruction sequences from a cache. In this case, Require Hooks works with Bootsnap so it can influence the instruction sequence before it is cached, while allowing Bootsnap to load existing instruction sequences directly from its cache.
Having hooked code loading, Strict Ivars makes a few minor modifications to your code before the Ruby compiler and interpreter sees it.
A Prism visitor navigates the syntax tree looking for instance variables. When it finds an instance variable read, it pushes the start and end location into an array of annotations.
@annotations <<
[location.start_character_offset, :start, name] <<
[location.end_character_offset, :end, name]
Once it’s gone through the whole file, it sorts these annotations by their location (an offset into the source code) and then iterates over them backwards, mutating a copy of the original source as it goes.
It inserts a tiny bit of code before and after each instance variable. Remember we tagged the annotations as either :start
or :end
.
annotations.sort_by!(&:first)
annotations.reverse_each do |offset, action, name|
case action
when :start
# insere before an instance variable
when :end
# insert after an instance variable
end
end
Working backwards means we don’t need to make any adjustments to the annotated offsets as we go, since the locations are stored as an offset from the start of the source string.
The specific inserts are a little ugly because they need to work in all contexts including BasicObject
and they need to be on a single line to ensure any error locations are consistent with the original source after processing.
You can think of it as taking something like this:
def name
@name
end
And turning it into this:
def name
(defined?(@name) ? @name : raise)
end
In the same way, it also hooks in to code dynamically evaluated via eval
, binding.eval
, class_eval
, module_eval
and instance_eval
. 1
That’s it. Aside form a few performance enhancements which we’ll talk about later, that’s all there is to it. All the checking happens at runtime via the generated code.
Correctness
There’s nothing worse than yet another warning message to ignore. I take alert fatigue very seriously so it’s important for me that there are as few false positives and false negatives as possible.
Ivar
Ivar can catch undefined references even when the code isn’t fully exercised. It can do this because although the check is triggered at runtime on each first initialisation, the checks themselves are static.
This static analysis is both a blessing and a curse. While it’s nice that it can catch errors without the code being exercised, it has to make a lot of assumptions in order to do this statically, and those assumptions lead to false positives and false negatives.
To get around the false positives, there is a special syntax to statically describe the instance variables each class is allowed to reference.
class Example
ivar :@some_variable
end
Since it knows statically all the permitted instance variables, it will catch instance variable writes as well as reads unless those writes occur during the first initialization.
However, there are some limitations:
- It doesn’t catch undefined references when the class is defined with
Class.new
, unless that class explicitly includes theIvar::Checked
module. - It doesn’t catch undefined references on singletons, e.g. when using instance variables on a class singleton.
- It doesn’t catch undefined references in blocks, unless they are inside an instance method defined directly on a class.
- It doesn’t catch undefined references in methods defined on modules, even when those modules are included into a class.
- It doesn’t catch undefined references that appear in dynamically evaluated code via
eval
,binding.eval
,class_eval
,module_eval
, orinstance_eval
. - It doesn’t catch all undefined references on objects with inconsistent shapes — when some instances have an instance variable that others don’t. In fact, inconsistent shapes can lead to false positives and/or false negatives.
Strict Ivars
Strict Ivars does its checks at runtime. The downside here is you have to exercise the code in order to validate it. The upside is it can be extremely accurate. And if you’re shipping code without exercising it — even manually — what are you doing?
One limitation is that Strict Ivars considers all writes to be authoritative. It only verifies reads.
This is a familiar model because it’s how local variables, global variables, constants and methods work already. You can define whatever you like, but when you come to access them, they need to have been defined.
Within this model, runtime checks have no false positives or false negatives because all the information is there at runtime. They work everywhere, even in dynamically evaluated code, even in classes that are allocated without being initialised.
You may think this last point is irrelevant. Who allocates instances of classes without initialising them? It’s actually more common than you think becuase it happens whenever you Marshal-load an object.
It’s also worth noting that neither ||=
nor &&=
are considered a read-followed-by-a-write. They are instead a special kind of write and therefore authoritative.
Consequences
What should happen when you access an undefined instance variable?
The Ivar gem gives you a number of options. It can warn, warn once, raise, log, or do nothing.
In his post, Avdi said, “I preferred a warning to a hard error, and I didn’t necessarily want to have to change methods that intentionally referenced unset ivars.”
But let’s think about this. In what circumstances do you intentionally reference undefined instance variables? I’ve run Strict Ivars with several Rails test suites and not found any false positives yet.
I guess it is possible to intentionally reference an undefined instance variable but it’s just not something people do, in my experience. If you did, the fix is easy. Set that instance variable to nil
first. One way to do this is to replace the read with an “or-write” @var ||= nil
since this is an authoritative write.
At this point, I can’t think of a single valid reason to reference an undefined instance variable. But let’s move on and talk about what should happen if you do.
Why not raise?
Unlike Ivar, Strict Ivars cannot be configured to log or warn. Imagine if Ruby logged when you referenced an undefined constant or method.
Chances are extremely high if you read an undefined instance variable, your code is taking a path no human has ever considered it might take, and that can have dire consequences in production.
For this reason Strict Ivars always raises an exception and you are encouraged to run it like this in production.
I cannot fathom how anyone could worry that Strict Ivars might raise an exception in production without worrying a hundred times more that production might be doing things no one expected it to do.
Performance
Let’s move on to performance. Objects and instance variables are the bread-and-butter of Ruby apps. There are three aspects of performance I’d like to look at:
- Boot performance
- Runtime performance
- Copy on Write utilisation
Boot performance
When Ivar boots, it uses a TracePoint
to hook each class definition and include the Ivar::Checked
module. This module then includes several other modules. I think this has little overhead but it is some overhead as it applies to all classes.
While the initial overhead is minimal, it does add quite a bit of overhead to the first initialisation of each class and each subsequent initialisation.
On first initialisation, Ivar needs to look up all the instance variables and instance methods, load the relevant file(s), parse them and do a static check. This check inevitably has to happen at some point, but the way it is implemented in Ivar means it cannot be cached.
When Strict Ivars boots, it parses all your applications files, inserting runtime checks where necessary. But this entire process can be cached as compiled instruction sequences by Bootsnap.
When using Bootsnap, there is no additional boot time overhead on a warm cache. Most of the time, you’ll only pay the pre-processing overhead for the one or two files you just modified.
On a cold boot (without a cache), Strict Ivars can process 1 million (1,000,000) lines of Ruby code (sample taken form average Rails app) in about 2.5 seconds.
Runtime performance
Ivar patches initialize
on every class and while this patch does include a fast path, the fast path still allocates an Array and a Hash (*args
and **kwargs
) and does an instance variable lookup as well as an extra method call to super
.
In a benchmark on a class with 3 parameters and three instance variables, the inclusion of Ivar::Checked
brought initialisation down from about 19.323M
IPS to just 302k
IPS. That’s 63.9x slower and it’s every class all the time.
Strict Ivars in theory adds some runtime overhead because each instance variable read also checks if the instance variable is defined. However, when you benchmark it, the result is “difference falls within error”. It’s so close that sometimes the code with the runtime check is faster than the code without it.
Additionally, Strict Ivars doesn’t check the same instance variable more than once in the same context. So if you have a method that reads the same instance variable 5 times, only the first read will be re-written with a runtime check.
Copy on Write utilisation
Because Ivar delays the check until first initialisation and because it stores all kinds of things globally after the first initialisation, it is less able to utilise shared memory via Copy on Write.
When you fork a new process in Ruby, the new process shares memory with the original process, copying only the parts it modifies.
To take advantage of this, you want to load as much stuff as possible up front before forking. This is why we eager load in production. It means when Puma forks your web server, significant chunks of memory are shared.
Since Strict Ivars does all the processing up front and never modifies global objects at runtime, it has no negative effect on Copy on Write utilisation.
Ergonomics
This will be a short section, but I wanted to explain my thoughts on the ergonomics.
I feel like raising an error on undefined instance variable reads should be built right into the language. A Gem that solves this issue should be invisible. You should never have to think about it.
This is why Strict Ivars is initialised with two lines in your boot process and beyond that has no configuration or interface. It should feel like part of the language after this.
Just on ergonomics, one thing I love about Ivar is the error message suggests to you a valid instance variable that you probably meant to type. I will unashamedly copy that idea in a future release of Strict Ivars!
Integration
Chances are you’re not starting from scratch and are instead considering adopting one of these libraries in an existing Rails app. Perhaps even a very large one.
So how much will this cost? How much of your code will you have to change to accommodate it?
My goal with Strict Ivars is that this cost is essentially nothing, that you won’t have to change any of your bug-free code.
It’s still early days and we may find exceptions to this rule, but as far as I know, there are no false positives and no false negatives in Strict Ivars.
It shouldn’t have any meaningful impact on warm boots, runtime, or memory and you should never have to think about it.
Additionally, since you never have to modify your code with anything specific to Strict Ivars (aside from the two lines in your boot process), backing out is as simple as uninstalling the gem and removing those two lines.
Conclusion
I’m thrilled that this problem is now solved. It has been my number one wish for Ruby for over eight years and it will probably never be fixed in Ruby itself because it’s an incompatible change. Just look at how long it’s taken to get frozen string literals.
I’m also thrilled to have ventured into a new world of meta-programming via Prism and source transformations.
I encourage you once again to read Avdi’s post and check out both Ivar and Strict Ivars on GitHub.
Footnotes
-
At the time of writing, Strict Ivars used various patches to do this. It turns out if you pass a block to a
class_eval
override that then callssuper
, the block is executed in the wrong context. Now it uses a different approach where arguments to eval methods are pre-processed. ↩