Why does Julia invalidate code?
Julia may be unique among computer languages in supporting all four of the following features:
- interactive development
- "method overloading" by packages that don't own the function
- aggressive compilation
- consistent compilation: same result no matter how you got there
The combination of these features requires that you sometimes "throw away" code that you have previously compiled.
To illustrate: suppose you have a function numchildren
with one method,
numchildren(::Any) = 1
and then write
total_children(list) = sum(numchildren.(list))
Now let list
be a Vector{Any}
. You can compile a fast total_children(::Vector{Any})
(aggressive compilation) by leveraging the fact that you know there's only one possible method of numchildren
, and you know that it returns 1 for every input. Thus, total_children(list)
gives you just length(list)
, which would indeed be a very highly-optimized implementation!
But now suppose you add a second method (interactive development + method overloading)
numchildren(::BinaryNode) = 2
where BinaryNode
is a new type you've defined (so it's not type-piracy). If you want to get the right answer (consistent compilation) from an arbitrary list::Vector{Any}
, there are only two options:
Option A: plan for this eventuality from the beginning, by making every
numchildren(::Any)
be called by runtime dispatch. But when there is only one method ofnumchildren
, forcing runtime dispatch makes the code vastly slower. Thus, this option at least partly violates aggressive compilation.
Option B: throw away the code for
total_children
that you created when there was only one method ofnumchildren
, and recompile it in this new world where there are two.
Julia does a mix of these: it does B up to 3 methods, and then A thereafter. (Recent versions of Julia have experimental support for customizing this behavior with Base.Experimental.@max_methods
.)
This example was framed as an experiment at the REPL, but it is also relevant if you load two packages: PkgX
might define numchildren
and total_children
, and PkgY
might load PkgX
and define a second method of PkgX.numchildren
. Any precompilation that occurs in PkgX
doesn't know what's going to happen in PkgY
. Therefore, unless you want to defer all compilation, including for Julia itself, until the entire session is loaded and then closed to further extension (similar to how compilers for C, Rust, etc. work), you have to make the same choice between options A and B.
Given that invalidation is necessary if Julia code is to be both fast and deliver the answers you expect, invalidation is a good thing! But sometimes Julia "defensively" throws out code that might be correct but can't be proved to be correct by Julia's type-inference machinery; such cases of "spurious invalidation" serve to (uselessly) increase latency and worsen the Julia experience. Except in cases of piracy, invalidation is a risk only for poorly-inferred code. With our example of numchildren
and total_children
above, the invalidations were necessary because list
was a Vector{Any}
, meaning that the elements might be of Any
type and therefore Julia can't predict in advance which method(s) of numchildren
would be applicable. Were one to create list
as, say, list = Union{BinaryNode,TrinaryNode}[]
(where TrinaryNode
is some other kind of object with children), Julia would know much more about the types of the objects to which it applies numchildren
: defining yet another new method like numchildren(::ArbitraryNode)
would not trigger invalidations of code that was compiled for a list::Vector{Union{BinaryNode,TrinaryNode}}
.