Compiler writers have for years used a bag of heuristics to provide error diagnostics. The sophistication of these heuristics increase alongside the sophistication of the type system whose errors they are diagnosing. Heuristics are imperfect, and this work shows that interactive debugging uncovers the root cause of errors when static diagnostics cannot. However, Argus also falls victim to the heuristic. It is not sufficient to simply give developers a whole proof tree and say “there, go digging!” We must also tell developers where to start, which failures are most likely the root cause. For this reason Argus comes with its own set of heuristics to rank failed errors. We want developers on the right debugging track with as few interactions as possible.
Providing the exact root failure is undecidable. The Rust trait system is Turing-complete and thanks to our friend the Halting Problem we cannot guarantee termination of running a Turing-complete program. Debugging is a human problem, not an algorithmic problem. Therefore we cannot expect that Argus, or any tool, be able to accurately localize arbitrary program errors algorithmically. Indeed there are even problems where there doesn’t exist only one failure, but a set of failures working together. (See
Conjunct Analysis
Imagine snipping the And–Or tree along the Or relations. Doing so produces a forest of failed trees.Each tree is a failure otherwise the initial goal wouldn’t have failed. Each tree in the forest has no branching except for a failed conjunctive set at its leaf. See
Conjunct analysis uses three metrics to determine whether or not a set contains interesting failed goals. The depth of the failed conjunct, the number of known principle types, and the ratio of conjunct size to the number of known principle types. Conjuncts are interesting if any of these metrics lie more than one standard deviation outside the average for all sets.
Depth Trait solving is a fail fast process. This means that if there isn’t enough information at any point the process stops along its current branch and returns an ambiguous response. Failures deep in the proof tree indicate that the solver was able to go further along the current branch. Progress was made. Intuitively we can think of the solver as being closer to a solution in a deep failure than in a shallow one. This metric helps combat Prolog-style unification failures when many inference variables remain unsolved. Refer back to the running example using the Axum Handler trait. The library uses macros to generate the trait implementation rule for functions of arity zero to sixteen. If the function used as a handler has arity two, this means there will be fourteen candidates whose rule head does not properly unify with the function type. Unification failures are shallow. The candidate that properly unifies with the rule head, in this case of function arity two, will result in the rule subgoals being solved for—at a greater depth. Appealing to this intuition we conclude the general rule that deep failures are more interesting than shallow ones.
Number of known principle types A trait goal of the form Type: Trait has principle type: Type. Of course obligations have a left-hand-side type, why make this distinction? Trait solving in Rust is interleaved with type inference. Many goals may start looking like 0?: Trait, where 0? is an inference variable. Its type is currently unknown, referring back to the goal, this goal has an unknown principle type. These goals are often answered with ‘ambiguous’ by the trait solver, but may produce a set of additional constraints on the inference variable. We may learn that 0? = Vec<1?>, meaning the inference variable is a vector whose type parameter remains unknown. Note that the goal Vec<1?>: Trait has a known principle type. As a metric we favor groups that have a higher number of known principle types. They are more interesting than groups with remaining inference variables, which are bound to fail by construction.
Ratio of size by known principle types Again using the number of known principle types we can additionally take into account the group size. Let us say there exist two groups, each with ten obligations with known principle types. Is one more interesting than the other if the groups have sizes 10 and 100? We argue yes. Small groups with a high number of known principle types are more interesting. Their data is more concrete. Failures are more concrete. This leads to the third metric, the ratio of group size by the number of known principle types; this favors smaller groups with more concrete information to aid debugging.
The proposed list of metrics is by no means exhaustive and will likely be extended in the future to analyze conjunctive sets more thoroughly. See
Subgoal Analysis
Analyzing sets of failed goals together is how we catch all interesting obligations. This does not mean that all members of an interesting set are themselves interesting. To throw out these wolves in sheep’s clothing we partition the set of interesting groups one more time, throwing out individual obligations that are uninteresting.
In this context an interesting obligation is one that provides concrete information. This happens in one of two ways, either its principle type is known, or the obligation resulted in “no” or “maybe: overflow.” Ambiguous obligations with unknown principle types are automatically thrown out. They provide the least amount of debugging help. This naïve partion is sufficient to remove obligations that produce visual noise and little useful information. They are however still available in the bottom-up debugging list, just at a lower rank.