Pulling a Rabbit Out of the Hat

Com­piler writ­ers have for years used a bag of heuris­tics to pro­vide error diag­nos­tics. The sophis­ti­ca­tion of these heuris­tics increase along­side the sophis­ti­ca­tion of the type sys­tem whose errors they are diag­nos­ing. Heuris­tics are imper­fect, and this work shows that inter­ac­tive debug­ging uncov­ers the root cause of errors when sta­tic diag­nos­tics can­not. How­ever, Argus also falls vic­tim to the heuris­tic. It is not suf­fi­cient to sim­ply give devel­op­ers a whole proof tree and say “there, go dig­ging!” We must also tell devel­op­ers where to start, which fail­ures are most likely the root cause. For this rea­son Argus comes with its own set of heuris­tics to rank failed errors. We want devel­op­ers on the right debug­ging track with as few inter­ac­tions as pos­si­ble.

Pro­vid­ing the exact root fail­ure is unde­cid­able. The Rust trait sys­tem is Tur­ing-com­plete and thanks to our friend the Halt­ing Prob­lem we can­not guar­an­tee ter­mi­na­tion of run­ning a Tur­ing-com­plete pro­gram. Debug­ging is a human prob­lem, not an algo­rith­mic prob­lem. There­fore we can­not expect that Argus, or any tool, be able to accu­rately local­ize arbi­trary pro­gram errors algo­rith­mi­cally. Indeed there are even prob­lems where there doesn’t exist only one fail­ure, but a set of fail­ures work­ing together. (See for an exam­ple.) Argus pro­vides a set of failed oblig­a­tions in its bot­tom-up view, and heuris­tics are used to rank them by their like­ly­hood of being a root cause. In this sec­tion oblig­a­tions with poten­tially use­ful data are described as inter­est­ing. Iden­ti­fy­ing inter­est­ing failed oblig­a­tions is a two step process: con­junc­tive sub­goals are ana­lyzed together as a set—this to first pare down the set of pos­si­bil­i­ties, then failed leaf oblig­a­tions are ranked and par­ti­tioned on indi­vid­ual merit.

Conjunct Analysis

Imag­ine snip­ping the And–Or tree along the Or rela­tions. Doing so pro­duces a for­est of failed trees.Each tree is a fail­ure oth­er­wise the ini­tial goal wouldn’t have failed. Each tree in the for­est has no branch­ing except for a failed con­junc­tive set at its leaf. See for the result­ing for­est of snip­ping the proof tree of the Axum login exam­ple. Each con­junc­tive set is then ana­lyzed to find those with inter­est­ing oblig­a­tions.


Con­junct analy­sis uses three met­rics to deter­mine whether or not a set con­tains inter­est­ing failed goals. The depth of the failed con­junct, the num­ber of known prin­ci­ple types, and the ratio of con­junct size to the num­ber of known prin­ci­ple types. Con­juncts are inter­est­ing if any of these met­rics lie more than one stan­dard devi­a­tion out­side the aver­age for all sets.

Depth Trait solv­ing is a fail fast process. This means that if there isn’t enough infor­ma­tion at any point the process stops along its cur­rent branch and returns an ambigu­ous response. Fail­ures deep in the proof tree indi­cate that the solver was able to go fur­ther along the cur­rent branch. Progress was made. Intu­itively we can think of the solver as being closer to a solu­tion in a deep fail­ure than in a shal­low one. This met­ric helps com­bat Pro­log-style uni­fi­ca­tion fail­ures when many infer­ence vari­ables remain unsolved. Refer back to the run­ning exam­ple using the Axum Handler trait. The library uses macros to gen­er­ate the trait imple­men­ta­tion rule for func­tions of arity zero to six­teen. If the func­tion used as a han­dler has arity two, this means there will be four­teen can­di­dates whose rule head does not prop­erly unify with the func­tion type. Uni­fi­ca­tion fail­ures are shal­low. The can­di­date that prop­erly uni­fies with the rule head, in this case of func­tion arity two, will result in the rule sub­goals being solved for—at a greater depth. Appeal­ing to this intu­ition we con­clude the gen­eral rule that deep fail­ures are more inter­est­ing than shal­low ones.

Num­ber of known prin­ci­ple types A trait goal of the form Type: Trait has prin­ci­ple type: Type. Of course oblig­a­tions have a left-hand-side type, why make this dis­tinc­tion? Trait solv­ing in Rust is inter­leaved with type infer­ence. Many goals may start look­ing like 0?: Trait, where 0? is an infer­ence vari­able. Its type is cur­rently unknown, refer­ring back to the goal, this goal has an unknown prin­ci­ple type. These goals are often answered with ‘ambigu­ous’ by the trait solver, but may pro­duce a set of addi­tional con­straints on the infer­ence vari­able. We may learn that 0? = Vec<1?>, mean­ing the infer­ence vari­able is a vec­tor whose type para­me­ter remains unknown. Note that the goal Vec<1?>: Trait has a known prin­ci­ple type. As a met­ric we favor groups that have a higher num­ber of known prin­ci­ple types. They are more inter­est­ing than groups with remain­ing infer­ence vari­ables, which are bound to fail by con­struc­tion.

Ratio of size by known prin­ci­ple types Again using the num­ber of known prin­ci­ple types we can addi­tion­ally take into account the group size. Let us say there exist two groups, each with ten oblig­a­tions with known prin­ci­ple types. Is one more inter­est­ing than the other if the groups have sizes 10 and 100? We argue yes. Small groups with a high num­ber of known prin­ci­ple types are more inter­est­ing. Their data is more con­crete. Fail­ures are more con­crete. This leads to the third met­ric, the ratio of group size by the num­ber of known prin­ci­ple types; this favors smaller groups with more con­crete infor­ma­tion to aid debug­ging.

The pro­posed list of met­rics is by no means exhaus­tive and will likely be extended in the future to ana­lyze con­junc­tive sets more thor­oughly. See for ideas lifted from related research areas. Using this list we fil­ter out con­junc­tive sets that do not meet our cri­te­ria. Now, failed goals are ranked amongst them­selves to lift inter­est­ing fail­ures to the top of the Argus pro­posed list.

Subgoal Analysis

Ana­lyz­ing sets of failed goals together is how we catch all inter­est­ing oblig­a­tions. This does not mean that all mem­bers of an inter­est­ing set are them­selves inter­est­ing. To throw out these wolves in sheep’s cloth­ing we par­ti­tion the set of inter­est­ing groups one more time, throw­ing out indi­vid­ual oblig­a­tions that are unin­ter­est­ing.

In this con­text an inter­est­ing oblig­a­tion is one that pro­vides con­crete infor­ma­tion. This hap­pens in one of two ways, either its prin­ci­ple type is known, or the oblig­a­tion resulted in “no” or “maybe: over­flow.” Ambigu­ous oblig­a­tions with unknown prin­ci­ple types are auto­mat­i­cally thrown out. They pro­vide the least amount of debug­ging help. This naïve par­tion is suf­fi­cient to remove oblig­a­tions that pro­duce visual noise and lit­tle use­ful infor­ma­tion. They are how­ever still avail­able in the bot­tom-up debug­ging list, just at a lower rank.