Diagnosing Type Errors

Hindley-Milner type inference is at once a great triumph of functional programming, and simultaneously a source of unending pain. For 40 years, researchers have proposed increasingly sophisticated methods for diagnosing type inference errors (although none have made it to production, to our knowledge). A variety of strategies have emerged:

Fault Localization

As a result of the global nature of HM type constraints, where the type-checker notices an error is not usually where the real fault lies. The strategy of fault localization is to try and blame the “right” line of code for a type error.

Wand1986 developed an algorithm to track the provenance of unifications made by the type-checker, which could then be presented to the user. Wand provides the intuition for needing this information as a question. “Why did the checker think it was dealing with a list here?” Having access to the provenance of unifications, experts can track the origins of unexpected checker states—the origin of an inconsistent type is likely not intended by the developer.

The usage of the raw proof tree presented by Argus follows the same intuitional logic. Like Wand, we currently rely on the intuition of expert programmers to justify the usefulness of such provenance. A planned extension of our work is to understand what user-friendly or expert-system interactions are desired—for both expert and beginner programmers.

Hage and Heeren2007 developed the Top framework to support customizable type-inference. Traditional HM systems solve constraints as they go, biasing the compiler to report errors towards the end of a program. The Top framework builds type graphs, allowing for the use of heuristics over the global type graph to aid error diagnosis. An example heuristic is a “trust factor,” which sorts constranits from least to most trusted when placing blame. For example, types imported from the language standard are considered high-trust—such functions can only be used incorrectly. Argus benefits from such a rule, and even stronger, Rust requires type declarations on all functions, which can be trusted more than inferred types. For a thorough treatment of the heuristics used in Argus see section .

Described in , oftentimes a subtree fails because of many conjuncts. However, one piece of information may make the entire subtree succeed. This same principle applies in fault localization. Many recent systems look for sets of constraints, that if resolved, the program would type-check. Pavlinovic et al.2014 and Loncaric et al.2016 use an SMT solver, and Zhang et al.2015 use a Bayesian analysis. Seidel et al.2017 go further and use machine learning to predict blame based on a training set of ill-typed programs.

The Argus heuristics use a similar approach by analyzing conjunctions for certain properties. However we do not look at sets of constraints as final. The obligations in a proof tree deviate from a constraint-satisfaction model because they can be ambiguous. Ambiguous obligations, if resolved, could result in further required obligations for the success of the subtree—additional obligations could be unsatisfiable or unintended by the user. A concrete example of this situation was already demonstrated in Compiler developers have operated for many years using a heuristic-based approach to diagnostics. While fault localization techniques can help with diagnosing Rust trait errors, they are unlikely to generalize to all cases. Our experience with Argus corroborates this intuition as each heuristic performs different on different shapes of trait system. Fault localization is usefull, not to determine accurate blame, but to point users to likely points of failure.

Interactive Debuggers

In contrast to fault localization, which attempts to find the root cause of an error, an alternative is to give the programmer an interface into all the information. Chitil2001 designed a compositional explanation graph of principal typings. This use of “compositional” requires that the explanation of a problem have a tree structure, where the type of each node must be uniqely determined by the node’s children’s types. This graph is combined with a command-line interface that asks users about the intended types of expressions in a process called algorithmic debugging, introduced by Shapiro1982. The value of algorithmic debugging is to bring the developer in the loop to understand the mental under which they’re work.

reverse [] = []
reverse (x:xs) = reverse xs ++ x

last xs = head (reverse xs)
init = reverse . tail . reverse

rotateR xs = last xs : init xs

The algorithmic debugging process for this program c
To help the tool place blame a series of questions are posed to the user. Below is an example user-interaction script from the above ill-typed program, with the user responses (y/n) appearing to the right.

Type error in: (last xs) : (init xs)

last :: [[a]] -> a
Is intended type an instance? (y/n) n

head :: [a] -> a
Is intended type an instance? (y/n) y

reverse :: [[a]] -> [a]
Is intended type an instance? (y/n) n

(++) :: [a] -> [a] -> [a]
Is intended type an instance? (y/n) y

Using this script the tool can identify that the error lies within the function reverse, but the inferred type of the append operator ++ is not to be blamed.

Algorithmic debugging is explained in great detail because it provides a promising direction forward for Argus when the tool faces ambiguity. In scenarios of great ambiguity such as human input can help Argus provide better feedback.

Furthermore, interactive debugging in the context of typing has been refined in multiple ways. Tsushima and Asai2013 developed a tool to work with the OCaml compiler rather than requiring a “debugger-friendly” re-implementation of type inference. Similar to some fault localization approaches, Stuckey et al.2003 uses constraint satisfiability and constraint provenance as a debugging aid. Chen and Erwig2014 use a “guided debugging” approach with counter-factual typing. A comprehensive set of type-change suggestions is filtered and presented to the programmer to find correct code changes 3.

Our approach to type class debugging took inspiration from many of the above tools. It distinguishes itself as being a richer 2D graphical interface that integrates into the existing VSCode IDE. This integration is essential to map errors to respective source locations, and does not require users to exit their development environment to run a command-line tool.

The published version does not provide an algorithmic debugging interface to refine error suggestions nor does it use counter-factual typing to propose code changes. An algorithmic debugging interface would certainly be a simple extension to the Argus system and rule out many “trivially false” obligations with a few user questions.

Code suggestions is an oft-requested feature. Understanding the source of an error is great, but wouldn’t it be better to “magically” provide a solution too? This is indeed a motivation behind program synthesis and automated repair.

Automated Repair

Stronger than the counter-factual typing presented in the last section by Chen and Erwig2014, several systems attempt to identify a small change to the input program that causes it to be well-typed 20, 3, 28.
In this thesis the proposed method of type class debugging aims to exhaust other avenues for debugging before reaching to program synthesis—a sledgehammer tool. A direction for exploration would be to present users with a set of types that do satisfy a class bound. Indeed the Rust compiler already attempts to do this under certain circumstances. However, Rust will not compose types for these suggestions, but following the obligations of a failed proof tree provides a recipe with which a tool could compose types to satisfy a given bound. This set of types is potentially empty if no instances are declared or are not visible at the error location. This feature would be more lightweight than full-blown synthesis as only the types are suggested, not an instance of those types.

Domain-specific annotations.

evidenced by the myriad of techniques to generally diagnose type errors, building a fully generic diagnostic system is a tricky task. An alternative approach is to give library authors the necessary tooling to inject domain-specific knowledge into diagnostics.

The Helium subset of Haskell introduced by Heeren et al.2003 took such an approach. Library authors can use a DSL to express custom errors messages that reflect common errors known to library maintainers.

Notably, most of the efforts in the Rust ecosystem towards addressing trait errors also have the shape of domain-specific annotations. RFC #2397 rust-lang/rfcs/2397-do-not-recommend.md describes a #[do_not_recommend] annotation that library authors could place on certain trait implementations. For instance, if a trait is implemented for tuples of length 32, then a library author could mark that implementation to not appear in the suggestions of diagnostics.

At time of publishing, the Rust compiler includes a nightly feature allowing library authors to append additional notes to diagnostic messages. One good example of its usage is in the Axum web framework. The pitfalls of creating a correct Handler have been discussed at lenghts in this written work. A portion of RFC #2397 was to improve error messages and here the following annotation was included on the Handler trait:

#[cfg_attr(
  nightly_error_messages,
  rustc_on_unimplemented(
    note = "Consider using `#[axum::debug_handler]` to improve the error message"
  )
)]

This annotation framework is not as expressive as Helium’s DSL so the authors have created a custom macro that does the heavy lifting, axum::debug_handler. This macro expands into additional code that does a static type assertion for all the Handler constraints. This macro-based approach is convincing and in many cases provides the current most-specific diagnostic. The downside is every library needs to create, and maintain, these additional debugging facilities. One attractive feature of the Helium system is that error notes are statically-checked to ensure compatibility with the current function types.