Fear Not the Association of Types

Gavin Gray

2024-08-11

Rough data hit my eyes recently and the takeaway was unsurprising: Rust associated types are misunderstood. In Chapter 17 of the Rust Book Experiment Will Crichton decided to inject questions on API design. The questions have been up long enough for us to see the preliminary data, and so far, the results are not great. Many questions involving associated types on traits were answered incorrectly. I’ve been knee-deep in Rust traits recently and feel I have enough energy to channel some passion into a post on associated types. This post serves two purposes

to provide a rule-of-thumb to help you determine when a type parameter should be an associated type
to show that associated types are no more than the return of a type-level function

The second purpose, really a goal, may sound scary to you at first, but it’s my job to convince you that it’s true. You may not even know what I mean by “type-level function,” we’ll cover all of this in time — please stay along for the ride.

Associated types are used all over the Rust standard library. If you’re unfamiliar with Rust, it may shock you to find that common operations

multiplication: 2 * 3, or <i32 as Mul<i32>>::mul(2, 3)
indexing: vs[0], or <Vec<i32> as Index<usize>>::index(&vs, 0)
pointer dereferencing: *ptr, or <&i32 as Deref>::deref(ptr)

are implemented with traits and associated types, but let’s not get ahead of ourselves. Before we dig into the broader use of associated types let’s look at one common example, the Mul trait.

pub trait Mul<Rhs> {
  type Output;

  fn mul(self, rhs: Rhs) -> Self::Output;
}

After this post I want you to be able to answer these two questions: “Why is Output an associated type?” and “Why is Rhs a type parameter?” If you can answer these well, the API of your future Rust libraries will be structured well, and I look forward to using them. But let’s start from the beginning and write a version of the trait Mul without an associated type; our version shall, of course, be called BadMul.

pub trait BadMul<Rhs, Output> {
    fn mul(self, rhs: Rhs) -> Output;
}

In the trait definition for BadMul there are three type parameters: Self, Rhs, and Output. Self is an implicit type parameter to every trait and denotes the type for which the trait will be implemented. In the context of multiplication, Self is the type of the left-hand side operand. This means that the method mul has the type signature fn mul(self: Self, rhs: Rhs) -> Output. The implementation for BadMul between two integers is as follows

impl BadMul<i32, i32> for i32 {
  fn mul(self, rhs: i32) -> i32 {
      self * rhs
  }
}

Edit, in Rust, like any language with sized integers, multiplication between two i32s may overflow. I’ve chosen to ignore overflow for this post.

In this trait implementation we specified that all three type parameters, Self, Rhs, and Output, equal i32. In elementary school you memorized integer multiplication tables and as you got older learned that we can multiply much more. Let’s take a breath and jot down some of the types we could substitute for our type parameters.

Self could be one of i32, i64, f32, Matrix<i32>, …
Rhs could be one of i32, i64, f32, Matrix<i32>, …
Output could be one of i32, i64, f32, Matrix<i32>, …

If you looked at the above and nodded your head. If you thought “yeah, that looks about right,” then I’m about to shatter your world. The above is wrong. One of those lines is not like the others. Take a random type from each of the above sets and substitute it for the type in the signature of mul. I’ll even write down a few for you.

✅ fn mul(self: i32, rhs: i32) -> i32
✅ fn mul(self: f32, rhs: i32) -> f32
❌ fn mul(self: i32, rhs: f32) -> Matrix<i32>

The last type signature makes no sense. When you multiply an integer and a float you get a float, not a two-dimensional matrix, at least in traditional multiplication. So what went wrong? The type for Output is dependent on the actual types for Self and Rhs. We aren’t free to choose the type for Output if we’ve chosen Self and Rhs — there exists one natural choice for Output once the others have been fixed. The dependency between type parameters indicates that Output should be an associated type on the trait BadMul. Of course, after moving Output to be an associated type we should also rename the trait to GoodMul, or to std::ops::Mul. I claim that if such a dependency exists between type parameters, the dependent type parameter should become an associated type.

Before moving forward, let’s correct the record by providing a better Mul trait definition with Output as an associated type and an example implementation for multiplying integers and floats.

pub trait Mul<Rhs> {
  type Output;

  fn mul(self, rhs: Rhs) -> Self::Output;
}

impl Mul<f32> for i32 {
  type Output = f32;

  fn mul(self, rhs: f32) -> Self::Output {
    self as f32 * rhs
  }
}

When parameters become associates

If you were to stop reading here the rule-of-thumb for whether or not a type parameter should be an associated type would read

A type parameter dependent on other type parameters should be an associated type.

Though I do believe this is an OK rule-of-thumb, it’s more hand-wavy than I’d like. Let’s make it more specific by considering one more change to the Mul trait definition. Consider the type parameter Rhs, should Rhs be an associated type? Is Rhs dependent on the type of Self? To help you decide, let’s assign Self to be the concrete type Vec<i32>. Provided this assignment to Self, let’s list types we could assign to Rhs.

✅ i32
✅ f32
❌ Vec<i32>

We should not assign Rhs = Vec<i32> because Vec<i32> * Vec<i32> is an ambiguous operation. Multiplication between vectors could mean element-wise multiplication, an inner vector product, or an outer vector product. Though some languages/libraries do choose a default operation for vector multiplication, choosing a default is, in my opinion, poor design and for this post we will not consider it valid. Therefore, the choice to assign Vec<i32> to Self has reduced the possible types assignable to Rhs. So yes, Rhs is dependent on Self, but should it be an associated type? The answer is no. To see why, let’s write another trait definition that declares Rhs an associated type. This version shall be called RestrictedMul.

trait RestrictedMul {
  type Rhs;
  type Output;

  fn mul(self, rhs: Self::Rhs) -> Self::Output;
}

impl RestrictedMul for i32 {
  type Rhs = i32;
  type Output = i32;

  fn mul(self, rhs: Self::Rhs) -> Self::Output {
    self * rhs
  }
}

impl RestrictedMul for i32 {
  type Rhs = f32;
  type Output = f32;

  fn mul(self, rhs: Self::Rhs) -> Self::Output {
    self * rhs
  }
}

The above code fails to type check. See for yourself in this playground. The code fails to type check because there are two implementation blocks with the same header, or as rustc says, there are “conflicting implementations of RestrictedMul for i32”.

error[E0119]: conflicting implementations of trait `RestrictedMul` for type `i32`
  --> src/main.rs:17:1
   |
8  | impl RestrictedMul for i32 {
   | -------------------------- first implementation here
...
17 | impl RestrictedMul for i32 {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^ conflicting implementation for `i32`

The core issue is that for one left-hand side type, for example Self = i32, there are multiple types we want to assign to the Rhs. After all, you can multiply i32 with many other types: i32, i64, and Matrix<i32>, etc. When a type is a type parameter, it signals that there is more than one way to instantiate the type. (I.e., assign the parameter a type.) When a type is an associated type, it signals that it can be instantiated with exactly one type per implementation block. Associated types aren’t just those dependent on other types, but specify a mapping from many types to one associated type. Our rule-of-thumb should read

A type parameter should be an associated type if for all implementation blocks there exists a relationship mapping the other type parameters to it. (I.e., you can create a table mapping the other parameters to the associated type, like we did for Mul.)

This rule-of-thumb is better than the prior notion of dependence: Not only are associated types dependent on the trait’s type parameters, but they are instantiated with a single type. In a previous post on Rust’s standard library, Pretzel Hammer puts forth a rule-of-thumb that reads

Use associated types when there should only be a single impl of the trait per type. Use generic types when there can be many possible impls of the trait per type.

We are trying to say the same thing. I think their rule lacks a bit of specificity, especially when traits have both type parameters and associated types. Take for example the Mul trait, there are several implementations with i32 as the Self type. Though both of these rules-of-thumb are intuition-based and require foresight into how future developers will use your trait. This foresight is a little scary, especially if you’re new to Rust or traits. But I argue that you engage in this reasoning frequently; every time you write a function you perform similar reasoning. We can piggyback on your engineering instinct about functions to help you reason about associated types.

From types to values, then back to types

I don’t want to get lost in type-theoretic weeds, so let’s allow ourselves to forget static types for a moment. Let’s write a multiplication function in a dynamically typed language, I prefer Julia, so Julia is what I will use. (If you’re unfamiliar, the syntax is quite easy.)

function mul(a, b)
  a * b
end

The function mul has two inputs and one output, C = mul(A, B). I shall ask again, what are some possible values we can pick for A, B, and C? Here’s some examples

A could be one of 0, 1, 2, 2.0, …
B could be one of 0, 1, 4/3, 2, …
C could be one of 0, 1, 2, 3.14, …

Just as before, we find that the set for C is not the same as the others. If you pick a random number from each of the sets and assign the value to our variables, the line 2 = mul(0, 1) would be nonsense; we know that 0 * 1 is equal to 0, not 2. The value of C depends on the values chosen for A and B. You know this because you spent all that time in elementary school memorizing multiplication tables. These tables map two operand values to the multiplied output value.

The row value is your left-hand side (`A`), the column value is the right-hand side (`B`), and the cell value is the return (`C`).
⨉	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`9`	`10`	`11`	`12`	`13`
`1`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`9`	`10`	`11`	`12`	`13`
`2`	`2`	`4`	`6`	`8`	`10`	`12`	`14`	`16`	`18`	`20`	`22`	`24`	`26`
`3`	`3`	`6`	`9`	`12`	`15`	`18`	`21`	`24`	`27`	`30`	`33`	`36`	`39`
`4`	`4`	`8`	`12`	`16`	`20`	`24`	`28`	`32`	`36`	`40`	`44`	`48`	`52`
`5`	`5`	`10`	`15`	`20`	`25`	`30`	`35`	`40`	`45`	`50`	`55`	`60`	`65`
`6`	`6`	`12`	`18`	`24`	`30`	`36`	`42`	`48`	`54`	`60`	`66`	`72`	`78`
`7`	`7`	`14`	`21`	`28`	`35`	`42`	`49`	`56`	`63`	`70`	`77`	`84`	`91`

In the context of values, my prompting you to “pick a value for C” sounds weird, if not wrong. You don’t pick C you compute it. C is the return value of the function and depends on the two input values: A, and B. We want to transplant this same logic to understand why the trait definition of BadMul was bad. In that definition Output was a type parameter, but it doesn’t behave like a parameter, you can’t pick it’s type. Instead, Output represents a single specific type as the result of the multiplication. Type parameters are input types and associated types are output types. There exists a mapping from input types to output types, just like our elementary school multiplication table for values, we can write a multiplication table for types.

The row type is your left-hand side (`Self`), the column type is the right-hand side (`Rhs`), and the cell type is the return (`Output`). Some entries have a cell type `?` indicating that I have chosen to not define multiplication between the two types.
⨉	`i32`	`i64`	`f32`	`Matrix<i32>`
`i32`	`i32`	`i64`	`f32`	`Matrix<i32>`
`i64`	`i64`	`i64`	`?`	`Matrix<i64>`
`f32`	`f32`	`?`	`f32`	`Matrix<f32>`
`Matrix<i32>`	`Matrix<i32>`	`Matrix<i64>`	`Matrix<f32>`	`Matrix<i32>`

This table shows that if you provide two input types, which are represented as the row and column header, you get back an output type. This behavior is what I call a type-level function: You provide two input types and get back an output type, just how functions on values take input values (parameters) and return an output value (the return).

By now, you may start to see why I describe associated types as the output of a type-level function. A multiplication function on values maps two input numbers to a third and it’s natural to define this mapping as a function. For the same two input values the mul function always returns the same output value.

For traits, the same type parameters always map to the same associated type. One benefit for type parameters is that they guarantee this. For example, multiplying a Matrix<i32> with a i32 always returns Matrix<i32> — no questions asked. In Rust you access the output type with

  <i32 as Mul<i32>>::Output
  /            \        \
Self          Rhs      Output

but if we wanted to mimic the structure of the Julia mul function, we could write

  i32 = Mul<i32, i32>
  /         /     \
Output    Self    Rhs

By moving Output from a type parameter to an associated type in the Mul trait we statically guarantee that for two input types there will exist only one implementation of the multiply method. The definition of BadMul did not guarantee this. Remember that we provided an implementation for fn mul(self: i32, rhs: i32) -> i32, using BadMul we could provide a second implementation that defines multiplication as fn mul(self: i32, rhs: i32) -> f32. This poor decision shall have some negative implications, as we will see.

// A refresher on the definition
pub trait BadMul<Rhs, Output> {
  fn mul(self, rhs: Rhs) -> Output;
}

// The only correct implementation
impl BadMul<i32, i32> for i32 {
  fn mul(self, rhs: i32) -> i32 {
    self * rhs
  }
}

// Some engineer's poor decision to provide another
impl BadMul<i32, f32> for i32 {
  fn mul(self, rhs: i32) -> f32 {
    (self * rhs) as f32
  }
}

In this version of the Rust universe there are now two different implementations for multiplication between integers. The Rust compiler would need to infer the output type to correctly pick which implementation block to use.

Even if there is only a single implementation block, the Rust compiler cannot pick the return type based on the implementation block — it still needs to be inferred. This would require developers to annotate more types than would otherwise be necessary.

Edit, I just learned about the type inference change in Rust 1.80 that broke any crate with time as a dependency. Pre-1.80, Rust allowed inference to resolve a type if there was a single implementation block, despite this being unsound. Post-1.80 disallowed this behavior, thus breaking inference for any crate that relied on the pattern, one example being the time crate.

fn main() {
  println!("{}", 3 * 2);

  println!("{}", <i32 as BadMul<i32, _>>::mul(3, 2));
}

If the definition of Mul in the Rust standard library were written like BadMul, the first line in the above code would behave like the second line of code does. This is undesirable because the code fails to type check.

error[E0283]: type annotations needed
    --> src/main.rs:20:18
    |
 20 |   println!("{}", <i32 as BadMul<i32, _>>::mul(3, 2));
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                  cannot infer type of the type parameter `Output`
    |                  declared on the trait `BadMul

The above code doesn’t compile because the Rust compiler doesn’t know how to infer the type parameter Output. It needs to be specified. The broader implication is that type parameters that should be associated types require developers to annotate more types. To recap, associated types guarantee that there exists a single type for the implementation block, thus helping the compiler to infer types rather than requiring it to infer more than necessary.

`<Self as Iterator<Item = Example>>::next(&mut self)`

Reasoning about associated types may be easier with the Mul trait, after all, it fits my proposed mental model very well and multiplication is something with which we’re all familiar. Lots of traits in the Rust standard library use associated types, and some may not be as obvious. Let’s look at two more traits: Iterator, and Deref, and specifically why they have associated types.

Iterators: Even if you don’t use iterators explicitly, for and while loops are syntactic sugar for iterator-based code. Needless to say they’re used quite a bit. Below is the trait definition for Iterator. Read it and ask yourself, why is Item an associated type?

pub trait Iterator {
  type Item;

  fn next(&mut self) -> Option<Self::Item>;
}

Iterators have only one type parameter, Self, the type implementing the iterator. For all of the types that Self could be, each one has a single specific type for Item. OK, not a table in this case, just a plain map. Again, we can write a table describing the relationship between Self and Item.

The first row represents the `Self` type and the second row is the corresponding `Item` type. There isn’t a good operator symbol for “item of” so I chose 👾 because it’s cute.
👾	`Iter<'a, T>`	`IterMut<'a, T>`	`Chars<'a>`	`Chunks<'a, T>`
	`&'a T`	`&'a mut T`	`char`	`&'a [T]`

Item is an associated type because there exists a mapping from Self types to Item types. If you’re new to Rust, you may be asking “would it be wrong to make Item a type parameter?” Just as with the BadMul trait you could write a version with Item as a type parameter, here’s a playground link, but it would affect type inference negatively. The concept of associated types may be tricky to grasp especially if you come from an object-oriented language where the item type is usually a type parameter. For example the Java iterator interface.

However, as the old saying goes, “just because Java does it, doesn’t mean you have to.” As we saw with the BadMul trait, if Item were a type parameter, Rust type inference would not work as well. I will go as far as to assert that iterators and related crates, like itertools, would not be as ergonomic and require developers to insert lots of type annotations. Let’s take a quick detour and look at a small difference between Rust’s and Java’s type systems, and ask, how does Java get away with this design decision?

What Java does that you shouldn’t

So that everyone is on the same page, below is the definition for the Java Iterator interface.

public interface Iterator<Item> {
  // Returns `true` if the iteration has more elements.
  boolean hasNext();

  // Returns the next element in the iteration.
  // @throws `NoSuchElementException` if the iteration has no more elements
  Item next();
}

In Rust the type of iterator elements Item is an associated type and I want you to understand why that needs to be the case in Rust while Java gets away with using a type parameter. A naïve response might be that Java doesn’t have type inference for variable assignments. Historically this was true, but Java 10 introduced the var binding that enables local inference. For example, creating a vector and iterating over the contents is shown below in Rust and Java.

// create a vector with two zeros
let v = vec![0; 2];
// iterate over the vector and print the contents
for i in &v {
  println!("{i}");
}

// create a vector with two zeros
var v = new Vector<>() {{ add(0); add(0); }};
// iterate over the vector and print the contents
for (var i : v) {
  System.out.println(i);
}

Each code snippet behaves the same and neither specifies the type of the iterated elements. So the real reason can’t be for mere type inference. The real reason is that Rust’s type system affects how the code runs in a deeper way than Java’s. And it all boils down to the fact that Java has a runtime system, the Java Virtual Machine (JVM).

The main goal of the Java type system is to make sure nothing will go wrong at runtime. In this context, let’s define “go wrong” as calling a method on an object that the object doesn’t have. For the small code snippet iterating over a vector, Java needs to know just that Vector implements the interface Iterator. If it does, then the methods for iterating will be found at runtime. It turns out that Vector extends the class AbstractList, and through a chain of inheritance Vector inherits the necessary implementation for Iterator. Knowing the exact methods for the iterator isn’t what’s important, Java wants to know only that they exist, not what they are or where they live. This is also true of the Java System.out.println function, which is overloaded to accommodate different types. In the above code the println(int i) overload would be picked, but if the items were instead objects, println(Object o) would be called and runtime dispatch would figure out how to print the object — Java doesn’t statically need to know the type.

Only caring about the existence of a method at runtime differs drastically from Rust, the Rust compiler needs to know the exact address of the functions to be called; there is no runtime to figure this out. In the above code, here are some things Rust needs to know

How to turn a &Vec<i32> into an iterator: using the trait system Rust knows it can call the trait method into_iter defined on the trait IntoIterator for &Vec<T>. Thanks to the associated type in this trait, Item, Rust also knows that the elements iterated over are of type &i32.
How to print a &i32: Rust now knows that the type of i is &i32. Again using the trait system, the compiler finds the fmt method of the Display trait defined for i32.
Rust also needs to know the size of the iterated elements, check lifetimes, yada yada yada. We’ll ignore this.

All of these individual pieces of Rust’s type machinery are in place so that the compiler can replace calls like into_iter and fmt with specific function addresses. Unlike Java, there is no runtime system that will find the methods later. Rust needs to know now at compile time!

This is how Java can get away with using a type parameter for iterator items, and why Rust developers should learn how and when to properly use associated types. Let’s return to the Rust standard library and look at the last example, pointer dereferencing.

Dereferences: Pointer dereferencing is a well-known operation to systems programmers. If you’ve used a C-like language at some point, you’re acquainted with pointer dereferencing and may have even dereferenced the null pointer. One feature of Rust that you may be unfamiliar with is called deref coercion. This means that Rust will automatically insert dereference operations where possible. It’s the feature that enables the following weird piece of code to type check.

use std::sync::Arc;
use std::rc::Rc;

fn takes_a_ref(x: &i32) { /*...*/ }

fn weird_func(y: &&&&&Box<Arc<Rc<&&&&&i32>>>) {
  takes_a_ref(y)
}

If you don’t like counting, there’s five references to the Box, and five inside the Rc to the integer, for a total of thirteen references. However, when we call takes_a_ref(y) we don’t have to insert dereferences, the compiler does it for us. Et voilà, deref coercion in action! If you ever find yourself with thirteen references to an integer, please reevaluate your life decisions that led you to that point. How is the compiler able to do this without accidentally screwing up our types? Let’s look at the trait definition Deref that defines how a type is dereferenced.

pub trait Deref {
  type Target;

  fn deref(&self) -> &Self::Target;
}

// Example implementation for immutable references
impl<T> Deref for &T {
  type Target = T;

  // &&T -> &T
  fn deref(&self) -> &Self::Target {
    *self
  }
}

Observe that Target, the resulting type of the dereference, is an associated type. This is a very important design choice for the trait. With Target as an associated type it is statically guaranteed that a type always dereferences to the same type. If Target were instead a type parameter, deref coercion could potentially insert dereferences resulting in code that doesn’t type check, due to ambiguous types like we saw with the BadMul trait. Just like all the other traits we’ve looked at so far, we can write a table that describes the relationship between the Self type and Target.

The first row represents the `Self` type and the second row is the corresponding `Target` type. The operator ‘`*`’ is dereference, not multiplication.
*	`String`	`&T`	`&mut T`	`Box<T>`	`Arc<T>`	`Vec<T>`
`str`	`T`	`T`	`T`	`T`	`[T]`

Implementing Deref is especially helpful for pointers, or smart pointers like Box and Arc. Implementing Deref allows us programmers to treat these types like pointers and leaves the deref gymnastics for the compiler to insert.

Closing remarks

I’ve provided a general rule-of-thumb to help you determine when a type parameter should be an associated type.

A type parameter should be an associated type if for all implementation blocks there exists a relationship mapping the other type parameters to it. (I.e., you can create a table mapping the other parameters to the associated type, like we did for Mul.)

The important difference is that associated types can only be specified once per implementation, type parameters can vary. This relationship has implications on how it behaves, previously we saw how making Output a type parameter on the BadMul trait would require us to write more annotations — the compiler couldn’t infer the type for us. There are other motivating factors behind associated types, specified well in the RFC, I will summarize a couple here.

Refactoring/Evolution You can add an associated type to an existing trait without breaking clients that don’t care about those types. You can’t add a type parameter without breaking all client code.
Inputs/Outputs An input type parameter is used to determine which implementation to use. Output types, or associated types, play no role in selecting the implementation.
Inference implications Output types (associated types) can be used to resolve inference types as soon as a matching implementation is found. Input types cannot do the same. Even if there is a single matching implementation block, type inference cannot use an input type (type parameter) to resolve inference types. This restriction is required to ensure that a crate added in the future does not break existing code. (This is called coherence.)

If you want to read others’ writing on associated types, consider the following:

Pretzel Hammer’s tour of the Rust standard library.
The Rust Book’s section on Advanced Traits.
Users discussion thread about type parameters vs associated types.
Associated types RFC

If you have questions or would like to correct this post, as always, my email is open.