Object-Orientation in Rust

Everyone comes from somewhere, and the chances are good that your previous programming language implemented Object-Oriented Programming (OOP) in a particular way:

  • 'classes' act as factories for generating objects (often called instances) and define unique types.
  • Classes may inherit from other classes (their parents), inheriting both data (fields) and behaviour (methods)
  • If B inherits from A, then an instance of B can be passed to something expecting A (subtyping)
  • An object should hide its data (encapsulation), which can only be operated on with methods.

Object-oriented design is then about identifying the classes (the 'nouns') and the methods (the 'verbs') and then establishing relationships between them, is-a and has-a.

There was a point in the old Star Trek series where the doctor would say to the captain, "It's Life, Jim, just not Life as we know it". And this applies very much to Rust-flavoured object-orientation: it comes as a shock, because Rust data aggregates (structs, enums and tuples) are dumb. You can define methods on them, and make the data itself private, all the usual tactics of encapsulation, but they are all unrelated types. There is no subtyping and no inheritance of data (apart from the specialized case of Deref coercions.)

The relationships between various data types in Rust are established using traits. A large part of learning Rust is understanding how the standard library traits operate, because that's the web of meaning that glues all the data types together.

Traits are interesting because there's no one-to-one correspondence between them and concepts from mainstream languages. It depends if you're thinking dynamically or statically. In the dynamic case, they're rather like Java or Go interfaces.

Trait Objects

Consider the example first used to introduce traits:

# #![allow(unused_variables)]
# 
#fn main() {
trait Show {
    fn show(&self) -> String;
}

impl Show for i32 {
    fn show(&self) -> String {
        format!("four-byte signed {}", self)
    }
}

impl Show for f64 {
    fn show(&self) -> String {
        format!("eight-byte float {}", self)
    }
}

#}

Here's a little program with big implications:

fn main() {
    let answer = 42;
    let maybe_pi = 3.14;
    let v: Vec<&Show> = vec![&answer,&maybe_pi];
    for d in v.iter() {
        println!("show {}",d.show());
    }
}
// show four-byte signed 42
// show eight-byte float 3.14

This is a case where Rust needs some type guidance - I specifically want a vector of references to anything that implements Show. Now note that i32 and f64 have no relationship to each other, but they both understand the show method because they both implement the same trait. This method is virtual, because the actual method has different code for different types, and yet the correct method is invoked based on runtime information. These references are called trait objects.

And that is how you can put objects of different types in the same vector. If you come from a Java or Go background, you can think of Show as acting like an interface.

A little refinement of this example - we box the values. A box contains a reference to data allocated on the heap, and acts very much like a reference - it's a smart pointer. When boxes go out of scope and Drop kicks in, then that memory is released.

# #![allow(unused_variables)]
# 
#fn main() {
let answer = Box::new(42);
let maybe_pi = Box::new(3.14);

let show_list: Vec<Box<Show>> = vec![question,answer];
for d in &show_list {
    println!("show {}",d.show());
}

#}

The difference is that you can now take this vector, pass it as a reference or give it away without having to track any borrowed references. When the vector is dropped, the boxes will be dropped, and all memory is reclaimed.

Animals

For some reason, any discussion of OOP and inheritance seems to end up talking about animals. It makes for a nice story: "See, a Cat is a Carnivore. And a Carnivore is an Animal". But I'll start with a classic slogan from the Ruby universe: "if it quacks, it's a duck". All your objects have to do is define quack and they can be considered to be ducks, albeit in a very narrow way.

# #![allow(unused_variables)]
# 
#fn main() {

trait Quack {
    fn quack(&self);
}

struct Duck ();

impl Quack for Duck {
    fn quack(&self) {
        println!("quack!");
    }
}

struct RandomBird {
    is_a_parrot: bool
}

impl Quack for RandomBird {
    fn quack(&self) {
        if ! self.is_a_parrot {
            println!("quack!");
        } else {
            println!("squawk!");
        }
    }
}

let duck1 = Duck();
let duck2 = RandomBird{is_a_parrot: false};
let parrot = RandomBird{is_a_parrot: true};

let ducks: Vec<&Quack> = vec![&duck1,&duck2,&parrot];

for d in &ducks {
    d.quack();
}
// quack!
// quack!
// squawk!

#}

Here we have two completely different types (one is so dumb it doesn't even have data), and yes, they all quack(). One is behaving a little odd (for a duck) but they share the same method name and Rust can keep a collection of such objects in a type-safe way.

Type safety is a fantastic thing. Without static typing, you could insert a cat into that collection of Quackers, resulting in run-time chaos.

Here's a funny one:

# #![allow(unused_variables)]
# 
#fn main() {
// and why the hell not!
impl Quack for i32 {
    fn quack(&self) {
        for i in 0..*self {
            print!("quack {} ",i);
        }
        println!("");
    }
}

let int = 4;

let ducks: Vec<&Quack> = vec![&duck1,&duck2,&parrot,&int];
...
// quack!
// quack!
// squawk!
// quack 0 quack 1 quack 2 quack 3

#}

What can I say? It quacks, it must be a duck. What's interesting is that you can apply your traits to any Rust value, not just 'objects'. (Since quack is passed a reference, there's an explicit dereference * to get the integer.)

However, you can only do this with a trait and a type from the same crate, so the standard library cannot be 'monkey patched', which is another piece of Ruby folk practice (and not the most wildly admired either.)

Up to this point, the trait Quack was behaving very much like a Java interface, and like modern Java interfaces you can have provided methods which supply a default implementation if you have implemented the required methods. (The Iterator trait is a good example.)

But, note that traits are not part of the definition of a type and you can define and implement new traits on any type, subject to the same-crate restriction.

It's possible to pass a reference to any Quack implementor:

# #![allow(unused_variables)]
# 
#fn main() {
fn quack_ref (q: &Quack) {
    q.quack();
}

quack_ref(&d);

#}

And that's subtyping, Rust-style.

Since we're doing Programming Language Comparisons 101 here, I'll mention that Go has an interesting take on the quacking business - if there's a Go interface Quack, and a type has a quack method, then that type satisfies Quack without any need for explicit definition. This also breaks the baked-into-definition Java model, and allows compile-time duck-typing, at the cost of some clarity and type-safety.

But there is a problem with duck-typing. One of the signs of bad OOP is too many methods which have some generic name like run. "If it has run(), it must be Runnable" doesn't sound so catchy as the original! So it is possible for a Go interface to be accidentally valid. In Rust, both the Debug and Display traits define fmt methods, but they really mean different things.

So Rust traits allow traditional polymorphic OOP. But what about inheritance? People usually mean implementation inheritance whereas Rust does interface inheritance. It's as if a Java programmer never used extend and instead used implements. And this is actually recommended practice by Alan Holub. He says:

I once attended a Java user group meeting where James Gosling (Java's inventor) was the featured speaker. During the memorable Q&A session, someone asked him: "If you could do Java over again, what would you change?" "I'd leave out classes," he replied. After the laughter died down, he explained that the real problem wasn't classes per se, but rather implementation inheritance (the extends relationship). Interface inheritance (the implements relationship) is preferable. You should avoid implementation inheritance whenever possible

So even in Java, you've probably been overdoing classes!

Implementation inheritance has some serious problems. But it does feel so very convenient. There's this fat base class called Animal and it has loads of useful functionality (it may even expose its innards!) which our derived class Cat can use. That is, it is a form of code reuse. But code reuse is a separate concern.

Getting the distinction between implementation and interface inheritance is important when understanding Rust.

Note that traits may have provided methods. Consider Iterator - you only have to override next, but get a whole host of methods free. This is similar to 'default' methods of modern Java interfaces. Here we only define name and upper_case is defined for us. We could override upper_case as well, but it isn't required.

# #![allow(unused_variables)]
# 
#fn main() {
trait Named {
    fn name(&self) -> String;

    fn upper_case(&self) -> String {
        self.name().to_uppercase()
    }
}

struct Boo();

impl Named for Boo {
    fn name(&self) -> String {
        "boo".to_string()
    }
}

let f = Boo();

assert_eq!(f.name(),"boo".to_string());
assert_eq!(f.upper_case(),"BOO".to_string());

#}

This is a kind of code reuse, true, but note that it does not apply to the data, only the interface!

Ducks and Generics

An example of generic-friendly duck function in Rust would be this trivial one:

# #![allow(unused_variables)]
# 
#fn main() {
fn quack<Q> (q: &Q)
where Q: Quack {
    q.quack();
}

let d = Duck();
quack(&d);

#}

The type parameter is any type which implements Quack. There's an important difference between quack and the quack_ref defined in the last section. The body of this function is compiled for each of the calling types and no virtual method is needed; such functions can be completely inlined. It uses the trait Quack in a different way, as a constraint on generic types.

This is the C++ equivalent to the generic quack (note the const):

template <class Q>
void quack(const Q& q) {
    q.quack();
}

Note that the type parameter is not constrained in any way.

This is very much compile-time duck-typing - if we pass a reference to a non-quackable type, then the compiler will complain bitterly about no quack method. At least the error is found at compile-time, but it's worse when a type is accidentally Quackable, as happens with Go. More involved template functions and classes lead to terrible error messages, because there are no constraints on the generic types.

You could define a function which could handle an iteration over Quacker pointers:

template <class It>
void quack_everyone (It start, It finish) {
    for (It i = start; i != finish; i++) {
        (*i)->quack();
    }
}

This would then be implemented for each iterator type It. The Rust equivalent is a little more challenging:

# #![allow(unused_variables)]
# 
#fn main() {
fn quack_everyone <I> (iter: I)
where I: Iterator<Item=Box<Quack>> {
    for d in iter {
        d.quack();
    }
}

let ducks: Vec<Box<Quack>> = vec![Box::new(duck1),Box::new(duck2),Box::new(parrot),Box::new(int)];

quack_everyone(ducks.into_iter());

#}

Iterators in Rust aren't duck-typed but are types that must implement Iterator, and in this case the iterator provides boxes of Quack. There's no ambiguity about the types involved, and the values must satisfy Quack. Often the function signature is the most challenging thing about a generic Rust function, which is why I recommend reading the source of the standard library - the implementation is often much simpler than the declaration! Here the only type parameter is the actual iterator type, which means that this will work with anything that can deliver a sequence of Box<Duck>, not just a vector iterator.

Inheritance

A common problem with object-oriented design is trying to force things into a is-a relationship, and neglecting has-a relationships. The GoF said "Prefer Composition to Inheritance" in their Design Patterns book, twenty-two years ago.

Here's an example: you want to model the employees of some company, and Employee seems a good name for a class. Then, Manager is-a Employee (this is true) so we start building our hierarchy with a Manager subclass of Employee. This isn't as smart as it seems. Maybe we got carried away with identifying important Nouns, maybe we (unconsciously) think that managers and employees are different kinds of animals? It's better for Employee to has-a Roles collection, and then a manager is just an Employee with more responsibilities and capabilities.

Or consider Vehicles - ranging from bicycles to 300t ore trucks. There are multiple ways to think about vehicles, road-worthiness (all-terrain, city, rail-bound, etc), power-source (electric, diesel, diesel-electric, etc), cargo-or-people, and so forth. Any fixed hierarchy of classes you create based on one aspect ignores all other aspects. That is, there are multiple possible classifications of vehicles!

Composition is more important in Rust for the obvious reason that you can't inherit functionality in a lazy way from a base class.

Composition is also important because the borrow checker is smart enough to know that borrowing different struct fields are separate borrows. You can have a mutable borrow of one field while having an immutable borrow of another field, and so forth. Rust cannot tell that a method only accesses one field, so the fields should be structs with their own methods for implementation convenience. (The external interface of the struct can be anything you like using suitable traits.)

A concrete example of 'split borrrowing' will make this clearer. We have a struct that owns some strings, with a method for borrowing the first string mutably.

# #![allow(unused_variables)]
# 
#fn main() {
struct Foo {
    one: String,
    two: String
}

impl Foo {
    fn borrow_one_mut(&mut self) -> &mut String {
        &mut self.one
    }
    ....
}

#}

(This is an example of a Rust naming convention - such methods should end in _mut)

Now, a method for borrowing both strings, reusing the first method:

# #![allow(unused_variables)]
# 
#fn main() {
    fn borrow_both(&self) -> (&str,&str) {
        (self.borrow_one_mut(), &self.two)
    }

#}

Which can't work! We've borrrowed mutably from self and also borrowed immmutably from self. If Rust allowed situations like this, then that immmutable reference can't be guaranteed not to change.

The solution is simple:

# #![allow(unused_variables)]
# 
#fn main() {
    fn borrow_both(&self) -> (&str,&str) {
        (&self.one, &self.two)
    }

#}

And this is fine, because the borrow checker considers these to be independent borrows. So imagine that the fields were some arbitrary types, and you can see that methods called on these fields will not cause borrowing problems.

There is a restricted but very important kind of 'inheritance' with Deref, which is the trait for the 'dereference' operator *. String implements Deref<Target=str> and so all the methods defined on &str are automatically available for String as well! In a similar way, the methods of Foo can be directly called on Box<Foo>. Some find this a little ... magical, but it is tremendously convenient. There is a simpler language inside modern Rust, but it would not be half as pleasant to use. It really should be used for cases where there is an owned, mutable type and a simpler borrowed type.

Generally in Rust there is trait inheritance:

# #![allow(unused_variables)]
# 
#fn main() {
trait Show {
    fn show(&self) -> String;
}

trait Location {
    fn location(&self) -> String;
}

trait ShowTell: Show + Location {}

#}

The last trait simply combines our two distinct traits into one, although it could specify other methods.

Things now proceed as before:

# #![allow(unused_variables)]
# 
#fn main() {
#[derive(Debug)]
struct Foo {
    name: String,
    location: String
}

impl Foo {
    fn new(name: &str, location: &str) -> Foo {
        Foo{
            name: name.to_string(),
            location: location.to_string()
        }
    }
}

impl Show for Foo {
    fn show(&self) -> String {
        self.name.clone()
    }
}

impl Location for Foo {
    fn location(&self) -> String {
        self.location.clone()
    }
}

impl ShowTell for Foo {}


#}

Now, if I have a value foo of type Foo, then a reference to that value will satisfy &Show, &Location or &ShowTell (which implies both.)

Here's a useful little macro:

# #![allow(unused_variables)]
# 
#fn main() {
macro_rules! dbg {
    ($x:expr) => {
        println!("{} = {:?}",stringify!($x),$x);
    }
}

#}

It takes one argument (represented by $x) which must be an 'expression'. We print out its value, and a stringified version of the value. C programmers can be a little smug at this point, but this means that if I passed 1+2 (an expression) then stringify!(1+2) is the literal string "1+2". This will save us some typing when playing with code:

# #![allow(unused_variables)]
# 
#fn main() {
let foo = Foo::new("Pete","bathroom");
dbg!(foo.show());
dbg!(foo.location());

let st: &ShowTell = &foo;

dbg!(st.show());
dbg!(st.location());

fn show_it_all(r: &ShowTell) {
    dbg!(r.show());
    dbg!(r.location());
}

let boo = Foo::new("Alice","cupboard");
show_it_all(&boo);

fn show(s: &Show) {
    dbg!(s.show());
}

show(&boo);

// foo.show() = "Pete"
// foo.location() = "bathroom"
// st.show() = "Pete"
// st.location() = "bathroom"
// r.show() = "Alice"
// r.location() = "cupboard"
// s.show() = "Alice"

#}

This is object-orientation, just not the kind you may be used to.

Please note that the Show reference passed to show can not be dynamically upgraded to a ShowTell! Languages with more dynamic class systems allow you to check whether a given object is an instance of a class and then to do a dynamic cast to that type. It isn't really a good idea in general, and specifically cannot work in Rust because that Show reference has 'forgotten' that it was originally a ShowTell reference.

You always have a choice: polymorphic, via trait objects, or monomorphic, via generics constrainted by traits. Modern C++ and the Rust standard library tends to take the generic route, but the polymorphic route is not obsolete. You do have to understand the different trade-offs - generics generate the fastest code, which can be inlined. This may lead to code bloat. But not everything needs to be as fast as possible - it may only happen a 'few' times in the lifetime of a typical program run.

So, here's a summary:

  • the role played by class is shared between data and traits
  • structs and enums are dumb, although you can define methods and do data hiding
  • a limited form of subtyping is possible on data using the Deref trait
  • traits don't have any data, but can be implemented for any type (not just structs)
  • traits can inherit from other traits
  • traits can have provided methods, allowing interface code re-use
  • traits give you both virtual methods (polymorphism) and generic constraints (monomorphism)

Example: Windows API

One of the areas where traditional OOP is used extensively is GUI toolkits. An EditControl or a ListWindow is-a Window, and so forth. This makes writing Rust bindings to GUI toolkits more difficult than it needs to be.

Win32 programming can be done directly in Rust, and it's a little less awkward than the original C. As soon as I graduated from C to C++ I wanted something cleaner and did my own OOP wrapper.

A typical Win32 API function is ShowWindow which is used to control the visibility of a window. Now, an EditControl has some specialized functionality, but it's all done with a Win32 HWND ('handle to window') opaque value. You would like EditControl to also have a show method, which traditionally would be done by implementation inheritance. You not want to have to type out all these inherited methods for each type! But Rust traits provide a solution. There would be a Window trait:

# #![allow(unused_variables)]
# 
#fn main() {
trait Window {
    // you need to define this!
    fn get_hwnd(&self) -> HWND;

    // and all these will be provided
    fn show(&self, visible: bool) {
        unsafe {
         user32_sys::ShowWindow(self.get_hwnd(), if visible {1} else {0})
        }
    }

    // ..... oodles of methods operating on Windows

}

#}

So, the implementation struct for EditControl can just contain a HWND and implement Window by defining one method; EditControl is a trait that inherits from Window and defines the extended interface. Something like ComboxBox - which behaves like an EditControl and a ListWindow can be easily implemented with trait inheritance.

The Win32 API ('32' no longer means '32-bit' anymore) is in fact object-oriented, but an older style, influenced by Alan Kay's definition: objects contain hidden data, and are operated on by messages. So at the heart of any Windows application there's a message loop, and the various kinds of windows (called 'window classes') implement these methods with their own switch statements. There is a message called WM_SETTEXT but the implementation can be different: a label's text changes, but a top-level window's caption changes.

Here is a rather promising minimal Windows GUI framework. But to my taste, there are too many unwrap instances going on - and some of them aren't even errors. This is because NWG is exploiting the loose dynamic nature of messaging. With a proper type-safe interface, more errors are caught at compile-time.

The next edition of The Rust Programming Language book has a very good discussion on what 'object-oriented' means in Rust.