Object-Orientation in Rust
Everyone comes from somewhere, and the chances are good that your previous programming language implemented Object-Oriented Programming (OOP) in a particular way:
- 'classes' act as factories for generating objects (often called instances) and define unique types.
- Classes may inherit from other classes (their parents), inheriting both data (fields) and behaviour (methods)
- If B inherits from A, then an instance of B can be passed to something expecting A (subtyping)
- An object should hide its data (encapsulation), which can only be operated on with methods.
Object-oriented design is then about identifying the classes (the 'nouns') and the methods (the 'verbs') and then establishing relationships between them, is-a and has-a.
There was a point in the old Star Trek series where the doctor would say to the captain,
"It's Life, Jim, just not Life as we know it". And this applies very much to Rust-flavoured
object-orientation: it comes as a shock, because Rust data aggregates (structs, enums
and tuples) are dumb. You can define methods on them, and make the data itself private,
all the usual tactics of encapsulation, but they are all unrelated types.
There is no subtyping and no inheritance of data (apart from the specialized
case of Deref
coercions.)
The relationships between various data types in Rust are established using traits. A large part of learning Rust is understanding how the standard library traits operate, because that's the web of meaning that glues all the data types together.
Traits are interesting because there's no one-to-one correspondence between them and concepts from mainstream languages. It depends if you're thinking dynamically or statically. In the dynamic case, they're rather like Java or Go interfaces.
Trait Objects
Consider the example first used to introduce traits:
# #![allow(unused_variables)] # #fn main() { trait Show { fn show(&self) -> String; } impl Show for i32 { fn show(&self) -> String { format!("four-byte signed {}", self) } } impl Show for f64 { fn show(&self) -> String { format!("eight-byte float {}", self) } } #}
Here's a little program with big implications:
fn main() { let answer = 42; let maybe_pi = 3.14; let v: Vec<&Show> = vec![&answer,&maybe_pi]; for d in v.iter() { println!("show {}",d.show()); } } // show four-byte signed 42 // show eight-byte float 3.14
This is a case where Rust needs some type guidance - I specifically want a vector
of references to anything that implements Show
. Now note that i32
and f64
have no relationship to each other, but they both understand the show
method
because they both implement the same trait. This method is virtual, because
the actual method has different code for different types, and yet the correct
method is invoked based on runtime information. These references
are called trait objects.
And that is how you can put objects of different types in the same vector. If
you come from a Java or Go background, you can think of Show
as acting like an interface.
A little refinement of this example - we box the values. A box contains a reference to data
allocated on the heap, and acts very much like a reference - it's a smart pointer. When boxes
go out of scope and Drop
kicks in, then that memory is released.
# #![allow(unused_variables)] # #fn main() { let answer = Box::new(42); let maybe_pi = Box::new(3.14); let show_list: Vec<Box<Show>> = vec![question,answer]; for d in &show_list { println!("show {}",d.show()); } #}
The difference is that you can now take this vector, pass it as a reference or give it away without having to track any borrowed references. When the vector is dropped, the boxes will be dropped, and all memory is reclaimed.
Animals
For some reason, any discussion of OOP and inheritance seems to end up talking about animals. It
makes for a nice story: "See, a Cat is a Carnivore. And a Carnivore is an Animal". But I'll start
with a classic slogan from the Ruby universe: "if it quacks, it's a duck". All your objects have
to do is define quack
and they can be considered to be ducks, albeit in a very narrow way.
# #![allow(unused_variables)] # #fn main() { trait Quack { fn quack(&self); } struct Duck (); impl Quack for Duck { fn quack(&self) { println!("quack!"); } } struct RandomBird { is_a_parrot: bool } impl Quack for RandomBird { fn quack(&self) { if ! self.is_a_parrot { println!("quack!"); } else { println!("squawk!"); } } } let duck1 = Duck(); let duck2 = RandomBird{is_a_parrot: false}; let parrot = RandomBird{is_a_parrot: true}; let ducks: Vec<&Quack> = vec![&duck1,&duck2,&parrot]; for d in &ducks { d.quack(); } // quack! // quack! // squawk! #}
Here we have two completely different types (one is so dumb it doesn't even have data), and yes,
they all quack()
. One is behaving a little odd (for a duck) but they share the same method name
and Rust can keep a collection of such objects in a type-safe way.
Type safety is a fantastic thing. Without static typing, you could insert a cat into that collection of Quackers, resulting in run-time chaos.
Here's a funny one:
# #![allow(unused_variables)] # #fn main() { // and why the hell not! impl Quack for i32 { fn quack(&self) { for i in 0..*self { print!("quack {} ",i); } println!(""); } } let int = 4; let ducks: Vec<&Quack> = vec![&duck1,&duck2,&parrot,&int]; ... // quack! // quack! // squawk! // quack 0 quack 1 quack 2 quack 3 #}
What can I say? It quacks, it must be a duck. What's interesting is that you can apply your traits
to any Rust value, not just 'objects'. (Since quack
is passed a reference, there's an explicit
dereference *
to get the integer.)
However, you can only do this with a trait and a type from the same crate, so the standard library cannot be 'monkey patched', which is another piece of Ruby folk practice (and not the most wildly admired either.)
Up to this point, the trait Quack
was behaving very much like a Java interface, and like
modern Java interfaces you can have provided methods which supply a default implementation
if you have implemented the required methods. (The Iterator
trait is a good example.)
But, note that traits are not part of the definition of a type and you can define and implement new traits on any type, subject to the same-crate restriction.
It's possible to pass a reference to any Quack
implementor:
# #![allow(unused_variables)] # #fn main() { fn quack_ref (q: &Quack) { q.quack(); } quack_ref(&d); #}
And that's subtyping, Rust-style.
Since we're doing Programming Language Comparisons 101 here, I'll mention that Go has an interesting
take on the quacking business - if there's a Go interface Quack
, and a type has a quack
method,
then that type satisfies Quack
without any need for explicit definition. This also breaks the
baked-into-definition Java model, and allows compile-time duck-typing, at the cost of some
clarity and type-safety.
But there is a problem with duck-typing.
One of the signs of bad OOP is too many methods which have some
generic name like run
. "If it has run(), it must be Runnable" doesn't sound so catchy as
the original! So it is possible for a Go interface to be accidentally valid. In Rust,
both the Debug
and Display
traits define fmt
methods, but they really mean different
things.
So Rust traits allow traditional polymorphic OOP. But what about inheritance? People usually
mean implementation inheritance whereas Rust does interface inheritance. It's as if a Java
programmer never used extend
and instead used implements
. And this is actually
recommended practice
by Alan Holub. He says:
I once attended a Java user group meeting where James Gosling (Java's inventor) was the featured speaker. During the memorable Q&A session, someone asked him: "If you could do Java over again, what would you change?" "I'd leave out classes," he replied. After the laughter died down, he explained that the real problem wasn't classes per se, but rather implementation inheritance (the extends relationship). Interface inheritance (the implements relationship) is preferable. You should avoid implementation inheritance whenever possible
So even in Java, you've probably been overdoing classes!
Implementation inheritance has some serious problems. But it does feel so very
convenient. There's this fat base class called Animal
and it has loads of useful
functionality (it may even expose its innards!) which our derived class Cat
can use. That is,
it is a form of code reuse. But code reuse is a separate concern.
Getting the distinction between implementation and interface inheritance is important when understanding Rust.
Note that traits may have provided methods. Consider Iterator
- you only have to override
next
, but get a whole host of methods free. This is similar to 'default' methods of modern
Java interfaces. Here we only define name
and upper_case
is defined for us. We could
override upper_case
as well, but it isn't required.
# #![allow(unused_variables)] # #fn main() { trait Named { fn name(&self) -> String; fn upper_case(&self) -> String { self.name().to_uppercase() } } struct Boo(); impl Named for Boo { fn name(&self) -> String { "boo".to_string() } } let f = Boo(); assert_eq!(f.name(),"boo".to_string()); assert_eq!(f.upper_case(),"BOO".to_string()); #}
This is a kind of code reuse, true, but note that it does not apply to the data, only the interface!
Ducks and Generics
An example of generic-friendly duck function in Rust would be this trivial one:
# #![allow(unused_variables)] # #fn main() { fn quack<Q> (q: &Q) where Q: Quack { q.quack(); } let d = Duck(); quack(&d); #}
The type parameter is any type which implements Quack
. There's an important difference
between quack
and the quack_ref
defined in the last section.
The body of this function is compiled for each of the calling
types and no virtual method is needed; such functions can be completely inlined. It
uses the trait Quack
in a different way, as a constraint on generic types.
This is the C++ equivalent to the generic quack
(note the const
):
template <class Q>
void quack(const Q& q) {
q.quack();
}
Note that the type parameter is not constrained in any way.
This is very much compile-time duck-typing - if we pass a reference to a
non-quackable type, then the compiler will complain bitterly about no quack
method.
At least the error is found at compile-time, but it's worse when a type is accidentally
Quackable, as happens with Go. More involved template functions and classes lead to
terrible error messages, because there are no constraints on the generic types.
You could define a function which could handle an iteration over Quacker pointers:
template <class It>
void quack_everyone (It start, It finish) {
for (It i = start; i != finish; i++) {
(*i)->quack();
}
}
This would then be implemented for each iterator type It
.
The Rust equivalent is a little more challenging:
# #![allow(unused_variables)] # #fn main() { fn quack_everyone <I> (iter: I) where I: Iterator<Item=Box<Quack>> { for d in iter { d.quack(); } } let ducks: Vec<Box<Quack>> = vec![Box::new(duck1),Box::new(duck2),Box::new(parrot),Box::new(int)]; quack_everyone(ducks.into_iter()); #}
Iterators in Rust aren't duck-typed but are types that must implement Iterator
, and in
this case the iterator provides boxes of Quack
. There's no ambiguity about the types
involved, and the values must satisfy Quack
. Often the function signature is the most challenging
thing about a generic Rust function, which is why I recommend reading
the source of the standard library - the implementation is often much simpler than the declaration!
Here the only type parameter is the actual iterator type,
which means that this will work with anything that can deliver a sequence of Box<Duck>
, not just
a vector iterator.
Inheritance
A common problem with object-oriented design is trying to force things into a is-a relationship, and neglecting has-a relationships. The GoF said "Prefer Composition to Inheritance" in their Design Patterns book, twenty-two years ago.
Here's an example: you want to model the employees of some company, and Employee
seems a good
name for a class. Then, Manager is-a Employee (this is true) so we start building our
hierarchy with a Manager
subclass of Employee
. This isn't as smart as it seems. Maybe we got
carried away with identifying important Nouns, maybe we (unconsciously) think that managers and
employees are different kinds of animals? It's better for Employee
to has-a Roles
collection,
and then a manager is just an Employee
with more responsibilities and capabilities.
Or consider Vehicles - ranging from bicycles to 300t ore trucks. There are multiple ways to think about vehicles, road-worthiness (all-terrain, city, rail-bound, etc), power-source (electric, diesel, diesel-electric, etc), cargo-or-people, and so forth. Any fixed hierarchy of classes you create based on one aspect ignores all other aspects. That is, there are multiple possible classifications of vehicles!
Composition is more important in Rust for the obvious reason that you can't inherit functionality in a lazy way from a base class.
Composition is also important because the borrow checker is smart enough to know that borrowing different struct fields are separate borrows. You can have a mutable borrow of one field while having an immutable borrow of another field, and so forth. Rust cannot tell that a method only accesses one field, so the fields should be structs with their own methods for implementation convenience. (The external interface of the struct can be anything you like using suitable traits.)
A concrete example of 'split borrrowing' will make this clearer. We have a struct that owns some strings, with a method for borrowing the first string mutably.
# #![allow(unused_variables)] # #fn main() { struct Foo { one: String, two: String } impl Foo { fn borrow_one_mut(&mut self) -> &mut String { &mut self.one } .... } #}
(This is an example of a Rust naming convention - such methods should end in _mut
)
Now, a method for borrowing both strings, reusing the first method:
# #![allow(unused_variables)] # #fn main() { fn borrow_both(&self) -> (&str,&str) { (self.borrow_one_mut(), &self.two) } #}
Which can't work! We've borrrowed mutably from self
and also borrowed immmutably from self
.
If Rust allowed situations like this, then that immmutable reference can't be guaranteed not to
change.
The solution is simple:
# #![allow(unused_variables)] # #fn main() { fn borrow_both(&self) -> (&str,&str) { (&self.one, &self.two) } #}
And this is fine, because the borrow checker considers these to be independent borrows. So imagine that the fields were some arbitrary types, and you can see that methods called on these fields will not cause borrowing problems.
There is a restricted but very important kind of
'inheritance' with Deref,
which is the trait for the 'dereference' operator *
.
String
implements Deref<Target=str>
and so all the methods defined on &str
are automatically
available for String
as well! In a similar way, the methods of Foo
can be directly
called on Box<Foo>
. Some find this a little ... magical, but it is tremendously convenient.
There is a simpler language inside modern Rust, but it would not be half as pleasant to use.
It really should be used for cases where there is an owned, mutable type and a simpler borrowed
type.
Generally in Rust there is trait inheritance:
# #![allow(unused_variables)] # #fn main() { trait Show { fn show(&self) -> String; } trait Location { fn location(&self) -> String; } trait ShowTell: Show + Location {} #}
The last trait simply combines our two distinct traits into one, although it could specify other methods.
Things now proceed as before:
# #![allow(unused_variables)] # #fn main() { #[derive(Debug)] struct Foo { name: String, location: String } impl Foo { fn new(name: &str, location: &str) -> Foo { Foo{ name: name.to_string(), location: location.to_string() } } } impl Show for Foo { fn show(&self) -> String { self.name.clone() } } impl Location for Foo { fn location(&self) -> String { self.location.clone() } } impl ShowTell for Foo {} #}
Now, if I have a value foo
of type Foo
, then a reference to that value will
satisfy &Show
, &Location
or &ShowTell
(which implies both.)
Here's a useful little macro:
# #![allow(unused_variables)] # #fn main() { macro_rules! dbg { ($x:expr) => { println!("{} = {:?}",stringify!($x),$x); } } #}
It takes one argument (represented by $x
) which must be an 'expression'. We print out its
value, and a stringified version of the value. C programmers can be a little smug at this point,
but this means that if I passed 1+2
(an expression) then stringify!(1+2)
is the literal
string "1+2". This will save us some typing when playing with code:
# #![allow(unused_variables)] # #fn main() { let foo = Foo::new("Pete","bathroom"); dbg!(foo.show()); dbg!(foo.location()); let st: &ShowTell = &foo; dbg!(st.show()); dbg!(st.location()); fn show_it_all(r: &ShowTell) { dbg!(r.show()); dbg!(r.location()); } let boo = Foo::new("Alice","cupboard"); show_it_all(&boo); fn show(s: &Show) { dbg!(s.show()); } show(&boo); // foo.show() = "Pete" // foo.location() = "bathroom" // st.show() = "Pete" // st.location() = "bathroom" // r.show() = "Alice" // r.location() = "cupboard" // s.show() = "Alice" #}
This is object-orientation, just not the kind you may be used to.
Please note that the Show
reference passed to show
can not be dynamically
upgraded to a ShowTell
! Languages with more dynamic class systems allow you to
check whether a given object is an instance of a class and then to do a
dynamic cast to that type. It isn't really a good idea in general, and specifically
cannot work in Rust because that Show
reference has 'forgotten' that it was originally
a ShowTell
reference.
You always have a choice: polymorphic, via trait objects, or monomorphic, via generics constrainted by traits. Modern C++ and the Rust standard library tends to take the generic route, but the polymorphic route is not obsolete. You do have to understand the different trade-offs - generics generate the fastest code, which can be inlined. This may lead to code bloat. But not everything needs to be as fast as possible - it may only happen a 'few' times in the lifetime of a typical program run.
So, here's a summary:
- the role played by
class
is shared between data and traits - structs and enums are dumb, although you can define methods and do data hiding
- a limited form of subtyping is possible on data using the
Deref
trait - traits don't have any data, but can be implemented for any type (not just structs)
- traits can inherit from other traits
- traits can have provided methods, allowing interface code re-use
- traits give you both virtual methods (polymorphism) and generic constraints (monomorphism)
Example: Windows API
One of the areas where traditional OOP is used extensively is GUI toolkits. An EditControl
or a ListWindow
is-a Window
, and so forth. This makes writing Rust bindings to GUI toolkits more difficult
than it needs to be.
Win32 programming can be done directly in Rust, and it's a little less awkward than the original C. As soon as I graduated from C to C++ I wanted something cleaner and did my own OOP wrapper.
A typical Win32 API function is ShowWindow
which is used to control the visibility of a window. Now, an EditControl
has some specialized
functionality, but it's all done with a Win32 HWND
('handle to window') opaque value.
You would like EditControl
to also have a show
method, which traditionally would be done
by implementation inheritance. You not want to have to type out all these inherited methods
for each type! But Rust traits provide a solution. There would be a Window
trait:
# #![allow(unused_variables)] # #fn main() { trait Window { // you need to define this! fn get_hwnd(&self) -> HWND; // and all these will be provided fn show(&self, visible: bool) { unsafe { user32_sys::ShowWindow(self.get_hwnd(), if visible {1} else {0}) } } // ..... oodles of methods operating on Windows } #}
So, the implementation struct for EditControl
can just contain a HWND
and implement Window
by defining one method; EditControl
is a trait that inherits from Window
and defines the extended
interface. Something like ComboxBox
- which behaves like an EditControl
and a
ListWindow
can be easily implemented with trait inheritance.
The Win32 API ('32' no longer means '32-bit' anymore) is in fact object-oriented, but an
older style, influenced by Alan Kay's definition: objects contain hidden data, and are operated
on by messages. So at the heart of any Windows application there's a message loop, and
the various kinds of windows (called 'window classes') implement these methods with their
own switch statements. There is a message called WM_SETTEXT
but the implementation can be
different: a label's text changes, but a top-level window's caption changes.
Here is a rather promising
minimal Windows GUI framework. But to my taste, there are too many unwrap
instances
going on - and some of them aren't even errors. This is because NWG is exploiting the
loose dynamic nature of messaging. With a proper type-safe interface, more errors are
caught at compile-time.
The next edition of The Rust Programming Language book has a very good discussion on what 'object-oriented' means in Rust.