The Common Rust Traits
What is a Trait?
In Rust, data types - primitives, structs, enums and any other ‘aggregate’ types like tuples and arrays - are dumb. They may have methods but that is just a convenience (they are just functions). Types have no relationship with each other.
Traits are the abstract mechanism for adding functionality to types and establishing relationships between them.
They operate in two different modes; in their more familiar guise they act
like interfaces in Java or C# (and in fact the keyword was originally
interface
).
Interface inheritance is supported, but not implementation inheritance.
There is support for object-orientated programming
but it is different enough from the mainstream to cause conceptual confusion.
But, most characteristically, traits act as generic constraints. A generic function is
defined over types that implement specific traits.
That is, the “compile-time duck typing” of C++ templates is avoided. If we are passed a duck, then
it must implement Duck
. The quack()
method itself is not sufficient, as it is with Go.
Converting Things to Strings
To make this more concrete, consider ToString
which defines a to_string
method.
There are two ways to write functions taking references to types that implement it.
The first is generic or monomorphic:
use std::string::ToString;
fn to_string1<T: ToString> (item: &T) -> String {
item.to_string()
}
println!("{}", to_string1(&42));
println!("{}", to_string1(&"hello"));
item
is a reference to any type which implements ToString
.
The second is dynamic or polymorphic:
fn to_string2(item: &ToString) -> String {
item.to_string()
}
println!("{}", to_string2(&42));
println!("{}", to_string2(&"hello"));
Now, converting numbers and string slices to owned strings are obviously different operations.
In the first case, different code is generated for each distinct type, just like with a C++ template.
This is maximally efficient - to_string
can be inlined.
In the second case, the code is generated once (it’s an ordinary function) but the actual
to_string
is called dynamically. Here &ToString
is behaving much like a Java interface
or C++ base class with virtual methods.
A reference to a concrete type becomes a trait object. It’s non-trivial because the trait object has two parts - the original reference and a ‘virtual method table’ containing the methods of the trait (a so-called “fat pointer”).
let d: &Display = &10;
A little too much magic is happening here, and Rust is moving towards a
more explicit notation for trait objects, &dyn ToString
etc.
How to decide between generic and polymorphic? The only honest answer is “it depends”. Bear in mind that the actual cost of using trait objects might be negligible compared to the other work done by a program. (It’s hard to make engineering decisions based on micro-benchmarks.)
Printing Out: Display and Debug
For a value to be printed out using {}
, it must implement the Display trait.
{:?}
requires that it implement Debug.
Defining Display
for your own types is straightforward but needs to be
explicit, since the compiler cannot reasonably guess what the
output format must be (unlike with Debug
)
use std::fmt;
// Debug can be auto-generated
#[derive(Debug)]
struct MyType {
x: u32,
y: u32
}
// but not Display
impl fmt::Display for MyType {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "x={},y={}", self.x, self.y)
}
}
let t = MyType{x:1,y:2};
println!("{}", t); //=> x=1,y=2
println!("{:?}", t); //=> MyType { x: 1, y: 2 }
The write!
macro is a relative of our friend println!
where the first
parameter is anything that implements Write
(more about this very important
trait later.)
Debug
is implemented by most standard library types and is a very convenient
way to get a developer-friendly string representation of your types. But note
that you have to ask for Debug
to be implemented - Rust is not going to
make all structs pay the price of the extra code by default.
Any type that implements Display
automatically implements ToString
, so
42.to_string()
, "hello".to_string()
all work as expected.
(Rust traits often hunt in packs.)
Default
This expresses the intuitive idea that most types have a sensible default value,
like zero for numbers, empty for vectors, “” for String
, etc.
Most standard library types
implement Default.
Here is a roundabout way to declare an integer variable and set it to zero.
default
is a generic method that returns some T
, so Rust needs to know that
T
somehow:
let n: u64 = Default::default(); // declare type explicitly
Default
is easy to implement for your own structs,
providing the type of each field implements Default
#[derive(Default)]
struct MyStruct {
name: String,
age: u16
}
...
let mine: MyStruct = Default::default();
Rust likes to be explicit so this does not happen automatically, unlike in other
languages. If you said let n: u64;
then Rust would expect a later initialization,
or complain bitterly.
There are no ‘named function parameters’ in Rust, but here is one idiom that achieves
the same thing. Imagine you have a function which could take a large number of
configuration arguments - that’s usually not a good idea, so you make up a big struct
called Config
. If Config
implements Default
, then the function could be called
like so, without having to specify each and every field in Config
.
my_function(Config {
job_name: "test",
output_dir: Path::new("/tmp"),
...Default::default()
})
Conversion: From and Into
An important pair of traits is From/Into
. The From
trait expresses the conversion
of one value into another using the from
method. So we have String::from("hello")
.
If From
is implemented, then the Into
trait is auto-implemented.
Since String
implements From<&str>
, then &str
automatically implements Into<String>
.
let s = String::from("hello"); // From
let s: String = "hello".into(); // Into
The json crate provides a nice example. A JSON object is indexed with strings,
and new fields can be created by inserting JsonValue
values:
obj["surname"] = JsonValue::from("Smith"); // From
obj["name"] = "Joe".into(); // Into
obj["age"] = 35.into(); // Into
Note how convenient it is to use into()
here, instead of using from()
. We are doing
a conversion which Rust will not do implicitly. But into()
is a small word,
easy to type and read.
From
expresses a conversion that always succeeds. It may be relatively expensive, though:
converting a string slice to a String
will allocate a buffer and copy the bytes. The
conversion always takes place by value.
From/Info
has an intimate relationship with Rust error handling.
This statement in a function returning Result<T,E>
:
let res = returns_some_result()?;
is (in effect) sugar for this:
let res = match returns_some_result() {
Ok(r) => r,
Err(e) => return Err(e.into())
};
That is, any error type which can convert into the returned error type E
works.
A useful strategy for informal error handling is to make the function return
Result<T,Box<Error>>
. Any type that implements Error
can be converted
into the trait object Box<Error>
.
Making Copies: Clone and Copy
From
(and its mirror image Into
) describe how distinct types are converted into
each other. Clone
describes how a new value of the same type can be created.
Rust likes to make any potentially expensive operation obvious, so val.clone()
.
This can simply involve moving some bits around (“bitwise copy”). A number is just a bit pattern in memory.
But String
is different, since as well as size and capacity fields,
it has dynamically-allocated string data. To clone a string involves
allocating that buffer and copying the original bytes into it. There’s depth
to the clone operation here.
Making your types cloneable is easy, as long as every type in a struct or enum
implements Clone
:
#[derive(Debug,Clone)]
struct Person {
first_name: String,
last_name: String,
}
Copy
is a marker trait (there are no methods to implement) which says that
a type may be copied by just moving bits. You can define it for your own
structs:
#[derive(Debug,Clone,Copy)]
struct Point {
x: f32,
y: f32,
z: f32
}
Again, only possible if all types implement Copy
. You cannot sneak in a
non-Copy
type like String
here!
This trait interacts with a key Rust feature: moving. Moving a value is always
done by simply moving bits around. If the value is Copy
, then the original
location remains valid. (The implication is that copying is always bitwise.)
let n1 = 42;
let n2 = n1;
// n1 is still fine (i32 is Copy)
let s1 = "hello".to_string();
let s2 = s1;
// value moved into s2, s1 can no longer be used!
Bad things would happen if s1
was still valid - both s1
and s2
would
be dropped at the end of scope and their shared buffer would be deallocated twice!
C++ handles this situation by always copying; in Rust you
must say s1.clone()
.
Fallible Conversions - FromStr
If I have the integer 42
, then it is safe to convert this to an owned string,
which is expressed by ToString
. However, if I have the string “42” then
the conversion into i32
must be prepared to fail.
To implement FromStr
takes two things; defining the from_str
method
and setting the associated type Err
to the error type returned when the conversion fails.
Usually it’s used implicitly through the string parse
method. This is a method with
a generic output type, which needs to be tied down.
E.g. using the turbofish operator:
let answer = match "42".parse::<i32>() {
Ok(n) => n,
Err(e) => panic!("'42' was not 42!");
};
Or (more elegantly) in a context where we can use ?
:
let answer: i32 = "42".parse()?;
The Rust standard library defines FromStr
for the numerical types and for network addresses.
It is of course possible for external crates to define FromStr
for their types and then
they will work with parse
as well. This is a cool thing about the standard traits - they
are all open for further extension.
Reference Conversions - AsRef
AsRef expresses the situation where a cheap reference conversion is possible between two types.
The most common place you will see it in action is with &Path
. In an ideal world,
all file systems would enforce UTF-8 names and we could just use String
to
store them. However, we have not yet arrived at Utopia and Rust has a dedicated
type PathBuf
with specialized path handling methods, backed by OsString
,
which represents untrusted text from the OS. &Path
is the borrowed counterpart
to PathBuf
. It is cheap to get a &Path
reference from regular Rust strings
so AsRef
is appropriate:
// asref.rs
fn exists(p: impl AsRef<Path>) -> bool {
p.as_ref().exists()
}
assert!(exists("asref.rs"));
assert!(exists(Path::new("asref.rs")));
let ps = String::from("asref.rs");
assert!(exists(&ps));
assert!(exists(PathBuf::from("asref.rs")));
This allows any function or method working with file system paths to be conveniently
called with any type that implements AsRef<Path>
. From the documentation:
impl AsRef<Path> for Path
impl AsRef<Path> for OsStr
impl AsRef<Path> for OsString
impl AsRef<Path> for str
impl AsRef<Path> for String
impl AsRef<Path> for PathBuf
Follow this pattern when defining a public API, because people are accustomed to this little convenience.
AsRef<str>
is implemented for String
, so we can also say:
fn is_hello(s: impl AsRef<str>) {
assert_eq!("hello", s.as_ref());
}
is_hello("hello");
is_hello(String::from("hello"));
This seems attractive, but using this is very much a matter of taste. Idiomatic Rust code
prefers to declare string arguments as &str
and lean on deref coercion
for convenient passing of &String
references.
Overloading *
- Deref
Many string methods in Rust are not actually defined on String
. The methods
explicitly defined typically mutate the string, like push
and push_str
.
But something like starts_with
applies to string slices as well.
At one point in Rust’s history, this had to be done explicitly, so if you
had a String
called s
, you would have to say s.as_str().starts_with("hello")
.
You will occasionally see as_str()
, but mostly method resolution happens
through the magic of deref coercion.
The Deref trait is actually used to implement the “dereference” operator *
.
This has the same meaning as in C - extract the value which the reference is
pointing to - although doesn’t appear explicitly as much. If r
is a reference,
then you say r.foo()
, but if you did want the value, you have to say *r
(In this respect Rust references are more like C pointers than C++ references,
which try to be behave like C++ values, leading to hidden differences.)
The most obvious use of Deref
is with “smart pointers” like Box<T>
and Rc<T>
- they behave like references to the values inside them,
so you can call methods of T
on Box<T>
and so forth.
String
implements Deref
; kf s
is String
then the type of &*s
is &str
.
Deref coercion means that &String
will implicitly convert into &str
:
let s: String = "hello".into();
let rs: &str = &s;
“Coercion” is a strong word, but this is one of the few places in Rust
where type conversion happens silently. &String
is a very
different type to &str
! I still remember my
confusion when the compiler insisted that these types were distinct,
especially with operators where the convenience of deref coercion
does not happen. The match operator matches types explicitly
and this is where s.as_str()
is still necessary - &s
would not work here:
let s = "hello".to_string();
...
match s.as_str() {
"hello" => {....},
"dolly" => {....},
....
}
It’s idiomatic to use string slices in function arguments, knowing that
&String
will convert to &str
.
Deref coercion is also used to resolve methods - if the method isn’t defined
on String
, then we try &str
. It acts like a limited kind of inheritance.
A similar relationship holds between Vec<T>
and &[T]
. Likewise, it’s
not idiomatic to have &Vec<T>
as a function argument type, since &[T]
is more flexible and &Vec<T>
will convert to &[T]
.
Ownership: Borrow
Ownership is an important concept in Rust; we have types like String
that
“own” their data, and types like &str
that can “borrow” data from
an owned typed.
The Borrow
trait solves a sticky problem with associative maps and sets.
Typically we would keep owned strings in a HashSet
to avoid borrowing blues.
But we really don’t want to create a String
to query set membership!
let mut set = HashSet::new();
set.insert("one".to_string());
// set is now HashSet<String>
if set.contains("two") {
println!("got two!");
}
The borrowed type &str
can be used instead of &String
here.
I/O: Read and Write
The types std::fs::File
and std::io::Stdin
are very distinct. Rust does not
hack stdin as a kind-of file. What they do share is the trait Read.
The basic method read
will read some bytes into a buffer and return Result<usize>
.
If there was not an error, this will be the number of bytes read.
Read
provides the method read_to_string
which will read all of a file in
as a String
, or read_to_end
which reads the file as Vec<u8>
. (If a file
isn’t guaranteed to be UTF-8, it’s better to use read_to_end
.)
Traits need to be visible to be used, but Read
is not part of the Rust prelude.
Instead use std::io::prelude::*
to get all of the I/O traits in scope.
An important thing to remember is that Rust I/O is unbuffered by default. So a naive Rust program can be outperformed by a script!
For instance, if you want the fastest possible way to read from stdin, lock it first - the currently executing thread now has exclusive access:
let stdin = io::stdin();
let mut lockin = stdin.lock();
// lockin is buffered!
Locked stdin implements ReadBuf which defines buffered reading.
There is a lines()
method which iterates over all lines in the input, but
it allocates a new string for each line, which is convenient but inefficient.
For best performance, use read_line
because it allows you to reuse a
single string buffer.
Likewise, to get buffered reading from a file:
let mut rdr = io::BufReader:new(File::open(file)?);
...
This comes across as unnecessarily fiddly at first but bear in mind that Rust is a systems language which aims to make things like buffering and allocation explicit.
For writing, there is the Write trait. Files, sockets and standard streams like
stdout and stderr implement this. Again, this is unbuffered and io::BufWriter
exists
to add buffering to any type that implements Write
.
There is a performance cost with the println
macro. It is
designed for convenient and sensible output, not for speed. It gets an exclusive lock
before writing out so you do not get scrambled text from different threads.
So, if you need fast, buffer and use the write
macro.
Iteration: Iterator and IntoIterator
The Iterator trait is interesting.
You are only required to implement
one method - next()
- and all that method must do is return an
Option
value each time it’s called. When that value is None
we
are finished.
This is the verbose way to use an iterator:
let mut iter = [10, 20, 30].iter();
while let Some(n) = iter.next() {
println!("got {}", n);
}
The for
statement provides a shortcut:
for n in [10, 20, 30].iter() {
println!("got {}", n);
}
The expression here actually is anything that can convert into an iterator,
which is expressed by IntoIterator.
So for n in &[10, 20, 30] {...}
works
as well - a slice is definitely not an iterator, but it implements
IntoIterator
. Iterators implement IntoIterator
(trivially).
So the for
statement in Rust is specifically tied to a single trait.
Iterators in Rust are a zero-overhead abstraction, which means that usually you do not pay a run-time penalty for using them. In fact, if you wrote out a loop over slice elements explicitly it would be slower because of the run-time index range checks.
There are a lot of provided methods which have default
implementations in Iterator
. You get map
,filter
,etc for free.
I advise people to familiarize themselves with these methods because they are
very useful. Often you do not need an explicit loop at all.
For instance, this is the idiomatic way to sum a sequence of numbers,
and there is no performance penalty whatsoever.
let res: i64 = (0..n).into_iter().sum();
The most general way to pass a sequence of values to a function is
to use IntoIterator
. Just using &[T]
is too limited and requires the caller
to build up a buffer (which could be both awkward and expensive), Iterator<Item=T>
itself requires caller to call iter()
etc.
fn sum (ii: impl IntoIterator<Item=i32>) -> i32 {
ii.into_iter().sum()
}
println!("{}", sum(0..9));
println!("{}", sum(vec![1,2,3]));
// cloned() here makes an interator over i32 from an interator over &i32
println!("{}", sum([1,2,3].iter().cloned()));
Conclusion: Why are there So Many Ways to Create a String?
let s = "hello".to_string(); // ToString
let s = String::from("hello"); // From
let s: String = "hello".into(); // Into
let s = "hello".to_owned(); // ToOwned
This is a common complaint - people like to have one idiomatic way of
doing common operations. And (curiously enough) none of these are actual
String
methods!
But all these traits are needed, since they make generic programming possible; when you create strings in code, just pick one way and use it consistently.
A consequence of Rust’s dependence on traits is that it can take a while
to learn to read the documentation.
Knowing what methods can be called on a type depends on what traits are implemented for that type.
std::fs::File
doesn’t have any methods to actually do I/O - these capabilities come from
implementing Read
and Write
.
However, Rust traits are not sneaky. They have to be brought into scope before they
can be used. For instance, you need use std::error::Error
before you can
call description()
on a type implementing Error
. The Rust prelude brings in
many common traits so they do not need to be explicitly brought into scope.