
Practice Rust with challenging examples, exercises and projects
This book was designed for easily diving into and get skilled with Rust, and it's very easy to use: All you need to do is to make each exercise compile without ERRORS and Panics !
Reading online
Running locally
We use mdbook building our exercises. You can run locally with below steps:
$ cargo install mdbook
$ cd rust-by-practice && mdbook serve
Features
Part of our examples and exercises are borrowed from Rust By Example, thanks for your great works!
Although they are so awesome, we have our own secret weapons :)
-
There are three parts in each chapter: examples, exercises and practices
-
Besides examples, we have
a lot of exercises
, you can Read, Edit and Run them ONLINE -
Covering nearly all aspects of Rust, such as async/await, threads, sync primitives, optimizing, standard libraries, tool chain, data structures and algorithms etc.
-
Every exercise has its own solutions
-
The overall difficulties are a bit higher and from easy to super hard: easy 🌟 medium 🌟🌟 hard 🌟🌟🌟 super hard 🌟🌟🌟🌟
What we want to do is fill in the gap between learning and getting started with real projects.
Small projects with Elegant code base
Following questions come up weekly in online Rust discussions:
- I just finished reading The Book, what should I do next ?
- What projects would you recommend to a Rust beginner?
- Looking for small projects with an elegant code base
- Codes that is easy to read and learn
The answers to these questions are always Practice: doing some exercises, and then reading some small and excellent Rust projects.
This is precisely the goal of this book, so, collecting relative resourses and representing in Rust By Practice seems not a bad idea.
1. Ripgrep
Answers for above questions usually came with ripgrep
, though I don't think it is a small project, but yes, go for it if you are not afraid to delve deep a bit.
2. Building a text editor
Tutorial https://www.philippflenker.com/hecto/
will lead you to build a text editor from scratch.
3. Ncspot
Ncspot, a terminal Spotify client. Small, simple, well organized and async, it's good for learning.
4. Command Line Rust
This project is for the book Command-Line Rust(O'Reily)
,it will show you how to write small CLIS( clones of head, cat, ls).
5. pngme book
This book will guide you to make a command line program that lets you hide secret messages in PNG files. The primary goal here is to get you writing code. The secondary goal is to get you reading documentation.
To be continued...
Variables
Binding and mutablity
- 🌟 A variable can be used only if it has been initialized.
- 🌟 Use
mut
to mark a variable as mutable.
Scope
A scope is the range within the program for which the item is valid.
- 🌟
- 🌟🌟
Shadowing
You can declare a new variable with the same name as a previous variable, here we can say **the first one is shadowed by the second one.
- 🌟🌟
- 🌟🌟
Unused varibles
- fix the warning below with :
- 🌟 only one solution
- 🌟🌟 two distinct solutions
Note: none of the solutions is to remove the line
let x = 1
Destructuring
- 🌟🌟 We can use a pattern with
let
to destructure a tuple to separate variables.
Tips: you can use Shadowing or Mutability
Destructuring assignments
Introducing in Rust 1.59: You can now use tuple, slice, and struct patterns as the left-hand side of an assignment.
- 🌟🌟
Note: the feature
Destructuring assignments
need 1.59 or higher Rust version
You can find the solutions here(under the solutions path), but only use it when you need it
Basic Types
Learning resources:
- English: Rust Book 3.2 and 3.3
- 简体中文: Rust语言圣经 - 基本类型
Numbers
Integer
- 🌟
Tips: If we don't explicitly give one type to a varible, then the compiler will infer one for us
- 🌟
- 🌟🌟🌟
Tips: If we don't explicitly give one type to a varible, then the compiler will infer one for us
- 🌟🌟
- 🌟🌟
- 🌟🌟
Floating-Point
- 🌟
- 🌟🌟 make it work in two distinct ways
Range
- 🌟🌟 two goals: 1. modify
assert!
to make it work 2. makeprintln!
output: 97 - 122
- 🌟🌟
Computations
- 🌟
You can find the solutions here(under the solutions path), but only use it when you need it
Char, Bool and Unit
Char
- 🌟
- 🌟
Bool
- 🌟
- 🌟
Unit type
- 🌟🌟
- 🌟🌟 what's the size of the unit type?
You can find the solutions here(under the solutions path), but only use it when you need it
Statements and Expressions
Examples
Exercises
- 🌟🌟
- 🌟
- 🌟
You can find the solutions here(under the solutions path), but only use it when you need it
Functions
- 🌟🌟🌟
- 🌟
- 🌟🌟🌟
Diverging functions
Diverging functions never return to the caller, so they may be used in places where a value of any type is expected.
- 🌟🌟
- 🌟🌟
You can find the solutions here(under the solutions path), but only use it when you need it
Ownership and Borrowing
Learning resources:
- English: Rust Book 4.1-4.4
- 简体中文: Rust语言圣经 - 所有权与借用
Ownership
- 🌟🌟
- 🌟🌟
- 🌟🌟
- 🌟🌟
- 🌟🌟
Mutability
Mutability can be changed when ownership is transferred.
- 🌟
- 🌟🌟🌟
Partial move
Within the destructuring of a single variable, both by-move and by-reference pattern bindings can be used at the same time. Doing this will result in a partial move of the variable, which means that parts of the variable will be moved while other parts stay. In such a case, the parent variable cannot be used afterwards as a whole, however the parts that are only referenced (and not moved) can still be used.
Example
Exercises
- 🌟
- 🌟🌟
You can find the solutions here(under the solutions path), but only use it when you need it
Reference and Borrowing
Reference
- 🌟
- 🌟
- 🌟
- 🌟
- 🌟🌟
ref
ref
can be used to take references to a value, similar to &
.
- 🌟🌟🌟
Borrowing rules
- 🌟
Mutablity
- 🌟 Error: Borrow a immutable object as mutable
- 🌟🌟 Ok: Borrow a mutable object as immutable
NLL
- 🌟🌟
- 🌟🌟
You can find the solutions here(under the solutions path), but only use it when you need it
Compound Types
Learning resources:
- English: Rust Book 4.3, 5.1, 6.1, 8.2
- 简体中文: Rust语言圣经 - 复合类型
string
The type of string literal "hello, world"
is &str
, e.g let s: &str = "hello, world"
.
str and &str
- 🌟 We can't use
str
type in normal ways, but we can use&str
- 🌟🌟 We can only use
str
by boxed it,&
can be used to convertBox<str>
to&str
String
String
type is defined in std and stored as a vector of bytes (Vec
- 🌟
- 🌟🌟🌟
- 🌟🌟
replace
can be used to replace substring
More String
methods can be found under String module.
- 🌟🌟 You can only concat a
String
with&str
, andString
's ownership can be moved to another variable
&str and String
Opsite to the seldom using of str
, &str
and String
are used everywhere!
- 🌟🌟
&str
can be converted toString
in two ways
- 🌟🌟 We can use
String::from
orto_string
to convert a&str
toString
string escapes
- 🌟
- 🌟🌟🌟 Sometimes there are just too many characters that need to be escaped or it's just much more convenient to write a string out as-is. This is where raw string literals come into play.
byte string
Want a string that's not UTF-8? (Remember, str and String must be valid UTF-8). Or maybe you want an array of bytes that's mostly text? Byte strings to the rescue!
Example:
A more detailed listing of the ways to write string literals and escape characters is given in the 'Tokens' chapter of the Rust Reference.
string index
- 🌟🌟🌟 You can't use index to access a char in a string, but you can use slice
&s1[start..end]
.
operate on UTF8 string
- 🌟
utf8_slice
You can use utf8_slice to slice UTF8 string, it can index chars instead of bytes.
Example
use utf8_slice;
fn main() {
let s = "The 🚀 goes to the 🌑!";
let rocket = utf8_slice::slice(s, 4, 5);
// Will equal "🚀"
}
You can find the solutions here(under the solutions path), but only use it when you need it
Array
The type of array is [T; Lengh]
, as you can see, array's lengh is part of their type signature. So their length must be known at compile time.
For example, you cant initialized an array as below:
This will cause an error, because the compile have no idea of the exact size of the array in compile time.
- 🌟
- 🌟🌟
- 🌟 All elements in an array can be initialized to the same value at once.
- 🌟 All elements in an array must be of the same type
- 🌟 Indexing starts at 0.
- 🌟 Out of bounds indexing causes
panic
.
You can find the solutions here(under the solutions path), but only use it when you need it
Slice
Slices are similar to arrays, but their length is not known at compile time, so you can't use slice directly.
- 🌟🌟 Here, both
[i32]
andstr
are slice types, but directly using it will cause errors. You have to use the reference of the slice instead:&[i32]
,&str
.
A slice reference is a two-word object, for simplicity reasons, from now on we will use slice instead of slice reference
. The first word is a pointer to the data, and the second word is the length of the slice. The word size is the same as usize, determined by the processor architecture eg 64 bits on an x86-64. Slices can be used to borrow a section of an array, and have the type signature &[T]
.
- 🌟🌟🌟
- 🌟🌟
string slices
- 🌟
- 🌟
- 🌟🌟
&String
can be implicitly converted into&str
.
You can find the solutions here(under the solutions path), but only use it when you need it
Tuple
- 🌟 Elements in a tuple can have different types. Tuple's type signature is
(T1, T2, ...)
, whereT1
,T2
are the types of tuple's members.
- 🌟 Members can be extracted from the tuple using indexing.
- 🌟 Long tuples cannot be printed
- 🌟 Destructuring tuple with pattern.
- 🌟🌟 Destructure assignments.
- 🌟🌟 Tuples can be used as function arguments and return values
You can find the solutions here(under the solutions path), but only use it when you need it
Struct
There types of structs
- 🌟 We must specify concrete values for each of the fields in struct.
- 🌟 Unit struct don't have any fields. It can be useful when you need to implement a trait on some type but don’t have any data that you want to store in the type itself.
- 🌟🌟🌟 Tuple struct looks similar to tuples, it has added meaning the struct name provides but has no named fields. It's useful when you want give the whole tuple a name, but don't care the fields's names.
Operate on structs
- 🌟 You can make a whole struct mutable when instantiate it, but Rust doesn't allow us to mark only certain fields as mutable.
- 🌟 Using field init shorthand syntax to reduct repetitions.
- 🌟 You can create instance from other instance with struct update syntax
Print the structs
- 🌟🌟 We can use
#[derive(Debug)]
to make a struct prinable.
Partial move
Within the destructuring of a single variable, both by-move and by-reference pattern bindings can be used at the same time. Doing this will result in a partial move of the variable, which means that parts of the variable will be moved while other parts stay. In such a case, the parent variable cannot be used afterwards as a whole, however the parts that are only referenced (and not moved) can still be used.
Example
Exercises
- 🌟🌟
You can find the solutions here(under the solutions path), but only use it when you need it
Enum
- 🌟🌟 Enums can be created with explicit discriminator.
- 🌟 each enum variant can hold its own data.
- 🌟🌟 we can get the data which a enum variant is holding by pattern match
- 🌟🌟
- 🌟🌟 As there is no
null
in Rust, we have to use enumOption<T>
to deal the cases when value is absent.
- 🌟🌟🌟🌟 implement a
linked-list
via enums.
You can find the solutions here(under the solutions path), but only use it when you need it
Flow control
if/else
- 🌟
- 🌟🌟
if/else
expression can be used in assignments.
for
- 🌟 The
for in
construct can be used to iterate through an Iterator, e.g a rangea..b
.
- 🌟🌟
- 🌟
while
- 🌟🌟 The
while
keyword can be used to run a loop when a condition is true.
continue and break
- 🌟 use
break
to break the loop.
- 🌟🌟
continue
will skip over the remaining code in current iteration and go to the next iteration.
loop
- 🌟🌟 loop is usually used together with
break
orcontinue
.
- 🌟🌟 loop is an expression, so we can use it with
break
to return a value
- 🌟🌟🌟 It's possible to break or continue outer loops when dealing with nested loops. In these cases, the loops must be annotated with some 'label, and the label must be passed to the break/continue statement.
You can find the solutions here(under the solutions path), but only use it when you need it
Pattern Match
Learning resources:
- English: Rust Book 18
- 简体中文: Rust语言圣经 - 模式匹配
match, if let
match
- 🌟🌟
- 🌟🌟 match is an expression, so we can use it in assignments
- 🌟🌟 using match to get the data an enum variant holds
matches!
matches!
looks like match
, but can do something different
- 🌟🌟
- 🌟🌟
if let
For some cases, when matching enums, match
is too heavy, we can use if let
instead.
- 🌟
- 🌟🌟
- 🌟🌟
Shadowing
- 🌟🌟
You can find the solutions here(under the solutions path), but only use it when you need it
Patterns
- 🌟🌟 use
|
to match several values, use..=
to match a inclusive range
- 🌟🌟🌟 The
@
operator lets us create a variable that holds a value at the same time we are testing that value to see whether it matches a pattern.
- 🌟🌟🌟
- 🌟🌟 A match guard is an additional if condition specified after the pattern in a match arm that must also match, along with the pattern matching, for that arm to be chosen.
- 🌟🌟 Ignoring remaining parts of the value with
..
- 🌟🌟 Using pattern
&mut V
to match a mutable reference needs you to be very careful due toV
being a value after matching
You can find the solutions here(under the solutions path), but only use it when you need it
Associated function & Method
Examples
Exercises
Method
- 🌟🌟 Methods are similar to functions: declare with
fn
, have parameters and a return value. Unlike functions, methods are defined within the context of a struct (or an enum or a trait object), and their first parameter is alwaysself
, which represents the instance of the struct the method is being called on.
- 🌟🌟
self
will take the ownership of current struct instance, however,&self
will only borrow a reference from the instance
- 🌟🌟 The
&self
is actually short forself: &Self
. Within animpl
block, the typeSelf
is an alias for the type that theimpl
block is for. Methods must have a parameter namedself
of typeSelf
for their first parameter, so Rust lets you abbreviate this with only the nameself
in the first parameter spot.
Associated function
- 🌟🌟 All functions defined within an
impl
block are called associated functions because they’re associated with the type named after theimpl
. We can define associated functions that don’t haveself
as their first parameter (and thus are not methods) because they don’t need an instance of the type to work with.
Multiple impl
blocks
- 🌟 Each struct is allowed to have multiple impl blocks.
Enums
- 🌟🌟🌟 We can also implement methods for enums.
Practice
@todo
You can find the solutions here(under the solutions path), but only use it when you need it
Generics and Traits
Learning resources:
- English: Rust Book 10.1, 10.2
- 简体中文: Rust语言圣经 - 模式匹配
Generics
Functions
- 🌟🌟🌟
- 🌟🌟 A function call with explicitly specified type parameters looks like:
fun::<A, B, ...>()
.
Struct and impl
- 🌟
- 🌟🌟
- 🌟🌟
Method
- 🌟🌟🌟
- 🌟🌟
You can find the solutions here(under the solutions path), but only use it when you need it
Const Generics
Const generics are generic arguments that range over constant values, rather than types or lifetimes. This allows, for instance, types to be parameterized by integers. In fact, there has been one example of const generic types since early on in Rust's development: the array types [T; N], for some type T and N: usize. However, there has previously been no way to abstract over arrays of an arbitrary size: if you wanted to implement a trait for arrays of any size, you would have to do so manually for each possible value. For a long time, even the standard library methods for arrays were limited to arrays of length at most 32 due to this problem.
Examples
- Here's an example of a type and implementation making use of const generics: a type wrapping a pair of arrays of the same size.
- Currently, const parameters may only be instantiated by const arguments of the following forms:
- A standalone const parameter.
- A literal (i.e. an integer, bool, or character).
- A concrete constant expression (enclosed by {}), involving no generic parameters.
- Const generics can also let us avoid some runtime checks.
/// A region of memory containing at least `N` `T`s.
pub struct MinSlice<T, const N: usize> {
/// The bounded region of memory. Exactly `N` `T`s.
pub head: [T; N],
/// Zero or more remaining `T`s after the `N` in the bounded region.
pub tail: [T],
}
fn main() {
let slice: &[u8] = b"Hello, world";
let reference: Option<&u8> = slice.get(6);
// We know this value is `Some(b' ')`,
// but the compiler can't know that.
assert!(reference.is_some());
let slice: &[u8] = b"Hello, world";
// Length check is performed when we construct a MinSlice,
// and it's known at compile time to be of length 12.
// If the `unwrap()` succeeds, no more checks are needed
// throughout the `MinSlice`'s lifetime.
let minslice = MinSlice::<u8, 12>::from_slice(slice).unwrap();
let value: u8 = minslice.head[6];
assert_eq!(value, b' ')
}
Exercises
- 🌟🌟
<T, const N: usize>
is part of the struct type, it meansArray<i32, 3>
andArray<i32, 4>
are different types.
- 🌟🌟
- 🌟🌟🌟 Sometimes we want to limit the size of an variable, e.g when using in embedding evironments, then
const expressions
will fit your need.
You can find the solutions here(under the solutions path), but only use it when you need it :)
Traits
A trait tells the Rust compiler about functionality a particular type has and can share with other types. We can use traits to define shared behavior in an abstract way. We can use trait bounds to specify that a generic type can be any type that has certain behavior.
Note: Traits are similar to interfaces in other languages, although with some differences.
Examples
Exercises
- 🌟🌟
Derive
The compiler is capable of providing basic implementations for some traits via
the #[derive]
attribute. For more info, please visit here.
- 🌟🌟
Operator
In Rust, many of the operators can be overloaded via traits. That is, some operators can be used to accomplish different tasks based on their input arguments. This is possible because operators are syntactic sugar for method calls. For example, the + operator in a + b calls the add method (as in a.add(b)). This add method is part of the Add trait. Hence, the + operator can be used by any implementor of the Add trait.
- 🌟🌟
- 🌟🌟🌟
Use trait as function parameters
Instead of a concrete type for the item parameter, we specify the impl keyword and the trait name. This parameter accepts any type that implements the specified trait.
- 🌟🌟🌟
Returning Types that Implement Traits
We can also use the impl Trait syntax in the return position to return a value of some type that implements a trait.
However, you can only use impl Trait if you’re returning a single type, using Trait Objects instead when you really need to return serveral types.
- 🌟🌟
Trait bound
The impl Trait
syntax works for straightforward cases but is actually syntax sugar for a longer form, which is called a trait bound.
When working with generics, the type parameters often must use traits as bounds to stipulate what functionality a type implements.
- 🌟🌟
- 🌟🌟
- 🌟🌟🌟
You can find the solutions here(under the solutions path), but only use it when you need it :)
Trait Object
In traits chapter we have seen that we can't use impl Trait
when returning multiple types.
Also one limitation of arrays is that they can only store elements of one type, yeah, enum is a not bad solution when our items are a fixed set of types in compile time, but trait object are more flexible and powerful here.
Returning Traits with dyn
The Rust compiler needs to know how much space a function's return type requires. Because the different implementations of a trait probably will need different amounts of memoery, this means function need to return a concrete type or the same type when using impl Trait
, or it can return a trait object with dyn
.
- 🌟🌟🌟
Array with trait objects
- 🌟🌟
&dyn
and Box<dyn>
- 🌟🌟
Static and Dynamic dispatch
when we use trait bounds on generics: the compiler generates nongeneric implementations of functions and methods for each concrete type that we use in place of a generic type parameter. The code that results from monomorphization is doing static dispatch, which is when the compiler knows what method you’re calling at compile time.
When we use trait objects, Rust must use dynamic dispatch. The compiler doesn’t know all the types that might be used with the code that is using trait objects, so it doesn’t know which method implemented on which type to call. Instead, at runtime, Rust uses the pointers inside the trait object to know which method to call. There is a runtime cost when this lookup happens that doesn’t occur with static dispatch. Dynamic dispatch also prevents the compiler from choosing to inline a method’s code, which in turn prevents some optimizations.
However, we did get extra flexibility when using dynamic dispatch.
- 🌟🌟
Object safe
You can only make object-safe traits into trait objects. A trait is object safe if all the methods defined in the trait have the following properties:
- The return type isn’t
Self
. - There are no generic type parameters.
- 🌟🌟🌟🌟
You can find the solutions here(under the solutions path), but only use it when you need it :)
Advance Traits
Associated types
The use of "Associated types" improves the overall readability of code by moving inner types locally into a trait as output types. For example :
Using of Address
is much more clearable and convenient than AsRef<[u8]> + Clone + fmt::Debug + Eq + Hash
.
- 🌟🌟🌟
Default Generic Type Parameters
When we use generic type parameters, we can specify a default concrete type for the generic type. This eliminates the need for implementors of the trait to specify a concrete type if the default type works.
- 🌟🌟
Fully Qualified Syntax
Nothing in Rust prevents a trait from having a method with the same name as another trait’s method, nor does Rust prevent you from implementing both traits on one type. It’s also possible to implement a method directly on the type with the same name as methods from traits.
When calling methods with the same name, we have to use Fully Qualified Syntax.
Example
Exercise
- 🌟🌟
Supertraits
Sometimes, you might need one trait to use another trait’s functionality( like the "inheritance" in other languages ). In this case, you need to rely on the dependent trait also being implemented. The trait you rely on is a supertrait
of the trait you’re implementing.
- 🌟🌟🌟
Orphan Rules
We can’t implement external traits on external types. For example, we can’t implement the Display
trait on Vec<T>
within our own crate, because Display
and Vec<T>
are defined in the standard library and aren’t local to our crate.
This restriction is often called as the orphan rule, so named because the parent type is not present. This rule ensures that other people’s code can’t break your code and vice versa.
It’s possible to get around this restriction using the newtype pattern, which involves creating a new type in a tuple struct.
- 🌟🌟
You can find the solutions here(under the solutions path), but only use it when you need it :)
Collection Types
Learning resources:
- English: Rust Book Chapter 8
- 简体中文: Rust语言圣经 - 集合类型
String
std::string::String
is a UTF-8 encoded, growable string. It is the most common string type we used in daily dev, it also has ownership over the string contents.
Basic operations
- 🌟🌟
String and &str
A String
is stored as a vector of bytes (Vec<u8>
), but guaranteed to always be a valid UTF-8 sequence. String
is heap allocated, growable and not null terminated.
&str
is a slice (&[u8]
) that always points to a valid UTF-8 sequence, and can be used to view into a String, just like &[T]
is a view into Vec<T>
.
- 🌟🌟
- 🌟🌟
UTF-8 & Indexing
Strings are always valid UTF-8. This has a few implications:
- the first of which is that if you need a non-UTF-8 string, consider OsString. It is similar, but without the UTF-8 constraint.
- The second implication is that you cannot index into a String
Indexing is intended to be a constant-time operation, but UTF-8 encoding does not allow us to do this. Furthermore, it’s not clear what sort of thing the index should return: a byte, a codepoint, or a grapheme cluster. The bytes and chars methods return iterators over the first two, respectively.
- 🌟🌟🌟 You can't use index to access a char in a string, but you can use slice
&s1[start..end]
.
utf8_slice
You can use utf8_slice to slice UTF8 string, it can index chars instead of bytes.
Example
use utf8_slice;
fn main() {
let s = "The 🚀 goes to the 🌑!";
let rocket = utf8_slice::slice(s, 4, 5);
// Will equal "🚀"
}
- 🌟🌟🌟
Tips: maybe you need
from_utf8
method
Representation
A String is made up of three components: a pointer to some bytes, a length, and a capacity.
The pointer points to an internal buffer String uses to store its data. The length is the number of bytes currently stored in the buffer( always stored on the heap ), and the capacity is the size of the buffer in bytes. As such, the length will always be less than or equal to the capacity.
- 🌟🌟 If a String has enough capacity, adding elements to it will not re-allocate
- 🌟🌟🌟
Common methods
More exercises of String methods can be found here.
You can find the solutions here(under the solutions path), but only use it when you need it
Vector
Vectors are re-sizable arrays. Like slices, their size is not known at compile time, but they can grow or shrink at any time.
Basic Operations
- 🌟🌟🌟
- 🌟🌟 a Vec can be extended with
extend
method
Turn X Into Vec
- 🌟🌟🌟
Indexing
- 🌟🌟🌟
Slicing
A Vec can be mutable. On the other hand, slices are read-only objects. To get a slice, use &
.
In Rust, it’s more common to pass slices as arguments rather than vectors when you just want to provide read access. The same goes for String
and &str
.
- 🌟🌟
Capacity
The capacity of a vector is the amount of space allocated for any future elements that will be added onto the vector. This is not to be confused with the length of a vector, which specifies the number of actual elements within the vector. If a vector’s length exceeds its capacity, its capacity will automatically be increased, but its elements will have to be reallocated.
For example, a vector with capacity 10 and length 0 would be an empty vector with space for 10 more elements. Pushing 10 or fewer elements onto the vector will not change its capacity or cause reallocation to occur. However, if the vector’s length is increased to 11, it will have to reallocate, which can be slow. For this reason, it is recommended to use Vec::with_capacity
whenever possible to specify how big the vector is expected to get.
- 🌟🌟
Store distinct types in Vector
The elements in a vector mush be the same type, for example , the code below will cause an error:
fn main() {
let v = vec![1, 2.0, 3];
}
But we can use enums or trait objects to store distinct types.
- 🌟🌟
- 🌟🌟
HashMap
Where vectors store values by an integer index, HashMaps store values by key. It is a hash map implemented with quadratic probing and SIMD lookup. By default, HashMap
uses a hashing algorithm selected to provide resistance against HashDoS attacks.
The default hashing algorithm is currently SipHash 1-3
, though this is subject to change at any point in the future. While its performance is very competitive for medium sized keys, other hashing algorithms will outperform it for small keys such as integers as well as large keys such as long strings, though those algorithms will typically not protect against attacks such as HashDoS.
The hash table implementation is a Rust port of Google’s SwissTable. The original C++ version of SwissTable can be found here, and this CppCon talk gives an overview of how the algorithm works.
Basic Operations
- 🌟🌟
// FILL in the blanks and FIX the erros
use std::collections::HashMap;
fn main() {
let mut scores = HashMap::new();
scores.insert("Sunface", 98);
scores.insert("Daniel", 95);
scores.insert("Ashley", 69.0);
scores.insert("Katie", "58");
// get returns a Option<&V>
let score = scores.get("Sunface");
assert_eq!(score, Some(98));
if scores.contains_key("Daniel") {
// indexing return a value V
let score = scores["Daniel"];
assert_eq!(score, __);
scores.remove("Daniel");
}
assert_eq!(scores.len(), __);
for (name, score) in scores {
println!("The score of {} is {}", name, score)
}
}
- 🌟🌟
- 🌟🌟
Requirements of HashMap key
Any type that implements the Eq
and Hash
traits can be a key in HashMap
. This includes:
bool
(though not very useful since there is only two possible keys)int
,uint
, and all variations thereofString
and&str
(tips: you can have aHashMap
keyed byString
and call.get()
with an&str
)
Note that f32
and f64
do not implement Hash
, likely because floating-point precision errors would make using them as hashmap keys horribly error-prone.
All collection classes implement Eq
and Hash
if their contained type also respectively implements Eq
and Hash
. For example, Vec<T>
will implement Hash
if T
implements Hash
.
- 🌟🌟
Capacity
Like vectors, HashMaps are growable, but HashMaps can also shrink themselves when they have excess space. You can create a HashMap
with a certain starting capacity using HashMap::with_capacity(uint)
, or use HashMap::new()
to get a HashMap with a default initial capacity (recommended).
Example
Ownership
For types that implement the Copy
trait, like i32
, the values are copied into HashMap
. For owned values like String
, the values will be moved and HashMap
will be the owner of those values.
- 🌟🌟
Third-party Hash libs
If the performance of SipHash 1-3
doesn't meet your requirements, you can find replacements in crates.io or github.com.
The usage of third-party hash looks like this:
Ownership and Borrowing
Learning resources:
- English: Standary library
- 简体中文: Rust语言圣经 - 所有权与借用
Convert by as
Rust provides no implicit type conversion(coercion) between primitive types. But explicit type conversions can be performed using the as
keyword.
- 🌟
- 🌟🌟 By default, overflow will cause compile errors, but we can add an global annotation to suppress these errors.
- 🌟🌟 when casting any value to an unsigned type
T
,T::MAX + 1
is added or subtracted until the value fits into the new type.
- 🌟🌟🌟 Raw pointer can be converted to memory address (integer) and vice versa
- 🌟🌟🌟
From/Into
The From
trait allows for a type to define how to create itself from another type, hence providing a very simple mechanism for converting between several types.
The From
and Into
traits are inherently linked, and this is actually part of its implementation. It means if we write something like this: impl From<T> for U
, then we can use
let u: U = U::from(T)
or let u:U = T.into()
.
The Into
trait is simply the reciprocal of the From
trait. That is, if you have implemented the From
trait for your type, then the Into
trait will be automatically implemented for the same type.
Using the Into
trait will typically require the type annotations as the compiler is unable to determine this most of the time.
For example we can easily convert &str
into String
:
fn main() {
let my_str = "hello";
// three conversions below all depends on the fact: String implements From<&str>:
let string1 = String::from(my_str);
let string2 = my_str.to_string();
// explict type annotation is required here
let string3: String = my_str.into();
}
because the standard library has already implemented this for us : impl From<&'_ str> for String
.
Some implementations of From
trait can be found here.
- 🌟🌟🌟
Implement From
for custom types
- 🌟🌟
- 🌟🌟🌟 When performing error handling it is often useful to implement
From
trait for our own error type. Then we can use?
to automatically convert the underlying error type to our own error type.
TryFrom/TryInto
Similar to From
and Into
, TryFrom
and TryInto
are generic traits for converting between types.
Unlike From/Into
, TryFrom
and TryInto
are used for fallible conversions and return a Result
instead of a plain value.
- 🌟🌟
- 🌟🌟🌟
Others
Convert any type to String
To convert any type to String
, you can simply the ToString
trait for that type. Rather than doing that directly, you should implement the fmt::Display
trait which will automatically provides ToString
and also allows you to print the type with println!
.
- 🌟🌟
Parse a String
- 🌟🌟🌟 We can use
parse
method to convert aString
into ai32
number, this is becuaseFromStr
is implemented fori32
type in standard library:impl FromStr for i32
- 🌟🌟 We can also implement the
FromStr
trait for our custom types
Deref
You can find all the examples and exercises of the Deref
trait here.
transmute
std::mem::transmute
is a unsafe function can be used to reinterprets the bits of a value of one type as another type. Both of the orginal and the result types must have the same size and neither of them can be invalid.
transmute
is semantically equivalent to a bitwise move of one type into another. It copies the bits from the source value into the destination value, then forgets the original, seems equivalent to C's memcpy
under the hood.
So, transmute
is incredibly unsafe ! The caller has to ensure all the safes himself!
Examples
transmute
can be used to turn a pointer into a function pointer, this is not portable on machines where function pointer and data pointer have different sizes.
- Extending a lifetime or shortening the lifetime of an invariant is an advanced usage of
transmute
, yeah, very unsafe Rust!.
- Rather than using
transmute
, you can use some alternatives instead.
Result and panic
Learning resources:
- English: Rust Book 9.1, 9.2
- 简体中文: Rust语言圣经 - 返回值和错误处理
panic!
The simplest error handling mechanism is to use panic
. It just prints an error message and starts unwinding the stack, finally exit the current thread:
- if panic occured in
main
thread, then the program will be exited. - if in spawned thread, then this thread will be terminated, but the program won't
- 🌟🌟
common panic cases
- 🌟🌟
Detailed call stack
By default the stack unwinding will only give something like this:
thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 99', src/main.rs:4:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Though there is the panic reason and code line showing here, but sometime we want to get more info about the call stack.
- 🌟
## FILL in the blank to display the whole call stack
## Tips: you can find the clue in the default panic info
$ __ cargo run
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `[97, 98, 99]`,
right: `[96, 97, 98]`', src/main.rs:3:5
stack backtrace:
0: rust_begin_unwind
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
1: core::panicking::panic_fmt
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:116:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:154:5
4: study_cargo::main
at ./src/main.rs:3:5
5: core::ops::function::FnOnce::call_once
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
unwinding
and abort
By default, when a panic
occurs, the program starts unwinding, which means Rust walks back up the stack and cleans up the data from each function it encouters.
But this walk back and clean up is a lot of work. The alternative is to immediately abort the program without cleaning up.
If in your project you need to make the resulting binary as small as possible, you can switch from unwinding to aborting by adding below content to Cargo.toml
:
[profile.release]
panic = 'abort'
result and ?
Result<T>
is an enum to describe possible errors. It has two variants:
Ok(T)
: a value T was foundErr(e)
: An error was found with a valuee
In short words, the expected outcome is Ok
, while the unexpected outcome is Err
.
- 🌟🌟
?
?
is almost exactly equivalent to unwrap
, but ?
returns instead of panic on Err
.
- 🌟🌟
- 🌟🌟
map & and_then
map and and_then are two common combinators for Result<T, E>
(also for Option<T>
).
- 🌟🌟
- 🌟🌟🌟
Type alias
Using std::result::Result<T, ParseIntError>
everywhere is verbose and tedious, we can use alias for this purpose.
At a module level, creating aliases can be particularly helpful. Errors found in the a specific module often has the same Err
type, so a single alias can succinctly defined all associated Results
. This is so useful even the std
library even supplies one: io::Result
.
- 🌟
Using Result in fn main
Typically the
main function will look like this:
fn main() {
println!("Hello World!");
}
However main
is also able to have a return type of Result
. If an error occurs within the main
function it will return an error code and print a debug representation of the error( Debug trait ).
The following example shows such a scenario:
Crate and module
Learning resources:
- English: Rust Book Chapter 7
- 简体中文: Rust语言圣经 - 包和模块
Package and Crate
A package is a project which you create with Cargo (most cases), so it contains a Cargo.toml
file in it.
- 🌟 Create a package with below layout:
.
├── Cargo.toml
└── src
└── main.rs
1 directory, 2 files
# in Cargo.toml
[package]
name = "hello-package"
version = "0.1.0"
edition = "2021"
Note! We will use this package across the whole chapter as a practice project.
- 🌟 Create a package with below layout:
.
├── Cargo.toml
└── src
└── lib.rs
1 directory, 2 files
# in Cargo.toml
[package]
name = "hello-package1"
version = "0.1.0"
edition = "2021"
Note! This package could be safely removed due to the first one's existence.
- 🌟
Crate
A crate is a binary or library. The crate root is a source file that the Rust compiler starts from and makes up the root module of the crate.
In package hello-package
, there is binary crate with the same name as the package : hello-package
, and src/main.rs
is the crate root of this binary crate.
Similar to hello-package
, hello-package1
also has a crate in it, however, this package doesn't contain a binary crate but a library crate, and src/lib.rs
is the crate root.
- 🌟
- 🌟🌟 Add a library crate for
hello-package
and describe it's files tree below:
After this step, there should be two crates in package hello-package
: a binary crate and a library crate, both with the same name as the package.
- 🌟🌟🌟 A package can contain at most one library crate, but it can contain as many binary crates as you would like by placing files in
src/bin
directory: each file will be a separate binary crate with the same name as the file.
Yep, as you can see, the above package structure is very standard and is widely used in many Rust projects.
You can find the solutions here (under the solutions path), but only use it when you need it :)
Module
Modules let us organize the code within a crate into groups by readablity and easy reuse. Module also controls the privacy of items, which is whether an item can be seen by outside code( public ), or is just an internal implementation and not available for outside code( private ).
We have created a package named hello-package
in previous chapter, and it looks like this:
.
├── Cargo.toml
├── src
│ ├── lib.rs
│ └── main.rs
Now it's time to create some modules in the library crate and use them in the binary crate, let's start.
- 🌟🌟 Implement module
front_of_house
based on the module tree below:
library crate root
└── front_of_house
├── hosting
│ ├── add_to_waitlist
│ └── seat_at_table
└── serving
├── take_order
├── serve_order
├── take_payment
└── complain
- 🌟🌟 Let's call
add_to_waitlist
from a functioneat_at_restaurant
which within the library crate root.
- 🌟🌟 You can use
super
to import items within the parent module
Separating modules into different files
- 🌟🌟🌟🌟 Please separate the modules and codes above into files resident in below dir tree :
.
├── Cargo.toml
├── src
│ ├── back_of_house.rs
│ ├── front_of_house
│ │ ├── hosting.rs
│ │ ├── mod.rs
│ │ └── serving.rs
│ ├── lib.rs
│ └── main.rs
accessing code in library crate from binary crate
Please ensure you have completed the 4th exercise before making further progress.
You should have below structures and the corresponding codes in them when reaching here:
.
├── Cargo.toml
├── src
│ ├── back_of_house.rs
│ ├── front_of_house
│ │ ├── hosting.rs
│ │ ├── mod.rs
│ │ └── serving.rs
│ ├── lib.rs
│ └── main.rs
- 🌟🌟🌟 Now we will call a few library functions from the binary crate.
You can find the solutions here (under the solutions path), but only use it when you need it :)
use and pub
- 🌟 We can bring two types of the same name into the same scope with use, but you need
as
keyword.
- 🌟🌟 If we are using multiple items defined in the same crate or module, then listing each item on its own line will take up too much verticall space.
Re-exporting names with pub use
- 🌟🌟🌟 In our recently created package
hello-package
, add something to make the below code work
pub(in Crate)
Sometimes we want an item only be public to a certain crate, then we can use the pub(in Crate)
syntax.
Example
Full Code
The full code of hello-package
is here.
You can find the solutions here (under the solutions path), but only use it when you need it :)
Comments and Docs
Every program requires comments:
Comments
- Regular comments which are ignored by the compiler:
// Line comment, which goes to the end of the line
/* Block comment, which goes to the end of the closing delimiter */
Examples
fn main() {
// This is an example of a line comment
// There are two slashes at the beginning of the line
// And nothing written inside these will be read by the compiler
// println!("Hello, world!");
// Run it. See? Now try deleting the two slashes, and run it again.
/*
* This is another type of comment, a block comment. In general,
* line comments are the recommended comment style. But
* block comments are extremely useful for temporarily disabling
* chunks of code. /* Block comments can be /* nested, */ */
* so it takes only a few keystrokes to comment out everything
* in this main() function. /*/*/* Try it yourself! */*/*/
*/
/*
Note: The previous column of `*` was entirely for style. There's
no actual need for it.
*/
}
Exercises
- 🌟🌟
Doc Comments
- Doc comments which are parsed into HTML and supported
Markdown
/// Generate library docs for the following item
//! Generate library docs for the eclosing item
Before starting, we need to create a new package for practice: cargo new --lib doc-comments
.
Line doc comments ///
Add docs for function add_one
Cargo doc
We can use cargo doc --open
to generate html files and open them in the browser.
Block doc comments /** ... */
Add docs for function add_two
:
Doc comments for crate and module
We can also add doc comments for our crates and modules.
Firstly, let's add some doc comments for our library crate:
Note: We mush place crates and module comments at the top of the crate root or module file.
You can also use block comments to achieve this:
Next, create a new module file src/compute.rs
, and add following comments to it:
Then run cargo doc --open
and see the results.
Doc tests
The doc comments of add_one
and add_tow
contain two example code blocks.
The examples can not only demonstrate how to use your library, but also running as test with cargo test
command.
- 🌟🌟 But there are errors in the two examples, please fix them, and running with
cargo test
to get following result:
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests doc-comments
running 2 tests
test src/lib.rs - add_one (line 11) ... ok
test src/lib.rs - add_two (line 26) ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.55s
- 🌟🌟 Sometimes we expect an example to be panic, add following code to
src/compute.rs
and make thecargo test
passed.
You can only modify the comments, DON'T modify
fn div
- 🌟🌟 Sometimes we want to hide the doc comments, but keep the doc tests.
Add following code to src/compute.rs
,
// in src/compute.rs
/// ```
/// # fn try_main() -> Result<(), String> {
/// let res = doc_comments::compute::try_div(10, 0)?;
/// # Ok(()) // returning from try_main
/// # }
/// # fn main() {
/// # try_main().unwrap();
/// #
/// # }
/// ```
pub fn try_div(a: i32, b: i32) -> Result<i32, String> {
if b == 0 {
Err(String::from("Divide-by-zero"))
} else {
Ok(a / b)
}
}
and modify this code to achieve two goals:
- The doc comments must not be presented in html files generated by
cargo doc --open
- run the tests, you should see results as below:
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests doc-comments
running 4 tests
test src/compute.rs - compute::div (line 7) ... ok
test src/lib.rs - add_two (line 27) ... ok
test src/lib.rs - add_one (line 11) ... ok
test src/compute.rs - compute::try_div (line 20) ... ok
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.51s
Code navigation
Rust provide a very powerful feature for us, that is code navigation in doc comments.
Add following code to src/lib.rs
:
Besides jump into the standard library, you can also jump to another module in the package.
Doc attributes
Below are a few examples of the most common #[doc]
attributes used with rustdoc
.
inline
Used to inline docs, instead of linking out to separate page.
#[doc(inline)]
pub use bar::Bar;
/// bar docs
mod bar {
/// the docs for Bar
pub struct Bar;
}
no_inline
Used to prevent linking out to separate page or anywhere.
// Example from libcore/prelude
#[doc(no_inline)]
pub use crate::mem::drop;
hidden
Using this tells rustdoc
not to include this in documentation:
For documentation, rustdoc
is widely used by the community. It's what is used to generate the std library docs.
Full Code
The full code of package doc-comments
is here.
Formatted output
[std::fmt
][fmt] contains many [traits
][traits] which govern the display
of text. The base form of two important ones are listed below:
fmt::Debug
: Uses the{:?}
marker. Format text for debugging purposes.fmt::Display
: Uses the{}
marker. Format text in a more elegant, user friendly fashion.
Here, we used fmt::Display
because the std library provides implementations
for these types. To print text for custom types, more steps are required.
Implementing the fmt::Display
trait automatically implements the
[ToString
] trait which allows us to [convert] the type to [String
][string].
println! and format!
Printing is handled by a series of [macros
][macros] defined in [std::fmt
][fmt]
some of which include:
format!
: write formatted text to [String
][string]print!
: same asformat!
but the text is printed to the console (io::stdout).println!
: same asprint!
but a newline is appended.eprint!
: same asformat!
but the text is printed to the standard error (io::stderr).eprintln!
: same aseprint!
but a newline is appended.
All parse text in the same fashion. As a plus, Rust checks formatting correctness at compile time.
format!
1.🌟
print!
, println!
2.🌟
Debug and Display
All types which want to be printable must implement the std::fmt
formatting trait: std::fmt::Debug
or std::fmt::Display
.
Automatic implementations are only provided for types such as in the std
library. All others have to be manually implemented.
Debug
The implementation of Debug
is very straightfoward: All types can derive
the std::fmt::Debug
implementation. This is not true for std::fmt::Display
which must be manually implemented.
{:?}
must be used to print out the type which has implemented the Debug
trait.
- 🌟
- 🌟🌟 So
fmt::Debug
definitely makes one type printable, but sacrifices some elegance. Maybe we can get more elegant by replacing{:?}
with something else( but not{}
!)
- 🌟🌟 We can also manually implement
Debug
trait for our types
Display
Yeah, Debug
is simple and easy to use. But sometimes we want to customize the output appearance of our type. This is where Display
really shines.
Unlike Debug
, there is no way to derive the implementation of the Display
trait, we have to manually implement it.
Anotherthing to note: the placefolder for Display
is {}
not {:?}
.
- 🌟🌟
?
operator
Implementing fmt::Display
for a structure whose elements must be handled separately is triky. The problem is each write!
generates a fmt::Result
which must be handled in the same place.
Fortunately, Rust provides the ?
operator to help us eliminate some unnecessary codes for deaing with fmt::Result
.
- 🌟🌟
formating
Positional arguments
1.🌟🌟
/* Fill in the blanks */
fn main() {
println!("{0}, this is {1}. {1}, this is {0}", "Alice", "Bob");// => Alice, this is Bob. Bob, this is Alice
assert_eq!(format!("{1}{0}", 1, 2), __);
assert_eq!(format!(__, 1, 2), "2112");
println!("Success!");
}
Named arguments
2.🌟🌟
Padding with string
3.🌟🌟 By default, you can pad string with spaces
4.🌟🌟🌟 Left align, right align, pad with specified characters.
5.🌟🌟 You can pad numbers with extra zeros.
precision
6.🌟🌟 Floating point precision
7.🌟🌟🌟 string length
binary, octal, hex
- format!("{}", foo) -> "3735928559"
- format!("0x{:X}", foo) -> "0xDEADBEEF"
- format!("0o{:o}", foo) -> "0o33653337357"
8.🌟🌟
Capture the enviroments
9.🌟🌟🌟
Others
Example
Lifetime
Learning resources:
- English: Rust Book 10.3
- 简体中文: Rust语言圣经 - 生命周期
Lifetime
- 🌟
- 🌟
&'static and T: 'static
advance
Functional programing
Closure
下面代码是Rust圣经课程中闭包章节的课内练习题答案:
struct Cacher<T,E>
where
T: Fn(E) -> E,
E: Copy
{
query: T,
value: Option<E>,
}
impl<T,E> Cacher<T,E>
where
T: Fn(E) -> E,
E: Copy
{
fn new(query: T) -> Cacher<T,E> {
Cacher {
query,
value: None,
}
}
fn value(&mut self, arg: E) -> E {
match self.value {
Some(v) => v,
None => {
let v = (self.query)(arg);
self.value = Some(v);
v
}
}
}
}
fn main() {
}
#[test]
fn call_with_different_values() {
let mut c = Cacher::new(|a| a);
let v1 = c.value(1);
let v2 = c.value(2);
assert_eq!(v2, 1);
}
Iterator
https://doc.rust-lang.org/stable/rust-by-example/flow_control/for.html#for-and-iterators
newtype and Sized
Smart pointers
Box
Deref
Drop
Rc and Arc
Cell and RefCell
Weak and Circle reference
Self referential
Threads
Basic using
Message passing
Sync
Atomic
Send and Sync
Global variables
Errors
Unsafe todo
Inline assembly
Rust provides support for inline assembly via the asm!
macro.
It can be used to embed handwritten assembly in the assembly output generated by the compiler.
Generally this should not be necessary, but might be where the required performance or timing
cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.
Note: the examples here are given in x86/x86-64 assembly, but other architectures are also supported.
Inline assembly is currently supported on the following architectures:
- x86 and x86-64
- ARM
- AArch64
- RISC-V
Basic usage
Let us start with the simplest possible example:
This will insert a NOP (no operation) instruction into the assembly generated by the compiler.
Note that all asm!
invocations have to be inside an unsafe
block, as they could insert
arbitrary instructions and break various invariants. The instructions to be inserted are listed
in the first argument of the asm!
macro as a string literal.
Inputs and outputs
Now inserting an instruction that does nothing is rather boring. Let us do something that actually acts on data:
This will write the value 5
into the u64
variable x
.
You can see that the string literal we use to specify instructions is actually a template string.
It is governed by the same rules as Rust format strings.
The arguments that are inserted into the template however look a bit different than you may
be familiar with. First we need to specify if the variable is an input or an output of the
inline assembly. In this case it is an output. We declared this by writing out
.
We also need to specify in what kind of register the assembly expects the variable.
In this case we put it in an arbitrary general purpose register by specifying reg
.
The compiler will choose an appropriate register to insert into
the template and will read the variable from there after the inline assembly finishes executing.
Let us see another example that also uses an input:
This will add 5
to the input in variable i
and write the result to variable o
.
The particular way this assembly does this is first copying the value from i
to the output,
and then adding 5
to it.
The example shows a few things:
First, we can see that asm!
allows multiple template string arguments; each
one is treated as a separate line of assembly code, as if they were all joined
together with newlines between them. This makes it easy to format assembly
code.
Second, we can see that inputs are declared by writing in
instead of out
.
Third, we can see that we can specify an argument number, or name as in any format string. For inline assembly templates this is particularly useful as arguments are often used more than once. For more complex inline assembly using this facility is generally recommended, as it improves readability, and allows reordering instructions without changing the argument order.
We can further refine the above example to avoid the mov
instruction:
We can see that inout
is used to specify an argument that is both input and output.
This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.
It is also possible to specify different variables for the input and output parts of an inout
operand:
Late output operands
The Rust compiler is conservative with its allocation of operands. It is assumed that an out
can be written at any time, and can therefore not share its location with any other argument.
However, to guarantee optimal performance it is important to use as few registers as possible,
so they won't have to be saved and reloaded around the inline assembly block.
To achieve this Rust provides a lateout
specifier. This can be used on any output that is
written only after all inputs have been consumed.
There is also a inlateout
variant of this specifier.
Here is an example where inlateout
cannot be used:
Here the compiler is free to allocate the same register for inputs b
and c
since it knows they have the same value. However it must allocate a separate register for a
since it uses inout
and not inlateout
. If inlateout
was used, then a
and c
could be allocated to the same register, in which case the first instruction to overwrite the value of c
and cause the assembly code to produce the wrong result.
However the following example can use inlateout
since the output is only modified after all input registers have been read:
As you can see, this assembly fragment will still work correctly if a
and b
are assigned to the same register.
Explicit register operands
Some instructions require that the operands be in a specific register.
Therefore, Rust inline assembly provides some more specific constraint specifiers.
While reg
is generally available on any architecture, explicit registers are highly architecture specific. E.g. for x86 the general purpose registers eax
, ebx
, ecx
, edx
, ebp
, esi
, and edi
among others can be addressed by their name.
In this example we call the out
instruction to output the content of the cmd
variable to port 0x64
. Since the out
instruction only accepts eax
(and its sub registers) as operand we had to use the eax
constraint specifier.
Note: unlike other operand types, explicit register operands cannot be used in the template string: you can't use
{}
and should write the register name directly instead. Also, they must appear at the end of the operand list after all other operand types.
Consider this example which uses the x86 mul
instruction:
This uses the mul
instruction to multiply two 64-bit inputs with a 128-bit result.
The only explicit operand is a register, that we fill from the variable a
.
The second operand is implicit, and must be the rax
register, which we fill from the variable b
.
The lower 64 bits of the result are stored in rax
from which we fill the variable lo
.
The higher 64 bits are stored in rdx
from which we fill the variable hi
.
Clobbered registers
In many cases inline assembly will modify state that is not needed as an output. Usually this is either because we have to use a scratch register in the assembly or because instructions modify state that we don't need to further examine. This state is generally referred to as being "clobbered". We need to tell the compiler about this since it may need to save and restore this state around the inline assembly block.
use core::arch::asm;
fn main() {
// three entries of four bytes each
let mut name_buf = [0_u8; 12];
// String is stored as ascii in ebx, edx, ecx in order
// Because ebx is reserved, we get a scratch register and move from
// ebx into it in the asm. The asm needs to preserve the value of
// that register though, so it is pushed and popped around the main asm
// (in 64 bit mode for 64 bit processors, 32 bit processors would use ebx)
unsafe {
asm!(
"push rbx",
"cpuid",
"mov [{0}], ebx",
"mov [{0} + 4], edx",
"mov [{0} + 8], ecx",
"pop rbx",
// We use a pointer to an array for storing the values to simplify
// the Rust code at the cost of a couple more asm instructions
// This is more explicit with how the asm works however, as opposed
// to explicit register outputs such as `out("ecx") val`
// The *pointer itself* is only an input even though it's written behind
in(reg) name_buf.as_mut_ptr(),
// select cpuid 0, also specify eax as clobbered
inout("eax") 0 => _,
// cpuid clobbers these registers too
out("ecx") _,
out("edx") _,
);
}
let name = core::str::from_utf8(&name_buf).unwrap();
println!("CPU Manufacturer ID: {}", name);
}
In the example above we use the cpuid
instruction to read the CPU manufacturer ID.
This instruction writes to eax
with the maximum supported cpuid
argument and ebx
, esx
, and ecx
with the CPU manufacturer ID as ASCII bytes in that order.
Even though eax
is never read we still need to tell the compiler that the register has been modified so that the compiler can save any values that were in these registers before the asm. This is done by declaring it as an output but with _
instead of a variable name, which indicates that the output value is to be discarded.
This code also works around the limitation that ebx
is a reserved register by LLVM. That means that LLVM assumes that it has full control over the register and it must be restored to its original state before exiting the asm block, so it cannot be used as an output. To work around this we save the register via push
, read from ebx
inside the asm block into a temporary register allocated with out(reg)
and then restoring ebx
to its original state via pop
. The push
and pop
use the full 64-bit rbx
version of the register to ensure that the entire register is saved. On 32 bit targets the code would instead use ebx
in the push
/pop
.
This can also be used with a general register class (e.g. reg
) to obtain a scratch register for use inside the asm code:
Symbol operands and ABI clobbers
By default, asm!
assumes that any register not specified as an output will have its contents preserved by the assembly code. The clobber_abi
argument to asm!
tells the compiler to automatically insert the necessary clobber operands according to the given calling convention ABI: any register which is not fully preserved in that ABI will be treated as clobbered. Multiple clobber_abi
arguments may be provided and all clobbers from all specified ABIs will be inserted.
Register template modifiers
In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).
By default the compiler will always choose the name that refers to the full register size (e.g. rax
on x86-64, eax
on x86, etc).
This default can be overriden by using modifiers on the template string operands, just like you would with format strings:
In this example, we use the reg_abcd
register class to restrict the register allocator to the 4 legacy x86 registers (ax
, bx
, cx
, dx
) of which the first two bytes can be addressed independently.
Let us assume that the register allocator has chosen to allocate x
in the ax
register.
The h
modifier will emit the register name for the high byte of that register and the l
modifier will emit the register name for the low byte. The asm code will therefore be expanded as mov ah, al
which copies the low byte of the value into the high byte.
If you use a smaller data type (e.g. u16
) with an operand and forget the use template modifiers, the compiler will emit a warning and suggest the correct modifier to use.
Memory address operands
Sometimes assembly instructions require operands passed via memory addresses/memory locations.
You have to manually use the memory address syntax specified by the target architecture.
For example, on x86/x86_64 using Intel assembly syntax, you should wrap inputs/outputs in []
to indicate they are memory operands:
Labels
Any reuse of a named label, local or otherwise, can result in an assembler or linker error or may cause other strange behavior. Reuse of a named label can happen in a variety of ways including:
- explicitly: using a label more than once in one
asm!
block, or multiple times across blocks. - implicitly via inlining: the compiler is allowed to instantiate multiple copies of an
asm!
block, for example when the function containing it is inlined in multiple places. - implicitly via LTO: LTO can cause code from other crates to be placed in the same codegen unit, and so could bring in arbitrary labels.
As a consequence, you should only use GNU assembler numeric local labels inside inline assembly code. Defining symbols in assembly code may lead to assembler and/or linker errors due to duplicate symbol definitions.
Moreover, on x86 when using the default Intel syntax, due to an LLVM bug, you shouldn't use labels exclusively made of 0
and 1
digits, e.g. 0
, 11
or 101010
, as they may end up being interpreted as binary values. Using options(att_syntax)
will avoid any ambiguity, but that affects the syntax of the entire asm!
block. (See Options, below, for more on options
.)
This will decrement the {0}
register value from 10 to 3, then add 2 and store it in a
.
This example shows a few things:
- First, that the same number can be used as a label multiple times in the same inline block.
- Second, that when a numeric label is used as a reference (as an instruction operand, for example), the suffixes “b” (“backward”) or ”f” (“forward”) should be added to the numeric label. It will then refer to the nearest label defined by this number in this direction.
Options
By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However, in many cases it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.
Let's take our previous example of an add
instruction:
Options can be provided as an optional final argument to the asm!
macro. We specified three options here:
pure
means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.nomem
means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).nostack
means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86-64 to avoid stack pointer adjustments.
These allow the compiler to better optimize code using asm!
, for example by eliminating pure asm!
blocks whose outputs are not needed.
See the reference for the full list of available options and their effects.
macro
Tests
Write Tests
Benchmark
https://doc.rust-lang.org/unstable-book/library-features/test.html
Unit and Integration
Assertions
Async/Await
async and await!
Future
Pin and Unpin
Stream
Stand Library todo
String
Fighting with Compiler
Fighting with compiler is very common in our daily coding, especially for those unfamiliar with Rust.
This chapter will provide some exercises to help us avoid such cases to lower the steep learning curve.
Borrowing
- 🌟🌟