Having spent a number of Codeforces rounds practicing the ins and outs of Rust, I think it's finally safe to encourage wider participation in the language! This guide is intended primarily for C++ programmers who may have taken interest in Rust, but had doubts regarding its feasibility in timed contests. On Quora, I summarized why I think Rust is suitable for contests in 2019 onward. Granted, the learning curve is substantial enough that it may not seem that way at first. To make the transition easier, for anyone who's interested, here I'll go over some aspects of my Codeforces submissions in more detail, with some tips along the way.
My solution to 1168C: And Reachability
My solution to 1158D: Winding polygonal line
Right away, you'll notice certain characteristics:
The only global variable is a compile-time constant. By scoping things appropriately, I don't have to worry about accidentally forgetting to reset data.
Very few mutable variables. This makes code easier to reason about, and mistakes less likely.
Very strict type-checking. For example, I might use the
usize
type for array indices, andi64
for geometric coordinates. Conversions between the two must be made explicit. This seems annoying at first, but it caught quite a few of my bugs.A polymorphic
Scanner.next()
method. It can read space-separated tokens of any type that implements the traitFromStr
.Output via
BufWriter
. This is needed for speed, if you want to write a large number of lines.BufWriter
flushes automatically when it goes out of scope, but you'll probably want toflush()
manually on interactive problems. Further I/O optimizations are possible (see comments section), but this is the major one and is sufficient for contests.A mix of imperative-style and functional-style constructions, depending on which is clearer.
In Rust, you can read a Vec
(i.e., vector
in C++, ArrayList
in Java) of floats from standard input in imperative style:
let mut v = Vec::with_capacity(n);
for _ in 0..n {
let elem = scan.next::<f64>();
v.push(elem)
}
Or you can do it in functional style, rendering the result immutable:
let v: Vec<f64> = (0..n).map(|_| scan.next()).collect();
Both versions are very efficient: the Vec
initially allocates space for n
elements, similar to v.reserve(n)
in C++!
You can "consume" (i.e., move) a Vec
if you won't need it anymore:
for elem in v { // equivalent to v.into_iter().for_each(|elem| ...)
// do something with elem
}
Or borrow its contents mutably, to change some of its elements:
for elem in &mut v { // equivalent to v.iter_mut().for_each(|elem| ...)
// do something with *elem
}
Or borrow its contents immutably:
for elem in &v { // equivalent to v.iter().for_each(|elem| ...)
// do something with *elem
}
If the elements implement Copy
, the dereference can be copied into the variable elem
by pattern matching. This time, let's also keep track of the index:
for (i, &elem) in v.iter().enumerate() {
// do something with elem
}
Rust String
s are UTF-8. To get random access, you'll have to convert them to .bytes()
or .chars()
. And if you're reading a String
made entirely of 0s and 1s? Convert them to bool
s as follows:
let s: String = scan.next();
let v: Vec<bool> = s.chars().map(|ch| ch == ‘1’).collect();
My 1168C submission features the following rather magical line:
let (zero_bits, one_bits): (Vec<usize>, Vec<usize>) =
(0..BITS).partition(|b| (ai & (1usize << b)) == 0);
Where did I get the idea to do this? Well, I wanted to know which positions of the binary number ai
contain a 0, and which contain a 1. I knew that the Range
object 0..BITS
implements the Iterator
trait, so I Googled "rust iterator", landed on a doc page, and browsed through the list of methods. (0..BITS).filter().collect()
seems close to what I wanted: it selects the bits which satisfy a given condition. However, (0..BITS).partition()
does more, giving both the passing and the failing bits! Note that collect()
and partition()
are polymorphic, so I could just as easily have obtained HashSet
s instead.
As you can see, the language is very expressive, and the standard library quite flexible. One very interesting finding from my experience with Rust is that "production-quality" code and "quick hacky" code look much more alike than they do in C++. This is because Rust not only makes it harder to do things the wrong way, but also makes it much easier to do things the right way. As a result, I naturally find myself coding in a clearer style, even under time pressure.
Overall, my solutions attain much fewer WA verdicts in Rust than they did in C++. Development time is sometimes more, sometimes less, but it gets better with practice. Best of all, I now find coding to be a less paranoid experience: I can have fun with problems instead of being hyper-alert to programming errors. Try it out and see!
As additional resources, feel free to check out my past submissions on Codeforces, my competitive programming codebook full of algorithms and data structures you can use, and Rust's new dbg!() macro for a more informative way to inspect run-time values. Leave your competitive Rust questions in the comments, and good luck!
there's my IO template for rust: 53175781. Yeah, it's not pretty, but it is very, very fast, on par with fastest C++ templates (
getchar
etc).Also rust compiler version on codeforces is a bit outdated, so you need to put
into your
Cargo.toml
The Rust version on Codeforces is 1.31, which is exactly the release of 2018 edition!
Just for fun, I re-submitted your solution with my Scanner. It seems to work well enough, though indeed yours is faster: https://mirror.codeforces.com/contest/1151/submission/55073106
codeforces compiler can't deal with "non-lexical lifetimes".
- compilation error. Hence
edition = "2015"
Actually you can use
getchar
from Rust.https://mirror.codeforces.com/contest/947/submission/41213727
If I understand correctly, at linking stage the program is linked to the symtem's libc. Hence that
extern { fn getchar() -> i32 }
works. Tho there's nogetchar_unlocked
since it's not compiled on POSIX system.what a dirty hack! I like it. However, my template can read from any
BufReader
. For example fromVec<u8>
, which is good for testing.Looks like my unsafe
Scanner
is now even faster thangetchar()
!https://mirror.codeforces.com/contest/947/submission/55268770
Thanks for the discussion everyone. It took a while to get to this point, but I think Rust is seriously competitively viable now!!
What I'd like to know is which part of
Scanner<StdinLock>
makes it slower, and what can we do to get optimal performance while being generic inT: FromStr
?Could it be that
read_line
reads too little at a time? But it seems wrapping theStdinLock
in aBufReader
doesn't help much...Is it the fact that Scanner reads to a
String
, which must be checked to be valid UTF-8? If so, that's a waste since it seems the source code forparse::<i64>()
ends up calling.as_bytes()
anyway.Is it because I'm allocating new
String
s for each word and storing them in aVec
? If so, maybe we can use theSplitWhitespace
object directly without the allocations.Is the
parse()
method itself much slower than your custom digit multiplication? It shouldn't be, since the implementation seems similar, just with a few extra checks for the sign and such.yes, it copies bytes from buffer to string, then validates this string. This leads to serious slowdown.
Thanks. Alright, time to bring this to the attention of the Rust community: https://stackoverflow.com/questions/56449027
Gosh I had never been active on sites like Quora, Reddit, Stackoverflow before...
I don't know how to profile properly, but my unscientific measurements suggest the
Scanner<StdinLock>
(maybe even without the lock) is at most twice as slow asFastInput
, which is probably good enough!More data would be nice though.
Hey check this out! By avoiding the
Vec<String>
allocations, I made it almost as fast as yours! It's ugly though because it has to read line by line, passing around references to aString
and aStdinLock
. I commented the nicer version which, unfortunately, doesn't compile. Would you care to play against the lifetime checker to see if it can be made to work?Edit: improved, but with an unsafe operation...
Edit2: and the really unsafe dark arts version! It's as fast as you could ever want, I think, and generic so you can use it with any
FromStr
type.hello, is their a way to something like "has_next()" to get whether there is non-empty and non-whitespace input remaining or not...???
If yes, please share it, thanks :D
If you just want to know if the current line has more tokens, you can check the buffer's status. I didn't implement this, but you can derive it from any of the existing
Scanner
implementations.However, checking whether there's a next line might stall if
stdin
is waiting for the user to type something.Regarding
C++ has
const
, but it's not the default, and it's inconvenient to use them because ofstd::cin
.g++ also have
-Wconversion
for warning for e.g. conversion fromlong long
toint
. I often use it and it's quite useful (although it's annoying that it doesn't recognize that(long long)a*b%MOD
can be converted toint
safely ifMOD
is anint
.I tried using Rust one year ago. Here's my template: https://mirror.codeforces.com/contest/1036/submission/42644334. It's written for Rust 1.15 (atCoder still hasn't updated since) that's why it contains some weird workarounds. It supports reading in tuples, which makes the input very nice in my opinion, especially together with automatic type inference:
I had to do my own I/O implementation anyways, since stdin() and stdout() are impossible to use efficiently (every time you read or write something, a mutex gets locked and I tested that that's already too slow for some tasks).
Note that I have written my own
debug!()
macro beforedbg!()
even existed (or was discussed about).My two main reason against Rust are the lack of a good codebook (there's this one which is pretty good though: https://github.com/hatoo/competitive-rust-snippets -- it even comes with its snippet library: https://github.com/hatoo/cargo-snippet) and that I am turned down by the old compilers available on CF and atCoder. Now that CF has updated, I might give it another try.
The current command line to compile Rust solutions is
rustc --edition=2018 -O -C link-args=/STACK:268435456 --cfg ONLINE_JUDGE %1
It'll be really nice if you add text_io crate within compilation so we can use easier io more easily without the need of writing it down by ourselves.
Why no second level compiler optimization is used here similar to C/C++?
My bad got it..
rng_58 it would be great to upgrade version of Rust in Atcoder to 1.35.0 as in Codeforces. The current version (1.15.0), is practically useless and the language has evolved a lot since that.
cc: snuke, chokudai
We are currently discussing language updates. If you can read Japanese, you can get information about language updates here. (Discussed by Japanese AtCoder participants) https://docs.google.com/spreadsheets/d/1PmsqufkF3wjKN6g1L0STS80yP4a6u-VdGiEv5uOHe0M/edit#gid=0
I modified your code and added a
reader
field forScanner
And my test for p71a looks like
The stdin can be replaced with a string reader, so easier to test
Yep! My contest submissions are a bit abbreviated, but you can find my full Scanners here.
I had to use
BufRead
because interactive problems need to read line-by-line. Reading from Strings is still possible (see unit tests), meeting theBufRead
requirement using eitheras_bytes
orCursor
.Would it be possible to add itertools to the available crates when compiling? It's pretty much a must-have, especially when writing algorithms.
Anyone uses Rust to solve SGU problems? My solutions in Rust even fail at the simplest ones, e.g., SGU 118 (Wrong answer on test 2). I compared my code with others' in C/C++ but couldn't find any substantial differences. My code:
Could anyone take a look? Thanks.
It seems that you must print "\r\n" as newline for the problem.
https://mirror.codeforces.com/problemsets/acmsguru/submission/99999/71428058
[submission:https://mirror.codeforces.com/contest/793/submission/77476009]
Byte by byte reading using macros.
Reads
n
andVec<i32>
of sizen
. That's all what you need to write to read an input.MikeMirzayanov, can codeforces add
proconio
crate for rust just like what atcoder does? This will be useful for competitive programming in rust.I am developing a proconio like web application (in Rust). It only supports Python and C++ at the moment but if there is some need for Rust supports among CF users, I am willing to do some works.
https://akiradeveloper.github.io/procon-input/
When I solved this problem, my optimal program in Rust fails because of TLE.
I tried fast I/O from here, and found that fast writing save more time than fast reading, so I suggest to use
write!()
instead ofprintln!()
because it's faster. Use it this way:My solution with
println!()
works more than 1000ms and gets TLE, but the same solution withwrite!()
works 171ms.Also, when I added the fast read from this comment, I get 140ms instead of 171ms.
How do you deal with random values for several tasks like maintaining treaps, or just randomized solutions?
I usually use my own implementation of xorshift random number generator. For example 125871940
ekzhang also contributed a random number generator to rust-algorithms!
Any insights about Rust-vs-C++ performance from people who've tried it out in programming contests?
As far as I know, in theory either can happen: Rust can outperform C++ in some aspects (based on the compiler having better understanding of lifetimes, ranges, etc.), but can also be behind in others (e.g., in case one uses array indices extensively and the compiler can't optimize away the boundary checks). Also the differences can stem from different standard library algorithms: Rust's BTreeMap B-Tree vs. C++'s std::map red-black tree (usually).
But I'm curious whether anyone could share real-world, non-synthetic data, e.g., from solutions with segment trees, complex DP, etc.?
Some time ago I was also interested in whether Rust can indeed outperform C++ in lifetimes and so on, so I decided to test and came up with a benchmark. The benchmark idea was to take $$$10^7$$$ roughly random numbers (powers of 3 modulo $$$2^{32}$$$), push them all into skew heap and pop in sorted order, while computing some hash. The idea was that Rust's
mem::replace
and similar functions that are useful when you rotate trees should do better job (in theory) than what you usually do in C++, and therefore Rust should be slightly faster.And it turned out that it is not. Moreover, my first Rust implementation works ~27s while my sample C++ implementation (with
unique_ptr
inside) works ~17s (10s of difference, not a typo). Then, after some experiments, I found an implementation in Rust that works ~17s. The difference was using manual casework instead ofmem::swap
inside merge procedure (see the code below, functionsubheap_merge
). The interesting thing is that both manual casework andstd::swap
work the same time in C++! I spent a lot of time trying to understand whymem::swap
gives 60% slowdown in Rust, but I still have no clue (and thus, I would be very grateful if someone could explain this to me).Here is the minimal working example of the effect. The difference between fast and slow versions of subheap_merge is only one line.
As for the comparison C++ vs Rust. The fastest Rust implementation I came up with was slightly faster than sample C++ implementation, with the difference less than the average fluctuation. However, I am more concerned about the unexpected 60% slowdown which is not present in C++. All the results (and the slowdown) are consistent on both my PCs.
Do you still have the C++ code that was used for this comparison? Was it compiled by Clang++ using the same LLVM version as Rust or have you used GNU C++?
GCC and LLVM are both modern state of the art code generation backends and generally provide similar performance in practice, but in each individual case one of them may be faster than the other. By the way, GCC will eventually have Rust as one of the supported languages too: https://www.phoronix.com/news/GCC-Rust-v4-Cleared-For-Landing
I don't, but it was pure line-to-line translation with
std::unique_ptr
playing the role ofOption<Box<Node<T>>>
. I think, I can reimplement it, if it is really necessary. Or you can even do it yourself.For most of my tests I used g++, but I ran clang a couple of times and it was giving roughly the same results. Not sure if LLVM backends had exactly the same versions, I was simply using the most modern versions of compilers that stable Debian was able to provide at the moment.
Anyway, I believe that the effect persists across the versions of the compilers, so you can choose whatever you want for your investigation.
I have profiled your fast/slow Rust code with rustc 1.66.0 on Intel Core i7 860 and compared them side by side.
The IPC is low and this is caused by a lot of cache misses. An interesting thing is that the results are rather unusual, because the "slow" version was supposed to be actually better by all metrics when looking at the hardware performance counters (fewer mispredicted branches, fewer executed instructions, fewer cache misses). But cachegrind shows nearly identical number of cache misses for both slow and fast versions on an emulated processor and this is what we would normally expect.
So for the "fast" version LLVM generates a conditional jump and for the "slow" version it generates a pair of CMOVs. The choice between conditional branches and CMOVs in the LLVM generated code is a rather hot topic and in some cases CMOVs are preferable.
Why are CMOVs causing a slowdown in this test even though they reduce the number of mispredicted branches? The answer is that CMOV instructions block speculative execution. So the processor is just waiting in the case if a CMOV needs some value that is still being fetched from memory. Whereas a conditional branch gets predicted and the execution continues doing various stuff and reading data from memory. If branch prediction was wrong and the execution took a wrong path, then the results of this speculative execution are discarded. But the data that had been fetched into the CPU cache while executing the wrong path still remains in the cache and manifests itself as more cache misses reported by
perf stat
. This also enables Spectre.The problem is not in the Rust frontend, but in the LLVM backend. How can we tell LLVM not to use CMOVs here? I tried to find an answer, but the only usable workaround that had any effect was to compile it as
rustc --target=i586-unknown-linux-gnu -O test_slow.rs
. The old 32-bit i586 processors did not support CMOV instructions.Cool! And why it did not happen with clang then? Pure luck? Some tricky C++ optimization?
The version of LLVM used by Clang could be possibly different or it was indeed just pure luck. But just like Rust, without the
-mllvm -x86-cmov-converter-force-all=true
option Clang tends to generate CMOV instructions too. See the code generated for functions foo1 and foo2 at https://godbolt.org/z/3TPhjf1fP (the whole program can be used as a benchmark).It's hard to say as you usually won't have both c++ and rust solutions naturally. I had not had any unexpected problems with getting tl in rust where I had not expected it
Although one time I had problem with fitting my solution with segment tree and treap into tl I wrote same in C++ and it was 1 second (out of 7) slower than my rust solution
I saw a couple of times when Rust is slower than C++ if you are not careful enough (but could be optimized to have the same speed as C++ if you actually receive TL).
For example if you try to implement the Floyd-Warshall algorithm with
Vec<Vec<_>>
, it will be ~1.5 slower than C++ version just because on everyg[i][j]
Rust does additional bounds-checks. Usually Rust can eliminate such checks, so they don't add almost any overhead, but in case of 2-dimensional arrays it doesn't work very well. But it is possible to "prove" to the compiler that checks are not neccessary.Another example — if you have a lot of short strings, C++ will use small string optimization, while Rust doesn't have it, so you potentially could get ML (or TL because it doesn't fit into the cache of some level).
One more difference — by default in Rust you need to use usize (which is usually 64 bits) when you want to store some index, while in C++ you probably use just int (which is 32 bits). For example it is natural to store graphs as
Vec<Vec<usize>>
in Rust, which takes x2 memory compared tovector<vector<int>>
in C++.While I haven't studied this scientifically, my experience on Codeforces suggests that "natural" Rust code (meaning, without crazy/unsafe low-level optimizations) is about equally likely to be faster or slower than natural C++, so that overall it's about as competitive as C++. Some of the standard library algorithms (like
sort_unstable
) and data structures (sets and maps) are really fast!The only bit of
unsafe
I actually use in contests is my I/OScanner
: I don't trust myself to write newunsafe
code under time pressure.Cool. As a Rust fan, I'm really thankful for this blog. By the way, I have a question.
I had this question but I couldn't get any response from the community. If anyone knows, please help https://mirror.codeforces.com/blog/entry/109692
The I/O in my codebook reads and writes only one line at a time, so I think it'll handle interactive problems. Did you try it that way? It's important for the output to flush after every line, which
println!
will do for you.Cool. That sounds reasonable. Do you have any example or a template for interactive problems?
I think you can use either of the two Scanners from https://github.com/EbTech/rust-algorithms/blob/master/src/scanner.rs for input, and regular
println!
for output.Or you can do it exactly as in this post, using
flush()
after an interactive round of outputs. Let me know if it doesn't work!That's an example https://mirror.codeforces.com/contest/1918/submission/246092663