cargo-call-stack
Static, whole program stack analysis
Other examples: Embedded CoAP / IPv4 server (source) "Hello, world!"
HEADS UP: This tool relies on an experimental feature (-Z stack-sizes
)
and implementation details of rustc
(like symbol mangling) and could stop
working with a nightly toolchain at any time. You have been warned!
The tool produces the full call graph of a program as a [dot file].
A start point can be specified to analyze only the call graph that begins at that function.
Each node (function) in the call graph includes the local stack usage of the
function, if available (see [-Z emit-stack-sizes
]).
If there's at least some stack usage information available then the maximum stack usage of each function is also computed, or at least a lower bound is provided. Maximum stack usage of a function here refers to the stack usage that includes the stack used by functions that the function may invoke.
The tool has imperfect support for calls through function pointers (fn()
)
and dynamic dispatch (dyn Trait
). You will get a call graph from programs
that do indirect calls but it will likely be missing edges or contain
incorrect edges. It's best to use this tool on programs that only do direct
function calls.
console
$ # NOTE always use the latest stable release
$ cargo +stable install cargo-call-stack
The tool builds your program in release mode with LTO enabled, analyses it and
then prints a dot file to stdout. See cargo call-stack -h
for a list of build
options (e.g. --features
).
IMPORTANT: As the author of the program you must ensure that the
.stack_sizes
section survives the linker (doesn't get GC-ed). Unfortunately we
can't do that for you because tweaking the linking process from Cargo is tricky
and error prone. If you are using the [cortex-m-rt
] crate then everything will
Just Work. But if you are not then you'll need to use a linker script; see the
documentation on [-Z emit-stack-sizes
] for more information. If you don't
preserve the .stack_sizes
then the tool will have zero information about the
stack usage of your program.
console
$ cargo +nightly call-stack --example app > cg.dot
warning: assuming that asm!("") does *not* use the stack
warning: assuming that asm!("") does *not* use the stack
Graphviz's dot
can then be used to generate an image from this dot file.
console
$ dot -Tsvg -Nfontname='Fira Code' -Nshape=box cg.dot > cg.svg
Each node in this graph represents a function, which could be a free function,
an inherent method or a trait method. Each directed edge indicates a "calls"
relationship. For example, in the above graph Reset
calls both main
and
DefaultPreInit
.
Each node also contains its local
stack usage in bytes and its max
-imum
stack usage, also in bytes. The maximum stack usage includes the stack usage of
all the other functions that the function could invoke.
This is the no_std
program used to generate the call graph shown above.
``` rust
extern crate panic_halt;
use core::ptr;
use cortexmrt::{entry, exception};
fn main() -> ! { foo();
bar();
loop {}
}
fn foo() { // spill variables onto the stack unsafe { asm!("" : : "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5)) } }
fn bar() { unsafe { asm!("" : : "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5) "r"(6) "r"(7)) } }
fn SysTick() { bar(); }
fn baz() {
let x = 0;
unsafe {
// force x
to be on the stack
ptr::read_volatile(&&x);
}
} ```
In the previous example the call graph contained disconnected subgraphs. The reason for that is exceptions (also known as interrupts).
SysTick
, for example, is an exception handler that can preempt any function called fromReset
. This exception handler is never called from software but can be invoked by the hardware at any time. These exception handlers can appear as the roots of disconnected subgraphs.
In some cases you may be interested in the maximum stack usage of a particular function. The tool lets you specify a start point which will be used to filter the call graph to only include nodes reachable from that function.
If we invoke the tool on the previous program but select main
as the start
point we get this call graph:
console
$ cargo +nightly call-stack --example app main > cg.dot
warning: assuming that asm!("") does *not* use the stack
warning: assuming that asm!("") does *not* use the stack
Notice that SysTick
and baz
don't appear in this call graph since they are
not reachable from main
.
The tool can, in some cases, compute the maximum stack usage of programs that involve recursion. Recursion appears as cycles in the call graph. Consider the following example:
``` rust
extern crate panic_halt;
use core::sync::atomic::{AtomicBool, Ordering};
use cortexmrt::{entry, exception};
static X: AtomicBool = AtomicBool::new(true);
fn main() -> ! { foo();
quux();
loop {}
}
// these three functions form a cycle that breaks when SysTick
runs
fn foo() { if X.load(Ordering::Relaxed) { bar() } }
fn bar() { if X.load(Ordering::Relaxed) { baz() } }
fn baz() { if X.load(Ordering::Relaxed) { foo() } }
fn quux() { // spill variables onto the stack unsafe { asm!("" : : "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5)) } }
fn SysTick() { X.store(false, Ordering::Relaxed); } ```
It produces the following call graph:
The functions foo
, bar
and baz
use zero stack space thus the cycle formed
by them also uses zero stack space. In this particular case the maximum stack
usage of main
can be computed.
For the curious this is the disassembly of the "cyclic" program:
``` armasm
08000400
08000414
08000428
0800043c
08000450
And yes, the estimated maximum stack usage is correct as shown in this debug session:
``` console (gdb) b app::foo
(gdb) b app::bar
(gdb) b app::baz
(gdb) c Continuing.
Breakpoint 3, main () at src/main.rs:16 16 foo();
(gdb) p $sp $1 = (void *) 0x20005000
(gdb) c Continuing. halted: PC: 0x08000400
Breakpoint 4, app::foo () at src/main.rs:31 31 if X.load(Ordering::Relaxed) {
(gdb) p $sp $2 = (void *) 0x20005000
(gdb) c Continuing. halted: PC: 0x0800040c
Breakpoint 5, app::bar () at src/main.rs:38 38 if X.load(Ordering::Relaxed) {
(gdb) p $sp $3 = (void *) 0x20005000
(gdb) c Continuing. halted: PC: 0x08000420
Breakpoint 6, app::baz () at src/main.rs:45 45 if X.load(Ordering::Relaxed) {
(gdb) p $sp $4 = (void *) 0x20005000
(gdb) c Continuing. halted: PC: 0x08000434
Breakpoint 4, app::foo () at src/main.rs:31 31 if X.load(Ordering::Relaxed) {
(gdb) p $sp $5 = (void *) 0x20005000 ```
In some cases the tool can produce correct call graphs for programs that use trait objects -- more details about where and how it fails in the "Known limitations" section. Here's an example:
``` rust
extern crate panic_halt;
use cortexmrt::{entry, exception}; use spin::Mutex; // spin = "0.5.0"
static TO: Mutex<&'static (dyn Foo + Sync)> = Mutex::new(&Bar);
fn main() -> ! { // trait object dispatch (*TO.lock()).foo();
Quux.foo();
loop {}
}
trait Foo { // default implementation of this method fn foo(&self) -> bool { // spill variables onto the stack unsafe { asm!("" : : "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5)) }
false
}
}
struct Bar;
// uses the default method implementation impl Foo for Bar {}
struct Baz;
impl Foo for Baz { // overrides the default method fn foo(&self) -> bool { unsafe { asm!("" : : "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5) "r"(6) "r"(7)) }
true
}
}
struct Quux;
impl Quux { // not a trait method! #[inline(never)] fn foo(&self) -> bool { // NOTE(asm!) side effect to preserve function calls to this method unsafe { asm!("NOP" : : : : "volatile") }
false
}
}
// this handler can change the trait object at any time
fn SysTick() { *TO.lock() = &Baz; } ```
The tool produces the following call graph:
Here i1 ({}*)
denotes dynamic dispatch of a method with (Rust) signature
fn(&[mut] self) -> bool
. The dynamic dispatch can invoke either Bar.foo
,
which boils down to the default method implementation (app::Foo::foo
in the
graph), or Baz.foo
(<app::Baz as app::Foo>::foo
in the graph). In this case
the tool does not a draw an edge between i1 ({}*)
and Quux::foo
, whose
signature is also fn(&self) -> bool
, so the call graph is accurate.
If you are wondering why we use LLVM notation for the function signature of the
trait method: that's because the tool operates on LLVM-IR where there's no
bool
primitive and most of Rust's type information has been erased.
In some cases the tool can produce correct call graphs for programs that
invoke functions via pointers (e.g. fn()
). Here's an example:
``` rust
extern crate panic_halt;
use core::sync::atomic::{AtomicPtr, Ordering};
use cortexmrt::{entry, exception};
static F: AtomicPtr
fn main() -> ! { if let Some(f) = unsafe { F.load(Ordering::Relaxed).as_ref() } { // call via function pointer f(); }
loop {}
}
fn foo() -> bool { // spill variables onto the stack unsafe { asm!("" : : "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5)) }
false
}
fn bar() -> bool { unsafe { asm!("" : : "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5) "r"(6) "r"(7)) }
true
}
// this handler can change the function pointer at any time
fn SysTick() { F.store(bar as *mut _, Ordering::Relaxed); } ```
The tool produces the following call graph:
The node i1 ()*
represents a call via function pointer -- the LLVM type i1
()*
is equivalent to Rust's fn() -> bool
. This indirect call could invoke
foo
or bar
, the only functions with signature fn() -> bool
.
To reason about indirect function calls the tool uses the type information available in the LLVM-IR of the program. This information does not exactly match Rust's type information and leads to mislabeling of functions. For example, consider this program:
``` rust
extern crate panic_halt;
use core::{ ptr, sync::atomic::{AtomicPtr, Ordering}, };
use cortexmrt::{entry, exception};
static F: AtomicPtr
fn main() -> ! { if let Some(f) = unsafe { F.load(Ordering::Relaxed).as_ref() } { // call via function pointer f(); }
let x = baz();
unsafe {
// NOTE(volatile) used to preserve the return value of `baz`
ptr::read_volatile(&x);
}
loop {}
}
// this handler can change the function pointer at any time
fn SysTick() { F.store(bar as *mut _, Ordering::Relaxed); }
fn foo() -> u32 { // spill variables onto the stack unsafe { asm!("" : : "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5)) }
0
}
fn bar() -> u32 { 1 }
fn baz() -> i32 { unsafe { asm!("" : : "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5) "r"(6) "r"(7)) }
F.load(Ordering::Relaxed) as usize as i32
} ```
The tool produces the following call graph:
Note that the node that represents the indirect function call has type i32 ()*
(fn() -> i32
), not u32 ()*
. The reason is that there's no u32
type in
LLVM, there are only signed integers. This leads the tool to wrongly add an edge
between i32 ()*
and baz
. If the tool had Rust's type information then this
edge would have not been added.
Due to how LLVM works all compiler intrinsics, software implementations of functionality not available as instructions in the target ISA (e.g. multiplication of 64-bit integers), need to be a separate object file that gets linked into the final binary.
In Rust all these compiler intrinsics are packed in the
libcompiler_builtins
rlib. This rlib is distributed via rustup
and always
compiled without -Z emit-stack-sizes
so it contains no stack usage
information. Furthermore, the metadata in the compiler-builtins
crate is never
accessed when compiling a crate so no LLVM-IR is ever produced from it thus the
tool has no information about the call dependencies of the compiler intrinsics.
All these unknowns are currently papered over in the tool using "ad hoc
knowledge". For example, we now that __aeabi_memclr4
invokes
__aeabi_memset4
and that __aeabi_memset4
uses 8 bytes of stack on
thumbv7m-none-eabi
as of Rust 1.33.0 so the tool uses this information when
building the call graph. Obviously, this approach doesn't scale and this ad hoc
knowledge is likely to get outdated as compiler intrinsics are modified (to
optimize them) over time.
The tool assumes that all instances of inline assembly (asm!
) use zero bytes
of stack. This is not always the case so the tool prints a warning message
for each asm!
string it encounters.
The tool assumes that branching (calling a function) does not use the stack (i.e. no register is pushed onto the stack when branching). This may not be true on all the architectures that Rust supports -- it is true on ARM Cortex-M.
The tool only supports ELF binaries because -Z emit-stack-sizes
only supports
the ELF format.
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.