thumb2-stack-size

A tool for estimating Thumb/Thumb2 binary stack usage.

Microcontroller development often requires explicit management of stacks. Microcontrollers by and large do not have virtual addressing capabilities, thus their stacks must be pre-allocated entirely in advance. They also mostly feature only small quantities of RAM, thus stacks on microcontrollers are usually small. This present a bit of a dilemma: enough stack space must be allocated to ensure that no overruns occur, but as little memory as possible should be wasted on stacks.

This tool can estimate how much stack space is used by a given Thumb or Thumb2 binary.

Prerequisited

thumb2-stack-size supports only 32 bit statically linked ARM executable ELF files at this time. The ELF files must have symbols information, debug information is not necessary. The symbol table are used to reconstruct functions from the input binary.

Estimation algorithm

The input file is read and split into functions using the symbol table to locate functions and their sizes. Each function, at this point a stream of instructions, is then parsed instruction for instruction. If any instructions are encountered that are not defined by the ARM ISA an error is raised and estimation is aborted. ("Defined" includes the permanently undefined instructions udf and udf.w).

The resulting functions are run through a virtual machine that tracks stacks manipulations done by each instruction. Stack manipulations can be either static (eg push, pop, reserving large stack chunks at once) or dynamic (eg alloca). Traces can also be static or dynamic, but may also be divergent: if two traces lead to the same instruction, but reach that instruction with different stack pointer values, the function cannot be analyzed fully. The tool will attempt a best-effort analysis and calculate a lower bound on the functions stack requirements.

All possible traces through the function are simulated to give three possible results for a given function:

This tool notable does not support analysis of table branches (ie the tbb and tbh instructions). Table branches are often emitted by the Rust formatting code. Whenever such a branch is discovered a warning is issued. The binary can be recompile with -C llvm-args=-min-jump-table-entries=99999 and reanalyzed; the -min-jump-table-entries option is set to such a high value here that LLVM will not allowed to emit table branches. The resulting binary is functionally identical with the same stack properties, but likely larger.

Output format

When run without options the tool will output all warnings, followed by a table that contains one line for each function. The table lists

Each warning is annotated with the (demangled) name of the symbol that caused the warning and its address. The address is useful to disambiguate functions that have the same demangled name, as is common for generic functions that are called with different type signatures.

A small program that contains indirect branches (in this case calls through a function pointer) could produce the following output:

``` warning: function __reset @ 8000040 cannot be statically analyzed * indirect branches found warning: function hcl::timer::TimerSet::tick @ 80001cc cannot be statically analyzed * indirect branches found warning: function timer::systick @ 8000080 cannot be statically analyzed * calls to dynamic functions found: 80001cc,

8000040 0 dynamic _reset 8000080 48 dynamic timer::systick 8000084 128 static timer::main 8000134 136 static _hclentry 800013e 0 static hcl::qlist::QlistData::unlink 800015c 0 static hcl::qlist::QlistData::popfront 8000168 0 static >::unsafeinsertbefore 80001cc 48 dynamic hcl::timer::TimerSet::tick 800023c 32 static hcl::timer::TimerSet::addstub 8000272 0 static hcl::timer::invokeindirect 8000296 16 static hcl::timer::invokeindirect 80002b6 32 static hcl::timer::invokeindirect ```

Additionally to the table the tool can also output the call graph of all functions. The call graph may be used to track down indirect function calls if the developer has knowledge about certain functions that is not available to the estimation algorithm.

The call graph is split into newline-separated sections, one section for each function in the input. Each function is traversed in depth-first order, all callees of the function are listed recursively in the process. Mutually recursive functions are also printed, but the call graph is truncated with an ellipsis marker at the first function that appears twice in the graph.

Each line of a call graph contains the address of the called function, beginning with the function for which the call graph is printed. For each function the analysis result is printed as well, followed by the function name.

The same binary as above produces the following call graph:

``` 8000040 0 dynamic __reset

8000080 48 dynamic timer::systick 80001cc 48 dynamic └ hcl::timer::TimerSet::tick 800015c 0 static ├ hcl::qlist::QlistData::popfront 800013e 0 static │ └ hcl::qlist::QlistData::unlink 8000168 0 static └ >::unsafeinsert_before

8000084 128 static timer::main 800023c 32 static └ hcl::timer::TimerSet::addstub 8000168 0 static └ >::unsafeinsert_before

8000134 136 static _hclentry 8000084 128 static └ timer::main 800023c 32 static └ hcl::timer::TimerSet::addstub 8000168 0 static └ >::unsafeinsert_before

800013e 0 static hcl::qlist::QlistData::unlink

800015c 0 static hcl::qlist::QlistData::pop_front 800013e 0 static └ hcl::qlist::QlistData::unlink

8000168 0 static >::unsafeinsertbefore

80001cc 48 dynamic hcl::timer::TimerSet::tick 800015c 0 static ├ hcl::qlist::QlistData::popfront 800013e 0 static │ └ hcl::qlist::QlistData::unlink 8000168 0 static └ >::unsafeinsert_before

800023c 32 static hcl::timer::TimerSet::addstub 8000168 0 static └ >::unsafeinsert_before

8000272 0 static hcl::timer::invoke_indirect

8000296 16 static hcl::timer::invoke_indirect 800013e 0 static └ hcl::qlist::QlistData::unlink

80002b6 32 static hcl::timer::invokeindirect 800013e 0 static ├ hcl::qlist::QlistData::unlink 8000168 0 static └ >::unsafeinsert_before

8000040 0 dynamic _reset 8000080 48 dynamic timer::systick 8000084 128 static timer::main 8000134 136 static _hclentry 800013e 0 static hcl::qlist::QlistData::unlink 800015c 0 static hcl::qlist::QlistData::popfront 8000168 0 static >::unsafeinsertbefore 80001cc 48 dynamic hcl::timer::TimerSet::tick 800023c 32 static hcl::timer::TimerSet::addstub 8000272 0 static hcl::timer::invokeindirect 8000296 16 static hcl::timer::invokeindirect 80002b6 32 static hcl::timer::invokeindirect ```

Note that hcl::timer::invoke_indirect appears multiple times, bit with different addresses.

This view may be used to deduce more information about stack usage. In the example case it is known that systick calls hcl::timer::TimerSet::tick, which in turn may call any of the hcl::timer::invoke_indirect functions. Here tick uses at least 48 bytes, plus at most 32 bytes for any of the invoke_indirect functions. With this we can deduce that tick will not require more than 80 bytes of stack space.