Microcontroller startup code written in C++

The startup code of a microcontroller is the first code executed after a reset. It performs microcontroller specific initialisation as well as zero-clearing RAM, ROM-to-RAM static initialisation and calling constructors for static objects before calling main, as shown in the next figure:

Startup code is mostly written in assembly (sometimes in C) and usually requires detailed knowledge of the specific microcontroller as well as C/C++ initialisation. This is why in most cases the startup code is provided by the compiler vendor or the microcontroller manufacturer and the firmware application developer does not need to to bother with those details. However, in certain cases (e.g. when one want’s to develop one’s own RTOS) it can be necessary to write one’s own startup code.

When I started to work on the Snowfox port for the SiFive FE310-G002 found on the HiFive1 Rev. B board I’ve noticed that only hard-to-read assembly startup files where available which were tightly integrated with the hardware abstraction layer provided by SiFive. This is quite at odds with the Snowfox design philosophy of providing clean and testable abstractions and highly readable and and maintainable embedded code.

Since about the same time I happened to be reading Real-Time C++ by Christopher Kormanyos which has a chapter about writing your own startup code I decided to give it a try myself. Here’s the final result (don’t worry, I’ll talk in detail about what happens below – the complete startup code can be found here):

extern "C" void __start(void)
{
  /* Load global pointer */
  asm volatile("la gp, __global_pointer$");

  /* Initialize stack pointer */
  asm volatile("la sp, __stack_end");
  asm volatile("andi sp, sp, -16");

  /* Clear the BSS segment */
  init_bss();

  /* Perform the ROM-to-RAM initialisation */
  init_data();

  /* Perform preinit/init steps - call static ctors */
  preinit_array();
  init_array();

  /* Jump to main */
  asm volatile("call main");

  /* In case we should return from main perform deinitialisation - calling static dtors */
  fini_array();

  /* Loop forever and don't return */
  for(;;) { }
}

The startup code works closely with the linker script which defines the locations of the various memory types (Flash, RAM, …). The complete linker script can be found here.

MEMORY
{
  flash (rxai!w) : ORIGIN = 0x20010000, LENGTH = 0x0006a120
  ram   (wxa!ri) : ORIGIN = 0x80000000, LENGTH = 0x00004000
  itim  (wxa!ri) : ORIGIN = 0x08000000, LENGTH = 0x00002000
}

Furthermore the function (__start) to be executed after reset is defined in the linker script using the ENTRY command.

ENTRY( __start )

Since __start is located in a C++ file (startup.cpp) it will be compiled by the C++ compiler. C++ compilers perform various actions such as name mangling which is why the function declaration has to be prefixed with extern „C“ which is enforcing C calling convention for this function.

extern "C" void __start(void) { /* ... */

We need to make sure that the first instruction of the startup code is placed on the first address of the flash memory (0x2001 0000). This is achieved by a combination of linker script magic and function attributes. In the linker script a .text section for storing code is defined.

SECTIONS
{
  .text           :
  {
    *(.text.startup .text.startup.*)
    *(.text .text.*)
    /* ... */
  } >flash AT>flash :flash

By appending the section attribute to the declaration of __start we enforce the placement of the function in the desired section of the flash memory (which is at the beginning of the .text section which starts at the beginning of the flash at the address 0x2001 0000). The noreturn attribute informs the compiler that this function will never return and allows for certain optimisations.

extern "C" void __start(void) __attribute__ ((noreturn)) __attribute__ ((section (".text.startup")));

Now that we have ensured that the startup code is placed at the right place we need to take care of CPU specific initialisation. In case of the SiFive F310-G002 processors this is achieved by the following assembly statements (sorry, there’s need for a tiny bit of assembly):

asm volatile("la gp, __global_pointer$");
asm volatile("la sp, __stack_end");
asm volatile("andi sp, sp, -16");

The next step is to zero-clear all static and global variables which have not been explicitly initialized by the programmer. These variables are stored in RAM and are initialized to zero (for arithmetic types) or a null pointer (for pointer types) in C. In order to conviently zero-clear them in one swoop the linker is aggregating all those variables within a continous memory field known as the bss section. The start and the end address of the bss section are provided via the constants __bss_start and __bss_end defined in the linker script and made available to the compiler.

SECTIONS
{
  /* ... */
  PROVIDE( __bss_start = . );
  .bss            :
  {
    *(.bss .bss.*)
  } >ram AT>ram :ram
  PROVIDE( __bss_end = . );
  /* ... */

Zero clearing those variables is then as easy as zero-ing every value between the start and the end address of bss section.

extern uintptr_t __bss_start;
extern uintptr_t __bss_end;

void init_bss()
{
  std::fill(&__bss_start, &__bss_end, 0U);
}

Now we have to perform the ROM-to-RAM initialisation which concerns all static and global variables which have been explicitly initialised by the programmer. These variables are placed in a continuos section of the RAM known as data segment similarly to the bss segment for zero-cleared variables. The start and the end address of the bss segment are provided via the the constants __data_dst_start and __data_dst_end.

SECTIONS
{
  /* ... */
  PROVIDE( __data_dst_start = . );
  .data          :
  {
    *(.data .data.*)
  } >ram AT>flash :ram_init
  PROVIDE( __data_dst_end = . );
  /* ... */

The values with which those variables are initialized are stored in the flash in a section known as rodata.

SECTIONS
{
  /* ... */
  PROVIDE( __data_src_start = . );
  .rodata         :
  {
    *(.rodata .rodata.*)
  } >flash AT>flash :flash
  /* ... */

Since both data and rodata sections have the same length and internal structure performing the ROM (flash)-to-RAM initialisation is simply a matter of copying from the flash to the RAM.

void init_data()
{
  /* Calculate the size of the data section */
  std::size_t const cnt = (&__data_dst_end - &__data_dst_start);

  /* Copy the data initialisation code from flash to RAM */
  std::copy(&__data_src_start,
            &__data_src_start + cnt,
            &__data_dst_start);
}

Now it’s time to call the constructors for static or global objects. The function pointers for those constructors are stored in the flash known to the linker via the preinit_array and init_array sections. These are basically function pointers arrays where every entry points to a constructor for a static or global object.

SECTIONS
{
  /* ... */
  .preinit_array  :
  {
    PROVIDE (__preinit_array_start = .);
    KEEP (*(.preinit_array))
    PROVIDE (__preinit_array_end = .);
  } >flash AT>flash :flash

  .init_array     :
  {
    PROVIDE (__init_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
    KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors))
    PROVIDE (__init_array_end = .);
  } >flash AT>flash :flash
  /* ... */

Calling all those static constructors is now a simply matter of iterating over the function pointer array and calling every function.

typedef void(*FuncType)(void);

extern FuncType __preinit_array_start[];
extern FuncType __preinit_array_end  [];
extern FuncType __init_array_start   [];
extern FuncType __init_array_end     [];

void preinit_array()
{
  std::for_each(__preinit_array_start,
                __preinit_array_end,
                [](FuncType const func)
                {
                  func();
                });
}

void init_array()
{
  std::for_each(__init_array_start,
                __init_array_end,
                [](FuncType const func)
                {
                  func();
                });
}

With all initialisation complete we now have to call main. Since calling main directly from within C is forbidden we have to use assembly which looks like this for RISCV64-GCC:

asm volatile("call main");

Usually the main function does not return, but in case it should the destructors for all static and global objects should be called. The function pointers for those destructors are stored in the flash known to the linker via as fini_array section.

SECTIONS
{
  /* ... */
  .fini_array     :
  {
    PROVIDE (__fini_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*)))
    KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors))
    PROVIDE (__fini_array_end = .);
  } >flash AT>flash :flash
  /* ... */

In a similiar way to the constructors for static and global objects the destructors can be called by iterating over this function pointer array:

typedef void(*FuncType)(void);

extern FuncType __fini_array_start[];
extern FuncType __fini_array_end  [];

void fini_array()
{
  std::for_each(__fini_array_start,
                __fini_array_end,
                [](FuncType const func)
                {
                  func();
                });
}

After calling the destructors for all static and global objects I’ve choosen to run an infinite loop. Other possibilities are triggering a reset or calling a error handler function.