Archive for November, 2014

Notes To Myself: EFM32 and heaps of external SRAM

Goal:
Use the EFM32 microcontroller’s External Bus Interface (EBI) to place a large external SRAM and work with data larger than the chip’s internal memory will allow. Support dynamic memory allocation via standard malloc()/calloc() calls probably present in whatever 3rd-party code-snarfed-from-the-internet you are trying to integrate.

Solution:
First off, ignore any notes about needing to ground the 0th address bit on the memory and shift the remaining address lines, as stated in the EFM32 appnotes/manuals. Unless very explicity stated otherwise, 1 address increment == 1 address change at the memory’s word size. For example, changing A[0] on a 16-bit SRAM generally addresses the next 16-bit memory location.

Sidenote about external memory address lines: If they are actually numbered in the RAM’s datasheet, this is an extremely polite suggestion only. In practice, it doesn’t matter if A[0..n] from the MCU map to A[0..n] of the memory in order; if the address lines are swapped around, they are swapped around for both read and write, so it doesn’t matter one bit (har!) to the MCU. Incidentally, same goes for the data lines. So feel free to run them however makes the PCB routing easier.

Setting up the heap in external memory:
You probably want bulk data to go to the external RAM, but your stack and most of your code’s internal housekeeping in the internal memory, which is faster and likely eats less juice. Especially if that code is using malloc() and friends to access that memory, this means creating the heap in external RAM.

The EFM32’s internal RAM starts at 0x20000000. Unless you do something funky, memory on the EBI maps in starting at 0x80000000.

This means tweaks to the vendor-supplied linker file (*.ld) to…

a) Tell it about the memory:
MEMORY { FLASH (rx) : ORIGIN = 0x00000000, LENGTH = 262144 RAM (rwx) : ORIGIN = 0x20000000, LENGTH = 32768 EXRAM (rwx) : ORIGIN = 0x80000000, LENGTH = 0x00200000 /* Add the EXRAM line above. Don't touch the CPU-specific FLASH/RAM base address or length from the original linker file.*/ }

b) Tell it to place the heap there:

 .heap : { __end__ = .; end = __end__; _end = __end__; *(.heap*) __HeapLimit = .; } > EXRAM /* Change 'RAM' to the 'EXRAM' section you just defined */

BUT… As mentioned above, the external RAM has a much higher physical address than the internal RAM. This will confuse a check later in the vendor linker file, which assumes all the memory is allocated in the same segment, the stack is allocated starting from the end of RAM (grows downward) and thus is the highest RAM address anywhere. Since this is no longer true, this check needs to be modified so as not to generate a false stack collision warning:

Change this line

 /* ASSERT(__StackLimit >= __HeapLimit, "region RAM overflowed with stack") */

to this:
 /* The above assumes heap will always be at the top of (same) RAM section. Since it's now in its own section, simply check that the STACK did not overflow RAM. This modified check assumes the '.bss' section is the last one allocated (i.e. highest non-stack allocation) in main RAM. */ ASSERT(__StackLimit >= __bss_end__, "region RAM overflowed with stack")

Step 2: Tell the Compiler.
Now that we’ve told the linker, we need to tell the compiler/assembler. If you just build the code now, you will get a heap starting at 0x80000000 as expected, but with some tiny default size chosen by the vendor. This magic value is defined in the ‘startup_efm32wg.S’ (or part-specific equivalent) file buried in the SDK. This will be at e.g. “X:\path\to\SDK\version\developer\sdks\efm32\v2\Device\SiliconLabs\EFM32WG\Source\GCC\startup_efm32wg.S” . What’s the difference between the ‘.S’ file here (uppercase S) and the ‘.s’ (lowercase s) file located in ‘g++’? Don’t ask me. What’s the difference between either of these in /Device/SiliconLabs vs. the same files in /Device/EnergyMicro ? Don’t ask me. There are also compiler-specific variants (Atollic, etc.) and an ‘ARM’ version. Don’t ask me…

Anyway, once you figure out which one your project is actually using, open it and you should find a line like:
 .section .heap .align 3 #ifdef __HEAP_SIZE .equ Heap_Size, __HEAP_SIZE #else .equ Heap_Size, 0xC00 #endif

The specifics might vary depending on your exact CPU and its memory size of course (assuming the vendor selects a larger default value for those with larger internal memory, but I could be wrong.) So we just have to define __HEAP_SIZE somewhere and bob’s your uncle, right?

Er, sort of. There are two nuances to notice, in case your situation slightly differs from mine. One is that the double underscore before HEAP_SIZE looks like a standard compiler-added decoration (i.e. name mangling). Does the compiler expect you to supply the mangled, unmangled or some semi-mangled version of this name? The other is that the ‘.s’ (or ‘.S’) file is an assembler file, not a C file. So in this case you actually need to pass the magic value to the assembler, not the compiler (and beware that the two may in fact have different name mangling conventions). What a mess!

I figured the easiest way to figure out exactly what was expected was experimentally. If using the Simplicity Studio GUI/IDE, you can mousedance your way into Project -> Properties -> Settings -> Tool Settings -> toolname -> Symbols -> Defined symbols and add the symbol definitions there. So I created six versions in total: all three mangling permutations (HEAP_SIZE, _HEAP_SIZE and __HEAP_SIZE) for both the assembler and the compiler, with a different size value for each, then fished in the .map file after compilation to see which one ‘took’. In my particular case, it was the version passed to the assembler, with the fully mangled (double underscore) name. YMMV. Are there any cases where it must be passed to both the compiler and the assembler? Don’t ask me. When you find out which your particular setup is expecting, set the value to match the external memory size and delete the extra definitions.

Step 3: Fix any remaining braindead checks.
When using dynamic memory allocation (malloc() and friends), they (usually, probably) call a deep internal library function called _sbrk. Among other things, this function performs a check similar to the one we just fixed in the EFM32 linker file, failing nastily if it ever allocates heap memory with a higher address than the lowest stack allocation (at least in GCC). So to get around this, you have to override the builtin _sbrk with a fixed copy. If you are using the vendor’s ‘retargetio.c’ for anything (e.g. delivering printf output to the SWO debug pin), this file redefines a bunch of internal functions including sbrk. Failing that, is ‘just’ creating a function any-old-place with the same name sufficient to guaranteeably override the internal function in all cases? Don’t ask me.

The vendor-supplied copy in retargetio.c looks like the below. Here I’ve modified it crudely to just remove the check entirely. In my case, the external RAM contains only the heap and nothing else, so this should be OK.

caddr_t _sbrk(int incr) { static char *heap_end; char *prev_heap_end; static const char heaperr[] = "Heap and stack collision\n"; if (heap_end == 0) { heap_end = &_end; } prev_heap_end = heap_end; // HACK HACK HACK: This check assumes stack and heap in same memory segment; remove it... //if ((heap_end + incr) > (char*) __get_MSP()) //{ // _write(fileno(stdout), heaperr, strlen(heaperr)); // exit(1); //} heap_end += incr; return (caddr_t) prev_heap_end; }

Now your malloc() calls should stop failing! After performing the above steps, I was able to get a ‘complex’ piece of code with dynamic memory allocation (the SHINE mp3 encoder) running on an EFM32 microcontroller, with a few changes to be reported soon…

BONUS: SHINE particulars:
The encodeSideInfo() function in l3bitstream.c appears to build the mp3 header incorrectly. Try…
 //shine_putbits( &config->bs, 0x7ff, 11 ); // wrong shine_putbits( &config->bs, 0xfff, 12 ); // right //shine_putbits( &config->bs, config->mpeg.version, 2 ); //wrong shine_putbits( &config->bs, 1, 1 ); //right

It also seems to fail outright (generate incorrect, unplayable bitstreams) for certain input files, depending (probably) on mono vs. stereo and/or bitrate. A stereo .wav file (PCM 16-bit signed LE) at 44100Hz worked.