Embedded µC 32: 2020

2020-12-20

Does an NVMe Drive Require a Heatsink?

1 Hardware Setup

The test system was built with:

Main board: ASUS PRIME H470M-PLUS
CPU: Intel® Core™ i5-10600T
CPU fan: Zalman CNPS 2X
Case: Antec NSK1380
Disk: Samsung SSD 980 PRO 250GB
NVMe heatsink: Jonsbo M.2-3

The heatsink was mounted with a thin (grey) silicone pad at the bottom and a thick (white) silicone pad at the top.

2 Test Load

To create disk activity I read the whole disk in one go. Though reading the disk in smaller chunks is much slower the power consumption is in the same order. The size of the SSD is 250059350016 bytes. Type as root:

dd if=/dev/nvme0n1 of=/dev/null bs=16777216 count=14904

14904+0 records in
14904+0 records out
250047627264 bytes (250 GB, 233 GiB) copied, 89.0268 s, 2.8 GB/s

During disk activity the power consuption raises from 27.0 W to 49.8 W. 22.8 W are additionally dissipated in power supply, chipset, CPU, and SSD. Let us see what this power does for the SSD.

The datasheet of the Samsung 980 PRO 250GB (https://s3.ap-northeast-2.amazonaws.com/global.semi.static/Samsung_NVMe_SSD_980_PRO_Data_Sheet_Rev.1.2.pdf) specifies 5 W for reading at 6400 MB/s.

3 Measuring the Temperature

The temperature is monitored with the sensors utility from the lm_sensors 3.6.0-2 packet and filtered by awk:

#!/bin/bash
#
# $Source: /home/cvsroot/blogspot/heatsink_for_nvme/index.html,v $
# $Revision: 1.1 $
# $Date: 2020/12/20 23:56:04 $
# $Author: h $
# $State: Exp $
#
# Measure the NVMe temperature using the sensors utility
#
# While temperature logging is running, start on a 2nd console for the load. The power consumption raises by 16 W while reading.
#
# Note The temperatures "Composite" and "Sensor 1" always(?) match

d0=$(date +"%s.%N")

while true ; do
    d=$(date +"%s.%N")
    sensors |\
        egrep '^Composite|^Sensor [12]' |\
        awk -v d=${d} -v d0=${d0} '
             {if (!match($0, /[[:space:]]+([+-]?[[:digit:]]{1,3}\.[[:digit:]])/, a))
                {print "Shit" > "/dev/stderr";
                 exit 1;}
              if (!match($0, /^([^:]+):/, b))
                {print "Shit" > "/dev/stderr";
                 exit 1;}
              gsub(/Composite/, "0", b [1]);
              gsub(/Sensor 1/,  "1", b [1]);
              gsub(/Sensor 2/,  "2", b [1]);
              t [b [1]] = a [1];
              if (b [1] == "2")
                 {printf("%10.3f", d - d0);
                  for (i = 0; i < 3; ++i)
                      printf("%6.1f", t [i]);
                  printf("\n");}}'
    sleep 0.072
done

4 Results

4.1 Large Block Read

Without the heatsink the controller chip of the NVMe reaches its peak temperature within 50 s (τ≈28 s). The heatsink is much bigger than the raw NVMe board and thus triples the heat capacity of the assembly. This increases the time constant of the thermal system to about 90 s. However, the thermal resistance to the ambient is not improved by the same factor. For a short time this appears to be a reasonable cooling system but in the long run it fails. See The Great Raspberry Pi Cooling Bake-Off

In practical work it does not happen often that huge amount of data is read / written from / to the disk. Most uses are short bursts. Their temperature peaks are flattened by the heat capacity of the heatsink.

4.2 A Heavy Usecase

Copying the whole disk to /dev/null is simple unworldly. Compiling gcc 10.2 from the sources is more realistic. The make-process organizes the disk usage in three top-directories:

Name	Size [MBytes]	Access	Usage
gcc-10.2.0	900 M	ro	Source Files
build	5266 M	rw	Compiled Files
/tmp	?	rw	Temporary Files

Three cases were observed:

/tmp and build on tmpfs
/tmp on tmpfs
Neither /tmp nor build on tmpfs

/tmp	build	real [s]	user [s]	sys [s]	Units read	Units written
y	y	1,481.127	10,520.575	523.041	448	92
y	n	1,323.177	9,570.220	365.872	1,333	182
n	n	1,303.015	9,553,401	366.698	1,592	12078

1 Unit = 512 KB

2020-06-12

RISC-V. Part 5: J-Link

1 Download and Install the J-Link Software

Download and install an appropriate version of the J-Link software. It installs to /opt/SEGGER/JLink which is a symbolic link from /opt/SEGGER/JLink_V672d.

2 Firmware Upgrade

Before using the probe, update its firmware to the most recent version using /opt/SEGGER/JLink/JLinkConfigExe.

3 Connecting with Outdated Probes

When J-Trace revision 3.2 and J-Link EDU 8 were built, there was no RISC-V. They do not work. The request to connect with a J-Trace 3.2 ends with the messages:

⋮
Detected: RV32 core
CSR access via abs. commands: No
ConfigTargetSettings() start
ConfigTargetSettings() end
TotalIRLen = 10, IRPrint = 0x0021
JTAG chain detection found 2 devices:
 #0 Id: 0x1000563D, IRLen: 05, RV32
 #1 Id: 0x790007A3, IRLen: 05, Unknown device
Cannot connect to target.

4 Connecting with Recent Probes

Connect a J-Link EDU version 11.0 to the host and start the JLinkExe utility:

./JLinkExe

SEGGER J-Link Commander V6.72d (Compiled May 15 2020 16:50:13)
DLL version V6.72d, compiled May 15 2020 16:50:03

Connecting to J-Link via USB...O.K.
Firmware: J-Link V11 compiled Apr 23 2020 16:49:23
Hardware version: V11.00
S/N: 261005761
License(s): FlashBP, GDB
OEM: SEGGER-EDU
VTref=0.000V


Type "connect" to establish a target connection, '?' for help

To turn the power on, enter:

power on

The “Target power” LED turns on and the development board powers up.

device risc-v
speed 1000

Selecting 1000 kHz as target interface speed

si jtag

Selecting JTAG as current target interface.

con

Device position in JTAG chain (IRPre,DRPre) <Default>: -1,-1 => Auto-detect
JTAGConf>
Device "RISC-V" selected.


Connecting to target via JTAG
ConfigTargetSettings() start
ConfigTargetSettings() end
TotalIRLen = 10, IRPrint = 0x0021
JTAG chain detection found 2 devices:
 #0 Id: 0x1000563D, IRLen: 05, RV32
 #1 Id: 0x790007A3, IRLen: 05, Unknown device
Debug architecture:
  RISC-V debug: 0.13
  AddrBits: 7
  DataBits: 32
  IdleClks: 7
Memory access:
  Via system bus: No
  Via ProgBuf: Yes (2 ProgBuf entries)
DataBuf: 4 entries
  autoexec[0] implemented: Yes
Detected: RV32 core
CSR access via abs. commands: No
Temp. halted CPU for NumHWBP detection
HW instruction/data BPs: 4
Support set/clr BPs while running: No
HW data BPs trigger before execution of inst
RISC-V identified.

5 Firmware Backup

From the J-Link>-prompt backup binary data from internal FLASH:

savebin /tmp/rv32.bin,0,20000

Opening binary file for writing... [/tmp/rv32.bin]
Reading 131072 bytes from addr 0x00000000 into file...O.K.

verifybin /tmp/rv32.bin ,0

Loading binary file /tmp/rv32.bin
Reading 131072 bytes data from target memory @ 0x00000000.
Verify successful.

6 What’s Next?

To really start developing, we need the following setup:

We have made a giant step. All the blue parts have been verified:

The hardware (PC, Probe and Target) is running.
The interfacing (USB and JTAG) works.
The J-Link firmware connects to the RISC-V target and we can read and write FLASH memory. DFU — as a workaround — is no longer required.

What is still missing is the glue between the debugger (gdb) and the probe: OpenOCD. This will be solved in the next blog.

2020-06-11

RISC-V. Part 4: JTAG Cabling

1 The SeeedStudio Connector

Compared to the development board, the standard 20-pin 0.1″ JTAG connector is huge. Therefore, SeeedStudio created another proprietary 0.1″ JTAG 10-pin connector. ☹!

The pin assignment is documented in the schematics and in pinout. The same pins are duplicated on the GPIO headers:

GPIO Header			JTAG Header
Signal	Port	PIN	Signal	PIN
GND	GND	H2/41
GND	GND	H2/42
JTDI	PA15	H3/1	TDI	5
JTMS	PA13	H3/3	TMS	7
NJTRST	PB4	H3/5	TRST	3
JTCK	PA14	H3/2	TCK	9
JTDO	PB3	H3/4	TDO	6
NRST	NRST	H3/6	NRST	4
GND	GND	H3/34	GND	10
3V3	3V3	H2/{43…48}¹	3V3	1…2
5V		H3/{45…48}²

¹ Continuity tested to AMS1117 voltage regulator (pin 2).

² Continuity tested to AMS1117 voltage regulator (pin 3).

2 The J-Link Connector

J-Link / J-Trace use a popular connector. It is documented in J-Link / J-Trace User Guide, Software Version: 6.70, Date: May 7, 2020, section 18.1.1 Pinout for JTAG:

PIN	SIGNAL	TYPE	Description
1	VTref	Input	This is the target reference voltage. It is used to check if the target has power, to create the logic-level reference for the input comparators and to control the output logic levels to the target. It is normally fed from VDD of the target board and must not have a series resistor.
2	Not connected	NC	This pin is not connected in J-Link.
3	nTRST	Output	JTAG Reset. Output from J-Link to the Reset signal of the target JTAG port. Typically connected to nTRST of the target CPU. This pin is normally pulled HIGH on the target to avoid unintentional resets when there is no connection.
5	TDI	Output	JTAG data input of target CPU. It is recommended that this pin is pulled to a defined state on the target board. Typically connected to TDI of the target CPU.
7	TMS	Output	JTAG mode set input of target CPU. This pin should be pulled up on the target. Typically connected to TMS of the target CPU.
9	TCK	Output	JTAG clock signal to target CPU. It is recommended that this pin is pulled to a defined state of the target board. Typically connected to TCK of the target CPU.
11	RTCK	Input	Return test clock signal from the target. Some targets must synchronize the JTAG inputs to internal clocks. To assist in meeting this requirement, you can use a returned, and retimed, TCK to dynamically control the TCK rate. J-Link supports adaptive clocking, which waits for TCK changes to be echoed correctly before making further changes. Connect to RTCK if available, otherwise to GND.
13	TDO	Input	JTAG data output from target CPU. Typically connected to TDO of the target CPU.
15	nRESET	I/O	Target CPU reset signal. Typically connected to the RESET pin of the target CPU, which is typically called “nRST”, “nRESET” or “RESET”. This signal is an active low signal.
17	DBGRQ	NC	This pin is not connected in J-Link. It is reserved for compatibility with other equipment to be used as a debug request signal to the target system. Typically connected to DBGRQ if available, otherwise left open.
19	5V-Supply	Output	This pin can be used to supply power to the target hardware. Older J-Links may not be able to supply power on this pin. For more information about how to enable/disable the power supply, please refer to Target power supply.
4, 6, 8, 10, 12	GND		GND pins connected to GND in J-Link. They should also be connected to GND in the target system.

3 Wiring

Waste of effort is always brought about by missing interfaces. Though having a box full of JTAG/SWD cables and adaptors with 0.05″/0.1″ spacing and 10/16/20/38 pins none of them fits. Later I found an adapter which can be googled by TQ2440 mini2440 (or “ULink2 JTAG ARM Adapter 20Pin 2+2.54 mm 14Pin 2.54mm 10P 2+2.54mm 6P + 10P XH2.54”). It almost does the job.

JLink		Color	Dev Board		Adapter TQ2440 mini2440	JTAG
Signal	Pin	Color	Signal	Pin	Pin	Signal
GND	4	blue	GND	H2/41	10	GND
GND	6	blue	GND	H2/42	10	GND
5V	19	red	5V	H3/48
VTref	1	white	3V3	H2/47	1…2	3V3
nTRST	3	brown	NJTRST	H3/5	3	TRST
TDI	5	yellow	JTDI	H3/1	5	TDI
TMS	7	orange	JTMS	H3/3	7	TMS
TCK	9	green	JTCK	H3/2	9	TCK
TDO	13	grey	JTDO	H3/4	6	TDO
nRESET	15	violet	NRST	H3/6	4	NRST

Until the adapter arrived I had to do it the hard way:

Debugging the SeeedStudio GD32 RISC-V Dev Board

4 Power-Up the Target

“Almost” (in the last section) means that the target is not powered by the adaptor. For that, the red 5V cable must still be plugged in and, as described in the next blog, the line must be enabled.

RISC-V. Part 3: The Test Program

1 The Source (src)

1.1 System API

To set input parameters and to get the results from a program, we require I/O. Peripherals and CPU share one address space. Therefore, a CPU can only access hardware of its own:

CPU	Hardware Access to		Basic Program
CPU	PC	Target	Basic Program
PC	✔	✘	`hello`	Prints “hello world”.
Target	✘	✔	`blinky`	Blinks a LED periodically.

There are techniques to circumvent the limitation. But they require some effort and, in particular if there is analog I/O, they introduce nasty side-effects. In other words: They are far beyond the scope of this blog. Because the simulator (riscv32-unknown-elf-run) is limited to the PC peripherals, we use a hello-type program.

1.2 Program hello

Jim Wilson set up a variant of a hello-type program (https://github.com/riscv/riscv-gnu-toolchain/pull/295). It’s in the file hello.c:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int
main (void)
{
  char *string = malloc (1000);
  if (string == 0)
    return 1;
  strcpy (string, "Hello world\n");
  printf ("%s", string);
  return 0;
}

2 Compile the Test Program (make)

make is the utility which almost always does what it is meant to do. E.g., if we type make hello, it finds the file hello.c in the working directory and invokes the C-compiler to build an executable named “hello” which can be run on the PC locally. To run the cross-compiler instead of the native gcc, we could define the environment variable CC=/opt/riscv/bin/riscv32-unknown-elf-gcc beforehand and type make hello afterwards. This would do the trick.

However, it is easier to write a Makefile. This file contains everything which differs from the default rules. In our case the (preliminary) Makefile reads:

CC        :=   /opt/riscv/bin/riscv32-unknown-elf-gcc

%.elf: %.c
               $(LINK.c) $^ $(LOADLIBES) $(LDLIBS) -o $@

CFLAGS    +=   --specs=nano.specs
LDFLAGS   +=   -Wl,--gc-sections,-Map,$*.map

hello.elf:

The specifications are for:

CC := /opt/riscv/bin/riscv32-unknown-elf-gcc: Invoke the RISC-V cross-compiler.
%.elf: %.c $(LINK.c) $^ $(LOADLIBES) $(LDLIBS) -o $@: Because bare metal programming sometimes requires binary files in addition to elf-files, I prefer to use an explicit elf extension. The price for that is this rule. $@ is the name of the target (here: hello.elf) and $^ is the name of all prerequisites (here: hello.c).
CFLAGS += --specs=nano.specs: Use include directories for nano and link against libc_nano.a. This is a variant of newlib for small embedded systems.
--gc-sections: Run a garbage collection for unsued sections to remove redundant modules.
-Map,$*.map: Write a linker map. $* is a placeholder for the stem name of the target which eventually expands to “hello”.
hello.elf:: This is the primary target of the file which is built if no other target has been given.

The program compiles effortless by entering make:

/opt/riscv/bin/riscv32-unknown-elf-gcc --specs=nano.specs  -Wl,--gc-sections,-Map,hello.map  hello.c   -o hello.elf

3 Run the Test Program

3.1 Traced Run

When starting the program in the simulator I added two trace options:

--trace=on: Print all executed assembler instructions.
--trace-register: Prints all changed register values.

We’ll look into the trace file later.

/opt/riscv/bin/riscv32-unknown-elf-run --trace=on --trace-register hello.elf 2> hello.trace

yields

Hello world

“Yawn!” you might think. But there is more to it than meets the eye.

3.2 Single Stepping

3.2.1 Startup Code

C programs start with crt0 (C runtime 0 very early). The module can be found with

find /opt/riscv/ -type f -name 'crt0*'

/opt/riscv/riscv32-unknown-elf/lib/crt0.o

The symbols in this tiny module are:

/opt/riscv/bin/riscv32-unknown-elf-nm -a /opt/riscv/riscv32-unknown-elf/lib/crt0.o

         w atexit
00000000 b .bss
00000000 d .data
         U _edata
         U _end
         U exit
         U __global_pointer$
00000008 t .L0 
00000010 t .L0 
00000024 t .L0 
0000002e t .L0 
00000000 t .L11
         w __libc_fini_array
         U __libc_init_array
0000003e t .Lweak_atexit
         U main
         U memset
00000000 n .riscv.attributes
00000000 T _start
00000000 t .text

_start is the only exported symbol (T) from the text segment. It must be the start address.

The references to _edata, _end, and memset look like a bss initialization (zeroing global variables). That’s fine.

However, there is no sign of a variable pointing to the start of the data segment, neither is there a reference to memcpy. Presumably the binary assumes that the data segment (initialized data) lives in writable memory. This is fine in an OS-based environment but infeasible in a bare-metal system, where initialized data must be copied from FLASH to SRAM upon start. Put it on the to-do list (item 1).

There is no reference to the end of the SRAM. This is required to initialize the stack pointer. In an OS-based environment the OS sets the stack pointer before the program is started. But again, this is infeasible for an bare metal system. Put it on the to-do list (item 2).

Connect gdb and the simulator

The simulator can be connected with gdb. I prefer to start gdb from emacs because, just like overweighted IDEs but much more agile, the editor synchronizes debugging information to the sources. Within emacs the debugger is started by typing M-x gdb. emacs proposes gdb -i=mi hello.elf. The binary (hello.elf) is correct, the native version of gdb is wrong and must be replaced by /opt/riscv/bin/riscv32-unknown-elf-gdb once. Confirmed with ↵ Enter, gdb fires up. To connect with the simulator, type at the (gdb) prompt:

target sim

Connected to the simulator.

load

This loads the sections of the file hello.elf into the simulator:

Loading section .text, size 0x1410 lma 0x10074
Loading section .rodata, size 0x108 lma 0x11484
Loading section .sdata2, size 0x4 lma 0x1158c
Loading section .eh_frame, size 0x4 lma 0x12590
Loading section .init_array, size 0x4 lma 0x12594
Loading section .fini_array, size 0x4 lma 0x12598
Loading section .data, size 0x60 lma 0x1259c
Loading section .sdata, size 0x4 lma 0x125fc
Start address 0x1009e
Transfer rate: 44128 bits in <1 sec.

The start address is “_start”. Set a breakpoint there …

b _start

Breakpoint 1 at 0x100ae

… and start the program

Starting program: /home/h/cvs/risc_v/src/testcase/hello.elf 

Breakpoint 1, 0x000100ae in _start ()

Because the newlib-library has no debugging information yet(!), emacs does not display the source code automatically. Instead we must type:

disass

Dump of assembler code for function _start:
   0x0001009e <+0>:	auipc	gp,0x3
   0x000100a2 <+4>:	addi	gp,gp,-770 # 0x12d9c
   0x000100a6 <+8>:	addi	a0,gp,-1948
   0x000100aa <+12>:	addi	a2,gp,-1904
=> 0x000100ae <+16>:	sub	a2,a2,a0
   0x000100b0 <+18>:	li	a1,0
   0x000100b2 <+20>:	jal	0x101e6 <memset>
   0x000100b4 <+22>:	li	a0,0
   0x000100b8 <+26>:	beqz	a0,0x100c6 <_start+40>
   0x000100ba <+28>:	li	a0,0
   0x000100be <+32>:	auipc	ra,0x0
   0x000100c2 <+36>:	jalr	zero # 0x0
   0x000100c6 <+40>:	jal	0x1016a <__libc_init_array>
   0x000100c8 <+42>:	lw	a0,0(sp)
   0x000100ca <+44>:	addi	a1,sp,4
   0x000100cc <+46>:	li	a2,0
   0x000100ce <+48>:	jal	0x1011e <main>
   0x000100d0 <+50>:	j	0x10074 <exit>
End of assembler dump.

Except for the missing debug information, this looks quite promising, just like a real debug session. We are running RISC-V code and we can now check the sp (Stack Pointer) register for the location of the stack segment.

p/x $sp

$1 = 0x3fff170

4 Debriefing

Though the program is the classical entry for an OS (operating system) based application, it poses some challenges for a bare-metal system. So, let us see how the program looks from the bare-metal perspective.

4.1 Program Size

Type

/opt/riscv/bin/riscv32-unknown-elf-size hello.elf

to list the program size:

text    data     bss     dec     hex filename
5404     112      44    5560    15b8 hello.elf

This is quite handy. The GD32VF103VBT6 has 128 kiB FLASH (for text and data) and 32 kiB SRAM (for data and bss). The data segment appears in FLASH and SRAM because it is copied from FLASH to SRAM during run time.

4.2 Segment Locations

The simulator has memory in abundance.

fgrep -e'Memory Configuration' -A3 hello.map

yields:

Memory Configuration

Name             Origin             Length             Attributes
*default*        0x0000000000000000 0xffffffffffffffff

This is not too bad for a 32-bit system ☺. But where are the segments? to find out, type

/opt/riscv/bin/riscv32-unknown-elf-readelf --sections hello.elf

There are 16 section headers, starting at offset 0x2408:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00010074 000074 001410 00  AX  0   0  2
  [ 2] .rodata           PROGBITS        00011484 001484 000108 00   A  0   0  4
  [ 3] .sdata2           PROGBITS        0001158c 00158c 000004 00   A  0   0  4
  [ 4] .eh_frame         PROGBITS        00012590 001590 000004 00  WA  0   0  4
  [ 5] .init_array       INIT_ARRAY      00012594 001594 000004 04  WA  0   0  4
  [ 6] .fini_array       FINI_ARRAY      00012598 001598 000004 04  WA  0   0  4
  [ 7] .data             PROGBITS        0001259c 00159c 000060 00  WA  0   0  4
  [ 8] .sdata            PROGBITS        000125fc 0015fc 000004 00  WA  0   0  4
  [ 9] .sbss             NOBITS          00012600 001600 000010 00  WA  0   0  4
  [10] .bss              NOBITS          00012610 001600 00001c 00  WA  0   0  4
  [11] .comment          PROGBITS        00000000 001600 000012 01  MS  0   0  1
  [12] .riscv.attributes RISCV_ATTRIBUTE 00000000 001612 00002b 00      0   0  1
  [13] .symtab           SYMTAB          00000000 001640 000860 10     14  68  4
  [14] .strtab           STRTAB          00000000 001ea0 0004e1 00      0   0  1
  [15] .shstrtab         STRTAB          00000000 002381 000086 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)

The memory areas of the GD32VF103VBT6 are:

Type	Start	End
SRAM	`0x0800 0000`	`0x0800 7fff`
FLASH	`0x2000 0000`	`0x2001 ffff`

That’s quite a mismatch which must be resolved: Point 1 on the to-do list (item 3).

4.3 System API

How did the program allocate memory? And how did it print “Hello world”. Spoiler: The simulator uses ecall (Environment call) instructions. Let us look at the trace hello.trace and type fgrep -B1 ecall hello.trace:

reg:      0x0113ba ---   _sbrk     -wrote a7 = 0xd6
insn:     0x0113be ---   _sbrk     -ecall;
--
reg:      0x0113ec ---   _sbrk     -wrote a7 = 0xd6
insn:     0x0113f0 ---   _sbrk     -ecall;
--
reg:      0x0113ec ---   _sbrk     -wrote a7 = 0xd6
insn:     0x0113f0 ---   _sbrk     -ecall;
--
reg:      0x0113ec ---   _sbrk     -wrote a7 = 0xd6
insn:     0x0113f0 ---   _sbrk     -ecall;
--
reg:      0x0112fc ---   _fstat    -wrote a7 = 0x50
insn:     0x011300 ---   _fstat    -ecall;
--
reg:      0x0113ec ---   _sbrk     -wrote a7 = 0xd6
insn:     0x0113f0 ---   _sbrk     -ecall;
--
reg:      0x01140e ---   _write    -wrote a7 = 0x40
insn:     0x011412 ---   _write    -ecall;
--
reg:      0x0112c8 ---   _exit     -wrote a7 = 0x5d
insn:     0x0112cc ---   _exit     -ecall;

a7 contains the system call number which is specific to Linux. The numbers are defined in the file /usr/include/asm-generic/unistd.h. As a bottom line the program contains 4 different system calls:

Newlib Function	System Call Number	`unistd.h` Symbol
_sbrk	0xd6	`__NR_brk`
_fstat	0x50	`__NR3264_fstat`
_write	0x40	`__NR_write`
_exit	0x5d	`__NR_exit`

This unmasks the magic and adds another issue to the to-do list (item 4).

4.4 The Stack

Where is the stack? How much stack is used? The trace file answers the questions:

fgrep 'wrote sp = ' hello.trace  | awk '{print $NF;}' | sort | uniq -c

      1 0x3fffd60
      4 0x3fffdf0
      4 0x3fffe00
      4 0x3fffe10
      1 0x3fffe20
      4 0x3fffe30
      4 0x3fffe40
      4 0x3fffe50
      6 0x3fffe70
      2 0x3fffe90
      3 0x3fffea0
      1 0x3fffee0
      2 0x3fffef0
      2 0x3ffff00
      2 0x3ffff20
      7 0x3ffff30
      7 0x3ffff40
      3 0x3ffff50
      4 0x3ffff60
      4 0x3ffff70
      6 0x3ffff80
      3 0x3ffffa0
      3 0x3ffffb0
      2 0x3ffffc0

Again, this tells us that the stack is not located in the SRAM area. The stack size is below 0.75 kiB.

5 To-do List

Initialize the data segment.
Load the SP-register.
Fix the segment offsets to match the memory areas of the MCU.
Replace the system calls.

In this blog we have colored src and make. Because we did not care about lib we cannot debug the library code. And because we did not define the linker script (ld) the segments are not in the MCU memory.

GNU Toolchain Provided

RISC-V. Part 2: The GNU Toolchain

1 Download

There is a well maintained git repository with a RISC-V cross-gcc. It has been updated from gcc 9 to gcc 10 in the days just before 2020-06-11. Download the current toolchain from git:

git clone --recursive https://github.com/riscv/riscv-gnu-toolchain

Cloning into 'riscv-gnu-toolchain'...
remote: Enumerating objects: 1, done.        
remote: Counting objects: 100% (1/1), done.        
remote: Total 7704 (delta 0), reused 0 (delta 0), pack-reused 7703        
Receiving objects: 100% (7704/7704), 4.72 MiB | 1.07 MiB/s, done.
Resolving deltas: 100% (3903/3903), done.
Submodule 'qemu' (https://git.qemu.org/git/qemu.git) registered for path 'qemu'
Submodule 'riscv-binutils' (https://github.com/riscv/riscv-binutils-gdb.git) registered for path 'riscv-binutils'
Submodule 'riscv-dejagnu' (https://github.com/riscv/riscv-dejagnu.git) registered for path 'riscv-dejagnu'
Submodule 'riscv-gcc' (https://github.com/riscv/riscv-gcc.git) registered for path 'riscv-gcc'
Submodule 'riscv-gdb' (https://github.com/riscv/riscv-binutils-gdb.git) registered for path 'riscv-gdb'
Submodule 'riscv-glibc' (https://github.com/riscv/riscv-glibc.git) registered for path 'riscv-glibc'
Submodule 'riscv-newlib' (https://github.com/riscv/riscv-newlib.git) registered for path 'riscv-newlib'
Cloning into '/home/h/riscv/riscv-gnu-toolchain/qemu'...
⋮

git downloads 3.26 GB over a 12 Mbit/s DSL line within an hour.

2 Compile the Toolchain

I prefer to compile the toolchain in a dedicated directory. For that I create build, enter the working directory, and configure the compilation from there. The prefix-parameter sets the installation directory. The architecture is rv32imac because the GD32VF103 is a 32-bit MCU with a 32 GPR (General Purpose Register) file, integer multiply/divide, atomics, and compressed (16bit) ISA set.

cd riscv-gnu-toolchain/
mkdir build
cd build
../configure --prefix=/opt/riscv --with-arch=rv32imac --with-abi=ilp32

checking for gcc... /usr/local/gcc-9.2/bin/gcc-9.2
checking whether the C compiler works... yes
⋮
configure: creating ./config.status
config.status: creating Makefile
config.status: creating scripts/wrapper/awk/awk
config.status: creating scripts/wrapper/sed/sed

To install the toolchain type

make

An hour later the cross-compiler is installed at /opt/riscv/bin/riscv32-unknown-elf-gcc.

3 Next Steps

GNU Toolchain Provided

The left branch (gcc and sim via elf) of the system is complete. To test it, we must provide input to gcc. This will be done in the next blog.

RISC-V. Part 1: The Board

1 The Hardware

For years I have heard stories about the legendary open-source RISC-V core. Is it just a toy for geeks locked in the ivory towers? Haven’t we got enough CPU cores already? Don’t they work reliably?

To be honest, it rankled me that I failed to de-activate Intel’s management engine which is most certainly an easier prey for malicious hackers than speculative execution bugs (Meltdown, Spectre).

It’s time to think about a more transparent core design. And that’s, where RISC-V steps in.

Recently I came across a 10 € board.

Debugging the SeeedStudio GD32 RISC-V Dev Board

2 The Course Of Action

2.1 An Overview

Where do we begin? Let’s have a look at a development system for embedded systems.

Developing Embedded Systems

Greyed-out objects are terra incognita that we will have to discover. At this time the only known thing is the PC which should have a recent Linux variant installed. Blog by blog we will see how the objects will be colored.

The objectives of this blog series are to set up a practicable development system and to benchmark RISC-V. Therefore, the implementations of the subsystems will be kept as simple as possible. But still, they will be expandable and an appropriate starting point for production-quality code.

2.2 IDEs

“Stop. Keep it simple!” you might argue. “SeeedStudio delivers not only the board, but it also provides a display and a software example. And Seeedstudio recommends platformIO as an IDE.”

IDEs (Integrated Development Environment) — like platformIO or Eclipse — just run the tools under the hood. Wrapping up stand-alone binaries hardly increases the functionality but add complexity and ruin efficiency.
As we will see, some tools are not mature yet. The documented example (see https://wiki.seeedstudio.com/SeeedStudio-GD32-RISC-V-Dev-Board/#platforms-supported) downloads the binaries via the DFU (Device Firmware Upgrade) interface. However, DFU was never designed for development and does only conceal a flaw in FLASH handling. Common practice today is to use the debug probe for flashing and debugging the application.
Eventually, we need the knowledge for more advanced uses.

2020-05-30

Finally, the RPi4B with 8 GBytes arrived

I updated to the new 8 GiB board.

It booted without any problems:

RPi4B showing 8GB of RAM

“But, there is nothing like a free lunch,” I thought. “What about power consumption?”

I fired up my ELV EA 8000 power meter and compared the new system to the old one. By system I mean

the Paspberry Pi official USB-C power supply,
the RPi4 board,
an X825 expansion board (for SATA),
a 256 GB SLC SSD (Super Talent FTM25JB25I), and
a 1 GB ethernet connection.

The headless system booted from SD-card and the root file system is on the SSD. For the measurement, the system was idling.

Because the EA 8000 has no interface and the display updates every second, I recorded it with a webcam for a minute, took stills every second (with ffmpeg), and typed the values into text files.

Power consumption as a function of time

RAM Size [GB]	Power Consumption [W]
4	4.7177 ± 0.0007
8	4.2309 ± 0.0010

Of course, the “±” values do not represent the precision of the measurement. For that, the EA 8000 is not precise enough. However, they denote the standard deviation of the mean value of the measurement. This means that though spending 4 GB of RAM the board designers have lowered the input power by almost 0.5 W.