2020-12-20

Does an NVMe Drive Require a Heatsink?

1 Hardware Setup

The test system was built with:

  • Main board: ASUS PRIME H470M-PLUS
  • CPU: Intel® Core™ i5-10600T
  • CPU fan: Zalman CNPS 2X
  • Case: Antec NSK1380
  • Disk: Samsung SSD 980 PRO 250GB
  • NVMe heatsink: Jonsbo M.2-3

The heatsink was mounted with a thin (grey) silicone pad at the bottom and a thick (white) silicone pad at the top.

2 Test Load

To create disk activity I read the whole disk in one go. Though reading the disk in smaller chunks is much slower the power consumption is in the same order. The size of the SSD is 250059350016 bytes. Type as root:

dd if=/dev/nvme0n1 of=/dev/null bs=16777216 count=14904
14904+0 records in
14904+0 records out
250047627264 bytes (250 GB, 233 GiB) copied, 89.0268 s, 2.8 GB/s

During disk activity the power consuption raises from 27.0 W to 49.8 W. 22.8 W are additionally dissipated in power supply, chipset, CPU, and SSD. Let us see what this power does for the SSD.

The datasheet of the Samsung 980 PRO 250GB (https://s3.ap-northeast-2.amazonaws.com/global.semi.static/Samsung_NVMe_SSD_980_PRO_Data_Sheet_Rev.1.2.pdf) specifies 5 W for reading at 6400 MB/s.

3 Measuring the Temperature

The temperature is monitored with the sensors utility from the lm_sensors 3.6.0-2 packet and filtered by awk:

#!/bin/bash
#
# $Source: /home/cvsroot/blogspot/heatsink_for_nvme/index.html,v $
# $Revision: 1.1 $
# $Date: 2020/12/20 23:56:04 $
# $Author: h $
# $State: Exp $
#
# Measure the NVMe temperature using the sensors utility
#
# While temperature logging is running, start on a 2nd console for the load. The power consumption raises by 16 W while reading.
#
# Note The temperatures "Composite" and "Sensor 1" always(?) match

d0=$(date +"%s.%N")

while true ; do
    d=$(date +"%s.%N")
    sensors |\
        egrep '^Composite|^Sensor [12]' |\
        awk -v d=${d} -v d0=${d0} '
             {if (!match($0, /[[:space:]]+([+-]?[[:digit:]]{1,3}\.[[:digit:]])/, a))
                {print "Shit" > "/dev/stderr";
                 exit 1;}
              if (!match($0, /^([^:]+):/, b))
                {print "Shit" > "/dev/stderr";
                 exit 1;}
              gsub(/Composite/, "0", b [1]);
              gsub(/Sensor 1/,  "1", b [1]);
              gsub(/Sensor 2/,  "2", b [1]);
              t [b [1]] = a [1];
              if (b [1] == "2")
                 {printf("%10.3f", d - d0);
                  for (i = 0; i < 3; ++i)
                      printf("%6.1f", t [i]);
                  printf("\n");}}'
    sleep 0.072
done

4 Results

4.1 Large Block Read

Without the heatsink the controller chip of the NVMe reaches its peak temperature within 50 s (τ≈28 s). The heatsink is much bigger than the raw NVMe board and thus triples the heat capacity of the assembly. This increases the time constant of the thermal system to about 90 s. However, the thermal resistance to the ambient is not improved by the same factor. For a short time this appears to be a reasonable cooling system but in the long run it fails. See The Great Raspberry Pi Cooling Bake-Off

In practical work it does not happen often that huge amount of data is read / written from / to the disk. Most uses are short bursts. Their temperature peaks are flattened by the heat capacity of the heatsink.

4.2 A Heavy Usecase

Copying the whole disk to /dev/null is simple unworldly. Compiling gcc 10.2 from the sources is more realistic. The make-process organizes the disk usage in three top-directories:

Name Size [MBytes] AccessUsage
gcc-10.2.0 900 MroSource Files
build 5266 MrwCompiled Files
/tmp ? rwTemporary Files

Three cases were observed:

  • /tmp and build on tmpfs
  • /tmp on tmpfs
  • Neither /tmp nor build on tmpfs
/tmpbuildreal [s]user [s]sys [s]Units readUnits written
yy1,481.12710,520.575 523.041448 92
yn 1,323.1779,570.220 365.8721,333182
n n 1,303.0159,553,401 366.6981,59212078

1 Unit = 512 KB

2020-06-12

RISC-V. Part 5: J-Link

1 Download and Install the J-Link Software

Download and install an appropriate version of the J-Link software. It installs to /opt/SEGGER/JLink which is a symbolic link from /opt/SEGGER/JLink_V672d.

2 Firmware Upgrade

Before using the probe, update its firmware to the most recent version using /opt/SEGGER/JLink/JLinkConfigExe.

3 Connecting with Outdated Probes

When J-Trace revision 3.2 and J-Link EDU 8 were built, there was no RISC-V. They do not work. The request to connect with a J-Trace 3.2 ends with the messages:

⋮
Detected: RV32 core
CSR access via abs. commands: No
ConfigTargetSettings() start
ConfigTargetSettings() end
TotalIRLen = 10, IRPrint = 0x0021
JTAG chain detection found 2 devices:
 #0 Id: 0x1000563D, IRLen: 05, RV32
 #1 Id: 0x790007A3, IRLen: 05, Unknown device
Cannot connect to target.

4 Connecting with Recent Probes

Connect a J-Link EDU version 11.0 to the host and start the JLinkExe utility:

./JLinkExe
SEGGER J-Link Commander V6.72d (Compiled May 15 2020 16:50:13)
DLL version V6.72d, compiled May 15 2020 16:50:03

Connecting to J-Link via USB...O.K.
Firmware: J-Link V11 compiled Apr 23 2020 16:49:23
Hardware version: V11.00
S/N: 261005761
License(s): FlashBP, GDB
OEM: SEGGER-EDU
VTref=0.000V


Type "connect" to establish a target connection, '?' for help

To turn the power on, enter:

power on

The “Target power” LED turns on and the development board powers up.

device risc-v
speed 1000
Selecting 1000 kHz as target interface speed
si jtag
Selecting JTAG as current target interface.
con
Device position in JTAG chain (IRPre,DRPre) <Default>: -1,-1 => Auto-detect
JTAGConf>
Device "RISC-V" selected.


Connecting to target via JTAG
ConfigTargetSettings() start
ConfigTargetSettings() end
TotalIRLen = 10, IRPrint = 0x0021
JTAG chain detection found 2 devices:
 #0 Id: 0x1000563D, IRLen: 05, RV32
 #1 Id: 0x790007A3, IRLen: 05, Unknown device
Debug architecture:
  RISC-V debug: 0.13
  AddrBits: 7
  DataBits: 32
  IdleClks: 7
Memory access:
  Via system bus: No
  Via ProgBuf: Yes (2 ProgBuf entries)
DataBuf: 4 entries
  autoexec[0] implemented: Yes
Detected: RV32 core
CSR access via abs. commands: No
Temp. halted CPU for NumHWBP detection
HW instruction/data BPs: 4
Support set/clr BPs while running: No
HW data BPs trigger before execution of inst
RISC-V identified.

5 Firmware Backup

From the J-Link>-prompt backup binary data from internal FLASH:

savebin /tmp/rv32.bin,0,20000
Opening binary file for writing... [/tmp/rv32.bin]
Reading 131072 bytes from addr 0x00000000 into file...O.K.
verifybin /tmp/rv32.bin ,0
Loading binary file /tmp/rv32.bin
Reading 131072 bytes data from target memory @ 0x00000000.
Verify successful.

6 What’s Next?

To really start developing, we need the following setup:

We have made a giant step. All the blue parts have been verified:

  • The hardware (PC, Probe and Target) is running.
  • The interfacing (USB and JTAG) works.
  • The J-Link firmware connects to the RISC-V target and we can read and write FLASH memory. DFU — as a workaround — is no longer required.

What is still missing is the glue between the debugger (gdb) and the probe: OpenOCD. This will be solved in the next blog.

2020-06-11

RISC-V. Part 4: JTAG Cabling

1 The SeeedStudio Connector

Compared to the development board, the standard 20-pin 0.1″ JTAG connector is huge. Therefore, SeeedStudio created another proprietary 0.1″ JTAG 10-pin connector. ☹!

The pin assignment is documented in the schematics and in pinout. The same pins are duplicated on the GPIO headers:

GPIO Header JTAG Header
SignalPortPIN SignalPIN
GND GND H2/41
GND GND H2/42
JTDI PA15H3/1 TDI 5
JTMS PA13H3/3 TMS 7
NJTRSTPB4 H3/5 TRST3
JTCK PA14H3/2 TCK 9
JTDO PB3 H3/4 TDO 6
NRST NRSTH3/6 NRST4
GND GND H3/34 GND 10
3V3 3V3 H2/{43…48}13V3 1…2
5V H3/{45…48}2

1 Continuity tested to AMS1117 voltage regulator (pin 2).

2 Continuity tested to AMS1117 voltage regulator (pin 3).

2 The J-Link Connector

J-Link / J-Trace use a popular connector. It is documented in J-Link / J-Trace User Guide, Software Version: 6.70, Date: May 7, 2020, section 18.1.1 Pinout for JTAG:

PINSIGNALTYPEDescription
1 VTref Input This is the target reference voltage. It is used to check if the target has power, to create the logic-level reference for the input comparators and to control the output logic levels to the target. It is normally fed from VDD of the target board and must not have a series resistor.
2 Not connected NC This pin is not connected in J-Link.
3 nTRST Output JTAG Reset. Output from J-Link to the Reset signal of the target JTAG port. Typically connected to nTRST of the target CPU. This pin is normally pulled HIGH on the target to avoid unintentional resets when there is no connection.
5 TDI Output JTAG data input of target CPU. It is recommended that this pin is pulled to a defined state on the target board. Typically connected to TDI of the target CPU.
7 TMS Output JTAG mode set input of target CPU. This pin should be pulled up on the target. Typically connected to TMS of the target CPU.
9 TCK Output JTAG clock signal to target CPU. It is recommended that this pin is pulled to a defined state of the target board. Typically connected to TCK of the target CPU.
11 RTCK Input Return test clock signal from the target. Some targets must synchronize the JTAG inputs to internal clocks. To assist in meeting this requirement, you can use a returned, and retimed, TCK to dynamically control the TCK rate. J-Link supports adaptive clocking, which waits for TCK changes to be echoed correctly before making further changes. Connect to RTCK if available, otherwise to GND.
13 TDO Input JTAG data output from target CPU. Typically connected to TDO of the target CPU.
15 nRESET I/O Target CPU reset signal. Typically connected to the RESET pin of the target CPU, which is typically called “nRST”, “nRESET” or “RESET”. This signal is an active low signal.
17 DBGRQ NC This pin is not connected in J-Link. It is reserved for compatibility with other equipment to be used as a debug request signal to the target system. Typically connected to DBGRQ if available, otherwise left open.
19 5V-Supply Output This pin can be used to supply power to the target hardware. Older J-Links may not be able to supply power on this pin. For more information about how to enable/disable the power supply, please refer to Target power supply.
4, 6, 8, 10, 12 GND GND pins connected to GND in J-Link. They should also be connected to GND in the target system.

3 Wiring

Waste of effort is always brought about by missing interfaces. Though having a box full of JTAG/SWD cables and adaptors with 0.05″/0.1″ spacing and 10/16/20/38 pins none of them fits. Later I found an adapter which can be googled by TQ2440 mini2440 (or “ULink2 JTAG ARM Adapter 20Pin 2+2.54 mm 14Pin 2.54mm 10P 2+2.54mm 6P + 10P XH2.54”). It almost does the job.

JLink ColorDev BoardAdapter
TQ2440
mini2440
JTAG
SignalPin SignalPin Pin Signal
GND 4 blue GND H2/4110 GND
GND 6 blue GND H2/4210 GND
5V19 red 5V H3/48
VTref 1 white 3V3 H2/471…2 3V3
nTRST 3 brown NJTRSTH3/5 3 TRST
TDI 5 yellow JTDI H3/1 5 TDI
TMS 7 orange JTMS H3/3 7 TMS
TCK 9 green JTCK H3/2 9 TCK
TDO 13 grey JTDO H3/4 6 TDO
nRESET 15 violet NRST H3/6 4 NRST

Until the adapter arrived I had to do it the hard way:

Debugging the SeeedStudio GD32 RISC-V Dev Board

4 Power-Up the Target

“Almost” (in the last section) means that the target is not powered by the adaptor. For that, the red 5V cable must still be plugged in and, as described in the next blog, the line must be enabled.

RISC-V. Part 3: The Test Program

1 The Source (src)

1.1 System API

To set input parameters and to get the results from a program, we require I/O. Peripherals and CPU share one address space. Therefore, a CPU can only access hardware of its own:

CPUHardware Access toBasic Program
PCTarget
PC hello Prints “hello world”.
Target blinkyBlinks a LED periodically.

There are techniques to circumvent the limitation. But they require some effort and, in particular if there is analog I/O, they introduce nasty side-effects. In other words: They are far beyond the scope of this blog. Because the simulator (riscv32-unknown-elf-run) is limited to the PC peripherals, we use a hello-type program.

1.2 Program hello

Jim Wilson set up a variant of a hello-type program (https://github.com/riscv/riscv-gnu-toolchain/pull/295). It’s in the file hello.c:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int
main (void)
{
  char *string = malloc (1000);
  if (string == 0)
    return 1;
  strcpy (string, "Hello world\n");
  printf ("%s", string);
  return 0;
}

2 Compile the Test Program (make)

make is the utility which almost always does what it is meant to do. E.g., if we type make hello, it finds the file hello.c in the working directory and invokes the C-compiler to build an executable named “hello” which can be run on the PC locally. To run the cross-compiler instead of the native gcc, we could define the environment variable CC=/opt/riscv/bin/riscv32-unknown-elf-gcc beforehand and type make hello afterwards. This would do the trick.

However, it is easier to write a Makefile. This file contains everything which differs from the default rules. In our case the (preliminary) Makefile reads:

CC        :=   /opt/riscv/bin/riscv32-unknown-elf-gcc

%.elf: %.c
               $(LINK.c) $^ $(LOADLIBES) $(LDLIBS) -o $@

CFLAGS    +=   --specs=nano.specs
LDFLAGS   +=   -Wl,--gc-sections,-Map,$*.map

hello.elf:

The specifications are for:

CC := /opt/riscv/bin/riscv32-unknown-elf-gcc
Invoke the RISC-V cross-compiler.
%.elf: %.c
     $(LINK.c) $^ $(LOADLIBES) $(LDLIBS) -o $@
Because bare metal programming sometimes requires binary files in addition to elf-files, I prefer to use an explicit elf extension. The price for that is this rule. $@ is the name of the target (here: hello.elf) and $^ is the name of all prerequisites (here: hello.c).
CFLAGS += --specs=nano.specs
Use include directories for nano and link against libc_nano.a. This is a variant of newlib for small embedded systems.
--gc-sections
Run a garbage collection for unsued sections to remove redundant modules.
-Map,$*.map
Write a linker map. $* is a placeholder for the stem name of the target which eventually expands to “hello”.
hello.elf:
This is the primary target of the file which is built if no other target has been given.

The program compiles effortless by entering make:

/opt/riscv/bin/riscv32-unknown-elf-gcc --specs=nano.specs  -Wl,--gc-sections,-Map,hello.map  hello.c   -o hello.elf

3 Run the Test Program

3.1 Traced Run

When starting the program in the simulator I added two trace options:

--trace=on
Print all executed assembler instructions.
--trace-register
Prints all changed register values.

We’ll look into the trace file later.

/opt/riscv/bin/riscv32-unknown-elf-run --trace=on --trace-register hello.elf 2> hello.trace

yields

Hello world

“Yawn!” you might think. But there is more to it than meets the eye.

3.2 Single Stepping

3.2.1 Startup Code

C programs start with crt0 (C runtime 0 very early). The module can be found with

find /opt/riscv/ -type f -name 'crt0*'
/opt/riscv/riscv32-unknown-elf/lib/crt0.o

The symbols in this tiny module are:

/opt/riscv/bin/riscv32-unknown-elf-nm -a /opt/riscv/riscv32-unknown-elf/lib/crt0.o
         w atexit
00000000 b .bss
00000000 d .data
         U _edata
         U _end
         U exit
         U __global_pointer$
00000008 t .L0 
00000010 t .L0 
00000024 t .L0 
0000002e t .L0 
00000000 t .L11
         w __libc_fini_array
         U __libc_init_array
0000003e t .Lweak_atexit
         U main
         U memset
00000000 n .riscv.attributes
00000000 T _start
00000000 t .text

_start is the only exported symbol (T) from the text segment. It must be the start address.

The references to _edata, _end, and memset look like a bss initialization (zeroing global variables). That’s fine.

However, there is no sign of a variable pointing to the start of the data segment, neither is there a reference to memcpy. Presumably the binary assumes that the data segment (initialized data) lives in writable memory. This is fine in an OS-based environment but infeasible in a bare-metal system, where initialized data must be copied from FLASH to SRAM upon start. Put it on the to-do list (item 1).

There is no reference to the end of the SRAM. This is required to initialize the stack pointer. In an OS-based environment the OS sets the stack pointer before the program is started. But again, this is infeasible for an bare metal system. Put it on the to-do list (item 2).

Connect gdb and the simulator

The simulator can be connected with gdb. I prefer to start gdb from emacs because, just like overweighted IDEs but much more agile, the editor synchronizes debugging information to the sources. Within emacs the debugger is started by typing M-x gdb. emacs proposes gdb -i=mi hello.elf. The binary (hello.elf) is correct, the native version of gdb is wrong and must be replaced by /opt/riscv/bin/riscv32-unknown-elf-gdb once. Confirmed with ↵ Enter, gdb fires up. To connect with the simulator, type at the (gdb) prompt:

target sim
Connected to the simulator.
load

This loads the sections of the file hello.elf into the simulator:

Loading section .text, size 0x1410 lma 0x10074
Loading section .rodata, size 0x108 lma 0x11484
Loading section .sdata2, size 0x4 lma 0x1158c
Loading section .eh_frame, size 0x4 lma 0x12590
Loading section .init_array, size 0x4 lma 0x12594
Loading section .fini_array, size 0x4 lma 0x12598
Loading section .data, size 0x60 lma 0x1259c
Loading section .sdata, size 0x4 lma 0x125fc
Start address 0x1009e
Transfer rate: 44128 bits in <1 sec.

The start address is “_start”. Set a breakpoint there …

b _start
Breakpoint 1 at 0x100ae

… and start the program

r
Starting program: /home/h/cvs/risc_v/src/testcase/hello.elf 

Breakpoint 1, 0x000100ae in _start ()

Because the newlib-library has no debugging information yet(!), emacs does not display the source code automatically. Instead we must type:

disass
Dump of assembler code for function _start:
   0x0001009e <+0>:	auipc	gp,0x3
   0x000100a2 <+4>:	addi	gp,gp,-770 # 0x12d9c
   0x000100a6 <+8>:	addi	a0,gp,-1948
   0x000100aa <+12>:	addi	a2,gp,-1904
=> 0x000100ae <+16>:	sub	a2,a2,a0
   0x000100b0 <+18>:	li	a1,0
   0x000100b2 <+20>:	jal	0x101e6 <memset>
   0x000100b4 <+22>:	li	a0,0
   0x000100b8 <+26>:	beqz	a0,0x100c6 <_start+40>
   0x000100ba <+28>:	li	a0,0
   0x000100be <+32>:	auipc	ra,0x0
   0x000100c2 <+36>:	jalr	zero # 0x0
   0x000100c6 <+40>:	jal	0x1016a <__libc_init_array>
   0x000100c8 <+42>:	lw	a0,0(sp)
   0x000100ca <+44>:	addi	a1,sp,4
   0x000100cc <+46>:	li	a2,0
   0x000100ce <+48>:	jal	0x1011e <main>
   0x000100d0 <+50>:	j	0x10074 <exit>
End of assembler dump.

Except for the missing debug information, this looks quite promising, just like a real debug session. We are running RISC-V code and we can now check the sp (Stack Pointer) register for the location of the stack segment.

p/x $sp
$1 = 0x3fff170

4 Debriefing

Though the program is the classical entry for an OS (operating system) based application, it poses some challenges for a bare-metal system. So, let us see how the program looks from the bare-metal perspective.

4.1 Program Size

Type

/opt/riscv/bin/riscv32-unknown-elf-size hello.elf

to list the program size:

text    data     bss     dec     hex filename
5404     112      44    5560    15b8 hello.elf

This is quite handy. The GD32VF103VBT6 has 128 kiB FLASH (for text and data) and 32 kiB SRAM (for data and bss). The data segment appears in FLASH and SRAM because it is copied from FLASH to SRAM during run time.

4.2 Segment Locations

The simulator has memory in abundance.

fgrep -e'Memory Configuration' -A3 hello.map

yields:

Memory Configuration

Name             Origin             Length             Attributes
*default*        0x0000000000000000 0xffffffffffffffff

This is not too bad for a 32-bit system ☺. But where are the segments? to find out, type

/opt/riscv/bin/riscv32-unknown-elf-readelf --sections hello.elf
There are 16 section headers, starting at offset 0x2408:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00010074 000074 001410 00  AX  0   0  2
  [ 2] .rodata           PROGBITS        00011484 001484 000108 00   A  0   0  4
  [ 3] .sdata2           PROGBITS        0001158c 00158c 000004 00   A  0   0  4
  [ 4] .eh_frame         PROGBITS        00012590 001590 000004 00  WA  0   0  4
  [ 5] .init_array       INIT_ARRAY      00012594 001594 000004 04  WA  0   0  4
  [ 6] .fini_array       FINI_ARRAY      00012598 001598 000004 04  WA  0   0  4
  [ 7] .data             PROGBITS        0001259c 00159c 000060 00  WA  0   0  4
  [ 8] .sdata            PROGBITS        000125fc 0015fc 000004 00  WA  0   0  4
  [ 9] .sbss             NOBITS          00012600 001600 000010 00  WA  0   0  4
  [10] .bss              NOBITS          00012610 001600 00001c 00  WA  0   0  4
  [11] .comment          PROGBITS        00000000 001600 000012 01  MS  0   0  1
  [12] .riscv.attributes RISCV_ATTRIBUTE 00000000 001612 00002b 00      0   0  1
  [13] .symtab           SYMTAB          00000000 001640 000860 10     14  68  4
  [14] .strtab           STRTAB          00000000 001ea0 0004e1 00      0   0  1
  [15] .shstrtab         STRTAB          00000000 002381 000086 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)

The memory areas of the GD32VF103VBT6 are:

TypeStartEnd
SRAM 0x0800 00000x0800 7fff
FLASH0x2000 00000x2001 ffff

That’s quite a mismatch which must be resolved: Point 1 on the to-do list (item 3).

4.3 System API

How did the program allocate memory? And how did it print “Hello world”. Spoiler: The simulator uses ecall (Environment call) instructions. Let us look at the trace hello.trace and type fgrep -B1 ecall hello.trace:

reg:      0x0113ba ---   _sbrk     -wrote a7 = 0xd6
insn:     0x0113be ---   _sbrk     -ecall;
--
reg:      0x0113ec ---   _sbrk     -wrote a7 = 0xd6
insn:     0x0113f0 ---   _sbrk     -ecall;
--
reg:      0x0113ec ---   _sbrk     -wrote a7 = 0xd6
insn:     0x0113f0 ---   _sbrk     -ecall;
--
reg:      0x0113ec ---   _sbrk     -wrote a7 = 0xd6
insn:     0x0113f0 ---   _sbrk     -ecall;
--
reg:      0x0112fc ---   _fstat    -wrote a7 = 0x50
insn:     0x011300 ---   _fstat    -ecall;
--
reg:      0x0113ec ---   _sbrk     -wrote a7 = 0xd6
insn:     0x0113f0 ---   _sbrk     -ecall;
--
reg:      0x01140e ---   _write    -wrote a7 = 0x40
insn:     0x011412 ---   _write    -ecall;
--
reg:      0x0112c8 ---   _exit     -wrote a7 = 0x5d
insn:     0x0112cc ---   _exit     -ecall;

a7 contains the system call number which is specific to Linux. The numbers are defined in the file /usr/include/asm-generic/unistd.h. As a bottom line the program contains 4 different system calls:

Newlib
Function
System Call
Number
unistd.h
Symbol
_sbrk 0xd6__NR_brk
_fstat0x50__NR3264_fstat
_write0x40__NR_write
_exit 0x5d__NR_exit

This unmasks the magic and adds another issue to the to-do list (item 4).

4.4 The Stack

Where is the stack? How much stack is used? The trace file answers the questions:

fgrep 'wrote sp = ' hello.trace  | awk '{print $NF;}' | sort | uniq -c
      1 0x3fffd60
      4 0x3fffdf0
      4 0x3fffe00
      4 0x3fffe10
      1 0x3fffe20
      4 0x3fffe30
      4 0x3fffe40
      4 0x3fffe50
      6 0x3fffe70
      2 0x3fffe90
      3 0x3fffea0
      1 0x3fffee0
      2 0x3fffef0
      2 0x3ffff00
      2 0x3ffff20
      7 0x3ffff30
      7 0x3ffff40
      3 0x3ffff50
      4 0x3ffff60
      4 0x3ffff70
      6 0x3ffff80
      3 0x3ffffa0
      3 0x3ffffb0
      2 0x3ffffc0

Again, this tells us that the stack is not located in the SRAM area. The stack size is below 0.75 kiB.

5 To-do List

  1. Initialize the data segment.
  2. Load the SP-register.
  3. Fix the segment offsets to match the memory areas of the MCU.
  4. Replace the system calls.

In this blog we have colored src and make. Because we did not care about lib we cannot debug the library code. And because we did not define the linker script (ld) the segments are not in the MCU memory.

Developing embedded systems
GNU Toolchain Provided

RISC-V. Part 2: The GNU Toolchain

1 Download

There is a well maintained git repository with a RISC-V cross-gcc. It has been updated from gcc 9 to gcc 10 in the days just before 2020-06-11. Download the current toolchain from git:

git clone --recursive https://github.com/riscv/riscv-gnu-toolchain
Cloning into 'riscv-gnu-toolchain'...
remote: Enumerating objects: 1, done.        
remote: Counting objects: 100% (1/1), done.        
remote: Total 7704 (delta 0), reused 0 (delta 0), pack-reused 7703        
Receiving objects: 100% (7704/7704), 4.72 MiB | 1.07 MiB/s, done.
Resolving deltas: 100% (3903/3903), done.
Submodule 'qemu' (https://git.qemu.org/git/qemu.git) registered for path 'qemu'
Submodule 'riscv-binutils' (https://github.com/riscv/riscv-binutils-gdb.git) registered for path 'riscv-binutils'
Submodule 'riscv-dejagnu' (https://github.com/riscv/riscv-dejagnu.git) registered for path 'riscv-dejagnu'
Submodule 'riscv-gcc' (https://github.com/riscv/riscv-gcc.git) registered for path 'riscv-gcc'
Submodule 'riscv-gdb' (https://github.com/riscv/riscv-binutils-gdb.git) registered for path 'riscv-gdb'
Submodule 'riscv-glibc' (https://github.com/riscv/riscv-glibc.git) registered for path 'riscv-glibc'
Submodule 'riscv-newlib' (https://github.com/riscv/riscv-newlib.git) registered for path 'riscv-newlib'
Cloning into '/home/h/riscv/riscv-gnu-toolchain/qemu'...
⋮

git downloads 3.26 GB over a 12 Mbit/s DSL line within an hour.

2 Compile the Toolchain

I prefer to compile the toolchain in a dedicated directory. For that I create build, enter the working directory, and configure the compilation from there. The prefix-parameter sets the installation directory. The architecture is rv32imac because the GD32VF103 is a 32-bit MCU with a 32 GPR (General Purpose Register) file, integer multiply/divide, atomics, and compressed (16bit) ISA set.

cd riscv-gnu-toolchain/
mkdir build
cd build
../configure --prefix=/opt/riscv --with-arch=rv32imac --with-abi=ilp32
checking for gcc... /usr/local/gcc-9.2/bin/gcc-9.2
checking whether the C compiler works... yes
⋮
configure: creating ./config.status
config.status: creating Makefile
config.status: creating scripts/wrapper/awk/awk
config.status: creating scripts/wrapper/sed/sed

To install the toolchain type

make

An hour later the cross-compiler is installed at /opt/riscv/bin/riscv32-unknown-elf-gcc.

3 Next Steps

Developing embedded systems
GNU Toolchain Provided

The left branch (gcc and sim via elf) of the system is complete. To test it, we must provide input to gcc. This will be done in the next blog.

RISC-V. Part 1: The Board

1 The Hardware

For years I have heard stories about the legendary open-source RISC-V core. Is it just a toy for geeks locked in the ivory towers? Haven’t we got enough CPU cores already? Don’t they work reliably?

To be honest, it rankled me that I failed to de-activate Intel’s management engine which is most certainly an easier prey for malicious hackers than speculative execution bugs (Meltdown, Spectre).

It’s time to think about a more transparent core design. And that’s, where RISC-V steps in.

Recently I came across a 10 € board.

Debugging the SeeedStudio GD32 RISC-V Dev Board

2 The Course Of Action

2.1 An Overview

Where do we begin? Let’s have a look at a development system for embedded systems.

Developing embedded systems
Developing Embedded Systems

Greyed-out objects are terra incognita that we will have to discover. At this time the only known thing is the PC which should have a recent Linux variant installed. Blog by blog we will see how the objects will be colored.

The objectives of this blog series are to set up a practicable development system and to benchmark RISC-V. Therefore, the implementations of the subsystems will be kept as simple as possible. But still, they will be expandable and an appropriate starting point for production-quality code.

2.2 IDEs

“Stop. Keep it simple!” you might argue. “SeeedStudio delivers not only the board, but it also provides a display and a software example. And Seeedstudio recommends platformIO as an IDE.”

  1. IDEs (Integrated Development Environment) — like platformIO or Eclipse — just run the tools under the hood. Wrapping up stand-alone binaries hardly increases the functionality but add complexity and ruin efficiency.
  2. As we will see, some tools are not mature yet. The documented example (see https://wiki.seeedstudio.com/SeeedStudio-GD32-RISC-V-Dev-Board/#platforms-supported) downloads the binaries via the DFU (Device Firmware Upgrade) interface. However, DFU was never designed for development and does only conceal a flaw in FLASH handling. Common practice today is to use the debug probe for flashing and debugging the application.
  3. Eventually, we need the knowledge for more advanced uses.

2020-05-30

Finally, the RPi4B with 8 GBytes arrived

I updated to the new 8 GiB board.

It booted without any problems:

RPi4B showing 8GB of RAM

“But, there is nothing like a free lunch,” I thought. “What about power consumption?”

I fired up my ELV EA 8000 power meter and compared the new system to the old one. By system I mean

  • the Paspberry Pi official USB-C power supply,
  • the RPi4 board,
  • an X825 expansion board (for SATA),
  • a 256 GB SLC SSD (Super Talent FTM25JB25I), and
  • a 1 GB ethernet connection.

The headless system booted from SD-card and the root file system is on the SSD. For the measurement, the system was idling.

Because the EA 8000 has no interface and the display updates every second, I recorded it with a webcam for a minute, took stills every second (with ffmpeg), and typed the values into text files.

Power consumption as a function of time
RAM Size [GB]Power Consumption [W]
44.7177 ± 0.0007
84.2309 ± 0.0010

Of course, the “±” values do not represent the precision of the measurement. For that, the EA 8000 is not precise enough. However, they denote the standard deviation of the mean value of the measurement. This means that though spending 4 GB of RAM the board designers have lowered the input power by almost 0.5 W.