From the Cradle to the OS 3: RISC-V Conventions

mnoureddine

Introduction

Welcome to the third installment of “From the Cradle to the OS”! This series will explore how a processor boots an operating system (OS) that can run many programs for multiple users. We will also explore how to write a simple OS that will run a few programs and let them interact. In the previous article, we explored how to set up the toolchain to compile C code into RISC-V assembly and then observed the execution of a simple program using the spike RISC-V simulator.

In this article, we will dive deeper into RISC-V assembly to understand conditional branching and unconditional jump instructions. This will set us up to discuss procedure calling and the calling convention in the following article.

Conditionals

So far, we have treated our program as a list of instructions that execute sequentially. However, writing meaningful programs requires us to take different paths (i.e., branch) in the list of instructions depending on certain register values or conditions. In other words, we’d like to be able to look at the state of our processor and then decide what would be the next instruction to execute. Essentially, we are implementing conditional statements in higher-level programming languages.

It is worth mentioning that when a program gets loaded into memory for execution, a special register called the program counter (pc) retains the address of the instruction that the processor is currently executing. In our study so far, we have assumed that pc will move from one instruction to the next sequentially. Since each instruction is 32-bits wide, this means that pc will be incremented by 4 bytes to point to the next instruction to be executed.

To support conditionals, RISC-V provides a set of conditional branching instructions that would update the value of the pc register based on a certain condition in relation to the processor’s registers. The conditional branching instructions are:

Instruction	Format	Description
`beq`	`beq rs1, rs2, TARGET`	Move `pc` to the `TARGET` code location if `rs1` is equal to `rs2`
`bne`	`bne rs1, rs2, TARGET`	Move `pc` to the `TARGET` code location if `rs1` and `rs2` are not equal
`blt`	`blt rs1, rs2, TARGET`	Move `pc` to the `TARGET` code location if `rs1` is less than `rs2` using signed comparison
`bltu`	`bltu rs1, rs2, TARGET`	Move `pc` to the `TARGET` code location if `rs1` is less than `rs2` using unsigned comparison
`bge`	`bge rs1, rs2, TARGET`	Move `pc` to the `TARGET` code location if `rs1` is greater than or equal to `rs2` using signed comparison
`bgeu`	`bgeu rs1, rs2, TARGET`	Move `pc` to the `TARGET` code location if `rs1` is greater than or equal to `rs2` using unsigned comparison

Let’s consider a simple example of a loop in C:

// for simplicity, we will assume that each variable represents a register.
int x5 = 0, x6 = 100;
for(; x5 < 10; x5++) {
  x6 = x6 - x5;
}

Here’s our first attempt to translate this small loop into an equivalent RISC-V assembly program:

# initialize the variables
add x5, x0, x0
add x6, x0, x0
# add a temporary register to hold the immediate 10
addi x7, x0, 10

# although we know it, we should make sure that x5 is less than 10 to enter
# the loop
bge x5, x7, LOOP_EXIT

# we add a label that represents the body of the loop
LOOP_BODY:
sub x6, x6, x5
# increment the loop iterator
addi x5, x5, 1

# check if we should continue with the loop
blt x5, x7, LOOP_BODY

LOOP_EXIT:
# code outside the loop starts here.

You can see that we have the lines LOOP_BODY and LOOP_EXIT as labels in the code. From a high-level perspective, these labels simply correspond to locations in the assembly code, which will then correspond to locations in memory where the instructions after them start. In other words, LOOP_BODY should map to the location in memory where the sub x6, x6, x5 instruction will be located, similarly for LOOP_EXIT. In practice, these labels will be translated by the assembler into immediates (i.e., signed integers) that will become an offset from the current value in the pc register. This is referred to as pc-relative addressing is a cornerstone for supporting position-independent code. For more information on pc-relative addressing in RISC-V, please consult the RISC-V specifications, page 31 of the 20250508 Version.

Visually, the control flow of this program looks like the following:

After initializing our variables, we first check to make sure that we should enter the body of the loop with the bge x5, x7, LOOP_EXIT. If the bge check fails (i.e., x5 < x7), then pc will move to the instruction after bge, which is nothing but the body of the loop. In the LOOP_BODY block, we perform the loop statements as well as increment the loop counter and then check if we should stay in the loop (blt x5, x7, LOOP_BODY) or exit to LOOP_EXIT.

Note that there are many ways to write this simple loop in RISC-V assembly, each with its own impact on performance and code size. It is usually the job of the compiler to analyze the source code and generate optimal assembly instructions that take into consideration the user’s requirements. Additionally, hardware considerations such as branch prediction and out of order execution can also impact the structure and the order of the assembly instructions. From an architecture point of view, the RISC-V specifications simply state:

Software should be optimized such that the sequential code path is the most common path, with less frequently taken code paths placed out of line.

Unconditional Jumps

While conditional branching instructions are great for writing loops and conditional statements, it is sometimes necessary to jump to a code location unconditionally. We can practically achieve that using a conditional branch that always evaluates to true (e.g., beq x0, x0, TARGET), that is enough to support functional programming (and is also against the recommendation of the RISC-V specifications, see the specifications, page 32).

RISC-V supports two unconditional jumping instructions:

jal rd, TARGET: The jump and link instruction will always place the address of the TARGET code location into the pc register. We will defer the discussion of the and link portion of this instruction to the next section.
jalr rd, offset(rs1): The jump and link register instruction will always jump to the instruction at address (rs1 + offset)&~1. This simply adds an offset to the address located in the register rs1 and then zero’s out the least significant bit. This allows a wider jump range than jal but requires a bit more involvement from the programmer.

To illustrate the use of unconditional jumps, let’s consider the following simple if-else C snippet:

int x5; // assume x5 is set somewhere else
int x6;

if(x5 & 1){
  x6 = 1; // x5 is odd
} else {
  x6 = 2; // x5 is even
}
x5 = 0;

This simple example checks if the contents of x5 are even or odd, and sets the register x6 to 2 or 1, respectively, and then resets x5. Let’s try to write this in RISC-V assembly. Note that in this case, we cannot rely on conditional branches alone since both conditional paths will converge to the statement x6 = 1;.

# compute the condition using andi to perform bitwise and.
andi x7, x5, 1

beq x7, x0, ELSE_BODY

# this is effective body of the if branch
addi x6, x0, 1
# done with the if branch, need to skip over the else branch
jal x0, IF_EXIT

# this is the else branch
ELSE_BODY:
addi x6, x0, 2 # the else branch will now directly go into IF_EXIT

IF_EXIT:
add x5, x0, x0

Visually, the control flow of this program looks like the following:

Note that in this case, after executing the if branch of the conditional statement, we would need to skip over the else branch of the statement, so we would use an unconditional jump to transfer control to the instruction at the location IF_EXIT.

The simplest form of an unconditional jump is the infamous goto statement in C. However, unconditional jumps are the cornerstone for supporting procedure calling in RISC-V as we shall explore in the next article.

Practical Example

Let’s now consider a practical example using the toolchain that we have set up in the previous article. We will consider the following snippet of C code:

int x5 = 0;

do {
  x5 = x5 + 1;
}while (x5 < 10);

x5 = x5 * 4;

Let’s practice using conditional branch instructions to translate this C snippet into RISC-V assembly. Here’s one possible representation:

add x5, x0, x0
addi x6, x0, 10

LOOP:
addi x5, x5, 1
blt x5, x6, LOOP

slli x5, x5, 2

Note that in this case we are dealing with a do-while loop, so the first iteration always happens regardless of the initial value of x5. Therefore we do not have a conditional instruction that precludes the entry into the LOOP area of the code. We only branch back to the top of the loop after doing the incrementation operation.

After the loop, we have introduced a new instruction that we have not used before, slli which stands for shift left logical immediate. This instruction will shift the bits of the x5 register by two positions to the left, thus multiplying the value in x5 by 4.

Compiling the Assembly

Let’s now generate an executable file from this piece of assembly code. For that, we would need to add a few instructions that will become clearer as we continue through this series. The complete source would look like the following:

# This declares that the main symbol should be exported globally. # This is used to declare that the main function will start from # here.
.global main

# This line is a label that indicates the start of the main function. This way,
# the runtime environment would know to execute our code
main:

# initialize registers
add x5, x0, x0
addi x6, x0, 10

LOOP:
addi x5, x5, 1
blt x5, x6, LOOP

# loop ends here
slli x5, x5, 2

# main is done here, so we need to return. We will use the ret
# pseudo-instruction.
ret

As you notice, we have added a label main to indicate the start of the main function so that the C runtime environment would figure out how to get to our code and execute it. We have also asked for the main label to be exported globally. At the end of our code, we have the ret pseudo-instruction to return from main to whichever runtime function had set things up for us. We refer to ret as a pseudo-instruction since it is not a native RISC-V instruction, it would be replaced by the assembler with the appropriate instructions according to the RISC-V specifications; it is more readable to use ret instead of the return native instructions.

Please note that this code is not correct, there are a few more things that we need to do to be adherent to the RISC-V ABI. However, since we have not discussed calling conventions yet, we will leave that out for the next article and discuss the possibilities where this code can lead to unexpected outcomes.

Let’s now turn to assembling and linking this piece of code to generate an executable that we can simulate using spike. I will assume that you have saved this source code in a file called dowhile.S, we can then use the toolchain to generate the executable as follows:

$ riscv64-unknown-linux-gnu-gcc -static -o a.out dowhile.S

This will generate the a.out executable that we can run using spike as follows:

$ spike pk ./a.out

Naturally, nothing will show up on the console since we have not called any printing routines, so the code would execute and then simply exit. To visualize the flow of the code better, we’d need to invoke the debugging feature of the spike simulator.

Debugging with spike

We will start by invoking the spike debugging utility using:

$ spike -d pk ./a.out

This will drop you into an interactive shell that starts with the (spike) prelude. You can then issue commands to read the registers values, advance through the code, set breakpoints, and examine memory. To check on the available commands in the spike debugger, you can issue the h or help command. Of concern to us in this case are the following command:

reg <core> [reg]: This command will display the content of the register [reg] on the CPU core <core>. In our case, we’re only running on a single core, so the value of <core> would always be 0.
pc <core>: This command will show the current value in the pc register on the core <core>.
insn <core>: This command will show you the instruction under the pc register on core <core>.
until pc <core> <val>: This command will resume execution of the simulation until the value of pc reaches <val>.
until reg <core> <reg> <val>: This command will resume execution of the simulation until the value of reg reaches <val>.
Hitting newline (i.e., enter on the keyboard) would execute the current instruction that pc points to, and move pc to the next instruction.

Note that you can replace until with untiln to run the spike debugger in noisy mode, i.e., with it printing each instruction that it is currently executing.

So now let’s use these commands to examine the state of the simulated processor when executing this piece of code. If you try to step through a few instructions, you will see that we do not recognize any of those! That is the runtime library taking care of setting everything up for our main function to run correctly. So now how do we know when we reach our main function so we can debug the code that is relevant for us?

Normally, in a debugger, we’d ask our tool to set a breakpoint at our main function and then let it run until it hits that point. However, spike unfortunately does not support setting breakpoints with respect to symbols, so we’ll have to find the address of main ourselves.

There are many ways to do so, we will use the combination of the RISC-V toolchain with the Unix grep utility. For that, we would use the riscv64-unknown-linux-gnu-objdump to read the content of the executable and then pipe that into grep to search for the location of the main symbol.

Please note that in real examples, the address you find there would just be an offset. However, for our purposes, we would suffice with taking this address at face value.

Running the combination of commands would look like the following:

$ riscv64-unknown-linux-gnu-objdump -D a.out | grep '<main>:'
0000000000010532 <main>:

This reveals the address 0x0000000000010532 for the main function and we can use that in the until pc command of the spike debugger.

$ spike -d pk ./a.out
(spike) until pc 0 0x0000000000010532
(spike) insn 0
0x00000000000002b3 add     t0, zero, zero

As you can see above, after issuing the until command, the simulation continued and then stopped when we reached the instruction add t0, zero, zero. Note that spike uses register names instead of numbers for their representation, so it shows t0 instead of x5 and zero instead of x0.

Moving on to the next instruction, we can see that we have hit our main function and we can explore our code as it executes.

(spike)
core   0: 0x0000000000010532 (0x000002b3) add     t0, zero, zero
(spike)
core   0: 0x0000000000010536 (0x00004329) c.li    t1, 10
(spike)

After executing the add instruction at the top of main, we can then see that spike has moved to the next instruction addi x6, x0, 10 or addi t1, zero, 10. However, we notice that the instruction appears differently as c.li t1, 10. The c prefix at the start of the instruction stands for the compressed version of the instruction. li stands for load immediate which does exactly what our addi instruction is trying to do, namely put the constant 10 in the t1 or x6 register. The benefit of compressed instructions is that it allows for a shorter representation of the instructions, thus reducing our program size.

We can then enter the body of the do-while loop in our code and examine how the value of the t0 (or x5) register is being updated. Here’s a sample of doing that in the spike debugger for a few iterations.

(spike)
core   0: 0x000000000001053a (0xfe62cfe3) blt     t0, t1, pc - 2
(spike)
core   0: 0x0000000000010538 (0x00000285) c.addi  t0, 1
(spike)
core   0: 0x000000000001053a (0xfe62cfe3) blt     t0, t1, pc - 2
(spike)
core   0: 0x0000000000010538 (0x00000285) c.addi  t0, 1
(spike)
core   0: 0x000000000001053a (0xfe62cfe3) blt     t0, t1, pc - 2
(spike) reg 0 t0
0x0000000000000003

Two things we can note from this small execution trace:

The assembler has replaced the LOOP label with pc -2, indicating that if the branch is to be taken, pc would be set of pc - 2, which corresponds to the instruction right above the blt instruction in our code sample. Based on that, can you guess how wide the representation of a compressed instruction in RISC-V is?
Each time the branch at blt is taken, the value in the pc register jumps back to where the LOOP label is in our code, i.e., at the c.addi t0, 1 compressed instruction (addi t0, t0, 1 in our original sample).
After three iterations of the loop, inspecting the content of the t0 using the command reg 0 t0 shows that t0 now holds the value 0x03, as we intended.

To continue the simulation until the end of the loop, we can issue the until command but now targeting the t0 register as follows:

(spike) until reg 0 t0 0x0a
(spike) reg 0 t0
0x000000000000000a

Note that spike only accepts values in hexadecimal, so using until reg 0 t0 10 would not deliver the intended results as 10 is taken to represent 16 in hex.

Finally, let’s move on with the simulation to see how the loop exits:

(spike)
core   0: 0x000000000001053a (0xfe62cfe3) blt     t0, t1, pc - 2
(spike)
core   0: 0x000000000001053e (0x0000028a) c.slli  t0, 2
(spike)
core   0: 0x0000000000010540 (0x00008082) ret

We can see now that when t0 is at the value 10, the branch is not taken and the pc moves sequentially to point to the c.slli t0, 2 instruction, as we intended in our code sample.

Feel free to continue simulating the code sample after the ret instruction, but that is beyond the scope of this article.

Conclusion and Next Steps

In this article, we explored new RISC-V instructions that allow us to alter the control flow of a program. We introduced conditional branching instructions to alter the value of the pc register based on the state of the processor and a programmer-provided Boolean constraint. We also examined unconditional jumps that allow us to jump to different locations in the code regardless of the state of the processor. We finally made our first steps into debugging our programs using the debugging feature of the spike simulator.

In the next article, we will complete the picture by putting together our jumping instructions to allow for functional programming. This would require us to also understand memory organization, addressing, and memory instruction that allow for moving data between the processor registers and main memory.

Source Code and Errors

The source code that I used in this article can be found in this repo. If you find any typos or errors in this article, or if you just have some suggestions or questions, the best thing you can do is to open an issue on the same repo and I will get to it as soon as possible.

About the Author

I am an assistant professor of Computer Science and Software Engineering at the Rose-Hulman Institute of Technology. I was born in Kherbet Selem, a small village in southern Lebanon, attended college in Beirut, and then moved to the US to complete my PhD at the University of Illinois at Urbana-Champaign. My enjoyment of operating systems started early in my childhood, when formatting my tiny hard drive and reinstalling Windows 95 (multiple times a day) was the only way to get any video game to run on my aging PC; I might have enjoyed typing the fdisk command more than I did the games themselves!