Introduction
Welcome to the second part of “From the Cradle to the OS”! This series will explore how a processor boots an operating system (OS) that can run many programs for multiple users. We will also explore how to write a simple OS that will run a few programs and let them interact. In the previous article, we introduced what an ISA is and explored the exciting RISC-V open source project. We illustrated the concepts behind an ISA using a simple RISC-V assembly code that adds two integers using the add and addi instructions.
In this article, we will walk through how to set up our toolchain and write a simple program that adds two integer registers and observe its behavior through the spike RISC-V simulator.
Representing Instructions
You might find yourself wondering how would a processor understand instructions written using English characters? Well, the simple answer is that it doesn’t. A processor can only speak one language, and that is 1s and 0s, a.k.a. binary! We must therefore find a way to translate assembly language into machine instructions that a processor can parse, understand, and execute; that is the job of an assembler.

Each ISA provides a set of rules for representing its instructions, and hardware is then built around this representation of assembly instructions. Consulting with the RISC-V specifications, we see that the add instruction is labeled to be an R-type instruction with the following format:

Here is the breakdown of each field (starting from right to left):
- opcode or operation code, is a unique 7-bits number that identifies classes of instructions. For example, all R-type instructions share the same opcode of 0110011.
- rd is a 5-bits destination register number for the operation in question (what’s so unique about 5? Well, how many registers do we have?).
- f3 is a 3-bits function identifier (see f7 below for the full explanation).
- rs1 is a 5-bits operand register number.
- rs2 is the second 5-bits operand register number.
- f7 is a 7-bits function identifier. The combination of <opcode, f3, f7> allows us to identify an instruction uniquely.
Knowing that add t2, t0, t1 is an R-type instruction, we can again consult the RISC-V specifications to figure out how to represent it. It simply devolves into looking up the value of each field and then placing those bits in their correct location. Here is the breakdown of each field, along with the final hexadecimal representation of that instruction.

Similarly, we can do the same for the addi t0, x0, 3 instruction, which is an I-type instruction as follows:

Finally, looking at the representation of an I-type instruction, you can see where the limitation on the immediate comes from. Our instruction can only fit 12 bits for the immediate, thus creating the range of allowable two’s complement integers we can represent.
While this series is not intended to be a tutorial on writing RISC-V assembly, we will introduce relevant instructions as we dive deeper into the boot process. For now, let’s turn our attention to setting up the tools that would allow us to run such a small piece of code in the spike simulator.
Setting Up the Playground
We will start by exploring a simulated RISC-V processor and how it can run a simple C program. We will need to install three tools that we will carry throughout this series. These tools are:
- The RISC-V GNU Compiler Toolchain contains a cross-compiler for C and C++. Since we will be running our code on a different physical processor, we would need to cross-compile for RISC-V.
- Spike is a RISC-V processor simulator allowing us to explore RISC-V without running code on physical RISC-V hardware.
- pk, or the RISC-V proxy kernel, is a small application execution environment that allows us to run a single program on our simulated processor. Essentially, pk will replace our operating system kernel until we install one or write our own.
Installing Prerequisites
For a variety of reasons, including getting some practice with configuring and building packages, we will manually configure, build, and install each of these tools independently. Our target installation directory will be $HOME/.riscv/ where $HOME expands to your home directory. However, we will need a couple of prerequisite system wide packages that we will install using the distribution’s package manager (apt in my case):
$ sudo apt install autoconf automake autotools-dev curl python3 python3-pip libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build git cmake libglib2.0-dev libslirp-dev device-tree-compiler libboost-regex-dev libboost-system-dev
Then, let’s create the directory that will host the installed tools and add it to our path. To do so, first create the .riscv/ directory under your home directory using the following command:
$ mkdir -p $HOME/.riscv
And finally, add $HOME/.riscv/bin to your PATH environment variable so that the tools can be found later on:
$ export PATH="$HOME/.riscv/bin:$PATH"
To make this more permanent, you can add it to your shell’s run control (rc) script (e.g., .bashrc for BASH, .zshrc for ZSH, and so on).
Compiling and Installing the Toolchain
This is by far the most time consuming step we will encounter in this entire series (aside from compiling the Linux kernel later on), so be ready to launch this build task and do something else in the meantime.
First, grab a copy of the toolchain from its GitHub repository (repo) using:
$ git clone https://github.com/riscv-software-src/riscv-pk.git
Then, navigate into the cloned directory and configure it:
cd riscv-gnu-toolchain/
./configure --prefix=$HOME/.riscv --with-arch=rv64gc_zifencei --with-abi=lp64d
Let’s try to understand the options here:
- –prefix=$HOME/.riscv tells the configure script to use our local .riscv/ directory as the installation target (no need to install the toolchain for all users of our machine!).
- –with-arch=rv64gc_zifencei tells the configure script which base RISC-V architecture to use, along with any extensions to tag along. For now, we’ll take these extensions at face value and then reveal these uses as they come up.
- –with-abi=lp64d specifies the application binary interface (ABI) that we will be using. For a nice discussion of ABIs, please reference this discussion at Cppcon 2020. For our purposes, this is defined as:
- l for long integers, set to 64 bits.
- p for pointers, set to 64 bits.
- d for double-precision floating point numbers, which are also 64-bits wide.
Now, we are ready to compile the toolchain using make. Please note that this will take a while to complete, so it would be great if you know how capable your machine is so you can parallelize the compilation process. Luckily, make allows us to specify the number of jobs to run in parallel using the -j switch. Generally, it is good practice to use the number of processor cores + 1 there, so for my little virtual server with 16 cores, I would use 17, but please adjust that to the capabilities of your machine.
$ make -j 17 linux
Note that we use the linux target since we are attempting to run this as a general purpose RISC-V processor. Alternatively, we could target newlib which is intended for embedded systems.
Once the installation is complete, you will notice a few new directories showing under $HOME/.riscv/ and you can now use several tools from the toolchain, prominently riscv64-unknown-linux-gnu-gcc, which is our cross-compiler for RISC-V.
Compiling and Installing pk
Next, let’s install the proxy kernel (or pk) to be able to run our programs. Thankfully, this should be much faster than compiling the toolchain. Start by obtaining the source code for pk using:
$ git clone https://github.com/riscv-software-src/riscv-pk.git
Then, configure pk by setting $HOME/.riscv as the target prefix using:
$ export RISCV=$HOME/.riscv
$ mkdir build
$ cd build
$ ../configure --prefix=$RISCV --host=riscv64-unknown-linux-gnu
Please note that passing –host configuration option is necessary to tell pk that we will be using the Linux version of the RISC-V cross-compiler.
Finally, let’s compile and install pk using:
$ make -j17 && make install
Compiling and Installing spike
The final step in this process is to build and install the spike simulator, following a similar process to that of pk.
First, obtain the source code from GitHub using:
$ git clone https://github.com/riscv-software-src/riscv-isa-sim.git
Then, configure the build and specify that we are using the Linux cross-compiler:
$ export RISCV=$HOME/.riscv
$ mkdir build
$ cd build
$ ../configure --prefix=$RISCV --with-target=riscv64-unknown-linux-gnu
And finally compile and install spike using:
$ make -j 17 && make install
At this stage, we are ready to start having some fun with the RISC-V simulator!
Running a Simple C Code
Now, we are ready to start running some programs targeting our built RISC-V environment. We will first start off with some simple C code, then move on to assembly in the next article in the series.
Here’s a simple C version of the assembly code we created in a previous section (with some printing logic). Let’s call this file add.c.
#include <stdio.h>
int
main(int argc, char **argv) {
int x = 3, y = 5, z;
z = x + y;
printf("x + y = %d\n", (x+y));
return 0;
}
First thing we would need to do is to compile the code using the gcc cross-compiler that we have just built. We can do so as follows:
$ riscv64-unknown-linux-gnu-gcc -static -o add add.c
You might notice that we have passed the -static flag to gcc. This tells the gcc linker to perform static linking (as opposed to dynamic linking). In short, an executable program that is linked statically will have all the code of all its required libraries copied into it. On the other hand, dynamically linking an executable would leave out the code of the external libraries, and will only load those into memory on demand, whenever they are needed.
We can observe this difference between statically and dynamically linked executables by comparing the size (in KB) of the executable in each case. Compiling the same add.c file into a dynamically linked executable generates a file of size 8.6 KB, while linking it statically generates an executable of size 710 KB, an increase by a factor of 82.5!
Executable size is of course not the only consideration to take into account when deciding between static linking and dynamic linking. Yet, we will suffice with this short summary for now as we will explore linking further once we discuss linker scripts. In our case, we chose static linking simply because spike requires us to do so (the best choice is the one we don’t have to make!).
Running the Program
Now that we generated our executable, it is time to invoke spike and ask it to use pk as the kernel so that it can load and run our add executable.
First, let’s verify that our generated executable is indeed what we want it to be (i.e., it targets the right architecture). One easy way to do so is to use the file command as follows:
$ file add
add: ELF 64-bit LSB executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), statically linked, for GNU/Linux 4.15.0, with debug_info, not stripped
file tells us that our executable is in the ELF format (more about this in later posts) and it targets the 64-bit RISC-V architecture with the double-float ABI, exactly as we want it to be.
Now, let’s run add through the spike simulator:
$ spike pk add
x + y = 8
Congratulations, you have just run your first C program on a simulated RISC-V processor.
Conclusion and Next Steps
In this article, we made our first plunge into RISC-V by writing a simple program that adds two integer registers and observed its behavior through the spike RISC-V simulator. We have spent a significant amount of time setting up our toolchain, but now that we have that locked in, we can turn our attention to more interesting topics!
In the next article, we will remain within RISC-V assembly for some time. We will introduce a program’s memory organization, the RISC-V memory instructions, its calling conventions, and then learn how we can make calls to assembly functions from C code (and vice versa). For those of you who already know how to write assembly, you might read the next article as a quick refresher before we dive into the RISC-V privilege levels and explore the boot process.
Source Code and Errors
The source code that I used in this article can be found in this repo. If you find any typos or errors in this article, or if you just have some suggestions or questions, the best thing you can do is to open an issue on the same repo and I will get to it as soon as possible.
About the Author
I am an assistant professor of Computer Science and Software Engineering at the Rose-Hulman Institute of Technology. I was born in Kherbet Selem, a small village in southern Lebanon, attended college in Beirut, and then moved to the US to complete my PhD at the University of Illinois at Urbana-Champaign. My enjoyment of operating systems started early in my childhood, when formatting my tiny hard drive and reinstalling Windows 95 (multiple times a day) was the only way to get any video game to run on my aging PC; I might have enjoyed typing the fdisk command more than I did the games themselves!
