
Conditionals
It's hard to imagine a program that doesn't contain a conditional statement. It's almost a habit to check the input arguments of functions securing their safe execution. For example, the divide() function takes two arguments, divides one by the other, and returns the result. It's pretty clear that we need to make sure that the divisor is not zero:
int divide(int a, int b) {
if (b == 0) {
throw std::invalid_argument("The divisor is zero");
}
return a / b;
}
Conditionals are at the core of programming languages; after all, a program is a collection of actions and decisions. For example, the following code uses conditional statements to find the maximum value out of two input arguments:
int max(int a, int b) {
int max;
if (a > b) {
// the if block
max = a;
} else {
// the else block
max = b;
}
return max;
}
The preceding example is oversimplified on purpose to express the usage of the if-else statement as is. However, what interests us the most is the implementation of such a conditional statement. What does the compiler generate when it encounters an if statement? The CPU executes instructions sequentially one by one, and instructions are simple commands doing exactly one thing. We can use complex expressions in a single line in a high-level programming language such as C++, while the assembly instructions are simple commands that can do only one simple operation in one cycle: move, add, subtract, and so on.
The CPU fetches the instruction from the code memory segment, decodes it to find out what it should exactly do (move data, add numbers, subtract them), and executes the command.
To run at its fastest, the CPU stores the operands and the result of the execution in storage units called registers. You can think of registers as temporary variables of the CPU. Registers are physical memory units that are located within the CPU so the access is much faster compared to the RAM. To access the registers from an assembly language program, we use their specified names, such as rax, rbx, rdx, and so on. The CPU commands operate on registers rather than the RAM cells; that's why the CPU has to copy the contents of the variable from the memory to registers, execute operations and store the results in a register, and then copy the value of the register back to the memory cell.
For example, the following C++ expression takes just a single line of code:
a = b + 2 * c - 1;
It would look similar to the following assembly representation (comments are added after semicolons):
mov rax, b; copy the contents of "b"
; located in the memory to the register rax
mov rbx, c; the same for the "c" to be able to calculate 2 * c
mul rbx, 2; multiply the value of the rbx register with
; immediate value 2 (2 * c)
add rax, rbx; add rax (b) with rbx (2*c) and store back in the rax
sub rax, 1; subtract 1 from rax
mov a, rax; copy the contents of rax to the "a" located in the memory
A conditional statement suggests that a portion of the code should be skipped. For example, calling max(11, 22) means the if block will be omitted. To express this in the assembly language, the idea of jumps is used. We compare two values and, based on the result, we jump to a specified portion of the code. We label the portion to make it possible to find the set of instructions. For example, to skip adding 42 to the register rbx, we can jump to the portion labeled UNANSWERED using the unconditional jump instruction jpm as shown:
mov rax, 2
mov rbx, 0
jmp UNANSWERED
add rbx, 42; will be skipped
UNANSWERED:
add rax, 1
; ...
The jmp instruction performs an unconditional jump; that means it starts the execution of the first instruction at a specified label without any condition check. The good news is that the CPU provides conditional jumps as well. The body of the max() function will translate into the following assembly code (simplified), where the jg and jle commands are interpreted as jump if greater than and jump if less than or equal, respectively (based on the results of the comparison using the cmp instruction):
mov rax, max; copy the "max" into the rax register
mov rbx, a
mov rdx, b
cmp rbx, rdx; compare the values of rbx and rdx (a and b)
jg GREATER; jump if rbx is greater than rdx (a > b)
jl LESSOREQUAL; jump if rbx is lesser than
GREATER:
mov rax, rbx; max = a
LESSOREQUAL:
mov rax, rdx; max = b
In the preceding code, the labels GREATER and LESSOREQUAL represent the if and else clauses of the max() function implemented earlier.