Understanding assembly deepens your understanding of the “stack” as a software engineer.

Unlike other programming languages, an assembly instruction usually maps directly to a cpu instruction (aka “opcode”), therefore understanding assembly allows you to understand what a cpu actually does.

Moving Values Between Memory and Registers

A cpu has a set of registers. These are locations where it can store some information (bits). It can then do stuff with this information (e.g. “add the two numbers stored in registerA and registerB”).

Usually, information that a cpu wants to operate on is strored in memory. Additionally, when a cpu is done operating on some information, it generally wants to store it back on memory (so that the registers are free to operate on other information).

Thus, the cpu needs a way to 1) take register values and put them in memory and 2) take memory values and put them in registers.

; move stuff from memory to a register (comments in assembly start with a semi-colon)
mov <destination_register> <source_memory_address>

; move stuff from a register to memory
mov <destination_memory> <source_register>

Doing Arithmitic

A cpu also needs to do arithmitic on the values in the registers.

; add the content of two registers (result is stored in first register)
add <register1> <register2>

; subtract two registers (again, result is stored in the first, but even if it was 
; stored in the second, not a big deal, you just have to know where its stored)
sub <register1> <register2>

; similar instructions exist for multiplication and division

; the key thing here is: you can do arithmitic on registers. The results are usually stored 
; in one of the two operand registers or some other register, the only thing that matters is 
; that you know where your cpu is putting the results

Registers

Different cpus can have different sized registers. Cpus often adhere to a “instruction set archetecture” (ISA), which kind of “standardizes” them.

For example, all cpus that adhere to the x64 instruction set have the following registers:

  • register “rax” (64 bits wide)
  • register “rbx” (64 bits wide)
  • register “rcx” (64 bits wide)

All cpus that adhere to the x86 ISA had the following registers:

  • register “eax” (32 bits wide)
  • register “ebx” (32 bits wide)
  • register “ecx” (32 bits wide)

The x64 ISA is backwards compatible with x86 (it is a superset). The x64 ISA still allows programs to reference the x86 registers. If the program references an “eax” register, the x64 cpu will just use 32 bits of its rax register. In other words, whenever an x86 program references a 32 bit register, the x64 cpu will just use 32 bits of the corresponding 64 bit register. That is how x64 ISA maintains backwards compatiblity with x86 ISA.

Compare Instruction

The compare instruction of a cpu allows you to…well compare two values. The result of the comparison is usually stored in a 3rd register. The result is usually some number, and different values of this number means different things. For example if the result is 0, the 2 values are the same. If the result is 1, the first value is greater. This is just one convention a cpu could use.

cmp <register1> <register2>

Jump Instruction

The jump instruction of cpus allow you to start executing instructions at a different address. Note that all your instructions are loaded somwhere in the address space of your process. You can use the jump instruction to execute any address of your process that houses a cpu instruction.

; jump to the address specified in a particular register
jmp <register>

The conditional jump instruction allows you to jump to a different address only if a certain comparison is true. For example, “jump to spotA if register1 is greater than register2”. This allows you to implement if statements and loops.

; jump to the address in register 1 if register 2 is greater than register 3
; note: this means you previously put some stuff in register 2 and 3 to compare!
jg <register1>

Instruction Pointer

The “instruction pointer” register of cpus houses the address of the next instruction to execute. It’s kind of like a bookmark of where the cpu is at in the code (binary code).

Some instructions, like the jump instruction changes the value of the instruction pointer.

One Routine Calling Another

You can group a set of instructions and call it a “routine”. A routine can take arguments and provide a return value. A routine may in turn call other routines.

As one routine calls another, which then calls another, which might call yet another, you need to keep track of the arguments and return values of each. This is done with the help of a stack, which is located in memory.

The “esp” register always points to the top of the stack. The “ebp” register points to the area of the stack where the arguments/locals of the currently executing routine start at.

When the currently executing routine is done, the esp register will point to what was previously the ebp register. The ebp register will point to the area of the stack where args/locals of the calling routine start at.

This is what allows a routine to call another, which calls another, which calls another, etc. Because every time a routine returns, the ebp register always points to the area of the stack that houses args/locals of the calling routine (the routine that called the returning routine).

In order for one routine to call another, both routines have to agree on some conventions. Where are the arguments of the callee stored? On the stack? On registers? Which registers? Where is the address that the callee is supposed to return to stored at? In other words, when the callee is done, what should the value of the instruction pointer be? Obviously this needs to be at the address in the caller right after it called the callee, but where was this address stored at? If it was stored in some agreed upon register, the callee can look at that register and set the instruction pointer accordingly once its done.

These conventions that a caller and a callee routine agree to is called a “calling convention”. One example of a calling convention is the cdecl calling convention, which states:

  • the caller pushes arguments to the stack
  • the caller pushes the address to return to to the stack
  • the caller removes arguments from the stack

Checkout cdecl on wikipedia. It has a great example!

The following instructions are relavant for routines calling other routines:

  • push/pop (pushes/pops values onto the stack)
  • call (pushes next instruction onto stack, then jumps to a new address)
  • ret (pops from stack, sets IP to the popped value…this is how a routine “returns”)

System Calls

Making a system call is a bit different than calling another routine that is a part of your program. You make a system call when you need to access resources (disk, network, etc) that you can only access by asking the operating system.

For example, let’s say you need to read 1000 bytes of an opened file called hamburgers.txt. Reading a file is accessing the disk, thus you need to ask the operating system to do the read for you, thus you need to make a system call.

First, you look up the system call you need to make. There is a system call for reading a file. The documentation for this system call will tell you which registers to store your arguments to (what file you want to read, the offset you want to read, how much you want to read, etc), and which register you can expect the output (the address in your address space where the read bytes are stored at after the OS reads for you).

So you put your arguemnets of the system call in the proper registers, then you execute a trap instruction. When a trap instruction is executed, the cpu goes into privillaged mode, then it looks into the “interrupt vector table”. It looks for an entry in this table for your system call number, and then decided to do the system call or not. If it does, it will execute your read, then store the results in some area of your address space. It will then put the cpu back into non-privillaged mode and set the instruction pointer to be what it was right before the trap instruction was executed.

So you basically make a request, then the cpu goes into privillaged mode, looks at your request, decides whether it wants to do it or not, if it does, it will put your results somewhere, then put the cpu in non-privillaged mode, and only then set the instruction pointer to what it was before the trap instruction.