A beginners guide to shellcode

One of my favorite quotes of all time comes from Jon Erickson's Hacking the Art Of Exploitation:

"Shellcode is injected into a running program, where it takes over like a biological virus inside a cell."

I studied molecular biology in college and this description is fairly accurate. DNA is the code of our life and a virus is nothing more than a hacker using a living being's operating system to replicate its own. Many other concepts from information security and molecular biology overlap significantly because, at the end of the day, our bodies are deterministic self-replicating computers.

What exactly is shellcode? Well, it's assembly language designed to be used as a software exploit for Remote Code Execution or RCE for short. Assembly language is the lowest level of code that a programmer can write without actually writing out binary. When we use assembly we're directly interacting with the hardware and telling it what to do.

A lot goes into writing shell code exploits like creating a stack overflow, removing null bytes, and slimming down the instruction set so that the payload isn't so large as to draw unwanted attention. But before we do anything we need to understand basic assembly language.

There are a variety of computer processors, and the assembly language used will be different depending on which system you're using. In this tutorial, we're going to write some vanilla 64-bit x86 assembly to familiarize ourselves with registers and other basic concepts. I'll be using a 64-bit Ubuntu system running on AWS Lightsail, don't be surprised if you use a different system and get different results. If you're running a 32-bit system either get an upgrade or change all of the R prefixes in my registers to E.

As with every programming tutorial on the internet, and by sacred traditions we'll start by writing something that prints "Hello, World!". As with every other programming language, there are functions that we can use to do stuff. These are called System Calls or syscalls because they are passed down to the kernel for execution.

Each syscall has a convenient name like exit but we all know that computers talk in numbers. We can't just tell the kernel "Hey call exit, and be snappy about it because the weekends are almost over!" No, instead each syscall has a corresponding magic number that we need to load into the RAX register before executing with an int 0x80 call, more on all this later.

The syscalls might depend on which OS or distro you're using and you'll have to figure that out on your own. If we want to know more about a specific syscall, for example, the arguments exit takes we can look it up in the man pages via its human-readable name.

Before we spoke about loading the syscall integer into the RAX register, but what exactly is a register? Registers are hardware variables, they simply store information. The only difference is that, unlike variables in traditional programming languages, we don't define the names of registers.

There are four general-purpose registers RAX, RBX, RCX and RDX. These can hold any type of data that the CPU needs them to but when dealing with system calls the RAX register holds the syscall number. In alphabetical order, the rest of these general-purpose registers hold the arguments for the syscall. RBX stores the first argument, RCX is the second, and RDX is the third.

Aside from the general purpose registers there are some special registers. RSP is referred to as the stack pointer and RBP is the stack base pointer. The stack is an abstract structure in memory that stores variables for each of our function calls, we don't need to worry about the stack too much for now but it will be helpful to glance over some things and come back to it later in the future.

RSP is a pointer i.e. a variable that stores a memory address. RSP stands for stack pointer and it marks the top of the stack. RBP stands for base pointer and it stores the memory address at the bottom of the base of the stack. We don't need to worry about the stack for now, we'll explore it in another tutorial.

The very last thing we need to know is that registers have various widths or sizes associated with them R is 64 bits, E uses the first 32 bits of the register. A register with no prefix for example AX is 16 bits in width. Lastly, there will be the high and low widths of the registers, each of which is 8 bits in length. High is abbreviated with an H and low with an L. It is probably easier to visualize register widths so here is a picture via Wikipedia here is a list of x86 registers:

x86 general purpose registers

Now it's finally time to write some assembly!

We already know the syscall for write, now it's time to figure out what arguments it takes by man write or check out the man7 pages:

#include <unistd.h>
ssize_t write(int fd, const void buf[.count], size_t count);

As you can see write has 3 arguments, a file descriptor to print the string to, a buffer to print and the size of that buffer to avoid overflows. Now it's finally time to start writing some basic assembly, which technically isn't shellcode quite yet but we're getting there slowly but surely.

section .data
; Tell the assemble that we want this to be the data section

txt db "Hello, World!", 0x0a" ; create txt variable within data

section .text ; start to define the code section
global _start ;entry point for ELF linking. ELF is simply a format for executable files

 _start:
; SYSCALL WRITE(1,txt,14) for more on write see 'man write'
    mov rax, 4 ; add  write syscall number to rax
    mov rbx, 1 ; 1 is the file descriptor for STDOUT 
    mov rcx, txt ; the message we want to print
    mov rdx, 14 ; and 14 is the length of the message
    int 0x80 ; send syscall to the kernel

; SYSCALL EXIT(exit_code)
    mov rax, 0 ; exit has a syscall number of 0
    mov rbx, 0 ; 0 is the exit code for sucess 
    int 0x80; send the call to the kerne, congrats your a hacker.... ~

Assemble the program, link it and lastly run it:

 # -f elf specifies the format, in this case our 64 bit program creates an
# elf (extendable linking format) binary
# .o is an object file we link it with ld
$ nasm -f elf64 hello_world.asm && ld hello_world.o

$ ls
hello_world.asm  hello_world.o hello_world

$ ./hello_world
Hello, World!

Basic Concepts of x86 Shell Code

Figure out how to write shell code for anything