Writing a Web Server in ARM64 Assembly

Introduction

In this post, we are going to learn ARM64 assembly the hard way: by writing a basic web server in assembly for Linux systems. If this seems long or intimidating, don’t worry we’ll start with the basics and work our way up to the web server, explaining every step.

Syscalls

Syscalls are how user-space programs interact with the kernel. They are the interface between your application and the operating system, allowing you to perform operations like reading and writing files, creating processes, and interacting with hardware.

graph LR Application-->|syscall|Kernel Kernel-->|return|Application

For example, to print something to the console without using printf, you use the write syscall:

write(fd, buffer, size);
  • fd: file descriptor (1 for stdout)
  • buffer: the string to print
  • size: length of the string

You can find syscall arguments in the Linux manual. For example, run man 2 write to see the arguments for the write syscall.

Other syscalls essential for network programming (and for our web server) include socket, bind, listen, accept, and send.

Hello World in C Using Syscalls

Most beginners use printf in C:

#include <stdio.h>
int main() {
    printf("Hello, World!\n");
    return 0;
}

But printf is just a wrapper around the write syscall. You can use write directly:

#include <unistd.h>
int main() {
    const char *message = "Hello, World!\n";
    write(1, message, 14); // 1 is stdout
    return 0;
}

Here, 1 is the file descriptor for stdout, message is the string, and 14 is the length (including the newline).

ARM Architecture Basics

Before writing assembly, you need to understand ARM64 basics. ARM64 is a 64-bit architecture, so it has 64-bit wide registers and can address more memory than 32-bit architectures.

Registers

There are 31 general-purpose registers: x0 to x30.

  • x0 to x7: used for passing arguments and returning values
  • x8 to x15: general-purpose
  • x19 to x28: callee-saved (persistent)
  • x29: frame pointer (FP)
  • x30: link register (LR)
  • xzr: zero register (always 0)

The lower 32 bits of each register are accessible as w0 to w30.

Memory and Pages

Linux uses virtual memory. Each process has its own virtual address space, and the kernel maps virtual to physical addresses. Memory is divided into pages (usually 4KB). You never deal with physical memory directly.

Using the Stack

The stack is used for local variables and passing arguments. It grows downward. The sp (stack pointer) register points to the top of the stack. You use the stack to save register values, store local variables, and keep return addresses.

Register Conventions and Common Instructions

Register(s) Purpose
x0–x7 Arguments/Results
x9–x15 Caller-saved (temporary)
x19–x28 Callee-saved (persistent)
x29 Frame Pointer (FP)
x30 Link Register (LR)
xzr Zero Register

Common instructions:

  • ldr: load from memory
  • str: store to memory
  • mov: move value
  • bl: branch with link (call)
  • svc: supervisor call (syscall)
  • ret: return
  • add: add values
  • adrp: address of page

Sections

  • .text: code
  • .data: initialized data
  • .bss: uninitialized data

Example:

.data
helloworld:
    .ascii "Hello, ARM64!\n"
helloworld_len = . - helloworld

Hello World in ARM64 Assembly Using Syscalls

.data
helloworld:
    .ascii "Hello, ARM64!\n"
helloworld_len = . - helloworld

.text
.globl _start
_start:
    mov x0, #1              // fd = stdout
    ldr x1, =helloworld     // buf
    ldr x2, =helloworld_len // count
    mov w8, #64             // write syscall
    svc #0

    mov x0, #0
    mov w8, #93             // exit syscall
    svc #0

Arguments are passed in x0–x7. The syscall number goes in w8. These numbers are architecture-specific (see the syscall table below).

Compiling ARM64 Assembly

Use as and ld:

as -o hello.o hello.s
ld -o hello hello.o

Or use gcc:

gcc -o hello hello.s

Run with:

./hello

Debugging with GDB

GDB lets you set breakpoints and inspect registers/memory:

gdb ./hello
(gdb) break _start
(gdb) tui enable
(gdb) run
(gdb) info registers
(gdb) print/x $x0

Web Server in C

A simple TCP server in C:

#include <arpa/inet.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main() {
    int server_fd, client_fd;
    struct sockaddr_in address;
    int addrlen = sizeof(address);
    char *response = "HTTP/1.1 200 OK\nContent-Type: text/plain\nContent-Length: 12\n\nHello Libc!";

    server_fd = socket(AF_INET, SOCK_STREAM, 0);
    address.sin_family = AF_INET;
    address.sin_addr.s_addr = INADDR_ANY;
    address.sin_port = htons(8080);
    int opt = 1;
    setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
    bind(server_fd, (struct sockaddr *)&address, sizeof(address));
    listen(server_fd, 3);
    while (1) {
        client_fd = accept(server_fd, (struct sockaddr *)&address, (socklen_t *)&addrlen);
        char buffer[1024] = {0};
        read(client_fd, buffer, 1024);
        send(client_fd, response, strlen(response), 0);
        close(client_fd);
    }
}

This server creates a socket, binds to a port, listens, accepts connections, reads requests, sends responses, and closes connections.

Syscalls Used in the Web Server

Syscall Purpose
socket Create a TCP socket
setsockopt Set socket options (e.g., SO_REUSEADDR)
bind Bind socket to address/port
listen Listen for incoming connections
accept Accept a new connection
read Read data from client
send Send data to client
close Close client connection
exit Exit on error

All these are syscalls provided by the kernel. We can implement the same logic in ARM64 assembly.

HTTP Response Format

A minimal HTTP response:

HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 12

Hello Libc!

This is the minimum for a valid HTTP response: status line, headers, blank line, and body.

Syscall Table for ARM64

Reference: https://arm64.syscall.sh/

Syscall Number Arguments (x0, x1, x2, …)
socket 198 domain, type, protocol
bind 200 fd, addr*, addrlen
listen 201 fd, backlog
accept 202 fd, addr*, addrlen*
sendto/send 206 fd, buf, len, flags, dest_addr*, addrlen
setsockopt 208 fd, level, optname, optval*, optlen
read 63 fd, buf, count
write 64 fd, buf, count
close 57 fd

Defining Global Strings and Structs

Define error messages and HTTP payload in .data:

.data
payload:
    .ascii "HTTP/1.1 200 OK\nContent-Type: text/plain\nContent-Length: 16\n\nHello from ARM!\n"
payload_len = . - payload
socket_err:
    .ascii "socket failed\n"
socket_err_len = . - socket_err
// ... (other error messages)

Define sockaddr_in struct:

.align 4
sockaddr_in:
    .hword 2          // AF_INET
    .hword 0x901f     // port 8080 in network byte order (big-endian)
    .word 0x0100007f  // 127.0.0.1
    .zero 8           // padding
sockaddr_in_len:
    .word 16

Note: 0x8080 is represented as 0x1f90 in hex, but in big-endian as 0x901f. Network traffic is always big-endian, even on little-endian machines. This replaces the need for htons in C.

Same applies for the IP defined in the word just bellow 7f 00 00 01 matches the 127.0.0.1 but since we are representing in big-endian it stays 01 00 00 7f. Allowing only this IP to connect to the socket.

Storing the Server FD in Memory

To store the server file descriptor in memory (for educational purposes):

server_fd:
    .skip 4 // 4 bytes

To save the value:

adrp x1, server_fd
add  x1, x1, :lo12:server_fd
str  w0, [x1]

To load it back:

adrp x1, server_fd
add  x1, x1, :lo12:server_fd
ldr  w0, [x1]

We can actually also do this with a pseudo instruction like the one we used in the hello world program and in the strings.

ldr x1, =server_fd // Load the address
ldr w0, [x1] // Load to the register

The = is actually a pseudo instruction that does the adrp and add combo.

Writing the Syscalls

Each syscall routine:

  • Put arguments in registers
  • Set syscall number in w8
  • Call svc #0

Example:

socket:
    mov x0, #2
    mov x1, #1
    mov x2, #0
    mov w8, #198
    svc #0
    ret

For syscalls needing pointers to global data, use adrp and add :lo12:.

Main Routine

The main routine calls each step and checks for errors:

.text
.globl _start
_start:
    bl  socket
    ldr x1, =socket_err
    ldr x2, =socket_err_len
    bl  err_check
    adrp x1, server_fd
    add  x1, x1, :lo12:server_fd
    str  w0, [x1]
    bl  setsockopt
    ldr x1, =setsockopt_err
    ldr x2, =setsockopt_err_len
    bl  err_check
    bl  bind
    ldr x1, =bind_err
    ldr x2, =bind_err_len
    bl  err_check
    bl  listen
    ldr x1, =listen_err
    ldr x2, =listen_err_len
    bl  err_check
    sub sp, sp, #1040
    stp x29, x30, [sp]
    mov x29, sp
    bl  loop
    bl  exit

The Loop

The loop accepts connections, reads requests, writes the buffer, sends the response, and closes the connection:

loop:
    bl  accept
    ldr x1, =accept_err
    ldr x2, =accept_err_len
    bl  err_check
    mov w19, w0
    mov x1, sp
    mov x2, #1040
    bl  read
    ldr x1, =read_err
    ldr x2, =read_err_len
    bl  err_check
    bl  write
    ldr x1, =write_err
    ldr x2, =write_err_len
    bl  err_check
    bl  send
    ldr x1, =send_err
    ldr x2, =send_err_len
    bl  err_check
    bl  close
    ldr x1, =close_err
    ldr x2, =close_err_len
    bl  err_check
    b   loop

Buffer and Stack

We allocate 1040 bytes on the stack (1024 + 16 for alignment) for the request buffer:

sub sp, sp, #1040
stp x29, x30, [sp]
mov x29, sp

This is like char buffer[1040]; in C.

Full code

The full code should look like this:

.data

// Defining the address struct
.align 4

sockaddr_in:
	.hword 2
	.hword 0x901f
	.word  0x0100007f // Change to 0x0 to accept any connections
	.zero  8

sockaddr_in_len:
	.word 16

payload:
	.ascii "HTTP/1.1 200 OK\nContent-Type: text/plain\nContent-Length: 16\n\nHello from ARM!\n"
	payload_len = . - payload

socket_err:
	.ascii "socket failed\n"
	socket_err_len = . - socket_err

setsockopt_err:
	.ascii "setsockopt failed\n"
	setsockopt_err_len = . - setsockopt_err

bind_err:
	.ascii "bind failed\n"
	bind_err_len = . - bind_err

listen_err:
	.ascii "listen failed\n"
	listen_err_len = . - listen_err

accept_err:
	.ascii "accept failed\n"
	accept_err_len = . - accept_err

read_err:
	.ascii "read failed\n"
	read_err_len = . - read_err

write_err:
	.ascii "write failed\n"
	write_err_len = . - write_err

send_err:
	.ascii "send failed\n"
	send_err_len = . - send_err

close_err:
	.ascii "close failed\n"
	close_err_len = . - close_err

	// Here we store uninitialize memory
	.section .bss
	.align   2

server_fd:
	.skip 4 // 4 bytes

	.text
	.globl _start

_start:
	bl  socket
	ldr x1, =socket_err
	ldr x2, =socket_err_len
	bl  err_check

	// Save server_fd in memory
	adrp x1, server_fd
	add  x1, x1, :lo12:server_fd
	str  w0, [x1]

	bl  setsockopt
	ldr x1, =setsockopt_err
	ldr x2, =setsockopt_err_len
	bl  err_check

	bl  bind
	ldr x1, =bind_err
	ldr x2, =bind_err_len
	bl  err_check

	bl  listen
	ldr x1, =listen_err
	ldr x2, =listen_err_len
	bl  err_check

	// Define a buffer on the stack (1040 = 1024 + 16 for alignment)
	// x29 holds the previous pointer
	sub sp, sp, #1040
	stp x29, x30, [sp]
	mov x29, sp
	bl  loop

	bl exit

loop:
	bl  accept
	ldr x1, =accept_err
	ldr x2, =accept_err_len
	bl  err_check

	// Store client_fd in w19
	mov w19, w0

	// Create buffer from read()
	mov x1, sp
	mov x2, #1040

	// Read client_fd
	bl  read
	ldr x1, =read_err
	ldr x2, =read_err_len
	bl  err_check

	// Write the buffer
	bl  write
	ldr x1, =write_err
	ldr x2, =write_err_len
	bl  err_check

	// Send response
	bl  send
	ldr x1, =send_err
	ldr x2, =send_err_len
	bl  err_check

	// Close connection
	bl  close
	ldr x1, =close_err
	ldr x2, =close_err_len
	bl  err_check
	b   loop

socket:
	mov x0, #2
	mov x1, #1
	mov x2, #0
	mov w8, #198
	svc #0
	ret

setsockopt:
	// Get fd from memory
	adrp x1, server_fd
	add  x1, x1, :lo12:server_fd
	ldr  w0, [x1]

	mov w1, #0x1 // SOL_SOCKET
	mov w2, #0x2 // SO_REUSEADDR

	mov w20, #1          // Needs to be a pointer (&opt)
	str w20, [sp, #-16]!
	mov x3, sp
	mov x4, #4           // 32 bits - 4 bytes

	mov w8, #208
	svc #0

	add sp, sp, #16 // Return stack
	ret

bind:
	// Get fd from memory
	adrp x1, server_fd
	add  x1, x1, :lo12:server_fd
	ldr  w0, [x1]

	ldr x1, =sockaddr_in
	ldr x2, =sockaddr_in_len
	ldr w2, [x2]

	mov w8, #200
	svc #0
	ret

listen:
	// Get fd from memory
	adrp x1, server_fd
	add  x1, x1, :lo12:server_fd
	ldr  w0, [x1]

	mov x1, #3
	mov w8, #201
	svc #0
	ret

accept:
	// Get fd from memory
	adrp x1, server_fd
	add  x1, x1, :lo12:server_fd
	ldr  w0, [x1]

	ldr  x1, =sockaddr_in
	adrp x2, sockaddr_in_len           // adrp only loads top bits (page)
	add  x2, x2, :lo12:sockaddr_in_len // (so we need to add the last 12 as offset)

	mov w8, #202
	svc #0
	ret

read:
	mov w0, w19
	mov x1, sp
	mov x2, #1040

	mov w8, #63
	svc #0
	ret

write:
	mov w0, #1
	mov x1, sp
	mov x2, #1040

	mov w8, #64
	svc #0
	ret

send:
	mov w0, w19
	ldr x1, =payload
	mov x2, payload_len

	// Clear old values from args since we don't need them.
	mov x3, #0 // flags
	mov x4, #0 // dest_addr (NULL)
	mov x5, #0 // addrlen (0)

	mov w8, #206
	svc #0
	ret

close:
	mov w0, w19
	mov w8, #57
	svc #0
	ret

err_check:
	cmp  x0, #0
	b.lt err_handler
	ret

err_handler:
# Write to stderr
mov w0, #2
mov w8, #64
svc #0
bl  exit    // Call exit

exit:
	mov x0, #0
	mov w8, #93
	svc #0
	ret

Testing

Build and run the program on an ARM64 Linux system or emulator. Test with:

curl localhost:8080

References