Web Server in ARM64 Assembly
Table of Contents
Writing a Web Server in ARM64 Assembly⌗
Introduction⌗
In this post, we are going to learn ARM64 assembly the hard way: by writing a basic web server in assembly for Linux systems. If this seems long or intimidating, don’t worry we’ll start with the basics and work our way up to the web server, explaining every step.
Syscalls⌗
Syscalls are how user-space programs interact with the kernel. They are the interface between your application and the operating system, allowing you to perform operations like reading and writing files, creating processes, and interacting with hardware.
For example, to print something to the console without using printf, you use the write syscall:
write(fd, buffer, size);
fd: file descriptor (1 for stdout)buffer: the string to printsize: length of the string
You can find syscall arguments in the Linux manual. For example, run man 2 write to see the arguments for the write syscall.
Other syscalls essential for network programming (and for our web server) include socket, bind, listen, accept, and send.
Hello World in C Using Syscalls⌗
Most beginners use printf in C:
#include <stdio.h>
int main() {
printf("Hello, World!\n");
return 0;
}
But printf is just a wrapper around the write syscall. You can use write directly:
#include <unistd.h>
int main() {
const char *message = "Hello, World!\n";
write(1, message, 14); // 1 is stdout
return 0;
}
Here, 1 is the file descriptor for stdout, message is the string, and 14 is the length (including the newline).
ARM Architecture Basics⌗
Before writing assembly, you need to understand ARM64 basics. ARM64 is a 64-bit architecture, so it has 64-bit wide registers and can address more memory than 32-bit architectures.
Registers⌗
There are 31 general-purpose registers: x0 to x30.
x0tox7: used for passing arguments and returning valuesx8tox15: general-purposex19tox28: callee-saved (persistent)x29: frame pointer (FP)x30: link register (LR)xzr: zero register (always 0)
The lower 32 bits of each register are accessible as w0 to w30.
Memory and Pages⌗
Linux uses virtual memory. Each process has its own virtual address space, and the kernel maps virtual to physical addresses. Memory is divided into pages (usually 4KB). You never deal with physical memory directly.
Using the Stack⌗
The stack is used for local variables and passing arguments. It grows downward. The sp (stack pointer) register points to the top of the stack. You use the stack to save register values, store local variables, and keep return addresses.
Register Conventions and Common Instructions⌗
| Register(s) | Purpose |
|---|---|
| x0–x7 | Arguments/Results |
| x9–x15 | Caller-saved (temporary) |
| x19–x28 | Callee-saved (persistent) |
| x29 | Frame Pointer (FP) |
| x30 | Link Register (LR) |
| xzr | Zero Register |
Common instructions:
ldr: load from memorystr: store to memorymov: move valuebl: branch with link (call)svc: supervisor call (syscall)ret: returnadd: add valuesadrp: address of page
Sections⌗
.text: code.data: initialized data.bss: uninitialized data
Example:
.data
helloworld:
.ascii "Hello, ARM64!\n"
helloworld_len = . - helloworld
Hello World in ARM64 Assembly Using Syscalls⌗
.data
helloworld:
.ascii "Hello, ARM64!\n"
helloworld_len = . - helloworld
.text
.globl _start
_start:
mov x0, #1 // fd = stdout
ldr x1, =helloworld // buf
ldr x2, =helloworld_len // count
mov w8, #64 // write syscall
svc #0
mov x0, #0
mov w8, #93 // exit syscall
svc #0
Arguments are passed in x0–x7. The syscall number goes in w8. These numbers are architecture-specific (see the syscall table below).
Compiling ARM64 Assembly⌗
Use as and ld:
as -o hello.o hello.s
ld -o hello hello.o
Or use gcc:
gcc -o hello hello.s
Run with:
./hello
Debugging with GDB⌗
GDB lets you set breakpoints and inspect registers/memory:
gdb ./hello
(gdb) break _start
(gdb) tui enable
(gdb) run
(gdb) info registers
(gdb) print/x $x0
Web Server in C⌗
A simple TCP server in C:
#include <arpa/inet.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main() {
int server_fd, client_fd;
struct sockaddr_in address;
int addrlen = sizeof(address);
char *response = "HTTP/1.1 200 OK\nContent-Type: text/plain\nContent-Length: 12\n\nHello Libc!";
server_fd = socket(AF_INET, SOCK_STREAM, 0);
address.sin_family = AF_INET;
address.sin_addr.s_addr = INADDR_ANY;
address.sin_port = htons(8080);
int opt = 1;
setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
bind(server_fd, (struct sockaddr *)&address, sizeof(address));
listen(server_fd, 3);
while (1) {
client_fd = accept(server_fd, (struct sockaddr *)&address, (socklen_t *)&addrlen);
char buffer[1024] = {0};
read(client_fd, buffer, 1024);
send(client_fd, response, strlen(response), 0);
close(client_fd);
}
}
This server creates a socket, binds to a port, listens, accepts connections, reads requests, sends responses, and closes connections.
Syscalls Used in the Web Server⌗
| Syscall | Purpose |
|---|---|
| socket | Create a TCP socket |
| setsockopt | Set socket options (e.g., SO_REUSEADDR) |
| bind | Bind socket to address/port |
| listen | Listen for incoming connections |
| accept | Accept a new connection |
| read | Read data from client |
| send | Send data to client |
| close | Close client connection |
| exit | Exit on error |
All these are syscalls provided by the kernel. We can implement the same logic in ARM64 assembly.
HTTP Response Format⌗
A minimal HTTP response:
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 12
Hello Libc!
This is the minimum for a valid HTTP response: status line, headers, blank line, and body.
Syscall Table for ARM64⌗
Reference: https://arm64.syscall.sh/
| Syscall | Number | Arguments (x0, x1, x2, …) |
|---|---|---|
| socket | 198 | domain, type, protocol |
| bind | 200 | fd, addr*, addrlen |
| listen | 201 | fd, backlog |
| accept | 202 | fd, addr*, addrlen* |
| sendto/send | 206 | fd, buf, len, flags, dest_addr*, addrlen |
| setsockopt | 208 | fd, level, optname, optval*, optlen |
| read | 63 | fd, buf, count |
| write | 64 | fd, buf, count |
| close | 57 | fd |
Defining Global Strings and Structs⌗
Define error messages and HTTP payload in .data:
.data
payload:
.ascii "HTTP/1.1 200 OK\nContent-Type: text/plain\nContent-Length: 16\n\nHello from ARM!\n"
payload_len = . - payload
socket_err:
.ascii "socket failed\n"
socket_err_len = . - socket_err
// ... (other error messages)
Define sockaddr_in struct:
.align 4
sockaddr_in:
.hword 2 // AF_INET
.hword 0x901f // port 8080 in network byte order (big-endian)
.word 0x0100007f // 127.0.0.1
.zero 8 // padding
sockaddr_in_len:
.word 16
Note: 0x8080 is represented as 0x1f90 in hex, but in big-endian as 0x901f. Network traffic is always big-endian, even on little-endian machines. This replaces the need for
htonsin C.
Same applies for the IP defined in the word just bellow 7f 00 00 01 matches the 127.0.0.1 but since we are representing in big-endian it stays 01 00 00 7f. Allowing only this IP to connect to the socket.
Storing the Server FD in Memory⌗
To store the server file descriptor in memory (for educational purposes):
server_fd:
.skip 4 // 4 bytes
To save the value:
adrp x1, server_fd
add x1, x1, :lo12:server_fd
str w0, [x1]
To load it back:
adrp x1, server_fd
add x1, x1, :lo12:server_fd
ldr w0, [x1]
We can actually also do this with a pseudo instruction like the one we used in the hello world program and in the strings.
ldr x1, =server_fd // Load the address
ldr w0, [x1] // Load to the register
The = is actually a pseudo instruction that does the adrp and add combo.
Writing the Syscalls⌗
Each syscall routine:
- Put arguments in registers
- Set syscall number in w8
- Call
svc #0
Example:
socket:
mov x0, #2
mov x1, #1
mov x2, #0
mov w8, #198
svc #0
ret
For syscalls needing pointers to global data, use adrp and add :lo12:.
Main Routine⌗
The main routine calls each step and checks for errors:
.text
.globl _start
_start:
bl socket
ldr x1, =socket_err
ldr x2, =socket_err_len
bl err_check
adrp x1, server_fd
add x1, x1, :lo12:server_fd
str w0, [x1]
bl setsockopt
ldr x1, =setsockopt_err
ldr x2, =setsockopt_err_len
bl err_check
bl bind
ldr x1, =bind_err
ldr x2, =bind_err_len
bl err_check
bl listen
ldr x1, =listen_err
ldr x2, =listen_err_len
bl err_check
sub sp, sp, #1040
stp x29, x30, [sp]
mov x29, sp
bl loop
bl exit
The Loop⌗
The loop accepts connections, reads requests, writes the buffer, sends the response, and closes the connection:
loop:
bl accept
ldr x1, =accept_err
ldr x2, =accept_err_len
bl err_check
mov w19, w0
mov x1, sp
mov x2, #1040
bl read
ldr x1, =read_err
ldr x2, =read_err_len
bl err_check
bl write
ldr x1, =write_err
ldr x2, =write_err_len
bl err_check
bl send
ldr x1, =send_err
ldr x2, =send_err_len
bl err_check
bl close
ldr x1, =close_err
ldr x2, =close_err_len
bl err_check
b loop
Buffer and Stack⌗
We allocate 1040 bytes on the stack (1024 + 16 for alignment) for the request buffer:
sub sp, sp, #1040
stp x29, x30, [sp]
mov x29, sp
This is like char buffer[1040]; in C.
Full code⌗
The full code should look like this:
.data
// Defining the address struct
.align 4
sockaddr_in:
.hword 2
.hword 0x901f
.word 0x0100007f // Change to 0x0 to accept any connections
.zero 8
sockaddr_in_len:
.word 16
payload:
.ascii "HTTP/1.1 200 OK\nContent-Type: text/plain\nContent-Length: 16\n\nHello from ARM!\n"
payload_len = . - payload
socket_err:
.ascii "socket failed\n"
socket_err_len = . - socket_err
setsockopt_err:
.ascii "setsockopt failed\n"
setsockopt_err_len = . - setsockopt_err
bind_err:
.ascii "bind failed\n"
bind_err_len = . - bind_err
listen_err:
.ascii "listen failed\n"
listen_err_len = . - listen_err
accept_err:
.ascii "accept failed\n"
accept_err_len = . - accept_err
read_err:
.ascii "read failed\n"
read_err_len = . - read_err
write_err:
.ascii "write failed\n"
write_err_len = . - write_err
send_err:
.ascii "send failed\n"
send_err_len = . - send_err
close_err:
.ascii "close failed\n"
close_err_len = . - close_err
// Here we store uninitialize memory
.section .bss
.align 2
server_fd:
.skip 4 // 4 bytes
.text
.globl _start
_start:
bl socket
ldr x1, =socket_err
ldr x2, =socket_err_len
bl err_check
// Save server_fd in memory
adrp x1, server_fd
add x1, x1, :lo12:server_fd
str w0, [x1]
bl setsockopt
ldr x1, =setsockopt_err
ldr x2, =setsockopt_err_len
bl err_check
bl bind
ldr x1, =bind_err
ldr x2, =bind_err_len
bl err_check
bl listen
ldr x1, =listen_err
ldr x2, =listen_err_len
bl err_check
// Define a buffer on the stack (1040 = 1024 + 16 for alignment)
// x29 holds the previous pointer
sub sp, sp, #1040
stp x29, x30, [sp]
mov x29, sp
bl loop
bl exit
loop:
bl accept
ldr x1, =accept_err
ldr x2, =accept_err_len
bl err_check
// Store client_fd in w19
mov w19, w0
// Create buffer from read()
mov x1, sp
mov x2, #1040
// Read client_fd
bl read
ldr x1, =read_err
ldr x2, =read_err_len
bl err_check
// Write the buffer
bl write
ldr x1, =write_err
ldr x2, =write_err_len
bl err_check
// Send response
bl send
ldr x1, =send_err
ldr x2, =send_err_len
bl err_check
// Close connection
bl close
ldr x1, =close_err
ldr x2, =close_err_len
bl err_check
b loop
socket:
mov x0, #2
mov x1, #1
mov x2, #0
mov w8, #198
svc #0
ret
setsockopt:
// Get fd from memory
adrp x1, server_fd
add x1, x1, :lo12:server_fd
ldr w0, [x1]
mov w1, #0x1 // SOL_SOCKET
mov w2, #0x2 // SO_REUSEADDR
mov w20, #1 // Needs to be a pointer (&opt)
str w20, [sp, #-16]!
mov x3, sp
mov x4, #4 // 32 bits - 4 bytes
mov w8, #208
svc #0
add sp, sp, #16 // Return stack
ret
bind:
// Get fd from memory
adrp x1, server_fd
add x1, x1, :lo12:server_fd
ldr w0, [x1]
ldr x1, =sockaddr_in
ldr x2, =sockaddr_in_len
ldr w2, [x2]
mov w8, #200
svc #0
ret
listen:
// Get fd from memory
adrp x1, server_fd
add x1, x1, :lo12:server_fd
ldr w0, [x1]
mov x1, #3
mov w8, #201
svc #0
ret
accept:
// Get fd from memory
adrp x1, server_fd
add x1, x1, :lo12:server_fd
ldr w0, [x1]
ldr x1, =sockaddr_in
adrp x2, sockaddr_in_len // adrp only loads top bits (page)
add x2, x2, :lo12:sockaddr_in_len // (so we need to add the last 12 as offset)
mov w8, #202
svc #0
ret
read:
mov w0, w19
mov x1, sp
mov x2, #1040
mov w8, #63
svc #0
ret
write:
mov w0, #1
mov x1, sp
mov x2, #1040
mov w8, #64
svc #0
ret
send:
mov w0, w19
ldr x1, =payload
mov x2, payload_len
// Clear old values from args since we don't need them.
mov x3, #0 // flags
mov x4, #0 // dest_addr (NULL)
mov x5, #0 // addrlen (0)
mov w8, #206
svc #0
ret
close:
mov w0, w19
mov w8, #57
svc #0
ret
err_check:
cmp x0, #0
b.lt err_handler
ret
err_handler:
# Write to stderr
mov w0, #2
mov w8, #64
svc #0
bl exit // Call exit
exit:
mov x0, #0
mov w8, #93
svc #0
ret
Testing⌗
Build and run the program on an ARM64 Linux system or emulator. Test with:
curl localhost:8080
References⌗
- https://syscalls.mebeim.net/?table=arm64/64/aarch64/latest
- https://arm64.syscall.sh/
- RFC 2616 (HTTP/1.1)
- ARM64 Assembly Cheat Sheet