Building a Tetris Clone in x86 Assembly, pt. Ⅱ: I/O

By the end of part Ⅰ, the classic phrase, “Hello, world!”, had appeared on screen in beautiful bright green and all was well. It was, though, not at all close to being any sort of game. To get closer to being able to implement Tetris, more I/O functions are necessary. And in implementing these functions, it will be useful to be able to debug them.


Assembly is just as easy to debug as C. In fact, the same tools work for both.

As with C, the first thing to do is compile with debugging information. The flag for NASM is the same as for GCC, -g. It’s also useful to set a flag for conditional compilation with -d DEBUG. This can be tested in NASM very similarly to C.

%ifdef DEBUG
; ...

To have QEMU listen for a debugger, use the -s flag. To have it pause execution until a debugger is attached (and the continue command issued), use the -S flag as well.

After starting GDB with the path to the kernel binary, it can be attached to a listening QEMU instance with target remote localhost:1234. This command can be executed on start with the -ex command-line option. To have GDB output the current instruction at each step, use display/i $pc and set disassembly-flavor intel.

Most GDB commands can be used as normal. However, there are alternate commands to step by instruction: nexti and stepi (or ni and si). Register values can be printed with p $eax, for example. Memory can be printed with x.

(gdb) break main
Breakpoint 1 at 0x101a00
(gdb) c

Breakpoint 1, 0x00101a00 in main ()
1: x/i $pc
=> 0x101a00 <main>:	push   0x20
(gdb) si
0x00101a02 in main ()
1: x/i $pc
=> 0x101a02 <main+2>:	call   0x101d97 <clear>
(gdb) p $esp
$1 = (void *) 0x1040fc
(gdb) x $esp
0x1040fc:	0x00000020

Number formatting

The puts function from last time puts strings on the screen, but provides no way to print numbers. The traditional name for a function that formats numbers as strings is itoa.

The function will take a double-word number and a word containing the radix in the upper byte and the desired width in the lower byte. It will return a pointer into a static string buffer. Since the game will be single-threaded, a static buffer works fine here.

Since the longest possible output is a 32-bit double-word formatted as binary, the static buffer needs to be 32 bytes long with a null-terminator.

The function will also need a list of digit characters to use.

section .data
output db '00000000000000000000000000000000', 0
digits db '0123456789ABCDEF'

The implementation will be using all of the callee-saved registers,1 so its prologue and epilogue look like this.

section .text
  push ebp
  mov ebp, esp
  push ebx
  push esi
  push edi

  pop edi
  pop esi
  pop ebx
  mov esp, ebp
  pop ebp


The algorithm for formatting numbers is the same as for converting numbers between radixes.

  1. Divide the number by the radix
  2. The remainder is the next digit, starting from the least significant
  3. Repeat with the quotient until it is zero

For example, the remainders from dividing 26 by 2 give its binary representation, 11010.

  1. 26 = 2 × 13 + 0
  2. 13 = 2 × 6 + 1
  3. 6 = 2 × 3 + 0
  4. 3 = 2 × 1 + 1
  5. 1 = 2 × 0 + 1

To use this to format numbers, the remainders will be used as offsets into the list of digit characters. Additionally, since itoa takes a width, it will simply loop that number of times, rather than stopping at a quotient of zero.


Since the digits are calculated least-significant first, itoa needs to start at the end of the string buffer. To work backwards using string operations, it uses the std instruction.2

lea edi, [output + 32]

Next, the number is loaded into eax and the base into ebx. The width is loaded into ecx for use with loop.3

mov eax, [ebp + 8]
movzx ebx, byte [ebp + 13]
movzx ecx, byte [ebp + 12]

The loop itself first clears edx on each iteration, since 32-bit division treats it as the high bits of the dividend. It then divides, indexes into the digit characters, and writes to the buffer. The quotient remains in eax for further division.

  xor edx, edx
  div ebx
  lea esi, [digits + edx]
  loop .loop

At the end of this loop, edi has actually been decremented one too many times, so one is added when setting eax, the return value. According to the calling convention, the direction flag is also reset with cld.

lea eax, [edi + 1]

Hello, numbers

Now that itoa is implemented, it can be combined with puts to print numbers.

  push word BG.BLACK | ' '
  call clear

  push word 0x0208
  push dword 26
  call itoa

  push dword 0x0101 << 16 | FG.BRIGHT | FG.GREEN
  push eax
  call puts

  add esp, 16
  jmp halt

Number formatting QEMU screenshot

Way better than “Hello, world!”, right?

Keyboard input

That covers the O of “I/O,” now for the I. There are two ways to read keyboard input on x86: with interrupts and through polling. Interrupts are complicated to set up, and they aren’t needed4 for this particular case, so polling will be used.

Keyboard input comes in the form of byte-sized “scancodes.” There are key-down and key-up scancodes for each key, which are read from port 0x60. To be useful, the scan function will keep track of the previously read scancode to detect when it changes.

section .data
key db 0

Since the function won’t be taking any parameters, it doesn’t need a prologue or epilogue. It will, however, start by zeroing eax, since it will set only al for its return value.

section .text
  xor eax, eax

Then, the current scancode is read and compared to the previous code. If the scancode has not changed, zero is returned. Otherwise, the previous scancode is updated and returned.

in al, 0x60
cmp al, [key]
je .zero
mov [key], al

  xor al, al

Input loop

Putting everything together, an input loop can show the last scancode on the screen. This is useful finding scancodes of new keys.

  push BG.BLACK | ' '
  call clear
  add esp, 2

    call scan
    test al, al
    jz .loop

    push word 0x0A02
    push eax
    call itoa
    push dword 0x0101 << 16 | FG.BRIGHT | FG.GREEN
    push eax
    call puts
    add esp, 16

    jmp .loop

This is what holding the left shift key looks like.

Input loop QEMU screenshot

To reset using the keyboard, the returned scancode can be compared to the R key-down scancode, 0x13.

call scan
test al, al
jz .loop

cmp al, 0x13
je reset

Next time…

Speaking of time, the next set of basic functions to implement will be for timing. After that, it should be possible to start making the actual game.

  1. Calling convention, Building a Tetris Clone in x86 Assembly 

  2. STD, x86 Instruction Set Reference 

  3. LOOP/LOOPcc, x86 Instruction Set Reference 

  4. Stay tuned for the absolutely terrible way in which timing can be achieved without interrupts!