Crackme-style challenge programs often incorporate techniques designed to resist or slow down analysis; one such technique - quite familiar by now - is corruption of the header of the binary, but many other techniques exist as well. For example, a program may be designed to deliberately perform overly complex operations that are difficult for a human to follow during analysis, increasing the time required to sufficiently comprehend program behavior. The keygenme binary that will be analyzed here is an example of this. It will be demonstrated that in this case, a viable approach to overcoming the challenge posed for analysis by some of the program’s rather opaque internal operations is to devise a method that will automatically generate inputs that solve the binary using the angr binary analysis toolkit.
Overview
The following will be discussed:
- how to repair a corrupted ELF header such that tools such as angr or gdb can be used to interface with the binary
- using Cutter to analyze program flow-of-control and understand program behavior
- how to automatically generate correct inputs to the program with angr using symbolic variables and files, file system emulation, program state space exploration, and constraint solving
Prerequisites
- a basic understanding of some of angr’s underlying principles and concepts, such as what symbolic variables and constraint solving are
- a basic understanding of the various components of angr that are used to load and analyze the binary e.g. program states and simulation managers
All of these are explained in the “Core Concepts” section of the angr documentation. Various example programs leveraging angr to solve challenge programs can be found in the documentation as well in the Examples section.
Summary
Rather than first sequentially walking through the steps taken to solve the challenge in fine detail and then presenting the results at the very end, a relatively concise overview will be provided first, with the details discussed afterward. The analysis may be easier to follow if a clear path to a solution is presented beforehand.
The binary:
- The challenge page can be found on crackmes.one.
- Download link:
- https://crackmes.one/static/crackme/5d7c66d833c5d46f00e2c45b.zip
- password to unzip the zip archive:
crackmes.one
Challenge parameters:
If you can take a problem and wrangle it into a form where it has defined and tractable inputs and outputs, you can absolutely use angr to achieve your goals, given that these goals involve analyzing binaries. [1]
This is exactly the case here.
A common form of crackme program is one in which a specific input to the program (such as a secret password or numeric value) solves the challenge, and the goal of the analysis is to discover what that input is. The classic IOLI challenges[3] are examples of this type of design. However, since the program we are dealing with here is a keygenme, rather than a single correct input, there is actually a large set of inputs that will solve the challenge. Therefore, the goal is slightly different - the aim is to understand the logic of the program such that we can devise a method of generating multiple inputs that are part of the set of solutions. In our case, the set of solutions consists of the intersection of the set of strings that are valid usernames with the set of strings that are valid passwords.
With the right constraints applied, angr will compute solutions automatically; our task is to interface angr with the binary such that the necessary constraints are discovered.
The binary is listed as a level 3 challenge (“medium”) on a scale of 1 to 6, where 6 is maximum difficulty (“insane”).
To get the G00d P422w0rd
message, a correct username and password must be passed as arguments on the command line to the program.
While the username is passed as a string as the first argument, the password must be in a file, and the path to the file passed as the second argument:
$ ./keygenme
Usage ./keygenme <username> <filepath>
Important: If a text editor is used to write a password to the input file,
or if echo PASSWORD_STRING > INPUT_FILE.txt
is used to pipe the password string into the file, it will usually be evaluated as incorrect even if it is
correct, or cause the program to segfault, due to a trailing 0x0a
byte being included after the string. It does not seem like the author of the crackme took
this into consideration, and this particular problem resulted in several hours being wasted trying to determine why valid passwords were being rejected as invalid.
To guarantee that the submitted password is correctly parsed by the program, the following method can be used:
$ python -c 'with open("INPUT_FILE.txt", "w") as f: f.write("PASSWORD_STRING")'
It should also be noted that the messages indicating success and failure - G00d P422w0rd
and B4d P422w0RD
- are
written to the file containing the password rather than to stdout.
In order to use angr to automatically generate valid usernames and passwords that solve the challenge, the following must be known:
- what form the inputs must take in order to produce the desired output:
- constraints on the length and format of the username
- constraints on the length and format of the password
- the location of the code that outputs the message indicating that the inputs were correct, as well as the format of this message
- the path that must be taken through the program logic to reach the target code that writes the desired output, as well as which parts of the program to avoid
- how to successfully interface angr with the binary, such that the input is symbolic rather than concrete. Since the password is in a file passed as an argument on the command line, this is somewhat complicated.
Cutter was used to perform the initial analysis. In particular, control-flow graphs of various functions were analyzed to understand how the program worked overall.
Here is a simple summary of the program’s flow of control. The red path leads to the function which outputs G00d P422w0rd
,
which indicates the inputs were correct, so this is the target code for angr to reach when exploring paths through the binary.
Solutions:
After determining the length and format requirements for usernames and passwords, as well as which path to take throught the program logic to arrive at the code responsible for indicating success, angr can be directed to explore the state space of the program by emulating it symbolically, discovering the conditions necessary to reach the target code along the way. This information can then used by a constraint solver to generate username and password strings such that if those strings are entered as input, the success message will be written as output (in this case to the password file, not stdout). Here is an example of what finding a solution with angr can look like:
$ ./autosolve_keygenme.py
[ 23:12:36.438534 ] Exploration started...
WARNING | 2020-01-15 23:14:59,613 | angr.state_plugins.symbolic_memory | The program is accessing memory or registers with an unspecified value. This could indicate unwanted behavior.
WARNING | 2020-01-15 23:14:59,613 | angr.state_plugins.symbolic_memory | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:
WARNING | 2020-01-15 23:14:59,613 | angr.state_plugins.symbolic_memory | 1) setting a value to the initial state
WARNING | 2020-01-15 23:14:59,613 | angr.state_plugins.symbolic_memory | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold null
WARNING | 2020-01-15 23:14:59,613 | angr.state_plugins.symbolic_memory | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY_REGISTERS}, to suppress these messages.
WARNING | 2020-01-15 23:14:59,613 | angr.state_plugins.symbolic_memory | Filling memory at 0x7ffffffffff0000 with 204 unconstrained bytes referenced from 0x1000220 (fopen+0x0 in extern-address space (0x220))
WARNING | 2020-01-15 23:15:10,161 | angr.state_plugins.symbolic_memory | Filling memory at 0xc0001040 with 240 unconstrained bytes referenced from 0x1000218 (__printf_chk+0x0 in extern-address space (0x218))
[ 23:17:30.706189 ] Finished...
[ 23:17:30.706266 ] Computing valid username...
[ 23:17:30.706540 ] Username: eGMu3oCgcS
[ 23:17:30.706579 ] Computing valid password...
[ 23:17:30.707143 ] Password: NZKIPPOPQSMSACP
[ 23:17:30.707182 ] Checking...
[ 23:17:30.789811 ] NZKIPPOPQSMSACP: 'G00d P422w0rd\n\x00'
After running for approximately 5 minutes (on not very good hardware) a valid username and password are found, resulting in the program outputting the message that indicates a solution was found.
Here is the program producing the above output:
#!/usr/bin/python3 | |
# crackme page: https://crackmes.one/crackme/5d7c66d833c5d46f00e2c45b | |
# download link: https://crackmes.one/static/crackme/5d7c66d833c5d46f00e2c45b.zip | |
# zip archive password: crackmes.one | |
import angr | |
import claripy | |
import subprocess | |
from datetime import datetime | |
def skip_banner(state): | |
pass | |
def emulate(): | |
proj = angr.Project("keygenme", | |
main_opts = {"base_addr": 0}, | |
auto_load_libs = False) | |
# skip code | |
proj.hook(0x00001166, skip_banner, length = 0x00001177 - 0x00001166) | |
# create symbolic variables to solve for | |
username = claripy.BVS("username", 10 * 8) | |
password_bytes = [claripy.BVS("byte_%d" % i, 8) for i in range(15)] | |
password_bytes_ast = claripy.Concat(*password_bytes) | |
password_file = angr.storage.file.SimFile("password_file", content = password_bytes_ast) | |
state = proj.factory.entry_state(args = [ "keygenme", username, "password_file"]) | |
state.fs.insert("password_file", password_file) | |
# add some constraints | |
for byte in password_bytes: | |
state.solver.add(byte >= 0x41) | |
state.solver.add(byte <= 0x5a) | |
# explore | |
sim_mgr = proj.factory.simulation_manager(state) | |
print("[ %s ] Exploration started..." % datetime.now().time()) | |
sim_mgr.explore(find = 0x00001560, # good password | |
avoid = [ 0x000012c2, # in main, argc != 3, exit | |
0x000012ff, # in main, bad username | |
0x00001b8a, # in 0x1a00, can't open file | |
0x00001b50, # in 0x1a00, empty file | |
0x00001b10, # in 0x1a00, wrong password length or invalid char | |
0x00001aef, # in 0x1a00, bad password | |
0x00001ba8, # in 0x1a00, __stack_check_fail | |
0x00001e45, # in 0x1bb0, avoid to reduce complexity | |
0x00001c85 ]) # in 0x1bb0, bad password | |
print("[ %s ] Finished..." % datetime.now().time()) | |
# compute 1 valid username | |
# for that username compute 1 valid password | |
if len(sim_mgr.found) > 0: | |
print("[ %s ] Computing valid username... " % datetime.now().time()) | |
found = sim_mgr.found[0] | |
valid_username = found.solver.eval(username, cast_to = bytes) # compute username | |
print("[ %s ] Username: %s" % (datetime.now().time(), valid_username.decode("ASCII"))) | |
print("[ %s ] Computing valid password... " % datetime.now().time()) | |
valid_password = found.solver.eval(password_bytes_ast, cast_to = bytes) # compute password | |
print("[ %s ] Password: %s" % (datetime.now().time(), valid_password.decode("ASCII"))) | |
print("[ %s ] Checking..." % datetime.now().time()) | |
with open("key.txt", "w") as f: | |
f.write(valid_password.decode("ASCII")) # write generated password to input file | |
_ = subprocess.run(["./keygenme", valid_username, "key.txt"], stdout = subprocess.PIPE) # execute crackme, discard output | |
with open("key.txt", "r") as f: # check message written to file | |
message = f.read() | |
print("[ %s ] %s: %s" % (datetime.now().time(), valid_password.decode("ASCII"), repr(message))) | |
else: | |
print("No solution found") | |
if __name__ == "__main__": | |
emulate() |
The purpose of this script is to demonstrate the viability of the approach taken here to solving the challenge and represents a realization of the entire sequence of steps required to automatically find solutions with angr.
An explanation of the various components of this script is given in the “Analysis” section. I will state here though that the most conceptually difficult part of the challenge was determining how to use angr to read a symbolic variable representing the password from inside a symbolic file. This is handled in lines 29 - 35 in the script. The trick is to pass the name of the symbolic file as an argument when creating the initial program state instead of the pathname of the real file, and then insert this symbolic file into the emulated filesystem prior to emulation. When angr emulates the program, the symbolic file will be opened instead of the real password file. The symbolic variable representing the password will then be read and eventually have solutions computed for it.
The file containing the password is called “key.txt”, as stated in the code comments. The G00d P422w0rd
and B4d P422w0RD
messages are written to this same file,
so this file is checked after the username and password are computed and the keygenme is run with these as inputs to verify that G00d P422w0rd
has been written.
On rare occasions, a solution will not be found.
If this is the case, the script can simply be run again; angr computes a different username and password each time.
The script can be modified to compute more than just one username and password if so desired. A password length of 15 characters was arbitrarily chosen - any length from 4 through 49 should work. Computing shorter passwords is faster and consumes less RAM.
Here is another example solution, in which 10 valid 8-character passwords are computed for a known good username:
$ ./find_valid_passwords.py
[ 11:37:54.515668 ] Exploration started...
WARNING | 2020-01-17 11:37:55,618 | angr.state_plugins.symbolic_memory | The program is accessing memory or registers with an unspecified value. This could indicate unwanted behavior.
WARNING | 2020-01-17 11:37:55,618 | angr.state_plugins.symbolic_memory | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:
WARNING | 2020-01-17 11:37:55,618 | angr.state_plugins.symbolic_memory | 1) setting a value to the initial state
WARNING | 2020-01-17 11:37:55,618 | angr.state_plugins.symbolic_memory | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold null
WARNING | 2020-01-17 11:37:55,618 | angr.state_plugins.symbolic_memory | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY_REGISTERS}, to suppress these messages.
WARNING | 2020-01-17 11:37:55,618 | angr.state_plugins.symbolic_memory | Filling memory at 0x7ffffffffff0000 with 204 unconstrained bytes referenced from 0x1000220 (fopen+0x0 in extern-address space (0x220))
WARNING | 2020-01-17 11:38:00,187 | angr.state_plugins.symbolic_memory | Filling memory at 0xc0001039 with 247 unconstrained bytes referenced from 0x1000218 (__printf_chk+0x0 in extern-address space (0x218))
[ 11:42:11.144212 ] Finished...
[ 11:42:11.144274 ] Computing valid passwords...
[ 11:43:11.670569 ] Finished. Checking passwords for username 24T5JFN9fU:
[ + ] AZWPEPEM: 'G00d P422w0rd\n\x00'
[ + ] AZWPEPMY: 'G00d P422w0rd\n\x00'
[ + ] AZWPEPMU: 'G00d P422w0rd\n\x00'
[ + ] AZWPEPEU: 'G00d P422w0rd\n\x00'
[ + ] AZWPEPEA: 'G00d P422w0rd\n\x00'
[ + ] CZCVWPCH: 'G00d P422w0rd\n\x00'
[ + ] AZWPEPMI: 'G00d P422w0rd\n\x00'
[ + ] EFGPALED: 'G00d P422w0rd\n\x00'
[ + ] OLCTKRUD: 'G00d P422w0rd\n\x00'
[ + ] AZWPEPEE: 'G00d P422w0rd\n\x00'
[ 11:43:12.312680 ] Finished.
The code that produced this output will be discussed below in the “Analysis” section.
The summary concludes here.
Analysis
Contents:
- Dealing with the corrupt ELF header
- Recovering the correct values
- Zeroing out the corrupted fields
- Analyzing the program with Cutter
- Using angr to automatically generate valid usernames based on the information in main()
- Using angr to automatically generate valid passwords for a given valid username
- Examples of the keygenme mishandling input
1) Dealing with the corrupted header
The binary is a Position Independent Executable (PIE). The crackme author has corrupted some of the fields having to do with sections:
$ readelf -h keygenme
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1320
Start of program headers: 64 (bytes into file)
Start of section headers: 65535 (bytes into file) <----------- e_shoff
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 11
Size of section headers: 64 (bytes)
Number of section headers: 65535 <----------- e_shnum
Section header string table index: 65535 <corrupt: out of range> <----------- e_shstrndx
readelf: Error: Reading 4194240 bytes extends past end of file for section headers
readelf: Error: Reading 14312 bytes extends past end of file for dynamic string table
The corruption seen here is identical to the corruption induced by Julien Voisin’s elfscrewer tool. This is significant because this technique does not involve stripping section information from the binary, meaning if the ELF header fields are changed back to the correct values, section information can be used again. Tools such as gdb rely on intact section information in order to parse and load a binary.
CLE, the loader used by angr, is not able to load the binary due the invalid values in these fields:
.
< backtrace snipped >
.
File "/usr/local/lib/python3.6/dist-packages/elftools/elf/elffile.py", line 81, in __init__
self._file_stringtable_section = self._get_file_stringtable()
File "/usr/local/lib/python3.6/dist-packages/elftools/elf/elffile.py", line 573, in _get_file_stringtable
header=self._get_section_header(stringtable_section_num),
File "/usr/local/lib/python3.6/dist-packages/elftools/elf/elffile.py", line 468, in _get_section_header
stream_pos=self._section_offset(n))
File "/usr/local/lib/python3.6/dist-packages/elftools/common/utils.py", line 42, in struct_parse
raise ELFParseError(str(e))
elftools.common.exceptions.ELFParseError: expected 4, found 0
There are 2 approaches that can be taken to addressing this:
- Recovering the correct values
- Zeroing out the corrupted fields
Recovering the correct values
The section header table is appended to the end of the file after the very last section, which is .shstrtab
- the section header string table:
00002fe0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00003000 00 00 00 00 00 00 00 00 08 40 00 00 00 00 00 00 |.........@......|
00003010 47 43 43 3a 20 28 55 62 75 6e 74 75 20 38 2e 33 |GCC: (Ubuntu 8.3|
00003020 2e 30 2d 36 75 62 75 6e 74 75 31 29 20 38 2e 33 |.0-6ubuntu1) 8.3| <-----\
00003030 2e 30 00 00 2e 73 68 73 74 72 74 61 62 00 2e 69 |.0...shstrtab..i| |
00003040 6e 74 65 72 70 00 2e 6e 6f 74 65 2e 67 6e 75 2e |nterp..note.gnu.| |
00003050 62 75 69 6c 64 2d 69 64 00 2e 6e 6f 74 65 2e 41 |build-id..note.A| |
00003060 42 49 2d 74 61 67 00 2e 67 6e 75 2e 68 61 73 68 |BI-tag..gnu.hash| |
00003070 00 2e 64 79 6e 73 79 6d 00 2e 64 79 6e 73 74 72 |..dynsym..dynstr| |
00003080 00 2e 67 6e 75 2e 76 65 72 73 69 6f 6e 00 2e 67 |..gnu.version..g| \
00003090 6e 75 2e 76 65 72 73 69 6f 6e 5f 72 00 2e 72 65 |nu.version_r..re| .shstrtab
000030a0 6c 61 2e 64 79 6e 00 2e 72 65 6c 61 2e 70 6c 74 |la.dyn..rela.plt| /
000030b0 00 2e 69 6e 69 74 00 2e 70 6c 74 2e 67 6f 74 00 |..init..plt.got.| |
000030c0 2e 74 65 78 74 00 2e 66 69 6e 69 00 2e 72 6f 64 |.text..fini..rod| |
000030d0 61 74 61 00 2e 65 68 5f 66 72 61 6d 65 5f 68 64 |ata..eh_frame_hd| |
000030e0 72 00 2e 65 68 5f 66 72 61 6d 65 00 2e 69 6e 69 |r..eh_frame..ini| |
000030f0 74 5f 61 72 72 61 79 00 2e 66 69 6e 69 5f 61 72 |t_array..fini_ar| |
00003100 72 61 79 00 2e 64 79 6e 61 6d 69 63 00 2e 64 61 |ray..dynamic..da| |
00003110 74 61 00 2e 62 73 73 00 2e 63 6f 6d 6d 65 6e 74 |ta..bss..comment| <----/
00003120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| <------------------- section header table
* starts at offset
00003160 00 00 00 00 00 00 00 00 0b 00 00 00 01 00 00 00 |................| <---------\ 0x00003128
00003170 02 00 00 00 00 00 00 00 a8 02 00 00 00 00 00 00 |................| \-------- the space between 0x0b
00003180 a8 02 00 00 00 00 00 00 1c 00 00 00 00 00 00 00 |................| and the beginning of the
00003190 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................| section header table is
000031a0 00 00 00 00 00 00 00 00 13 00 00 00 07 00 00 00 |................| 0x40 (64) bytes
000031b0 02 00 00 00 00 00 00 00 c4 02 00 00 00 00 00 00 |................|
A tool called lepton[3] can be used to repair the ELF header:
#!/usr/bin/python3 | |
from lepton import * | |
from struct import pack | |
def main(): | |
with open("keygenme", "rb") as f: | |
elf_file = ELFFile(f) | |
elf_file.ELF_header.fields["e_shoff"] = pack("Q", 0x00003128) | |
elf_file.ELF_header.fields["e_shnum"] = pack("H", 27) | |
elf_file.ELF_header.fields["e_shstrndx"] = pack("<H", 26) | |
# output to file | |
binary = elf_file.ELF_header.to_bytes() + elf_file.file_buffer[64:] | |
with open("fixed_keygenme", "wb") as f: | |
f.write(binary) | |
if __name__ == "__main__": | |
main() | |
ELF header after repair:
$ readelf -h fixed_keygenme
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1320
Start of program headers: 64 (bytes into file)
Start of section headers: 12584 (bytes into file) <----- e_shoff
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 11
Size of section headers: 64 (bytes)
Number of section headers: 27 <----- e_shnum
Section header string table index: 26 <----- e_shstrndx
Section information can now be displayed:
$ readelf -SW fixed_keygenme
There are 27 section headers, starting at offset 0x3128:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 00000000000002a8 0002a8 00001c 00 A 0 0 1
[ 2] .note.gnu.build-id NOTE 00000000000002c4 0002c4 000024 00 A 0 0 4
[ 3] .note.ABI-tag NOTE 00000000000002e8 0002e8 000020 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000308 000308 000030 00 A 5 0 8
[ 5] .dynsym DYNSYM 0000000000000338 000338 000240 18 A 6 1 8
[ 6] .dynstr STRTAB 0000000000000578 000578 00012d 00 A 0 0 1
[ 7] .gnu.version VERSYM 00000000000006a6 0006a6 000030 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 00000000000006d8 0006d8 000050 00 A 6 1 8
[ 9] .rela.dyn RELA 0000000000000728 000728 0000f0 18 A 5 0 8
[10] .rela.plt RELA 0000000000000818 000818 000180 18 AI 5 22 8
[11] .init PROGBITS 0000000000001000 001000 000017 00 AX 0 0 4
[12] .plt PROGBITS 0000000000001020 001020 000110 10 AX 0 0 16
[13] .plt.got PROGBITS 0000000000001130 001130 000008 08 AX 0 0 8
[14] .text PROGBITS 0000000000001140 001140 000db1 00 AX 0 0 16
[15] .fini PROGBITS 0000000000001ef4 001ef4 000009 00 AX 0 0 4
[16] .rodata PROGBITS 0000000000002000 002000 000310 00 A 0 0 16
[17] .eh_frame_hdr PROGBITS 0000000000002310 002310 00007c 00 A 0 0 4
[18] .eh_frame PROGBITS 0000000000002390 002390 000268 00 A 0 0 8
[19] .init_array INIT_ARRAY 0000000000003d40 002d40 000008 08 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000000003d48 002d48 000008 08 WA 0 0 8
[21] .dynamic DYNAMIC 0000000000003d50 002d50 0001f0 10 WA 6 0 8
[22] .got PROGBITS 0000000000003f40 002f40 0000c0 08 WA 0 0 8
[23] .data PROGBITS 0000000000004000 003000 000010 00 WA 0 0 8
[24] .bss NOBITS 0000000000004020 003010 000030 00 WA 0 0 32
[25] .comment PROGBITS 0000000000000000 003010 000023 01 MS 0 0 1
[26] .shstrtab STRTAB 0000000000000000 003033 0000ee 00 0 0 1
Since the section information is now present and correct, gdb can be used to debug the binary. This approach is not taken here, because relying on section information is for babies and using gdb to analyze this program is very inefficient compared with using Cutter, which is vastly more powerful and its integrated debugger does not need section information to load the binary.
Zeroing out the affected fields
angr can load the binary without any problems after the corrupted fields are zeroed out. Patching the header can be done
with lepton
, but this snippet also does the job:
with open("keygenme", "rb+") as f:
f.seek(0x28)
f.write(b'\x00\x00')
f.seek(0x3a)
f.write(b'\x00\x00\x00\x00\x00\x00')
After patch:
$ readelf -h keygenme
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1320
Start of program headers: 64 (bytes into file)
Start of section headers: 0 (bytes into file) <--------- e_shoff
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 11
Size of section headers: 0 (bytes)
Number of section headers: 0 <--------- e_shnum
Section header string table index: 0 <--------- e_shstrndx
readelf: Error: Reading 14312 bytes extends past end of file for dynamic string table <--- irrelevant
2. Analyzing the program with Cutter
Though the decompiler from Ghidra has been integrated with Cutter, it was not used in the analysis, since I did not find the decompiled code to be particularly helpful. Some decompiled code will be included here to show why this was the case. I found that reading control-flow graphs of disassembled functions was clearer and more suitable for the purpose of using angr to compute solutions.
main()
Here the bytes of the username are iterated over and checked (basic blocks 0x000011c7
- 0x00001255
).
- 10 bytes are checked in total, which tells us the expected length of the username.
- If the comparisons all succeed,
fcn.0x00001a00
is called at offset0x00001277
, which reads the content of the password file.
This is already enough information to use angr to compute valid usernames, since we now know the length of the username, as well as the path that is taken
through main
to fcn.00001a00
if the username is valid.
- target address:
0x00001264
- this offset is for an instruction that is executed after all the checks of the bytes in the username have succeeded
- avoid:
0x000012c2
- argc != 30x000012ff
- bad username
- skip:
- instructions in range
0x00001166
to0x00001177
. The code in this range calls a function which prints some statements by the crackme author. It has no bearing on the rest of the program, so it can be skipped to save time
- instructions in range
The comments in the script below explain the code.
#!/usr/bin/python3 | |
import angr | |
import claripy | |
from datetime import datetime | |
# called in the hook to skip instructions | |
def skip_banner(state): | |
pass | |
def find_valid_usernames(): | |
# since the crackme binary is a Position Independent Executable, setting the base address to 0 | |
# saves us the trouble of having to calculate offsets of any kind | |
# | |
proj = angr.Project("keygenme", main_opts = {"base_addr": 0}, auto_load_libs = False) | |
# skip code that has no bearing on the rest of the program | |
proj.hook(0x00001166, skip_banner, length = 0x00001177 - 0x00001166) | |
# symbolic variable (bitvector) compute values for username | |
# 10 characters in length * 8 bits per byte | |
# | |
# https://docs.angr.io/core-concepts/solver | |
# https://github.com/angr/angr-doc/blob/master/examples/whitehat_crypto400/solve.py | |
username = claripy.BVS("username", 10 * 8) | |
# the arg3 string is just a placeholder here, since exploration ends before the code | |
# opening the password file is reached | |
# | |
# the username is checked in the main() function, so the simulation manager does not | |
# have to explore very deep into the program. The target address is located in main(), | |
# after the checks of the bytes the username is composed of | |
# | |
state = proj.factory.entry_state(args = ["keygenme", username, "arg3"]) | |
sim_mgr = proj.factory.simulation_manager(state) | |
print("[ %s ] Exploration started..." % datetime.now().time()) | |
sim_mgr.explore(find = 0x00001264, avoid=[0x000012c2, 0x000012ff]) | |
print("[ %s ] Finished..." % datetime.now().time()) | |
# when a path to the target address is found, compute usernames | |
# using the constraint solver, then write them to a file | |
if len(sim_mgr.found) > 0: | |
print("[ %s ] Computing valid usernames... " % datetime.now().time()) | |
found = sim_mgr.found[0] | |
valid_usernames = found.solver.eval_upto(username, 100, cast_to = bytes) # <--------- | |
print("[ %s ] Complete. Writing to file..." % datetime.now().time()) | |
with open("100_usernames.txt", "w") as f: | |
for name in valid_usernames: | |
print(name.decode("ASCII")) | |
f.write(name.decode("ASCII") + "\n") | |
else: | |
print("No valid usernames found.") # should never happen | |
if __name__ == "__main__": | |
find_valid_usernames() |
Output:
$ ./find_valid_usernames.py
[ 14:05:21.280678 ] Exploration started...
[ 14:07:51.566523 ] Finished...
[ 14:07:51.566622 ] Computing valid usernames...
[ 14:07:59.849391 ] Complete. Writing to file...
AyAAS9rjDV
Z36HKLfBA3
Z36HiLfBA3
GhMmh2DnQq
GhMmh2DnQX
.
< output of 95 more usernames snipped>
.
fcn.00001a00
This function checks if the password file can be read, and if so reads the content of the file and performs checks of the length and byte values.
The string must be shorter than 50 characters (check performed at offset 0x00001a9b
) and must not contain special characters such as {}, [], (), -, etc.
Looking at the CFG of this function makes it straightforward to identify what branches in flow-of-control to avoid, which is very useful to know if the plan is to use angr to explore the binary via symbolic execution. In a nutshell, we want to avoid basic blocks which perform the following operations:
- write 0 to
[r12]
,[r12 + 0x18]
and[r12 + 8]
- call
exit
,fclose
,free
, or__stack_check_fail
If all is well, this function returns to the main function.
fcn.0001bb0
Some of the operations performed in here were rather hard to follow, especially the code in very large basic block at 0x00001ce8
.
The decompiled code for this function also was not particularly helpful:
void fcn.00001bb0(int64_t arg1, int64_t arg2)
{
uint8_t *puVar1;
uint8_t *puVar2;
uint8_t uVar3;
uint32_t uVar4;
uint64_t uVar5;
uint64_t uVar6;
uint64_t uVar7;
uint32_t uVar8;
uint8_t uVar9;
uint32_t uVar10;
uint8_t uVar11;
uint8_t uVar12;
uint8_t uVar13;
uint8_t uVar14;
uint32_t uVar15;
uint8_t uVar16;
int64_t iVar17;
uint8_t uVar18;
int64_t iVar19;
uint8_t uVar20;
uint64_t uVar21;
int64_t in_XMM0_Qa;
undefined8 in_XMM1_Qa;
undefined8 in_XMM2_Qa;
undefined8 in_XMM3_Qa;
undefined8 in_XMM4_Qa;
undefined8 in_XMM5_Qa;
undefined8 in_XMM6_Qa;
undefined8 in_XMM7_Qa;
uint64_t uStack104;
uVar21 = *(uint64_t *)(arg1 + 8);
uVar7 = *(uint64_t *)(arg2 + 8);
if (uVar21 == uVar7) {
puVar1 = *(uint8_t **)arg2;
puVar2 = *(uint8_t **)arg1;
if (uVar21 == 0) {
code_r0x00001e88:
// WARNING: Subroutine does not return
fcn.00001830(in_XMM0_Qa);
}
if (*puVar1 == *puVar2) {
uVar7 = 0;
do {
uVar7 = (uint64_t)((int32_t)uVar7 + 1);
if (uVar21 <= uVar7) {
if (uVar21 == uVar7) goto code_r0x00001e88;
break;
}
} while (puVar2[uVar7] == puVar1[uVar7]);
}
uStack104 = 0;
iVar17 = 0;
uVar21 = 1;
do {
uVar3 = puVar2[iVar17];
uVar11 = (*puVar1 ^ uVar3) & 0xf;
uVar12 = (puVar1[1] ^ uVar3 ^ 1) & 0xf;
uVar20 = (puVar1[2] ^ uVar3 ^ 2) & 0xf;
uVar18 = (puVar1[3] ^ uVar3 ^ 3) & 0xf;
uVar16 = (puVar1[4] ^ uVar3 ^ 4) & 0xf;
uVar14 = (puVar1[5] ^ uVar3 ^ 5) & 0xf;
uVar15 = (int32_t)(char)uVar3 + (uint32_t)iVar17 ^ (uint32_t)iVar17;
uVar13 = (puVar1[6] ^ uVar3 ^ 6) & 0xf;
uVar9 = (puVar1[7] ^ uVar3 ^ 7) & 0xf;
uVar3 = (uVar3 ^ puVar1[8] ^ 8) & 0xf;
iVar17 = iVar17 + 1;
uStack104 = uStack104 +
(int64_t)(char)uVar9 +
(int64_t)(char)uVar14 +
(int64_t)(char)uVar18 + (int64_t)(char)uVar20 + (int64_t)(char)uVar12 + (int64_t)(char)uVar11 +
(int64_t)(char)uVar16 + (int64_t)(char)uVar13 + (int64_t)(char)uVar3;
uVar21 = uVar21 * (uint64_t)((int32_t)(char)uVar12 | uVar15) * (uint64_t)((int32_t)(char)uVar20 | uVar15) *
(uint64_t)((int32_t)(char)uVar11 | uVar15) * (uint64_t)((int32_t)(char)uVar18 | uVar15) *
(uint64_t)((int32_t)(char)uVar16 | uVar15) * (uint64_t)((int32_t)(char)uVar14 | uVar15) *
(uint64_t)((int32_t)(char)uVar13 | uVar15) * (uint64_t)((int32_t)(char)uVar9 | uVar15) *
(uint64_t)((int32_t)(char)uVar3 | uVar15);
} while (iVar17 != 10);
if (((uVar21 % uStack104) * 5 & 0xf) == 0) {
code_r0x00001e3d:
// WARNING: Subroutine does not return
fcn.00001560(in_XMM0_Qa, in_XMM1_Qa, in_XMM2_Qa, in_XMM3_Qa, in_XMM4_Qa, in_XMM5_Qa, in_XMM6_Qa, in_XMM7_Qa
, (char **)arg2);
}
} else {
if (uVar7 != 0) {
uVar6 = 0;
iVar19 = 1;
iVar17 = 0;
do {
if (uVar21 != 0) {
uVar5 = 0;
uVar15 = (int32_t)*(char *)(*(int64_t *)arg2 + uVar6);
do {
uVar8 = (uint32_t)uVar5;
uVar4 = (int32_t)*(char *)(*(int64_t *)arg1 + uVar5) +
((int32_t)*(char *)(*(int64_t *)arg2 + uVar6) ^ uVar8) & 0xf;
iVar17 = iVar17 + (uint64_t)(uVar4 * 2);
uVar10 = uVar8 ^ (uint32_t)uVar6 | uVar15;
uVar15 = uVar15 + 1;
iVar19 = iVar19 * (uint64_t)(uVar4 | uVar10);
uVar5 = (uint64_t)(uVar8 + 1);
} while (uVar5 < uVar21);
}
uVar6 = (uint64_t)((uint32_t)uVar6 + 1);
} while (uVar6 < uVar7);
uVar21 = (uint64_t)(iVar19 - iVar17) % (iVar17 + iVar19);
if (((int32_t)(9 % (uint64_t)((uint32_t)uVar21 & 0xf)) == 0) && ((uVar21 & 0xf) != 0))
goto code_r0x00001e3d;
}
}
// WARNING: Subroutine does not return
fcn.000016d0(in_XMM0_Qa, in_XMM1_Qa, in_XMM2_Qa, in_XMM3_Qa, in_XMM4_Qa, in_XMM5_Qa, in_XMM6_Qa, in_XMM7_Qa,
(char **)arg2);
}
Fortunately, it is not necessary to understand everything that is happening here. In fact, there is no need to analyze this complicated code at all -
we only need to find the path to the code that prints G00d P422w0rd
.
What we do know:
- This function is called from
main
afterfcn.00001a00
returns. - As indicated in the decompilation, this function does not return. This is clear from not just from the CFG of
main
but also from its own CFG displayed above - Rather than return, 1 of 3 functions is called:
fcn.00001830
fcn.00001560
fcn.000016d0
- Looking at the CFG of the current function, we can see that
fcn.00001560
andfcn.000016d0
take a pointer to a file name as an argument butfcn.00001830
does not. This eliminatesfcn.00001830
from consideration as a function containing the code that writesG00d P422w0rd
to the password file.
fcn.000016d0
After looking at fcn.00001560
and fcn.000016d0
using Cutter, we see that fwrite
is called, as well as the string [+] Check your file
.
This indicates that one of these two functions must write G00d P422w0rd
to the password file, but we don’t yet know which one.
After we choose one of the username strings that was computed by the angr script earlier - Z36HiLfBA3
, for example - and create a file called
“key.txt” containing test input in the form of the string “AAAA” to use as our password file, we can fire up Cutter’s debugger.
When asked for command line arguments, we will enter Z36HiLfBA3
and key.txt
like so:
Then we will set 2 breakpoints: one where fwrite
is called in fcn.00001560
and one where fwrite
is called in fcn.000016d0
. A pointer to
the output that be written to the password file will be contained in the RDI
register once either breakpoint is hit:
The string pointed to in this case is B4d P422w0rd
; therefore, the target code printing G00d P422w0rd
lies in the other function, fcn.00001560
.
Now we have all the information we need in order to use angr to compute usernames and passwords in a single program, thereby automatically generating solutions
for us. Example code accomplishing this was provided in the “Summary” section above, in the autosolve_keygenme.py
script.
3. Using angr to generate valid passwords for a given valid username
Let’s see what some passwords for the username Z36HiLfBA3
look like:
$ ./find_valid_passwords.py
[ 18:31:11.987510 ] Exploration started...
WARNING | 2020-01-17 18:31:13,136 | angr.state_plugins.symbolic_memory | The program is accessing memory or registers with an unspecified value. This could indicate unwanted behavior.
WARNING | 2020-01-17 18:31:13,136 | angr.state_plugins.symbolic_memory | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:
WARNING | 2020-01-17 18:31:13,136 | angr.state_plugins.symbolic_memory | 1) setting a value to the initial state
WARNING | 2020-01-17 18:31:13,136 | angr.state_plugins.symbolic_memory | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold null
WARNING | 2020-01-17 18:31:13,136 | angr.state_plugins.symbolic_memory | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY_REGISTERS}, to suppress these messages.
WARNING | 2020-01-17 18:31:13,137 | angr.state_plugins.symbolic_memory | Filling memory at 0x7ffffffffff0000 with 204 unconstrained bytes referenced from 0x1000220 (fopen+0x0 in extern-address space (0x220))
WARNING | 2020-01-17 18:31:17,668 | angr.state_plugins.symbolic_memory | Filling memory at 0xc0001039 with 247 unconstrained bytes referenced from 0x1000218 (__printf_chk+0x0 in extern-address space (0x218))
[ 18:35:10.435142 ] Finished...
[ 18:35:10.435242 ] Computing valid passwords...
[ 18:36:06.310322 ] Finished. Checking passwords for username Z36HiLfBA3:
[ + ] QPUBKPAM: 'G00d P422w0rd\n\x00'
[ + ] KRWFOBEM: 'G00d P422w0rd\n\x00'
[ + ] KRWFOBEA: 'G00d P422w0rd\n\x00'
[ + ] YBYLYVWH: 'G00d P422w0rd\n\x00'
[ + ] KRWFOBEH: 'G00d P422w0rd\n\x00'
[ + ] KRWFOBEI: 'G00d P422w0rd\n\x00'
[ + ] APUBKPQE: 'G00d P422w0rd\n\x00'
[ + ] APUBKPAE: 'G00d P422w0rd\n\x00'
[ + ] QPUBKPAE: 'G00d P422w0rd\n\x00'
[ + ] KRWFOBEL: 'G00d P422w0rd\n\x00'
[ 18:36:07.466093 ] Finished.
We can manually verify that the results are correct:
Here is the code:
#!/usr/bin/python3 | |
import angr | |
import claripy | |
import subprocess | |
from datetime import datetime | |
# argument to hook | |
def skip_banner(state): | |
pass | |
# since the crackme binary is a Position Independent Executable, setting the base address to 0 | |
# saves us the trouble of having to calculate offsets of any kind | |
# | |
def find_valid_passwords(): | |
proj = angr.Project("keygenme", | |
main_opts = {"base_addr": 0}, | |
auto_load_libs = False) | |
# skip meaningless code | |
proj.hook(0x00001166, skip_banner, length = 0x00001177 - 0x00001166) | |
# generate valid passwords for a given known good username | |
#username = "6WhOlwIq7K" | |
#username = "24T5JFN9fU" | |
username = "Z36HiLfBA3" | |
# figuring out how to do this was the most difficult aspect of solving the challenge | |
# | |
# - create a symbolic variable representing the password, then create a SimFile containing this variable | |
# - load and read this SimFile during emulation instead of a real file | |
# | |
# A password length of 4 is quite short and up to 49 characters is acceptable; however, the longer the AST representing the password, | |
# the greater the RAM consumed by the solver when computing passwords | |
# | |
# the following examples were helpful for this task: | |
# | |
# https://docs.angr.io/advanced-topics/file_system - creating the symbolic variable representing the password | |
# https://gist.github.com/inaz2/c812671841f97804c24ba6650b1b2500 - handling command line arguments | |
# https://github.com/angr/angr-doc/blob/master/examples/asisctffinals2015_license/solve.py - creating and loading the SimFile | |
# | |
password_bytes = [claripy.BVS("byte_%d" % i, 8) for i in range(8)] | |
password_bytes_ast = claripy.Concat(*password_bytes) | |
password_file = angr.storage.file.SimFile("password_file", content = password_bytes_ast) | |
state = proj.factory.entry_state(args = [ "keygenme", username, "password_file"]) | |
state.fs.insert("password_file", password_file) | |
# constraints restrict possible characters to range A - Z. Done to reduce complexity | |
# https://docs.angr.io/core-concepts/solver | |
# | |
for byte in password_bytes: | |
state.solver.add(byte >= 0x41) | |
state.solver.add(byte <= 0x5a) | |
# the goal is to find a path to the function that writes the "good password" message | |
# and avoid all other possible paths | |
# | |
# https://docs.angr.io/core-concepts/pathgroups | |
# | |
sim_mgr = proj.factory.simulation_manager(state) | |
print("[ %s ] Exploration started..." % datetime.now().time()) | |
sim_mgr.explore(find = 0x00001560, # good password | |
avoid = [ 0x000012c2, # in main, argc != 3, exit | |
0x000012ff, # in main, bad username | |
0x00001b8a, # in 0x1a00, can't open file | |
0x00001b50, # in 0x1a00, empty file | |
0x00001b10, # in 0x1a00, wrong password length or invalid char | |
0x00001aef, # in 0x1a00, bad password | |
0x00001ba8, # in 0x1a00, __stack_check_fail | |
0x00001e45, # in 0x1bb0, avoid to reduce complexity | |
0x00001c85 ]) # in 0x1bb0, bad password | |
print("[ %s ] Finished..." % datetime.now().time()) | |
# if path to 0x00001560 is found, use solver to generate valid passwords | |
# https://docs.angr.io/core-concepts/solver | |
# | |
if len(sim_mgr.found) > 0: | |
print("[ %s ] Computing valid passwords... " % datetime.now().time()) | |
found = sim_mgr.found[0] | |
passwords = found.solver.eval_upto(password_bytes_ast, 10, cast_to = bytes) | |
# verify that the generated passwords are correct. | |
# | |
# https://github.com/angr/angr-doc/blob/764f9b37003052b449d255a6a880ae8111ebcd06/examples/whitehat_crypto400/solve.py#L87 | |
# was helpful for this | |
# | |
valid_passwords = [] | |
print("[ %s ] Finished. Checking passwords for username %s:" % (datetime.now().time(), username)) | |
for password in passwords: | |
with open("key.txt", "w") as f: f.write(password.decode("ASCII")) # write generated password to input file | |
_ = subprocess.run(["./keygenme", username, "key.txt"], stdout = subprocess.PIPE) # execute crackme, discard output | |
with open("key.txt", "r") as f: # check message written to file | |
message = f.read() | |
print("[ + ] %s: %s" % (password.decode("ASCII"), repr(message))) | |
if 'G00d' in message: | |
valid_passwords.append(password) | |
# if valid password found, write username and password(s) to file | |
if len(valid_passwords) > 0: | |
with open("valid_passwords.txt", "a") as f: | |
f.write("Username: %s\n" % username) | |
for password in valid_passwords: | |
f.write(password.decode("ASCII") + "\n") | |
print("[ %s ] Finished." % datetime.now().time()) | |
else: | |
print("No valid password found") # this should never happen | |
if __name__ == "__main__": | |
find_valid_passwords() |
4. Examples of the keygenme mishandling input
The program can crash due to a segmentation fault if the password is piped into the file with the echo
command. The username and password are the same as above:
Another example - instead of segfaulting, the program outputs B4d P422w0rd
first, then G00d P422w0rd
next:
Dealing with this problem before the cause was found was rather frustrating. This behavior was observed with many different correct username and password combinations.
Conclusion
angr and Cutter are outstanding tools for reverse engineering and binary analysis. angr in particular is incredibly powerful, allowing us to automatically solve this keygenme with just a few lines of code and approximately 5 minutes of running time, in spite of how complex some portions of the keygenme’s code was. We could essentially skip a great deal of the analysis that would have been required to solve the keygenme if we were to rely on a tool like gdb.