A simple but often effective method for complicating or preventing analysis of an ELF binary by many common tools (gdb
, readelf
, pyelftools
, etc)
is mangling, damaging or otherwise manipulating values in the ELF header such that the tool parsing the header does so incorrectly, perhaps
even causing the tool to fail or crash. Common techniques include overlapping the ELF header with the program header table and writing
non-standard values to ELF header fields that are not needed for composing the process image of the binary in memory. In addition to some programs designed for criminal
purposes (e.g. the “mumblehard” family of malware programs), a few code-golf- and proof-of-concept-type programs have been created that employ these techniques.
Examples of such programs include
Brian Raiter’s “teensy” files and @netspooky’s “golfclub” programs. In this post, it will be demonstrated how emulation can be used to trace the execution
of these types of binaries.
The following will be discussed:
- how header mangling works as an anti-analysis technique
- how to use the Unicorn Engine to analyze minimalist binaries with malformed headers
- Capstone Engine
- Unicorn Engine
- Python3
Malformed ELF Headers
This technique has already been covered in depth elsewhere[1][2][3][4][5][6], so the discussion
here will be brief. The main reason mangling the ELF header works to complicate analysis is that even though only a specific subset of the fields in the ELF header are
read by the kernel when loading the program into memory, most ELF parsers do not parse the ELF header the way the kernel loader does
and thus are prone to malfunction when reading unexpected or garbage values in these extraneous (from the perspective of loading) fields. The most
typical examples of this are gdb
, objdump
and the rest of the libbfd
-based binutils tools, which will not even read an object file unless its section
information is present and intact.
The specially-crafted minimalist ELF programs - those that push the limits of the least number of bytes a file can consist of and still execute successfully - take advantage of the fact that not all ELF header fields are needed for loading and executing the program and can therefore be used to contain code or other non-standard values; as a case in point, the entry points of these programs often lie inside their ELF header. On the one hand, even though complicating analysis is not an explicit goal of their design, these programs serve to highlight the limitations of many common tools designed to work with the ELF format. On the other hand, since these minimalist binaries typically contain such little code, using fully-featured debuggers and other tools of this class for analysis would actually be overkill; one may have a good laugh about how NSA’s Ghidra cannot properly load their tiny ELF file, but attempting use such a tool to analyze an extremely minimalist binary is akin to trying to shoot down a fruitfly with a railgun - heavyweight frameworks packaged with disassemblers plus debuggers and/or decompilers are unsuitable and unecessary for analyzing the runtime behavior of executables literally 45 or 62 bytes in size. If there are 10 bytes of code in a program, does it make sense to try to load it into a decompiler? Probably not. A simple script emulating the execution of these programs may be a more appropriate approach.
muppetlabs’ tiny-i386: 45 bytes total, 7 bytes of code
This is the “Tiny” program from A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. This program, as well as the rest of the “Teensy” ELF files can be downloaded from the muppetlabs site.
The approach taken here to analyzing this file is as follows:
- attempting analysis with
- looking at the source code
- emulation
Using Standard Tools to Parse the file
It should be noted at the outset that the binary can be loaded and executed without any problems:
$ strace ./tiny-i386
execve("./tiny-i386", ["./tiny-i386"], 0x7ffc2a421f60 /* 52 vars */) = 0
strace: [ Process PID=30049 runs in 32 bit mode. ]
exit(42) = ?
+++ exited with 42 +++
However when readelf
is used to try to read the program’s ELF header, it fails:
$ readelf -h tiny-i386
readelf: Error: tiny-i386: Failed to read file header
fails to recognize that it is indeed an ELF file:
$ gdb -q tiny-i386
GEF for linux ready, type `gef' to start, `gef config' to configure
80 commands loaded for GDB using Python engine 3.6
"home/reversing/tiny-i386": not in executable format: File format not recognized
gef➤ info file
gef➤ run
Starting program:
No executable file specified.
Use the "file" or "exec-file" command.
In a pleasant surprise, we are able to debug and disassemble the code using r2:
$ r2 -d tiny-i386
Process with PID 29855 started...
= attach 29855 29855
bin.baddr 0x00010000
Using 0x10000
Warning: Cannot initialize program headers
Warning: Cannot initialize section headers
Warning: Cannot initialize strings table
Warning: Cannot initialize dynamic strings
Warning: Cannot initialize dynamic section
Warning: read (init_offset)
asm.bits 32
[0x00010020]> ds
[0x00010020]> ds
[0x00010020]> ds
[0x00010020]> ds
child exited with status 42
==> Process finished
Stepping failed!
Step failed
[0x00010020]> pd 10
0x00010020 b32a mov bl, 0x2a ; ebx
0x00010022 31c0 xor eax, eax
0x00010024 40 inc eax
;-- eip:
0x00010025 cd80 int 0x80 <---------- execution ends here
0x00010027 003400 add byte [eax + eax], dh
0x0001002a 2000 and byte [eax], al
0x0001002c ~ 0100 add dword [eax], eax
;-- section_end.ehdr:
0x0001002d 0000 add byte [eax], al
0x0001002f 0000 add byte [eax], al
0x00010031 0000 add byte [eax], al
However, when radare2 is used to parse the binary, some of the field values look strange:
$ r2 -nn tiny-i386
[0x00000000]> pf.elf_header @ elf_header
ident : 0x00000000 = .ELF.
type : 0x00000010 = type (enum elf_type) = 0x2 ; ET_EXEC
machine : 0x00000012 = machine (enum elf_machine) = 0x3 ; EM_386
version : 0x00000014 = 0x00010020
entry : 0x00000018 = 0x00010020
phoff : 0x0000001c = 0x00000004
shoff : 0x00000020 = 0xc0312ab3
flags : 0x00000024 = 0x0080cd40
ehsize : 0x00000028 = 0x0034
phentsize : 0x0000002a = 0x0020
phnum : 0x0000002c = 0xff01
shentsize : 0x0000002e = 0xffff
shnum : 0x00000030 = 0xffff
shstrndx : 0x00000032 = 0xffff
There do seem to be quite a few odd-looking values mixed together with ones that appear similar to what we are accustomed to seeing. What is happening here? Examining the source code will help explain some of this output.
A Look at the Source Code
; tiny.asm
org 0x00010000
db 0x7F, "ELF" ; e_ident
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dw 2 ; e_type ; p_paddr
dw 3 ; e_machine
dd _start ; e_version ; p_filesz
dd _start ; e_entry ; p_memsz
dd 4 ; e_phoff ; p_flags
mov bl, 42 ; e_shoff ; p_align
xor eax, eax
inc eax ; e_flags
int 0x80
db 0
dw 0x34 ; e_ehsize
dw 0x20 ; e_phentsize
db 1 ; e_phnum
; e_shentsize
; e_shnum
; e_shstrndx
filesize equ $ - $$
A few observations:
- the program header overlaps with the ELF header
- the entry point is inside the ELF header
- The implication is that there is executable code inside the header
- the fields having to do with sections are empty
- it is actually more precise to say that since the file is 45 bytes in size but the ELF header of a 32-bit binary should be 52 bytes in total, those fields are simply not there.
As it turns out, the subset of fields that must contain correct values in order to be loaded by the kernel consists of the following:
- The first 4 bytes of e_ident which includes:
- EI_MAG0 - EI_MAG4:
- EI_MAG0 - EI_MAG4:
- e_type
- e_machine
- e_entry
- e_phoff
- e_phnum
Summary from “A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux” (bolding added):
So: Here’s what is and isn’t essential in the ELF header. The first four bytes have to contain the magic number, or else Linux won’t touch it. The other three bytes in the e_ident field are not checked, however, which means we have no less than twelve contiguous bytes we can set to anything at all. e_type has to be set to 2, to indicate an executable, and e_machine has to be 3, as just noted. e_version is, like the version number inside e_ident, completely ignored. (Which is sort of understandable, seeing as currently there’s only one version of the ELF standard.) e_entry naturally has to be valid, since it points to the start of the program. And clearly, e_phoff needs to contain the correct offset of the program header table in the file, and e_phnum needs to contain the right number of entries in said table. e_flags, however, is documented as being currently unused for Intel, so it should be free for us to reuse. e_ehsize is supposed to be used to verify that the ELF header has the expected size, but Linux pays it no mind. e_phentsize is likewise for validating the size of the program header table entries. This one was unchecked in older kernels, but now it needs to be set correctly. Everything else in the ELF header is about the section header table, which doesn’t come into play with executable files.
Given that the program contains only 7 bytes of instructions and has a malformed header, emulation is a good alternative to heavyweight tools like radare2, IDA, Ghidra, etc. for analyzing/tracing/logging the runtime behavior of this kind of program. The program’s code can be emulated via a small python script that utlizes the Unicorn Engine (at time of writing, the Qiling emulation framework is still in alpha and the code is not available). For our purposes right now, it does not matter that the ELF header is malformed, as the only information needed to emulate the binary is its architecture and the file offsets at which to begin and end emulation; this information can be retrieved from a hex dump of the binary without needing to parse the ELF header.
The approach to emulating the tiny-i386
binary is as follows:
First, retrieve the start and end points for emulation from a hex dump. Then, when writing the script to emulate the program:
- read the file and map it to memory
- set up the stack
- initialize the emulation engine
- implement a hook that allows each executed instruction to be traced and logged to STDOUT
- a Capstone disassembly engine object will be passed to this hook so that each instruction can be disassembled and its disassembly logged as well
- implement a hook that handles system calls
We know from the source code that the first instruction is mov bl, 42
and the final instruction is int 0x80
. We can find these easily in a dump of tiny-i386
$ hexdump -C tiny-i386
00000000 7f 45 4c 46 01 00 00 00 00 00 00 00 00 00 01 00 |.ELF............|
00000010 02 00 03 00 20 00 01 00 20 00 01 00 04 00 00 00 |.... ... .......|
00000020 b3 2a 31 c0 40 cd 80 00 34 00 20 00 01 |.*1.@...4. ..|
Narrowing down the output:
$ hexdump -C -s 0x20 -n 7 tiny-i386
00000020 b3 2a 31 c0 40 cd 80 |.*1.@..|
There we have it: the offset at which to begin emulation is 0x20
and at which to end is 0x27
Since the architecture is already known to us, this is all the information required to emulate the program:
When executed, we get a trace + disassembly:
$ ./emulate_tiny-i386.py
>>> Tracing instruction at 0x100020, instruction size = 0x2, disassembly: mov bl, 0x2a
>>> Tracing instruction at 0x100022, instruction size = 0x2, disassembly: xor eax, eax
>>> Tracing instruction at 0x100024, instruction size = 0x1, disassembly: inc eax
>>> Tracing instruction at 0x100025, instruction size = 0x2, disassembly: int 0x80
>>> 0x100025: INTERRUPT: 0x80, EAX = 0x1
>>> Emulation Complete.
Looks good. No debugger needed.
netspooky’s bye: 84 bytes total, 23 bytes of code
An advantage of emulation over debugging is that the emulated instructions (should) have no effect on the host system. Even if the program being emulated
contains code that could potentially damage the system it runs on, its instructions are not actually being executed by the CPU, so emulation poses much less
risk than debugging (unless there is some way to
escape from the emulator, e.g. QEMU VM escape). This is useful for analyzing viruses,
crimeware, etc. and in this particular case @Netspooky’s bye
binary, which executes the
On a desktop system, this binary will shut down your computer abruptly. There are some potential side effects from a shutdown like this, but personally I haven’t experienced any issues with it. However, on a VPS, this specific syscall proves to be a bit of a problem. Since the virtual machine doesn’t actually have any of it’s own physical hardware (it’s either virtualized or shared with the host), the power button on a VPS isn’t really a thing. By executing a syscall the effectively “shuts off the power” to the operating system, this puts the VM in an unknown state. So far, whenever this is run on a VPS, it seemingly wipes out the entire instance.
Obviously it is advantageous to be able analyze the runtime behavior of such a program without having to actually load it into memory and execute since
we do not want our machine to be shut down, and in a way that may potentially damage the system at that.
The script used to analyze tiny-i386
can be modified to support emulation of x86-64 code and of the reboot
syscall. The same approach will be followed
as before, with minor adjustments.
Before we begin, however, we can first try to read the file’s ELF header with readelf
, take a look at the source code,
and then disassemble its code with Capstone to get a sense of what to expect from emulation.
Parsing the Header with readelf
$ readelf -h bye
ELF Header:
Magic: 7f 45 4c 46 ba dc fe 21 43 be 69 19 12 28 eb 3c
Class: <unknown: ba>
Data: <unknown: dc>
Version: 254 <unknown: %lx>
OS/ABI: <unknown: 21>
ABI Version: 67
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4
Start of program headers: 1 (bytes into file)
Start of section headers: 28 (bytes into file)
Flags: 0x0
Size of this header: 0 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 0 (bytes)
Number of section headers: 1
Section header string table index: 0
readelf: Warning: possibly corrupt ELF header - it has a non-zero program header offset, but no program headers
The ELF header is clearly malformed. At least we can see the entry point is at offset 0x4
The Source Code
The bytes that the instructions are composed of are not contiguous - rather than consisting of a single stream of bytes, some code resides at the beginning
and the end of the
ELF header with data and 00
bytes in between; in addition to accounting for the output produced by readelf
above, this may pose a challenge for correct
Disassembly with Capstone and Radare2
According to the comments in the source
code of the file,
the last instruction is located at the last byte of the file - offset 0x53
. Using this information, a simple script to disassemble the code
with Capstone can be written:
This produces the following disassembly:
$ ./disassemble_bye.py
0x1000: mov edx, 0x4321fedc
0x1005: mov esi, 0x28121969
0x100a: jmp 0x1048
0x100c: add al, byte ptr [rax]
0x100e: add byte ptr ds:[rcx], al
0x1011: add byte ptr [rax], al
0x1013: add byte ptr [rax + rax], al
0x1016: add byte ptr [rax], al
0x1018: add dword ptr [rax], eax
0x101a: add byte ptr [rax], al
0x101c: sbb al, 0
0x101e: add byte ptr [rax], al
0x1020: add byte ptr [rax], al
0x1022: add byte ptr [rax], al
0x1024: add byte ptr [rax], al
0x1026: add byte ptr [rax], al
0x1028: add byte ptr [rax], al
0x102a: add byte ptr [rax], al
0x102c: add dword ptr [rax], eax
0x102e: add byte ptr [rax], al
0x1030: add byte ptr [rax], dil
0x1033: add byte ptr [rcx], al
0x1035: add byte ptr [rdx], al
0x1037: add byte ptr [rax + 0x50fa9], dh
0x103d: add byte ptr [rax], al
0x103f: add byte ptr [rax + 0x50fa9], dh
0x1045: add byte ptr [rax], al
0x1047: add byte ptr [rdi - 0x11e2153], bh
0x104d: jmp 0x1038
0x104f: nop
This is clearly incorrect. What happened?
As it turns out, Capstone is a linear sweep-based disassembler (as opposed to recursive traversal-based, like radare2)[7][8]. This means that beginning at
the start address, it disassembles all bytes as code until the end address, ignoring flow-of-control. In the disassembly above, quite a bit of null bytes and
data are being decoded as instructions. We can compensate for this manually somewhat by ignoring the bytes between the jmp
at offset 0xa
and the cya
at offset 0x3c
(see the source code, lines 11 and 27 in particular):
The disassembly produced after these adjustments is less egregiously erroneous (but still not quite correct):
$ ./disassemble_bye_2.py
0x1000: mov edx, 0x4321fedc
0x1005: mov esi, 0x28121969
0x100a: jmp 0x1048 <------- jumps beyond the end of the buffer
0x100c: mov al, 0xa9
0x100e: syscall
0x1010: add byte ptr [rax], al <------- error
0x1012: add byte ptr [rax], al <------- error
0x1014: mov al, 0xa9
0x1016: syscall
0x1018: add byte ptr [rax], al <------- error
0x101a: add byte ptr [rax], al <------- error
0x101c: mov edi, 0xfee1dead
0x1021: jmp 0x100c
0x1023: nop
At least it somewhat resembles the source code.
How does radare2 fare in disassembling this binary? Not well at all. In fact, it completely fails (maybe I am not using the correct flags?):
$ r2 bye
Warning: Cannot initialize program headers
Warning: Cannot initialize dynamic strings
Warning: Cannot initialize dynamic section
[0x00000004]> pd
;-- entry0:
;-- eip:
0x00000004 ff invalid
0x00000005 ff invalid
0x00000006 ff invalid
0x00000007 ff invalid
0x00000008 ff invalid
0x00000009 ff invalid
0x0000000a ff invalid
0x0000000b ff invalid
0x0000000c ff invalid
0x00000031 ff invalid
0x00000032 ff invalid
0x00000033 ff invalid
;-- section_end.ehdr:
0x00000034 ff invalid
0x00000035 ff invalid
0x00000036 ff invalid
0x00000040 ff invalid
0x00000041 ff invalid
0x00000042 ff invalid
0x00000043 ff invalid
Looks like disassembly is not particularly helpful here.
Emulation seems to be the most reasonable option. The program responsible for handling emulation of bye
includes code for handling x86-64 syscalls
on lines 26 - 41, allowing us to see the arguments in the registers when the syscall is made:
Emulated execution trace:
$ ./emulate_bye.py
>>> Tracing instruction at 0x100004, instruction size = 0x5, disassembly: mov edx, 0x4321fedc
>>> Tracing instruction at 0x100009, instruction size = 0x5, disassembly: mov esi, 0x28121969
>>> Tracing instruction at 0x10000e, instruction size = 0x2, disassembly: jmp 0x10004c
>>> Tracing instruction at 0x10004c, instruction size = 0x5, disassembly: mov edi, 0xfee1dead
>>> Tracing instruction at 0x100051, instruction size = 0x2, disassembly: jmp 0x10003c
>>> Tracing instruction at 0x10003c, instruction size = 0x2, disassembly: mov al, 0xa9
>>> Tracing instruction at 0x10003e, instruction size = 0x2, disassembly: syscall
>>> got SYSCALL with RAX = 169
>>> SYSCALL: reboot
>>> ARGUMENTS: RDI = 0xfee1dead RSI = 0x28121969 RDX = 0x4321fedc
>>> Emulation Complete.
Very nice. Not only do we see the runtime behavior of the program without executing it, but we get essentially correct disassembly as well.
According to the source code and the attempt at disassembly using Capstone, the reboot
syscall is made twice, but obviously only the first one would
ever be executed, meaning the instructions following the first reboot
syscall are unreachable. Perhaps emulation is also useful for analysing obfuscated
assembly code? ;)
As we can see, emulation via Unicorn is a very powerful methdod for analyzing programs that can’t be properly parsed or disassembled with the ususal tools. However, the difficulty of writing the program that performs the emulation scales with the complexity of the program being emulated. An example of this is the necessity of implementing support manually for interrupts and syscalls. In the next post, somewhat larger programs with a greater range of functionality will be analyzed. Up to this point the start and end addresses of emulation have been manually retrieved from a dump of the target binary; a method of robustly parsing malformed ELF headers will also be explored so that the code start and end offsets can be retrieved in an automated fashion.
Links and References
- ELF Crafting: Advanced Anti-analysis techniques for the Linux Platform
- Striking Back GDB and IDA debuggers through malformed ELF executables
- Screwing elf header for fun and profit
- Modern Linux Malware Exposed
- Understanding Linux Malware
- Linux process execution and the useless ELF header fields
- Disassembly of Executable Code Revisited - discusses linear sweep and recursive traversal disassembly algorithms
- On Disassembling Obfuscated Assembly
Muppetlabs’ Tiny Binaries:
netspooky’s Experiments:
- source code of “golfclub” binaries on github
- ELF Binary Mangling Part 1 — Concepts
- Elf Binary Mangling Pt. 2: Golfin’
- Elf Binary Mangling Part 3 — Weaponization
Unicorn Engine materials: