In this post, we announce the release of a small library for disassembling Dalvik bytecode. This serves as a foundation for building static analysis tooling for Android applications and system services in Rust. Read on for an example graphview application, or just check out the crate’s source and documentation to get started with your own tooling!
Background
Android uses a custom runtime called ART to execute user applications and background system services on a handset. The bytecode for this runtime is called Dalvik. The bytecode is bundled into one or more Dex files per application, which store constant data, class and type metadata, and provide linking capability to call methods in other dex files (a feature called multidex).
Reverse engineers use a multitude of tools when reverse-engineering Android code. Bytecode Viewer is a great choice for quickly switching between multiple high quality Java/Dalvik decompilers. When one decompiler fails, another may succeed, or simply show different but semantically equivalent high level code.
For example, this snippet of Dalvik bytecode (disassembled to Smali):
.method public toDumpFormat()Ljava/lang/String;
.registers 10
.line 100
new-instance v0, Ljava/text/SimpleDateFormat;
sget-object v1, Ljava/util/Locale;->ENGLISH:Ljava/util/Locale;
const-string v2, "MM/dd HH:mm:ss.SSS"
invoke-direct {v0, v2, v1}, Ljava/text/SimpleDateFormat;-><init>(Ljava/lang/String;Ljava/util/Locale;)V
decompiles to this original Java code:
Locale locale = Locale.ENGLISH;
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("MM/dd HH:mm:ss.SSS", locale);
try {
return String.format(locale, "%s, %s, %s, %s, %d, %d, %d, %s", this.mType, this.mPackageName, simpleDateFormat.format(new Date(this.mStartTime)), this.mResultTime == 0 ? "-----------" : simpleDateFormat.format(new Date(this.mResultTime)), Long.valueOf(this.mLatency), Integer.valueOf(this.mExtra), Integer.valueOf(this.mBadQualityCount), this.mResult);
} catch (Exception e) {
Slog.w(SemBioLoggingManager.TAG, "toDumpFormat: " + e.getMessage());
return "formatting error";
}
The higher-level representation is extremely helpful when analyzing methods of any size. The larger the method, the more difficult and time-consuming it is to pick apart the disassembly directly.
So when this happened on a quite large method I needed to reverse, I knew I needed a new solution:
Nothing in Bytecode Viewer could decompile the method in question!
So I reached for another tool, Ghidra. Ghidra can also unpack APK files and decompile the dex files within. And the decompiler view produced readable code, even for the method that had failed previously.
Here’s what the example method looks like in Ghidra (Dalvik (listing view) on the left, and pseudo-code on the right):
But unfortunately it has a fatal flaw! When I click on an exception handling block in the listing view, I expect the cursor in the decompilation to jump to the respective high-level code in a try/catch block. This happens instead:
The related Dalvik bytecode in the handler does decompile; however, it is an Undefined Function, which means it was detached from the method that could jump to the catch block. Dalvik exception handling is essentially invisible in Ghidra's decompiler view. This may be a fundamental issue with Ghidra's pseudo code view, as the target high-level language is C, which does not have exception handling.
Understanding control flow is crucial for security researchers. Perhaps even more so in memory safe languages like Java and Kotlin where logic bugs may be the most common form of security flaws.
Given the shortcomings of the prior bytecode analysis tools, I needed a unique solution.
Building my own tool
I needed a higher level representation of the bytecode of some sort. The top priority was to show faithful semantics (e.g. no hiding exception handling code). The other goal was to have an interface better than that of the raw Smali output of apktool d example.apk
.
While “better” is entirely subjective, I especially needed good support for try/catch, because I could defer to Ghidra’s decompilation for the rest. So I decided to write my own Dalvik disassembler in Rust, with explicit support for control flow visualization of exception handling.
Early in the development of the disassembler, I closely matched the output of baksmali so I could easily diff with a known-good representation. This surfaced a few bugs, but in the end wasn’t very helpful as far as the secondary goal: the interface.
Exporting a directed graph with Graphviz was the obvious next step because we can visualize control flow without needing to decompile Dalvik to a Java-like language with if/else and try/catch. Dalvik can resolve a lot of high level information when paired with dex metadata, such as function arguments and string references, and I think it would be a great base to explore future decompilation ideas. But for now, this humble graph view helpfully stands in when other existing decompilers fail. It’s a decent middle ground between readability and reliability.
If you also have stubborn methods that refuse to decompile, check out the graphview example in our GitHub repo, and if you wish to build other tools in Rust for Dalvik analysis, check out the crates.io page. We’re excited to see what you make!