By Sean Deaton in binaryninja — Dec 27, 2024

Trying Out Binary Ninja's new WARP Signatures with IPSW Diff'ing

Binary diff'ing is pretty complex, but being able to apply markup from one binary to another is quite powerful. Binary Ninja's new WARP extends previous efforts, using SigKit, to quickly identify library functions.

Applying Binary Ninja markup from one binary to another.

With the 4.2 release of Binary Ninja, Frogstar, the Vector35 team released the alpha version of WARP. WARP is an open source, first-party, plugin that generates signatures for disassembled functions for easy transfer of information to other tools or different versions of the binary. For those familiar with IDA F.L.I.R.T. signatures, the concept is similar.

Comparison to IDA F.L.I.R.T.

The documentation for IDA's F.L.I.R.T. excellently summarizes the need for tools like this. If you compile a simple C or C++ Hello World executable, you may get 58 library functions needed to support one single user-generated function. Does a reverse engineer really want to RE those library functions? No. They rarely change and they don't add any new information; we're not fishing for bugs in puts.

IDA's approach with F.L.I.R.T. creates a database of all functions from recognized libraries (think libc, libxml, etc) and checks each byte of the program, at decompilation time, to see if that byte marks the start of a known standard library function.

Each function is represented as a hexadecimal pattern made up of the first 32-bytes of the function. To reduce true negatives, the algorithm also takes into account variant bytes. These are sequences of instructions that effectively perform the same actions but with difference instructions. Consider the following, which effectively do the same thing.

    call    mylibfunc
    ret
or
    jmp     mylibfunc

These instructions are represented as wildcards in the byte pattern.

They take this a step further and reduce disk space (and in-memory size) required to store these patterns by correctly identifying that many library functions start with the exact same instructions. This lends itself well to a tree structure, where, as an example, these four signatures can be represented as the following tree:

558BEC0EFF7604..........59595DC3558BEC0EFF7604..........59595DC3 _registerbgidriver
558BEC1E078A66048A460E8B5E108B4E0AD1E9D1E980E1C0024E0C8A6E0A8A76 _biosdisk
558BEC1EB41AC55604CD211F5DC3.................................... _setdta
558BEC1EB42FCD210653B41A8B5606CD21B44E8B4E088B5604CD219C5993B41A _findfirst

558BEC
      0EFF7604..........59595DC3558BEC0EFF7604..........59595DC3 _registerbgidriver
      1E
        078A66048A460E8B5E108B4E0AD1E9D1E980E1C0024E0C8A6E0A8A76 _biosdisk
        B4
          1AC55604CD211F5DC3                                       _setdta
          2FCD210653B41A8B5606CD21B44E8B4E088B5604CD219C5993B41A   _findfirst

This also increases the speed of matching, as matching is no longer linear to the number of functions in the executable, but logarithmic.

What if two functions share the same leading 32-bytes and thus have the same byte pattern? Or what if two leaves of the tree match? Such a tree would like this the following, where _chmod and _access share the same first 32 bytes.

558BEC
      56
        1E
          B8....8ED8
                   33C050FF7608FF7606..........83C406
                                                      8BF083FEFF
                    0. _chmod   (20 5F33)
                    1. _access  (18 9A62)

They've considered this. In this case, they take bytes 33 through n, where n is the index of the first variant byte, and generate a cyclic redundancy check (CRC16) to distinguish between functions.

This isn't infallible either, as the first variant byte may occur at index 33, leaving an empty byte sequence for the CRC. It's also possible, especially for smaller byte sequences, for the CRC to exactly match. They have some other algorithms in cases such as this, but that's beyond the scope of this article.

The point is, function signature matching is difficult, but IDA at least claims their results in very few false recognitions.

Binary Ninja's WARP

Being an alpha release, WARP is a little less documented and I'm not as confident at describing its algorithm as I am IDA's, but let's give it a shot.

The premise of the algorithm is the function globally unique identifier (GUID) calculated as follows.

First, the function is disassembled and the basic blocks making up the function are identified. Then, for each basic block, instructions are zero'ed out if they contain a relocatable operand. This would be for jumping to subroutines not known at compile time; the linker usually substitutes these bytes with values from a relocatable table. Additionally, all NOPs or NOP-like (xor $eax, $eax) instructions are removed.

The byte sequence of all instructions in the basic block are passed to the UUIDv5 function, which just sha1's the function namespace with the bytes and get's the UUID of the first 16 bytes. This results in the basic block GUID.

def uuid5(namespace, name_bytes):
  """Generate a UUID from the SHA-1 hash of a namespace UUID and a name bytes."""
  from hashlib import sha1
  hash = sha1(namespace.bytes + name_bytes).digest()
  return uuid.UUID(bytes=hash[:16], version=5)

The function namespace is an implementation-specific constant uuid.UUID('0192a179-61ac-7cef-88ed-012296e9492f'), meant to describe the current function hashing algorithm. You could, in practice, use the same hashing algorithm in Ghidra and IDA, and use the same function namespace ID, and have WARP signatures match between tools.

Finally, the function GUID is the UUIDv5 of all of the basic block GUIDs (sorted in order of highest to lowest starting address). An example from the documentation looks like the following:

function_namespace = uuid.UUID('0192a179-61ac-7cef-88ed-012296e9492f')
bb1 = uuid.UUID("036cccf0-8239-5b84-a811-60efc2d7eeb0")
bb2 = uuid.UUID("3ed5c023-658d-5511-9710-40814f31af50")
bb3 = uuid.UUID("8a076c92-0ba0-540d-b724-7fd5838da9df")
function = uuid5(function_namespace, bb1.bytes + bb2.bytes + bb3.bytes)

This algorithm does not result in unique function GUIDs. In the case where there is a match, WARP uses function constraints to properly identify a matching signature. For instance, a constraint may include one or more called functions, caller functions, or adjacent functions. The use of constraints is left to the user.

Testing Out WARP

To test out WARP, I downloaded the iOS 18.0.0. and 18.0.1 IPSW files for the iPhone 15 (iPhone15,4). The goal is to find a subroutine in an executable in the 18.0.0 build and apply it to the same executable in the 18.0.1 build.

ipsw download ipsw --device "iPhone15,4" --build 22A3354 # 18.0.0
ipsw download ipsw --device "iPhone15,4" --build 22A3370 # 18.0.1

Sort of at random, I picked /usr/libexec/anomalydetectiond.

Then, I created a Binary Ninja project (with version 4.3.6599-dev) and enabled WARP in the Project settings before importing both versions of the executable. I left the remaining WARP settings on default.

WARP analysis options in Binary Ninja. — Binary Ninja > Preferences > WARP

I found an unnamed subroutine (sub_10000690c) by just looking at cross references to strings. This one looked interesting. It's used in all sorts of places. The first argument is always a pointer to an uninitialized piece of memory. The second argument is always the result of the call to [NSString UTF8String], which is a const char *.

HLIL in Binary Ninja showing a call to a function. — A call to the function we're going to mark up.

The function looks to be creating a new C++ object with the call to operator new and assigning that to the first argument. So I made a typedef void * std::string type and the following function signature with an appropriate comment.

Though, to be honest, it doesn't really matter if I'm right or wrong. We just want to see if I can apply these same changes to the next binary automatically.

A HLIL function in Binary Ninja. — The function after I've marked it up and created a function signature.

Before moving on, let's also check the function GUID with WARP\\Copy Function GUID. Using that, we get the GUID af762eb7-d683-5bf6-b18c-dc83b9a6dca7.

To produce the WARP signature, the documentation just says I need to run WARP\\Generate Signature File. I ran that for both the 18.0.0 and 18.0.1 version. Of course, only the former had any meaningful markup. I saved the resulting signature files (.sbin) within the Binary Ninja project in the same directories as the analysis files.

The Binary Ninja WARP plugin items. — Generating a signature file using the plugin menubar item.

I'm not sure if this step was fully necessary, as I also discovered the option WARP\\Add Function Signature to File, which just creates a signature for one function.

You can run both of these commands through the Plugin taskbar option, with the command palette (⌘P) and typing "WARP" to filter, or right-clicking a function and selecting the plugin menu. After selecting WARP\\Add Function Signature to File, I saved the signature file (another .sbin) in the ios-aarch64 folder of the signature file directory.

Alright, time to switch back over to the second executable and apply the changes.

I analyzed the 18.0.1 version of anomalydetectiond and, using the only cross-reference to "com.apple.anomalydetectiond.kappa.signal.test", found the same subroutine (though they're actually loaded at the same address).

First, I wanted to check the function GUID using WARP\\Copy Function GUID. Boom, it's an exact match: af762eb7-d683-5bf6-b18c-dc83b9a6dca7.

Looking at the HLIL, we can observe, there's no markup. The subroutine is unnamed and the parameter types are all wrong. That is, until we run WARP\\Load Signature File and WARP\\Run Matcher.

The function before it's marked up and the plugin selection to load the signature file.

Select these two options and watch the magic.

After loading the signature file and running the matcher, we have some markup.

We now have a positive function signature. It even created the appropriate type for us (std::string).

Since we only applied one function signature, it was pretty easy to see the changes. However, in instances where you have multiple matches, it might be handy to examine the WARP tag that's automatically generated for you. This tags each function that was matched so you can immediately jump to it.

The bookmarks and tags side bar in Binary Ninja. — The WARP tag.

At the time of writing, this only applies function typing information. However, the main developer of the plugin mentioned that applying comments is in the pipeline. Applying variable definitions/names is a little more difficult and may require implementation-specific algorithms.

Conclusion

This was a pretty simplistic example; it was essentially the same version of the executable (with one minor change, and not to this function). The function was even loaded at the same address. The real magic would come from accruing several years of reverse engineering and having a database of signatures to instantly apply to new analysis efforts.

I'm excited to see where WARP ends up and I'll certainly continue to use it. Thanks to the Binja team, specifically Mason, for answering my barrage of questions about the plugin.

Comparison to IDA F.L.I.R.T.

Binary Ninja's WARP

Testing Out WARP

Conclusion

Subscribe to Sean Deaton