- Sample:
- 13063a496da7e490f35ebb4f24a138db4551d48a1d82c0c876906a03b8e83e05 (Bazaar, VT)
- Infection chain:
- Excel stylesheet -> Office equation -> Shellcode (downloader) -> DBatLoader stage 1 (stegano dropper) -> DBatLoader stage 2 (discord downloader) -> DBatLoader stage 3 (resource dropper) -> Stone packed -> Formbook
- Tools used:
- Malcat, Speakeasy emulator, Hex Rays
- Difficulty:
- Intermediate
Introduction
If you are doing cyber threat research on the internet, chances are you will find a ton of papers documenting malicious RATs, APTs and state-sponsored campaigns. It is indeed interesting (and it makes cyber security folks feel like James Bond), but sadly little attention is given to what makes most of the threat landscape: the packers, droppers and other downloaders at the front of the infection chain. They may be less sophisticated, but it is what the user first encounters, and what makes most of the threat landscape.
The truth is, if an antivirus successfully detects and blocks an advanced RAT on a system, it means that it already failed and that the system is compromised, because advanced RAT are at the end of the infection chain.
To illustrate our point, we will inspect a Formbook sample and we won't talk about Formbook at all. Instead we will dissect the infection chain which leads to the installation of Formbook. As you will see, it is actually more complex than one might think.
Exploiting CVE-2018-0798
Excel document
The malware we are analyzing today is an encrypted OpenXML Excel document that came as email attachment. OpenXML documents are usually just ZIP archives containing XML files and are easy to analyze, but not encrypted documents like this one.
In fact, when a user chose to protect its Excel sheet, Microsoft Excel will encrypt it (using the magical password VelvetSweatshop
) and store it inside an OLE container. And when the user opens the document, Office will transparently decrypt it without any user interaction.
Malware authors are well aware of that fact and tend to abuse Excel encryption in order to evade antivirus detection.
Fortunately, this is an old technique and tools exist to decrypt this kind of files. In fact, it is as simple as a few lines of python:
1 2 3 4 5 6 7 8 9 10 |
|
This gives us an OpenXML ZIP archive. Browsing the content, we can see a few things worth of interest:
- the document contains pictures baiting the user to deactivate safe mode (see screenshot below)
- there is no
vbaProject.bin
file in the archive, meaning no VBA macro - there is no Excel macro sheet
- there are two embedded objects:
- a Word document at
xl/embeddings/Microsoft_Office_Word_Macro-Enabled_Document1.docm
- an OLE container at
xl/embeddings/oleObject1.bin
- a Word document at
Beside these elements, the document looks pretty clean. The Word document only contains a single picture, but the OLE container seems promising since its doctype GUID is 0002CE02-0000-0000-C000-000000000046
(Microsoft Equation 3.0 object). Equation objects have seen several vulnerabilities in the past years and are actively exploited in the wild. Let us dive in.
Buggy equation
If we open the oLE10NATive
stream of the OLE container xl/embeddings/oleObject1.bin
inside Malcat, we can see a very bare bone Equation 3.0 object which has been stripped to the minimal, leaving just enough to target the exploit. But which exploit? VirusTotal tends to detect it as CVE-2017-11882, but not all engines agree. Let us have a look at the data:
Using the documentation of the MTEF format found here, we can make sense of most of the stream:
Offset | Size | Meaning |
---|---|---|
00 | 4 | The OLE1 header specifying the size of the data in the stream. Office seems to ignore this value and use the stream size from the OLE container instead |
04 | 5 | MTEF header. Only the MTEF version (3) and MTEF product(1 = Equation Editor) seem to have valid values. The rest is most likely ignored by Office and has been randomized. |
09 | 2 | First MTEF record: 0x0A = FULL SIZE record |
0B | 6-? | Second MTEF record: 0x05 = MATRIX record |
The MATRIX record seem to be the culprit there, and it would mean that we are facing CVE-2018-0798. CVE-2018-0798 is sometimes confused with CVE-2018-0802 since Microsoft originally allocated the same CVE for two different vulnerabilities. But it is quite different from CVE-2017-11882 which exploits the FONT record: funny how most antivirus got it wrong.
According to this document, the MATRIX record triggers the exploit by setting the field NumberOfRows
too high. Only 8 bytes are reserved in eqnedt32.exe for the array RowPartitionLineTypes
, but (2 * 0xec + 9) / 8 = 0x3c bytes are copied instead, leading to a stack overflow:
Knowing this, we can now start looking for a shellcode.
The shellcode
By quickly inspecting what follows the MATRIX record (so starting at offset 0x4D), we notice that offset 0x50 looks like the start of a shellcode. Indeed, the push/pop/jmp chain tends to indicate a meterpreter-generated shellcode.
Judging by the high entropy of the rest of the stream, the shellcode is most likely encrypted. We could of course reverse it, but it is faster to emulate the code. We will use the Speakeasy emulator from FireEye on the content of the oLE10NATive
stream. You can use the following script:
1 2 3 4 5 6 7 8 9 10 11 |
|
If you are using Malcat, you can alternatively force a function declaration at offset 0x50 (start of the shellcode) and then run the script speakeasy_shellcode.py
. The shellcode gets decrypted and strings are now in plain text:
No need to analyze the shellcode in depth. Judging by the strings, it is a simple downloader that fetches and runs a file from the url hxxp://104.168.32.50/009/vbc.exe
(still online at time of writing). So let us fetch the data and move on.
First stage: a bit of steganography
The file vbc.exe
is a 937KB Delphi application of sha256 3045902d7104e67ca88ca54360d9ef5bfe5bec8b575580bc28205ca67eeba96d (Bazaar, VT). Because of its size, reversing the complete application is out of question. We could send it to a sandbox, but our goal is to analyze and understand the dropper. So let us try to locate the payload instead by looking at anomalies.
Locating the payload
Sweeping quickly through the binary, we find two points of interest:
-
A huge string (104427 bytes) at address
0x0046f718
-
A resource bitmap named
BBTREX
which does not look like the standard one (size is different, resource language too). Visually, the resource is a picture and definitely not an icon like the rest. It has most likely been patched post-compilation.
These two objects are referenced by the same function at offset 0x46D330
, which is quite convenient. This function is located near the end of the CODE section, which is also of importance. Delphi application are structured in Units, and the linker tends to put library units at the start of the code section, and user units at the end. So everything at the end of the CODE section is likely to be user code and thus interesting. Let us have a look at the function using HexRays:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
The function RunPayload
at address 0x46cdf0
makes use of VirtualAlloc
and VirtualProtect
, which suggests that at this point the dropper already decrypted its payload. And just before the call, we can see that the program loads the patched bitmap resource BBTREX
into a TBitmap
and calls the function that we named SteganoUnpack
. So let us have a look at SteganoUnpack
.
Decrypting the bitmap
The function SteganoUnpack
at address 0x46C8F8
is a bit harder to understand. But using IDA's Delphi FLIRT signatures, we can get most of it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
In a nutshell, the function reads the bitmap line by line, and each line pixel by pixel. For every byte of the bitmap, some bits (the lowest significant bits) are extracted and concatenated in order to assemble the final payload. This is textbook steganography. The first line is a bit special since it contains additional info:
- The first 3 bytes (so the first RGB pixel) encodes a 4 bits integer (2 bits of red component, 1 bit of green and 1 bit of blue). This integer that we named
stegano_num_lsb_bits
tells the software how many bits of each bitmap byte it should extract from the image (3 in our case) - Then the software jumps to the 4th byte and reads 32 bits from the bitmap into an integer. This integer is the number of bytes which should be extracted from the image (the payload size in other words)
- Finally the software starts the payload extraction process
So let us try if we got it right. We will open the bitmap BBTREX
(which is a DIB bitmap, meaning the BITMAPFILEINFOHEADER is missing) in an hexadecimal editor and try to manually decode the first bytes. We first have to locate the first bitmap row. Good to know: bitmaps are stored upside down, i.e the top-most line is actually the last one in the file. So knowing that our bitmap is 588 pixels wide and is a RGB bitmap (so 3 bytes per pixel), the first line should start at EndOfFile - 588*3 = 0x44ea8
:
So first thing first, we will decrypt the first 4 bits integer (aka stegano_num_lsb_bits
). The first line starts with the 3 bytes 03 02 02
, which gives us the binary number 1100
(in LSB display) = 3. Ok.
Next, the algorithm moves to the second pixel and reads 32 bits. 32 bits / 3 bits per byte means it will read 10 bytes and 2 bits of the 11th byte. The next 11 bytes are: 00 00 00 03 06 02 00 00 00 00 04
, which gives us the binary number 000 000 000 110 011 010 000 000 000 000 00(1)
(in LSB display) = 91648 ok. The 11th byte contains an additional bit which we did not read which was a (1)
.
Next we could start reading 2 bytes of the payload, which is 16 bits. Since we still have a bit unread from 04
, we just have to read 15 additional bits or 5 bytes. The next five bytes are: 06 04 04 06 02
, which gives us the binary numbers (1) 011 001 0--01 011 010
or 0x4d--0x5a
... looks like the start of a PE, great!
So let us put everything together and write a small extraction script using python. The following script should be run inside Malcat's script editor with the bitmap open:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
Running the script gives us the second stage of DBatLoader.
Second stage: cloud download
We are now looking at a Delphi binary of sha256 e232e1cd61ca125fbb698cb32222a097216c83f16fe96e8ea7a8b03b00fe3e40
(VT). Given its small size (91KB) and API usage (wininet usage) it definitely looks like a downloader. So let us dive in this new binary.
Retrieving the url
Who says downloader says download url, but no URL can be found in the second stage. If the url is not hard-coded in this binary, it has to be somewhere else. Remember the big big string that we've identified as suspicious in the previous binary (address 0x0046f718
)? It is mostly composed of uppercase letter, except for a short substring:
And the delimiter ^^Nc
can be found as referenced string in the second stage binary at address 0x413f58
, so could it be our url? At this point we should look for decrypting functions inside one of the two binaries. But let us be smart. See how the string prefix ammil3((
has repeated characters. Encryption is must likely a weak one-byte cipher. And we know that we are looking for an url, so the plain text string could definitely start with https://
. So let us try a few usual cipher:
- XOR: the key would be 0x09 and give us
hdd...
-> no - ROT13: ROT13 does not encode non-letter characters so not likely since the slash has been encrypted
- ADD: the key would be 0x7 and give us
https://cdn.discordapp.com/attachments/902132472924479511/902136733435592744/Wbjhzkbevojgqfhfalbqxnykvunmobi
... bingo!
Sometimes, being lazy pays off. Note that the url is not reachable anymore at the time of writing, so I have attached a copy of the file at this address. But the work is not over yet: the downloaded packet looks encrypted:
Decrypting the file
So before going further, we have to locate function responsible for decrypting the downloaded discord attachment inside the binary. While the binary is relatively small, Malcat helps us saving some time by locating two candidates functions featuring a XOR opcode inside a loop:
The function sub_413b14
seems to be the most promising of the two, so let us have a look. This function is quite simple, and takes as input a single number in ecx
and a Delphi string in edx
. The number is kind of the decryption key, and will be used to generate three variables:
[ebp-0C]
which is initialized with0x833e - number
[ebp-10]
which is initialized with0x5e9b - number
[ebp-14]
which is initialized with0x41d6 - number
This input number is hard-coded. If we look to the decryption function's caller code, we can see that this numbers stems from an atoi(0x41414c)
call at address 0x41408d
. The atoi
parameter at address 0x41414c
is the string "328"
, so the first mystery has been solved.
Now we just have to figure how the key stream is generated from these three variables. The assembly code of the function body is relatively simple. We converted it to a Python script that can be run inside Malcat, with the downloaded file open. Running the script will decrypt the packet:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
After decryption, we obtain yet another Delphi program, which would make it the third stage of the malware.
Third stage: resource dropper
We are now looking at a Delphi binary of sha256 f8fc925d89baa140c9cb436f158ec91209789e9f8e82a0b7252f05587ce8e06f
(VT). It looks more like a dropper this time, since most of its size (269KB) is taken by a single resource entry named YAK
.
The YAK
resource is a well-known artifact of the DBatLoader malware family. Note that Malcat does not identify it as a Delphi program because section names have been modified post-compilation and replaced with dots. Why, that's a very good question, since it only makes the binary more suspicious.
Making sense of the YAK resource
The program contains all his logic inside the main function, located at the program's entry point. It performs a lot of unnecessary and over-complicated operations in order to decrypt the resource. Here is a summary:
- call to function
0x416004
: loads content of resourceYAK
into memory - call to function
0x416408
: the resource bytes gets "decrypted" using the following algorithm: for every byte b, if 0x21 <= b <= 0x7e: b = ( (c + 0xe) % 0x5e) + 0x21. I know, it does not make a lot of sense. - the first 36 bytes of the decrypted resource is a delimiter (
*()%@5YT!@#G__T@#$%^&*()__#@$#57$#!@
). This delimiter is used to separate different fields in the decrypted YAK data:- the first field (
7826546
) use is unknown - the second field is a XOR key used to decrypt the payload data
- the third field is used to generate the filename and RunKey name used by the dropper to save and persist the dropped payload data
- the 4th field is the encrypted payload data
- ... other field of lesser importance follow
- the last field is another decryption key and has the value
328
(remember stage 1? Looks like the author really likes this number)
- the first field (
This is how the YAK resource looks after the first initial decryption done by function 0x416408
. We have highlighted the delimiter to better highlight the different fields:
Does it sound overly complicated? Wait until you have seen how the resource payload data is decrypted.
Decrypting the payload data
Now that we know the structure of the YAK resource, it is time to decrypt the payload data (aka the 4th field), which makes most of the YAK resource. The decryption process happens in four steps:
- function
0x415c40
decrypts Xor the data using the second field (ipnwxoenebxarqdhdiseentqdtfigqgzpuxlxi
) as key. But every byte is not only XORed with a byte of the key, but also with the size of the payload AND the size of the key. - the result is reversed.
- function
0x416368
decrypts the final result using the last field (328
) as key. Every byte is added with the value335 % 328
=7
. - the result is finally decrypted using function
0x416408
, the same algorithm that was used to perform the initial decryption of the YAK resource
At this stage, I have a lot of questions to the programmer who wrote this. The main one is: why oh god why? Why adding so much complexity to the payload extraction process. The added measures don't help evading detection:
- manual reversers don't care about the extra layers. Most of them youl just use a debugger and go through the decryption process in one pass.
- for reversers who likes to do everything statically (hi!), the added code is too simple to be considered as obfuscation.
- antivirus programs don't care about the resource, they would just put a signature on the decryption code. Or even better, create an heuristic on the binary (which would be very easy considering Delphi program with dots as section names are pretty rare :)
- "next-gen" machine-learning based antivirus also have a very easy time there
- sandbox directly go through the decryption process and would grab the payload at injection time
On the other hand, it makes the dropping code quite harder to maintain. I am a bit puzzled to be honest. Anyway, let us write the decryption algorithm in python. This python scripts must be run inside Malcat, with the third stage binary open:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
Running this script, we obtain another PE file. Do you think it is the final malware? Do you? Of course not :)
Fourth stage: Stone's packer
This time, believe it or not, we are not facing a Delphi program, but a packed 164KB binary featuring a weird .Stone
section and a huge encrypted .text
section. The sha256 of the binary is b0b4a3897ef76dfebc9ccdc9b83b49cb6d23c08a5b010bf8960c0bb82d48c4bc
. How do we know it is packed you may ask? Well, it could be because the entropy is high, or maybe because it is written in the binary:
Yes, sometimes it is this easy :) Also the word PowerLame
seems to imply we won't have a hard time cracking this one.
Unpacking Stone's packer
Instead of diving into the code, let us have a quick sweep through the file. The .text
section displays interesting properties, in particular the beginning of the section:
The end of the first section is also interesting. Sections are usually padded with zeroes (or PADDINGXXX
for the resource section), but here we got ones instead:
Knowing that the most frequent x86 function prologue is 55 8B EC
(aka push ebp; mov ebp, esp
), it looks like all bytes value are just one off. So let us try our hypothesis and just subtract 1 to the complete .text
section. This can be done easily using Malcat's transforms, as we can see below:
After reanalyzing the file, we can see that our hypothesis holds and the .text
section has been successfully decrypted. Several functions are now visible, even if most of theme are obfuscated and part of the binary seem to remain encrypted. But anyway, we are now facing the last stage of the malware, and what we see should be enough to identify the malware.
Identifying the malware family
Using the TLP:white Yara rule set from Malpedia, the decrypted binary is detected by Malpedia's Formbook rule:
Formbook is a well-known stealer-as-a-service used by a variety of threat actors for over five years. It is designed to steal personal information and allow remote control via commands issued from a C2 server. It can steal passwords from locally installed software (browsers, chat clients, email clients and FTP clients), or directly from the user using keylogger and form-grabber components. After submitting the sample toe Joe sandbox, we get access to the Formbook configuration data and the address of its C2 server:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
And ... that's the end of the infection chain and the end of this article.
Conclusion
While entry-level malware do not make headlines, it does not mean that they should be ignored altogether. Some of them are more than just mere droppers and feature multi-staged architectures. In this article, we have dissected a gran total of 4 intermediate malicious binaries that were used between the initial infection (an armed Excel spreadsheet) and the final malware (Formbook).
Each of them used different techniques, from exploits to cloud-based downloaders and event a bit of steganography. We developed python scripts to extract and decrypt the payload of each of them. These scripts can be applied to other instances of DBatLoader, like this other excel document, which downloads another DBatLoader first stage using yet another picture for its steganography.