Exploit, steganography and Delphi: unpacking DBatLoader

Sample:
13063a496da7e490f35ebb4f24a138db4551d48a1d82c0c876906a03b8e83e05 (Bazaar, VT)
Infection chain:
Excel stylesheet -> Office equation -> Shellcode (downloader) -> DBatLoader stage 1 (stegano dropper) -> DBatLoader stage 2 (discord downloader) -> DBatLoader stage 3 (resource dropper) -> Stone packed -> Formbook
Tools used:
Malcat, Speakeasy emulator, Hex Rays
Difficulty:
Intermediate

Introduction

If you are doing cyber threat research on the internet, chances are you will find a ton of papers documenting malicious RATs, APTs and state-sponsored campaigns. It is indeed interesting (and it makes cyber security folks feel like James Bond), but sadly little attention is given to what makes most of the threat landscape: the packers, droppers and other downloaders at the front of the infection chain. They may be less sophisticated, but it is what the user first encounters, and what makes most of the threat landscape.

The truth is, if an antivirus successfully detects and blocks an advanced RAT on a system, it means that it already failed and that the system is compromised, because advanced RAT are at the end of the infection chain.

To illustrate our point, we will inspect a Formbook sample and we won't talk about Formbook at all. Instead we will dissect the infection chain which leads to the installation of Formbook. As you will see, it is actually more complex than one might think.

Exploiting CVE-2018-0798

Excel document

The malware we are analyzing today is an encrypted OpenXML Excel document that came as email attachment. OpenXML documents are usually just ZIP archives containing XML files and are easy to analyze, but not encrypted documents like this one. In fact, when a user chose to protect its Excel sheet, Microsoft Excel will encrypt it (using the magical password VelvetSweatshop) and store it inside an OLE container. And when the user opens the document, Office will transparently decrypt it without any user interaction. Malware authors are well aware of that fact and tend to abuse Excel encryption in order to evade antivirus detection. Fortunately, this is an old technique and tools exist to decrypt this kind of files. In fact, it is as simple as a few lines of python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import msoffcrypto
"""
NOTE: for this script to work, you will have to install msoffcrypto: pip3 install msoffcrypto-tool
"""

with open("13063a496da7e490f35ebb4f24a138db4551d48a1d82c0c876906a03b8e83e05.xlsx", "rb") as f_in:
    doc = msoffcrypto.OfficeFile(f_in)
    doc.load_key(password="VelvetSweatshop")
    with open("file0_stage0.xlsx.dec", "wb") as f_out:
        doc.decrypt(f_out)

This gives us an OpenXML ZIP archive. Browsing the content, we can see a few things worth of interest:

  • the document contains pictures baiting the user to deactivate safe mode (see screenshot below)
  • there is no vbaProject.bin file in the archive, meaning no VBA macro
  • there is no Excel macro sheet
  • there are two embedded objects:
    • a Word document at xl/embeddings/Microsoft_Office_Word_Macro-Enabled_Document1.docm
    • an OLE container at xl/embeddings/oleObject1.bin

Beside these elements, the document looks pretty clean. The Word document only contains a single picture, but the OLE container seems promising since its doctype GUID is 0002CE02-0000-0000-C000-000000000046 (Microsoft Equation 3.0 object). Equation objects have seen several vulnerabilities in the past years and are actively exploited in the wild. Let us dive in.

Excel sheet baiting the user to deactivate safe mode
Figure 1: Excel sheet baiting the user to deactivate safe mode

Buggy equation

If we open the oLE10NATive stream of the OLE container xl/embeddings/oleObject1.bin inside Malcat, we can see a very bare bone Equation 3.0 object which has been stripped to the minimal, leaving just enough to target the exploit. But which exploit? VirusTotal tends to detect it as CVE-2017-11882, but not all engines agree. Let us have a look at the data:

Embedded OLE object
Figure 2: Embedded OLE object

Using the documentation of the MTEF format found here, we can make sense of most of the stream:

Offset Size Meaning
00 4 The OLE1 header specifying the size of the data in the stream. Office seems to ignore this value and use the stream size from the OLE container instead
04 5 MTEF header. Only the MTEF version (3) and MTEF product(1 = Equation Editor) seem to have valid values. The rest is most likely ignored by Office and has been randomized.
09 2 First MTEF record: 0x0A = FULL SIZE record
0B 6-? Second MTEF record: 0x05 = MATRIX record

The MATRIX record seem to be the culprit there, and it would mean that we are facing CVE-2018-0798. CVE-2018-0798 is sometimes confused with CVE-2018-0802 since Microsoft originally allocated the same CVE for two different vulnerabilities. But it is quite different from CVE-2017-11882 which exploits the FONT record: funny how most antivirus got it wrong.

According to this document, the MATRIX record triggers the exploit by setting the field NumberOfRows too high. Only 8 bytes are reserved in eqnedt32.exe for the array RowPartitionLineTypes, but (2 * 0xec + 9) / 8 = 0x3c bytes are copied instead, leading to a stack overflow:

The equation object explained
Figure 3: The equation object explained

Knowing this, we can now start looking for a shellcode.

The shellcode

By quickly inspecting what follows the MATRIX record (so starting at offset 0x4D), we notice that offset 0x50 looks like the start of a shellcode. Indeed, the push/pop/jmp chain tends to indicate a meterpreter-generated shellcode.

meterpreter-generated shellcode are easy to spot
Figure 4: meterpreter-generated shellcode are easy to spot

Judging by the high entropy of the rest of the stream, the shellcode is most likely encrypted. We could of course reverse it, but it is faster to emulate the code. We will use the Speakeasy emulator from FireEye on the content of the oLE10NATive stream. You can use the following script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import speakeasy
import speakeasy.winenv.arch as e_arch

unpacker = speakeasy.Speakeasy()
with open("olenative10_stream.bin", "rb") as ole10native:
    data = ole10native.read()
address = unpacker.load_shellcode("", e_arch.ARCH_X86, data=data)
unpacker.run_shellcode(address, 0x50)   # shellcode starts at offset 0x50

with open("shellcode_decrypted.bin", "wb") as f:
    f.write(unpacker.mem_read(address, len(data)))

If you are using Malcat, you can alternatively force a function declaration at offset 0x50 (start of the shellcode) and then run the script speakeasy_shellcode.py. The shellcode gets decrypted and strings are now in plain text:

Decrypting the shellcode with speakeasy
Figure 5: Decrypting the shellcode with speakeasy

No need to analyze the shellcode in depth. Judging by the strings, it is a simple downloader that fetches and runs a file from the url hxxp://104.168.32.50/009/vbc.exe (still online at time of writing). So let us fetch the data and move on.

First stage: a bit of steganography

The file vbc.exe is a 937KB Delphi application of sha256 3045902d7104e67ca88ca54360d9ef5bfe5bec8b575580bc28205ca67eeba96d (Bazaar, VT). Because of its size, reversing the complete application is out of question. We could send it to a sandbox, but our goal is to analyze and understand the dropper. So let us try to locate the payload instead by looking at anomalies.

Locating the payload

Sweeping quickly through the binary, we find two points of interest:

  • A huge string (104427 bytes) at address 0x0046f718

  • A resource bitmap named BBTREX which does not look like the standard one (size is different, resource language too). Visually, the resource is a picture and definitely not an icon like the rest. It has most likely been patched post-compilation.

Weird bitmap resource BBTREX
Figure 6: Weird bitmap resource BBTREX

These two objects are referenced by the same function at offset 0x46D330, which is quite convenient. This function is located near the end of the CODE section, which is also of importance. Delphi application are structured in Units, and the linker tends to put library units at the start of the code section, and user units at the end. So everything at the end of the CODE section is likely to be user code and thus interesting. Let us have a look at the function using HexRays:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#! cpp
int DropAndRun() {   // 46D330  
  int *v0; // eax
  DWORD v3; // [esp-18h] [ebp-20h]
  int *v4; // [esp-14h] [ebp-1Ch]
  LPURL_COMPONENTS v5; // [esp-10h] [ebp-18h]
  unsigned int v6; // [esp-Ch] [ebp-14h]
  void *v7; // [esp-4h] [ebp-Ch]
  __int32 unpacked_bitmap; // [esp+0h] [ebp-8h] BYREF
  int v9; // [esp+4h] [ebp-4h] BYREF
  int savedregs; // [esp+8h] [ebp+0h] BYREF

  TimeGetTickCount(100);
  sub_406F00(dword_48ACB0, dword_48AD7C, 4);
  if ( InetIsOffline(0) )
    System::__linkproc__ LStrAsg(&payload, &str_A[1]);
  else
    System::__linkproc__ LStrAsg(&payload, &str_a[1]);
  System::__linkproc__ LStrCatN(&v9, 6);
  GetApiAddress(v9, &str_RasClearLinkSta[1], &p_RasClearLinkStatistics);
  if ( dword_48ACC0 <= 0x87A68E ) { // always true
    GetApiAddress(&str_amsi[1], &str_amsiamsiScanBuf[1], &p_amsiScanBuffer);
    Patch(p_amsiScanBuffer, WinHttpCrackUrl);
    bitmap = Graphics::TBitmap::TBitmap(&cls_Graphics_TBitmap);
    LoadResourceIntoBitmap(bitmap, Y, &str_BBTREX + 8);// load resource bitmap BBTREX
    SteganoUnpack(bitmap, &unpacked_bitmap);    // extract payload from bitmap
    System::__linkproc__ LStrAsg(&payload, unpacked_bitmap);
    RunPayload(payload);                        // run payload in memory
    v0 = j_unknown_libname_57_0(&dword_48AD44);
    System::Move(aGlojdxoscdjtlq, v0 + 2, 0);   // append very long string in memory
  }
  else {
    // call WinHttpCrackUrl and exits
  }
  __writefsdword(0, v6);
  v7 = &loc_46D4A5;
  return System::__linkproc__ LStrArrayClr(&unpacked_bitmap, 2);
}

The function RunPayload at address 0x46cdf0 makes use of VirtualAlloc and VirtualProtect, which suggests that at this point the dropper already decrypted its payload. And just before the call, we can see that the program loads the patched bitmap resource BBTREX into a TBitmap and calls the function that we named SteganoUnpack. So let us have a look at SteganoUnpack.

Decrypting the bitmap

The function SteganoUnpack at address 0x46C8F8 is a bit harder to understand. But using IDA's Delphi FLIRT signatures, we can get most of it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
int __fastcall SteganoUnpack(Graphics::TBitmap *bitmap, BYTE *output)
{
  char is_bit_set; // al

  payload_data = new_string(&off_415BF8, output, 0);
  line_content = Graphics::TBitmap::GetScanline(bitmap, 0);// read first bitmap line

  // in the first 3 bytes is an integer encoded that is the number of lsb bits that should be extracted from each byte to get the payload.
  // (only saw the value 3 in the wild)
  stegano_num_lsb_bits = 0;                     
  is_bit_set = IsBitSet(*line_content, 0);
  SetBit(&stegano_num_lsb_bits, 0, is_bit_set);
  is_bit_set = IsBitSet(*line_content, 1u);
  SetBit(&stegano_num_lsb_bits, 1, is_bit_set);
  is_bit_set = IsBitSet(*(line_content + 1), 0);
  SetBit(&stegano_num_lsb_bits, 2, is_bit_set);
  is_bit_set = IsBitSet(*(line_content + 2), 0);
  SetBit(&stegano_num_lsb_bits, 3, is_bit_set);

  bitmap_width = (*(*bitmap + 44))(bitmap);
  bitmap_height = (*(*bitmap + 32))(bitmap) - 1;
  bitmap_line_index = 0;
  bit_index = 0;
  bitmap_row_index = 2;                         // X position on current bitmap line (1-based)
  rgb_index = 1;                                // which color component are we reading: 1 = RED, 2 = GREEN, B = BLUE
  line_content += 3;                            // advance file pointer by 3 bytes
  payload_bit_index = 0;

  do {
    is_bit_set = IsBitSet(*(line_content + rgb_index - 1), bit_index);
    SetBit(&payload_size, payload_bit_index, is_bit_set);
    AdvanceToNextBit();     // will update bit_index, rgb_index, bitmap_row_index and line_content as needed
    ++payload_bit_index;
  } while ( payload_bit_index != 32 );                // first 32 payload bits = payload size

  if ( payload_size > 0 ) {     // start of the payload extraction process
    do {
      payload_bit_index = 0;
      do {
        is_bit_set = IsBitSet(*(line_content + rgb_index - 1), bit_index);
        SetBit(&payload_byte, payload_bit_index, is_bit_set);
        AdvanceToNextBit(); // will update bit_index, rgb_index, bitmap_row_index and line_content as needed
        ++payload_bit_index;
      }
      while ( payload_bit_index != 8 );             // extract 8 bits from bitmap
      (*(*payload_data + 16))(payload_data, &payload_byte, 1);// append byte to payload data
      System::__linkproc__ LStrAsg(output, *(payload_data + 1));
    }
    while ( --payload_size );                   // read <payload_size> bytes into <output>
  }
  System::TObject::Free(payload_data);
  return System::TObject::Free(bitmap);
}

In a nutshell, the function reads the bitmap line by line, and each line pixel by pixel. For every byte of the bitmap, some bits (the lowest significant bits) are extracted and concatenated in order to assemble the final payload. This is textbook steganography. The first line is a bit special since it contains additional info:

  • The first 3 bytes (so the first RGB pixel) encodes a 4 bits integer (2 bits of red component, 1 bit of green and 1 bit of blue). This integer that we named stegano_num_lsb_bits tells the software how many bits of each bitmap byte it should extract from the image (3 in our case)
  • Then the software jumps to the 4th byte and reads 32 bits from the bitmap into an integer. This integer is the number of bytes which should be extracted from the image (the payload size in other words)
  • Finally the software starts the payload extraction process

So let us try if we got it right. We will open the bitmap BBTREX (which is a DIB bitmap, meaning the BITMAPFILEINFOHEADER is missing) in an hexadecimal editor and try to manually decode the first bytes. We first have to locate the first bitmap row. Good to know: bitmaps are stored upside down, i.e the top-most line is actually the last one in the file. So knowing that our bitmap is 588 pixels wide and is a RGB bitmap (so 3 bytes per pixel), the first line should start at EndOfFile - 588*3 = 0x44ea8:

The first bitmap line
Figure 7: The first bitmap line

So first thing first, we will decrypt the first 4 bits integer (aka stegano_num_lsb_bits). The first line starts with the 3 bytes 03 02 02, which gives us the binary number 1100 (in LSB display) = 3. Ok.

Next, the algorithm moves to the second pixel and reads 32 bits. 32 bits / 3 bits per byte means it will read 10 bytes and 2 bits of the 11th byte. The next 11 bytes are: 00 00 00 03 06 02 00 00 00 00 04, which gives us the binary number 000 000 000 110 011 010 000 000 000 000 00(1) (in LSB display) = 91648 ok. The 11th byte contains an additional bit which we did not read which was a (1).

Next we could start reading 2 bytes of the payload, which is 16 bits. Since we still have a bit unread from 04, we just have to read 15 additional bits or 5 bytes. The next five bytes are: 06 04 04 06 02, which gives us the binary numbers (1) 011 001 0--01 011 010 or 0x4d--0x5a ... looks like the start of a PE, great!

So let us put everything together and write a small extraction script using python. The following script should be run inside Malcat's script editor with the bitmap open:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def next_byte(malcat):
    width = malcat.struct["BitmapInfoHeader"]["biWidth"]
    height = malcat.struct["BitmapInfoHeader"]["biHeight"]
    bpp = malcat.struct["BitmapInfoHeader"]["biBitCount"]
    data = malcat.struct["biImageData"]
    line_width = width * (bpp // 8)
    if line_width % 4:
        line_width += 4 - (line_width % 4)
    for y in range(height, 0, -1):
        ptr = line_width * (y - 1)
        for x in range(width * (bpp // 8)):
            yield data[ptr + x]

def next_bit(data):
    for c in data:
        for i in range(num_bits):
            yield (c >> i) & 1


byte_iterator = next_byte(malcat)
for i in range(3):
    next(byte_iterator)
bit_iterator = next_bit(byte_iterator)


res = bytearray()

# read size of payload
payload_size = 0
for i in range(32):
    payload_size |= next(bit_iterator) << i

cur_i = 0
cur_val = 0
for bit in bit_iterator:
    if bit:
        cur_val = cur_val | (1 << cur_i)
    cur_i += 1
    if cur_i >= 8:
        res.append(cur_val)
        if len(res) == payload_size:
            break
        cur_val = 0
        cur_i = 0


gui.open_after(bytes(res), "decrypted")

Running the script gives us the second stage of DBatLoader.

Second stage: cloud download

We are now looking at a Delphi binary of sha256 e232e1cd61ca125fbb698cb32222a097216c83f16fe96e8ea7a8b03b00fe3e40 (VT). Given its small size (91KB) and API usage (wininet usage) it definitely looks like a downloader. So let us dive in this new binary.

Retrieving the url

Who says downloader says download url, but no URL can be found in the second stage. If the url is not hard-coded in this binary, it has to be somewhere else. Remember the big big string that we've identified as suspicious in the previous binary (address 0x0046f718)? It is mostly composed of uppercase letter, except for a short substring:

Huge string composed almost exclusively of capital letters
Figure 8: Huge string composed almost exclusively of capital letters

And the delimiter ^^Nc can be found as referenced string in the second stage binary at address 0x413f58, so could it be our url? At this point we should look for decrypting functions inside one of the two binaries. But let us be smart. See how the string prefix ammil3(( has repeated characters. Encryption is must likely a weak one-byte cipher. And we know that we are looking for an url, so the plain text string could definitely start with https://. So let us try a few usual cipher:

  • XOR: the key would be 0x09 and give us hdd... -> no
  • ROT13: ROT13 does not encode non-letter characters so not likely since the slash has been encrypted
  • ADD: the key would be 0x7 and give us https://cdn.discordapp.com/attachments/902132472924479511/902136733435592744/Wbjhzkbevojgqfhfalbqxnykvunmobi ... bingo!

Sometimes, being lazy pays off. Note that the url is not reachable anymore at the time of writing, so I have attached a copy of the file at this address. But the work is not over yet: the downloaded packet looks encrypted:

Repeating sequence in downloaded buffer
Figure 9: Repeating sequence in downloaded buffer

Decrypting the file

So before going further, we have to locate function responsible for decrypting the downloaded discord attachment inside the binary. While the binary is relatively small, Malcat helps us saving some time by locating two candidates functions featuring a XOR opcode inside a loop:

any XOR in a loop is a good decryption routine candidate
Figure 10: any XOR in a loop is a good decryption routine candidate

The function sub_413b14 seems to be the most promising of the two, so let us have a look. This function is quite simple, and takes as input a single number in ecx and a Delphi string in edx. The number is kind of the decryption key, and will be used to generate three variables:

  • [ebp-0C] which is initialized with 0x833e - number
  • [ebp-10] which is initialized with 0x5e9b - number
  • [ebp-14] which is initialized with 0x41d6 - number

This input number is hard-coded. If we look to the decryption function's caller code, we can see that this numbers stems from an atoi(0x41414c) call at address 0x41408d. The atoi parameter at address 0x41414c is the string "328", so the first mystery has been solved.

decryption function sub_413b14
Figure 11: decryption function sub_413b14

Now we just have to figure how the key stream is generated from these three variables. The assembly code of the function body is relatively simple. We converted it to a Python script that can be run inside Malcat, with the downloaded file open. Running the script will decrypt the packet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
def decrypt_stage2(data, number):
    res = bytearray(len(data))
    ebp_c = 0x833e - number
    ebp_10 = 0x5e9b - number
    ebp_14 = 0x41d6 - number

    di = ebp_14

    for i, c in enumerate(data):
        e = c ^ (di >> 8)
        res[i] = e
        di = c
        di = (di + ebp_14) & 0xffff
        di = (di * ebp_c) & 0xffff
        di = (di + ebp_10) & 0xffff
    return res

decrypted = decrypt_stage2(malcat.file[:], 328)
decrypted = decrypted[::-1]                 # payload is stored reversed, don't ask me why
gui.open_after(bytes(decrypted), "decrypted")

After decryption, we obtain yet another Delphi program, which would make it the third stage of the malware.

Third stage: resource dropper

We are now looking at a Delphi binary of sha256 f8fc925d89baa140c9cb436f158ec91209789e9f8e82a0b7252f05587ce8e06f (VT). It looks more like a dropper this time, since most of its size (269KB) is taken by a single resource entry named YAK.

Third stage of the malware: a dropper
Figure 12: Third stage of the malware: a dropper

The YAK resource is a well-known artifact of the DBatLoader malware family. Note that Malcat does not identify it as a Delphi program because section names have been modified post-compilation and replaced with dots. Why, that's a very good question, since it only makes the binary more suspicious.

Making sense of the YAK resource

The program contains all his logic inside the main function, located at the program's entry point. It performs a lot of unnecessary and over-complicated operations in order to decrypt the resource. Here is a summary:

  • call to function 0x416004: loads content of resource YAK into memory
  • call to function 0x416408: the resource bytes gets "decrypted" using the following algorithm: for every byte b, if 0x21 <= b <= 0x7e: b = ( (c + 0xe) % 0x5e) + 0x21. I know, it does not make a lot of sense.
  • the first 36 bytes of the decrypted resource is a delimiter (*()%@5YT!@#G__T@#$%^&*()__#@$#57$#!@). This delimiter is used to separate different fields in the decrypted YAK data:
    • the first field (7826546) use is unknown
    • the second field is a XOR key used to decrypt the payload data
    • the third field is used to generate the filename and RunKey name used by the dropper to save and persist the dropped payload data
    • the 4th field is the encrypted payload data
    • ... other field of lesser importance follow
    • the last field is another decryption key and has the value 328 (remember stage 1? Looks like the author really likes this number)

This is how the YAK resource looks after the first initial decryption done by function 0x416408. We have highlighted the delimiter to better highlight the different fields:

The decrypted YAK resource first 4 fields
Figure 13: The decrypted YAK resource first 4 fields

Does it sound overly complicated? Wait until you have seen how the resource payload data is decrypted.

Decrypting the payload data

Now that we know the structure of the YAK resource, it is time to decrypt the payload data (aka the 4th field), which makes most of the YAK resource. The decryption process happens in four steps:

  • function 0x415c40 decrypts Xor the data using the second field (ipnwxoenebxarqdhdiseentqdtfigqgzpuxlxi) as key. But every byte is not only XORed with a byte of the key, but also with the size of the payload AND the size of the key.
  • the result is reversed.
  • function 0x416368 decrypts the final result using the last field (328) as key. Every byte is added with the value 335 % 328 = 7.
  • the result is finally decrypted using function 0x416408, the same algorithm that was used to perform the initial decryption of the YAK resource

At this stage, I have a lot of questions to the programmer who wrote this. The main one is: why oh god why? Why adding so much complexity to the payload extraction process. The added measures don't help evading detection:

  • manual reversers don't care about the extra layers. Most of them youl just use a debugger and go through the decryption process in one pass.
  • for reversers who likes to do everything statically (hi!), the added code is too simple to be considered as obfuscation.
  • antivirus programs don't care about the resource, they would just put a signature on the decryption code. Or even better, create an heuristic on the binary (which would be very easy considering Delphi program with dots as section names are pretty rare :)
  • "next-gen" machine-learning based antivirus also have a very easy time there
  • sandbox directly go through the decryption process and would grab the payload at injection time

On the other hand, it makes the dropping code quite harder to maintain. I am a bit puzzled to be honest. Anyway, let us write the decryption algorithm in python. This python scripts must be run inside Malcat, with the third stage binary open:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import itertools

def decrypt_yak(data):
    """
    implements the first decryption layer of function 0x416408
    """
    res = bytearray(data)
    for i, c in enumerate(data):
        if 0x21 <= c <= 0x7e:
            res[i] = ((((c + 0xe) % 0x5e) + 0x21) & 0xff)
    return res


def xor_payload(data, key):
    """
    custom XOR, function 0x415c40
    """
    res = bytearray()
    for data_byte, key_byte in zip(data, itertools.cycle(key)):
        c = data_byte ^ key_byte ^ len(data) ^ len(key)
        res.append( c & 0xff )
    return res

def add_payload(data, key):
    """
    custom ADD, function 0x416368
    """
    res = bytearray(len(data))
    val = 335 % key
    for i, c in enumerate(data):
        res[i] = (c + val) & 0xff
    return res

#######################################


yak_resource = malcat.struct["Resources.RCDATA.YAK.unk.Data"]

decrypted = decrypt_yak(yak_resource)

# split resource into fields
delimiter = decrypted[:36]
fields = decrypted[len(delimiter):].split(delimiter)

#get important fields
payload_data = fields[3]
xor_key = fields[1]
add_key = int(fields[-1])

print("Decrypting payload data ({} bytes) with XOR key '{}' and ADD key {:d}".format(
    len(payload_data), 
    xor_key.decode("ascii"), 
    add_key))

step1 = xor_payload(payload_data, xor_key)

step2 = step1[::-1]     # reverse

step3 = add_payload(step2, add_key)

decrypted = decrypt_yak(step3)

gui.open_after(bytes(decrypted), "yak_payload_plaintext")

Running this script, we obtain another PE file. Do you think it is the final malware? Do you? Of course not :)

Fourth stage: Stone's packer

This time, believe it or not, we are not facing a Delphi program, but a packed 164KB binary featuring a weird .Stone section and a huge encrypted .text section. The sha256 of the binary is b0b4a3897ef76dfebc9ccdc9b83b49cb6d23c08a5b010bf8960c0bb82d48c4bc. How do we know it is packed you may ask? Well, it could be because the entropy is high, or maybe because it is written in the binary:

That is one hell of a stealth crypter
Figure 14: That is one hell of a stealth crypter

Yes, sometimes it is this easy :) Also the word PowerLame seems to imply we won't have a hard time cracking this one.

Unpacking Stone's packer

Instead of diving into the code, let us have a quick sweep through the file. The .text section displays interesting properties, in particular the beginning of the section:

Start of encrypted .text section
Figure 15: Start of encrypted .text section

The end of the first section is also interesting. Sections are usually padded with zeroes (or PADDINGXXX for the resource section), but here we got ones instead:

End of encrypted .text section
Figure 16: End of encrypted .text section

Knowing that the most frequent x86 function prologue is 55 8B EC (aka push ebp; mov ebp, esp), it looks like all bytes value are just one off. So let us try our hypothesis and just subtract 1 to the complete .text section. This can be done easily using Malcat's transforms, as we can see below:

Decrypting the .text section
Figure 17: Decrypting the .text section

After reanalyzing the file, we can see that our hypothesis holds and the .text section has been successfully decrypted. Several functions are now visible, even if most of theme are obfuscated and part of the binary seem to remain encrypted. But anyway, we are now facing the last stage of the malware, and what we see should be enough to identify the malware.

Identifying the malware family

Using the TLP:white Yara rule set from Malpedia, the decrypted binary is detected by Malpedia's Formbook rule:

Formbook detection
Figure 18: Formbook detection

Formbook is a well-known stealer-as-a-service used by a variety of threat actors for over five years. It is designed to steal personal information and allow remote control via commands issued from a C2 server. It can steal passwords from locally installed software (browsers, chat clients, email clients and FTP clients), or directly from the user using keylogger and form-grabber components. After submitting the sample toe Joe sandbox, we get access to the Formbook configuration data and the address of its C2 server:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
  "C2 list": [
    "www.mgav26.xyz/n8rn/"
  ],
  "decoy": [
    "jlvip1066.com",
    "gconsultingfirm.com",
    "foundergomwef.xyz",
    "bredaslo.com",
    // ... (truncated)
    "counterpokemon.com",
    "beyerenterprisestreeservice.com",
    "phorganicfoods.com",
    "hermespros.com"
  ]
}

And ... that's the end of the infection chain and the end of this article.

Conclusion

While entry-level malware do not make headlines, it does not mean that they should be ignored altogether. Some of them are more than just mere droppers and feature multi-staged architectures. In this article, we have dissected a gran total of 4 intermediate malicious binaries that were used between the initial infection (an armed Excel spreadsheet) and the final malware (Formbook).

Each of them used different techniques, from exploits to cloud-based downloaders and event a bit of steganography. We developed python scripts to extract and decrypt the payload of each of them. These scripts can be applied to other instances of DBatLoader, like this other excel document, which downloads another DBatLoader first stage using yet another picture for its steganography.