Today we are happy to announce the release of version 0.9.4. This version was released a bit faster than usual since we wanted to push support for Ubuntu 23. As a consequence, it also a bit light on the features side, and packs the following improvements to the software:

Added a parser + a disassembler for python 3.11 .PYC files and bytecode
Added a new magic selection mask feature for easy code signatures
Slightly improved stack string detection algorithm
Added a proper non-default context menu to the pure text view
Improved CP437, CP850, CP866 and CP1252 charsets to make them compatible with older DOS graphical extensions
Changing the font in Malcat is a bit easier thanks to a new font auto-check
.. and the usual doc / anomalies / yara signatures updates

Ubuntu 23.04 support

It was time, but we now provide binaries for the recent Ubuntu 23.04 distribution! The binaries should also be compatible with other python 3.11-based Linux distributions. This was not an easy task since it required us to update the excellent pybind library to its latest version (2.11.1), which itself comes with a few breaking changes, in particular in how the GIL is managed from within a C++ program. On the plus side, we should now be ready to integrate python 3.12 when the day comes.

Figure 1: Malcat running in Ubuntu 23.04

If you own a purchased version of Malcat and want to test on ubuntu 23, just visit the download link you got in the email: the ubuntu 23.04 archive has been added to the bundle. For free user, it is available in the download page.

Python 3.11 disassembler

Since Malcat can now run on python 3.11-based systems, it would be nice if it also could analyse python 3.11 code, right? Right, that's why we have added a proper disassembler for python 3.11 bytecode, and a parser for python 3.11 .PYC files and PYINST archives as well. This was not as easy as we initially thought since python 3.11 comes with a lot of changes:

The file format of .PYC files has changed
A lot of new bytecodes have been added
Some bytecode are now suffixed by a variable amount of CACHE opcodes (for JIT optimisation) that we have chosen to hide
The way jump targets are encoded has changed (again and again ...)

In order to make things right, we have developed internally a regression test suite specific to python bytecode, where Malcat's disassembly is compared to the output of the dis.dis() command for each supported python version. This allowed us to spot a couple of bytecode changes that we did overlook in python 3.10 for instance.

Last but not least, we have added an instruction decomposer to all supported python versions, that exposes the details of a python opcode. This is similar to what we already have for x86, x64 and .Net.

Magic code mask feature

When writing a new Yara rule for a malware family, detection engineers have to find a subtle balance between:

being specific enough, i.e. does my rule detects only what I want to detect? (decreases false positive ratio)
being generic enough, i.e. will my rule also match a different build/version of the malware? (decreases false negative ratio)

When using code patterns in a Yara rule, it is usually not so hard to be specific enough: just use a big-enough non-library function in your signature. Being generic enough using code patterns is a bit more tricky though, since the slightest change in code, or even just a recompilation of the same source code, can break your carefully chosen pattern. Things that the compiler likes to change are:

Offsets: even if the function that you made a signature on stays untouched by the malware author, the compiler can always move it, any function it calls, or any variable it references around in memory. This leads to different offsets in the opcodes stream and will thus break your pattern.
Registers: compilers are responsible for register allocation, which means that which register is used to hold which variable at a given program point can in theory change after each new compiling of the malware
Stack offsets: the layout of a function frame can also change over time (for instance a variable becomes bigger), and it has a big impact on the opcode, since potentially all stack offsets are going to change if you are unlucky.

A good signature should be resilient against such minor compiler-induced changes, and a tool to achieve this goal is to mask out in your pattern all the parts of the opcode that are subject to change. This can be done manually, but it quickly becomes tedious. So in order to help our fellow detection engineers, we have added the magic mask feature!

Figure 3: You can add a masked code pattern directly to a Yara rule

When adding code snippets to a Yara rule from Malcat, you will now have the option to automatically mask out parts of the opcode stream beforehand via the Add to Yara rule (disassembly, magic mask) context menu. If you want have more control over the result, you can alternatively:

magic-mask the selected opcodes via the Magic mask context menu
work on the exclusion pattern with the mouse via Ctrl+LeftClick
choose Add to Yara rule (disassembly) (note that you should NOT chose magic mask there, as it would reset your crafted exclusion pattern)

The parts of the opcode streams that get masked with the magic mask action can be configured from within the Preferences dialog. There, you can chose between 4 different settings that offer a different specific/generic balance:

Figure 4: Fine-tuning the magic mask action

And that's it, a simple feature which should save you a lot of clicks. Note that it works with all CPU architecture currently supported by Malcat, with the exception of VB p-code which is still work in progress. This can be useful when creating detection for .NET or python malware for instance, in order to mask out references to the constant pool, which are likely to change frequently.

Quality of life features

While Malcat focuses on the analysis of binary programs, it also features a source code view in order to display decompiled scripts and/or pure text files. We have made two small improvements to that view:

it now has a proper context menu, similar to the data view context menu (see below)
when applying transforms to the selected text, you now get the option of applying it in-place or in a new file

These two changes should help you deal with the occasional scripts found during an infection chain. You can now easily decrypt/rework them and use part of their strings to create Yara rules.

Font self-check

Malcat uses many unicode glyphs in its user interface. This is particularly true in the data views: depending on the chosen charset, non-ascii glyphs are used to represent the bytes outside the ascii range ([32-126]). This all works pretty well as long as the two conditions below are met:

The current font must have these glyphs in stock
The glyphs must all be of the same size (at least the ones used in the hexadecimal, structure and DNA views, since these are fixed-width views)

If you stick to Malcat's default font, it works out of the box. But if you manually change Malcat's font using the preference dialog, finding a suitable font can be "like playing russian roulette" as one of you did nicely put ;)

The problem is that most mono fonts are not 100% mono: characters in the ascii range usually all have the same size, but for other glyphs it depends heavily on the font. And to find out, there was now way other than testing the font in the different data views and see if visual bugs arise.

Figure 6: Changing Malcat's font now triggers a font check

In order to help you find a suitable font, changing the font from the preferences dialog now triggers a self-check: all glyphs used by Malcat user interface are rendered internally and their availability as well their relative size are compared. If all tests pass, you're good to go! It can also be that your font is only partially compatible, e.g. it can only be used with the ascii charset, in this case the self-test will also tell you so.

A font passing all the tests does not mean that you will never encounter any missing glyph. Malcat is unicode aware and will try to display all glyphs it finds (think functions symbols, etc.), even these weird ones found in obfuscated .NET malware. Your font may not have all of them. It just means that you won't encounter visual bugs in the different fixed-width views and graphical controls used in Malcat.

We still recommend to stick to the default font (Consolas for Windows and DejaVu Sans Mono for Linux), as they look nice, render relatively fast (yes that's a thing to consider too, you would be surprised) and cover a large portion of the unicode range.

Full changelog

Here is the complete changelog of this release:

● Malcat can now run on ubuntu23 / python3.11-based linux distributions
● Pure text view:
    - Added a "new file" option in the transform dialog when called within the text view
    - Added a proper non-default context menu to the pure text view (search selected string, add selected string to Yara, etc.)
● Python 3.11:
    - The PYC and PYINST file parsers now support python 3.11 files
    - Added a disassembler for python 3.11 opcodes
    - Added a decomposer for Python opcodes (cf. opcode quickview window)
    - Fixed a few minor errors in the python disassembler
● Analysis:
    - Various dynamic strings extraction improvements
● Quality of Life:
    - Changing the font in the option dialog now displays information regarding the font compatibility with the different charsets
    - For CP437, CP850, CP866 and CP1252 charsets in the data views, we've made the unspecified ascii chars < 0x20 100% compatible with the old DOS graphical extensions
    - Added a "Show disassembly" button shortcut to the call graph view
    - When selecting a structure field in the structure view, the corresponding editing control in the structure editor (quick view) is now given focus 
    - Added a "tokenize" transform that only keep bytes enclosed by the given token
    - Wiki dialogs, like the one for the EULA, is now smaller by default but can be resized as needed
● Magic mask:
    - The "add to Yara (disassembly)" context menu action now respects selection-excluded bytes
    - Added a "Magic mask" context menu action to the selection and opcodes: automatically excludes part of instructions from selection
    - Added a "Magic selection mask" option in the Preferences panel > Code view
    - Added a "Add to Yara (disassembly, magic mask)" context menu action
● Bug fixing:
    - Fixed a regression: hex & struct views were not refreshed when a value was modified using the structure editor (in the quick view)
    - Fixed a regression in custom types
    - Forbid putting more than 6 MB of data in the clipboard, since it seems to crash the clipboard on some systems
    - Fixed a regression: d/D shortcuts in diff mode would sometimes be stuck in disasm view if the difference is in the middle of an instruction
    - Fixed a bug which made some unicode string symbols be double-encoded in the disasm view
    - Fixed a focus issue when editing bytes structures from the data view under Linux

0.9.4 is out: Ubuntu 23 support, python 3.11 and magic masking