One of our goals for the 1.0 release of Malcat is to have the software run on MacOS, for you my rich computer scientist friends. But how sad would it be if Malcat on MacOS would be unable to disassemble itself? Pretty sad if you want my opinion. Maybe not as sad as me having to use MacOS, but sad nonetheless.

So before starting working on a potential MacOS port, this 0.9.11 release focuses on MacOS program analysis. And for you my less rich friends (but with a rich personality right? ... right?), it also comes with numerous QoL improvements:

Support for Mach-O, Universal Mach-O and DMG file formats
A new GCC symbol name demangler
Three new disassemblers: Armv7, Armv8 and Aarch64
Improved disassembly listing for dynamically computed addresses (a must-have for arm and mips)
Three new sleigh-backed decompilers: Armv7, Armv8 and Aarch64
Improvements to the decompiler view and scripting interface
New context menu for multi-strings and multi-functions selection
Save selection as new PE/ELF
Updated offline Kesakode database (with many arm and mips binaries)
and the usual: new transforms, anomalies, signatures, etc.

So we are talking big changes here, most likely the last big changes before the release of 1.0. If you want more details, just keep reading!

Arm analysis

In the last release, we have added our first capstone-based disassemblers: MIPS32 and MIPS64. Adding a new disassembler in Malcat is a complex task that goes beyond just calling a disassembly library. Indeed, the disassembler interacts with many aspects of Malcat:

CFG reconstruction (Malcat needs to understand the semantic of at least some instructions, the one that compute pointers)
The disassembly view and the auto-comments
The markov-chain-based code/data identifier, which needs to be trained
Instruction decomposition (for the opcode preview, magic masking etc.)
Some string heuristics are also CPU-dependant
Kesakode needs to be adapted to the new CPU and its database populated

Thus the baby steps with the MIPS architecture. But it's time now to go play with the big kids and add support for the ARM architecture.

While ARM claims to be a RISC architecture, it is far from simplistic. Supporting ARM programs often means supporting many completely different instructions sets. Even worse, instruction sets can be switched dynamically (e.g. to/from thumb mode). Malcat will support three (or five depending how you count) of them:

The Armv7 instruction set (both "normal" and thumb mode). This is a 32bits mode which should be 99% backward-compatible with Armv4/5/6.
The Armv8 32 bits (aka Aarch32) instruction set (both "normal" and thumb mode)
The Armv8 64 bits (aka Aarch64) instruction set, the one used on Apple silicon

For Armv7 and Aarch32, a pre-analysis will be first performed to identify which regions of the code execute in thumb mode and which don't. Afterwards, the CFG reconstruction algorithm will make heavy use of our new value analysis, since as for all RISC platforms, pointers are computed dynamically. But beside these two points which may slightly affect performance, analysing an ARM program in Malcat should be transparent for you.

Figure 3: ARM programs can also be decompiled

You will also notice that implicit references (i.e. dynamically computed pointers) are better identified and displayed in the disassembly/proximity views. This feature is a must-have for RISC architecture, since ALL addresses are dynamically constructed. In a similar fashion, dynamically computed pointers will also appear in the list of cross references:

Figure 4: Computed addresses are correctly displayed and cross-referenced

And finally, ARM programs can also be decompiled in Malcat using the usual workflow (F4), courtesy of the Sleigh decompiler. You will also notice that we have added a symbol demangler for GCC symbols, that will complement the MSVC name demangler already present in Malcat.

New and improved parsers

In this release, 4 new file format parsers have been added to Malcat, and a couple of existing ones have been improved, see below.

MacOS program parsers

Now that we have support for ARM program, we need to add support for the MacOS ecosystem. MacOS programs use the famous MACH-O file format, which is a relatively simple (compared to ELF and PE) file format. The new MACH-O parser in Malcat should support the most used commands. A new template view has also been designed for this file type. It's currently in beta phase, you can expect it to change in the following releases:

Figure 5: MACHO file format summary view

Now most MacOS programs are compile for two architectures: ARM and x64, since older macs did embed Intel CPUs. In order to bundle different ports of the same program, MacOS may also distribute software using the Universal MACHO format, or UMACHO. This format, also supported by Malcat, is a simple library format that lists all the available ports for a given program.

Figure 6: UMACHO file format summary view

And finally, MacOS application are often bundled in yet another format: the DMG file format. DMG containers can be seen as virtual disk images (similar to VHD files in Windows). Malcat will support DMG files, but may or may not handle the underlying file system. If the virtual disk is formated using ISO for instance, you will be able to browse the file system in Malcat.

GGUF

Some time ago, I've read an interesting article talking about how malware can leverage the GGUF file format to poison existing models. For those who don't know, the GGUF file format stands for Generic GPT Unified Format. It's specifically designed for storing and running quantized large language models efficiently, such as the ones you will find on hugging face. A GGUF files will of course include the weights of the models, but also other sensitive data such as the default prompt template.

A threat actor can alter the default prompt template of a GGUF file and force a model to, for instance, include malicious links in the model answers. The malicious prompt can also be stealthy and only activate on specific condition, when html code is generated for instance.

In order to ease the analysis of these new type of threats, you can now analyse GGUF files from within Malcat. If you spot such malware, don't hesitate to notify us, this is really an interesting attack vector to our eyes!

Installers

Only two small improvements on this front. First, we have added support for BZip2-compressed NSIS archives. Why the delay you may ask? Well, because NSIS installers use a custom version of bzip2, which had to be reimplemented in C++.

The second small improvement is a bug fix in the InnoSetup installer: uncompressed files could not be unpacked properly.

Decompiler improvements

Synchronisation of disassembly/decompiler listings

In this release, we have invested time to improve the decompiler view, in particular the synchronisation between the decompiler view and the other views. While all views in Malcat are synchronised, i.e. the current position in the file is reflected in all views, the decompiler view (F4) was kind of an exception. The current position was only marked with a slightly visible comment // <-- you are here and scrolling in the view would not update the global current position. This has been improved in different ways:

The current position is marked more clearly via the symbol ►► present in the margin
Scrolling within the decompiler view will update the current position, which is reflected by this symbol
You can now switch quickly to the disassembly view by clicking on the ►► symbol in the margin of every C line
Switching from the disassembly view to the decompiler view should better position the cursor within the C code

You can see it below in action:

Figure 8: Better synchronisation between disassembly and C code views

Overall, the precision of C-code localisation should have been improved. Tell us what you think!

Scripting interface

Another improvement in the same department: the different decompilers in Malcat now have python bindings! Why so late you may ask? Well, before the usefulness of such bindings was kind of debatable: if you want to interact with Ghidra's decompiler for instance, you're better of scripting Ghidra directly.

But some of you want to use Malcat's headless mode to build a MCP server. This way, you could use large language models to automate file triaging. And decompilation to C-code, or VBA/AutoIT/MSI tables extraction can help a lot in this context. Thus, the following bindings have been added:

You can decompile the whole program/document/archive via Analysis.decompile:
- For Office macros/VBA, AutoIT scripts, MSI tables etc, you will get as text what is printed in the source view (F4)
- For real programs, you get the C code of all functions that could be successfuly decompiled, as a single huge textual listing
For supported CPU architecture (x86/mips/arm), you can also decompile at the function level via the function Function.decompile method, which also accepts mot formatting options

And .. that's it! Again, if you want to develop complex algorithms manipulating HLL representations of programs, you are better off with ida, binja or ghidra. These two bindings are just meant to use in MCP servers or automated reporting. If you develop a cool MCP server, please tell us, we'll be happy to see it in action!

QoL changes

Default Kesakode provider

Since version 0.9.9, you can define your own Kesakode provider in Malcat. This allows you to develop or reuse your own lookup services, while capitalizing on Malcat's user interface. If you want to test this feature, we have added Malpedia's FLOSSed string plugin for Malcat directly in the release.

Figure 9: third-party Kesakode providers

And we're happy to announce that some of you are already using this feature intensively. The only complaint until now is that switching the provider every time is tedious. And we've heard you: you can now setup the default Kesakode provider in the options menu:

Figure 10: changing the default Kesakode providers

Note that this will only affect the Online Kesakode lookups, performed on-demand when you click on the "Online lookup" button from the summary view, or hit Ctrl+K. The automatic offline Kesakode scan performed with every analysis will still be using Kesakode and cannot be changed. Why? Because automatic analyses need to be fast and offline Kesakode lookups are pretty fast. On the contrary, having to call third-party python scripts or worse, waiting for online lookups would slow down the analysis too much.

Function panel

The function panel has been improved in this release, with the addition of two new features:

You can now quickfilter function names by typing letter when the control has focus (same as with the Files panel)
There is now a context menu when multiple functions are selected

We find the new context menu particularly useful, especially the Fill submenu. It lets you either zero-out the selected functions OR all but the selected functions:

This is a very useful feature when you want to share part of a program with others. Or, in my case, when you want to add a sample to the Kesakode database, but you know that only a couple of functions are relevant.

Strings view

Similar to what we have done with the functions panel, the multi-strings context menu now also let you zero out the selected strings or all but the selected strings. Again, pretty useful when sharing samples:

Another small improvement: hitting Ctrl-T when a string is selected in this view will open the transform dialog for the selected string, ignoring which bytes are currently selected in Malcat.

Save selection as new PE/ELF

If we stay in the same use case, i.e. you would like to focus on a sub-portion of a program, zeroing out the rest of the program is sometimes not enough. Sometimes, you want to embed it inside a new, smaller, independent runnable program. This could be the case if:

you have just decrypted a shellcode, and you want to run it inside a sandbox
you have located an interesting and functionally independent part of a malware that you would like to debug
would like to share a subset of a program with someone, but be sure that it runs

We have added a new feature in Malcat for this exact purpose: save selection as new PE/ELF. This will create a new PE or ELF file with a single RWX section/segment and paste the currently selected bytes in it. Behind the scene, this functionality will also take care of other tedious tasks:

populate header fields such as size of image with correct values
the entry point field of the program will be set to point to the first identified function found in your selection, or to the beginning of your selection if no function could be found
the CPU architecture field of the new PE/ELF will be set the current CPU architecture

Figure 13: Save selection as new PE file

At the end, no magic is performed there, this is just a small utility feature to save you a couple of clicks, like copy/pasting your selection in an existing PE file. If you are curious, the underlying function can be found in data/scripts/PE/save_as_new_pe.py and data/scripts/ELF/save_as_new_elf.py

Changelog

There have been several smaller quality of life improvements and bug fixes made to Malcat in this release. If you want the complete list, have a look at the changelog:

● Disassembler:
    - Added support for ARMv7 CPU 
    - Added support for Aarch32 CPU (aka Armv8 32bits)
    - Added support for Aarch64 CPU (aka Armv8 64bits)
    - Added opcode encoding details for MIPS32/64
    - magic masking is now working for MIPS32/64, ARMv7 and AARCH32/64
    - added gcc symbols demangler
    - Various small improvements to the CFG recovery algorithm
    - autocomment: better display of implicit references
    - autocomment: now try to display dereferenced value when non-ambiguous and from a read-only section
    - Magic masking a .NET instruction referencing a CLR token now tags the table-part of the token as a displacement instead of an offset (for finer-grained magic masking)
    - Hitting ';' while an instruction is selected opens the comment dialog
    - Hitting Ctrl+C with hex bytes selected and some bytes deselected will now respect the selection mask, like in the hex and text views
● Decompiler:
    - Added support for ARMv7 CPU 
    - Added support for Aarch32 CPU (aka Armv8 32bits)
    - Added support for Aarch64 CPU 
    - added gcc symbols demangler
    - Decompiler view: you can now view and edit user comments
    - Decompiler view: you can jump to the corresponding disassembly by clicking on the arrow symbol in the margin
    - Decompiler view: improved disassembly/decompiler synchronisation
    - Added a "Decompile at address" context menu action, alongside "Disassemble at"
    - Fixed a couple of issues regarding symbol derefencing (API call dereferencing and read-only variable dereferencing)
● Kesakode/Intelligence:
    - Kesakode now supporting MIPS32/64, ARMv7, and AARCH32/64
    - Added MalpediaFLOSSed as third-party provider
    - You can now select in the options a default provider for onlines kesakode queries 
    - The view control now displays an appropriate context menu when multiple functions or strings are selected 
    - Functions are now filtered by their _fully_ qualified_ name in the kesakode view (before was just by function name)
    - Hitting Ctrl+A will now select all the element of the list having focus
    - Intelligence providers now accept bool and int options in addition to strings. These options can also be set in the options dialog using relevant controls. 
● Parsers:
    - Added parser for MACHO files
    - Added parser for Universal Binary archives (multi-arch MACHO files)
    - Added parser for DMG containers
    - Added parser for GGUF files (LLMs model files)
    - Added support for BZip2 compression in NSIS installers
    - Added support for ELF Pyinst files
    - Improved support for uncompressed files in InnoSetup installers
    - More symbol types supported in the ELF parser
    - Added parsing of .init_array and .fini_array to the ELF parser
    - PE parser now selects the right architecture for mips, arm and aarch64
● Strings:
    - Scan strings candidates are now rejected if referenced by a jump-like opcode
    - Hitting Ctrl+T in string view now pop-ups the transform dialog for the selected string
    - (Multi-)string context menu allows you to zero-out a string, or to zero out all strings but the one(s) selected
● Signatures:
    - Creating a new yara rule using the dialog will now try to save the rule in a yara file inside the user data directory, if any
    - Yara editor now has word wrapping enabled by default
    - improved the precision of MSVC rich header YARA rules, you get a version too now
● Source code view:
    - Ctrl-T has now precedence (i.e calls the transform dialog) over scintilla's own shortcut (transpose line)
    - Added a "Transform" context menu action to the selection
● Transforms:
    - added xpress decompress (lz77) transform
    - added base85 encode/decode transforms
    - the "to clipboard" button always converted the output to an hex string beforehand. It now depends on the output content: utf16 string > utf8 string > hex string (in addition to the raw bytes)
● Functions panel:
    - Added quickfilter control
    - You can now select multiple functions
    - Added context menu when multiple functions are selected
    - (Multi-)function context menu allows you to zero-out a function, or to zero out all functions but the one(s) selected
● Anomalies:
    - CrossSectionJump: limit to PE files
    - Added WrongSizeOfOptionalHeader
    - cross references also added for _computed_ addresses even in the absence of any read/write operation, e.g. mov eax, <base>; add eax, <delta>;
● Scripting:
    - Function.num_highvalue_immediates now only counts _unique_ high-value immediates
    - Added Function.decompile() (decompile one function to C code, for machine code only)
    - Added Analysis.decompile() (decompile whole file, either all functions for machine code, or source script for autoit, vba, excel, etc)
● Other:
    - API hash constants are now also computed over API names with an ending null byte 
    - cross references also added for _computed_ addresses even in the absence of any read/write operation, e.g. mov eax, <base>; add eax, <delta>;
    - You can now save the current selection/a function into a new barebone PE file using the context menu
    - You can now save the current selection/a function into a new barebone ELF file using the context menu
    - Added rich header hash for PE files (same one as VT)
● Bug fixes:
    - Fixed alt-left/alt-right shortcuts not working under Linux
    - [Type dialog] the types list would scroll through _all_ matching fields after a user search, which could take a while for large searches
    - [Type dialog] Applying a user dynamic type to a file without a recognized type would fail
    - [Type dialog] The python error was not beeing properly cleared, leading to repeated error for dynamic types that raise any exception
    - Fixed a bug in the parsing of delayed imports for PE32+, only the first one would be parsed
    - The gap analysis algorithm would ignore instruction crossing basic block boundaries
    - Comment on bit fields would not be shown on mouse hover
    - The view switcher in the tool bar will now be wider for larger font sizes 
    - Fixed a regression in the PYINST parser
    - Fixed decompiler view not being refreshed the second time a local variable is renamed
    - Fixed missing import statement when creating a new transform from the transform dialog in an existing but empty python file
    - Fixed a conflict between Analysis.open_vfile and some of the parsers disagreeing on a leading '/' for paths
    - Fixed a bug when loading a previously saved transform chain featuring a transform with an empty string / bytes parameter
    - Fixed hover code preview window beeing adjusted in width to the first disassembled instruction
    - Clicking on a matching pattern in the Yara quick view now jumps to the address AND registers the jump in the navigation history
    - Fixed incomplete parsing of exports by ordinals in PE files
    - Fixed a display lag in disassembly view for instructions having a VERY large (like > 10k) count of incoming/outgoing references (thank you flareon #5)