Today we are happy to announce the release of version 0.8.5! After some much-needed vacations we are back in business with a bunch of improvements to the software. This time, we have focused on user feedback and did address most of Malcat's user experience shortcomings. So what are the improvements of this new version?

Open arbitrarily large files using "Big File Mode"
Improved disassembly view user experience
Improved Yara integration
Added support for VHD images and FAT32 file systems

Let us see what it is all about.

Big File mode

Many of you wished to open very large (like 1Gb+) files within Malcat. To be clear, Malcat has been primarily designed to inspect malicious (thus usually not so large) files. And since it features many complex analyses that standard hexadecimal editors lack, it can't rely on a "read from disk on-demand" paradigm. I mean, it would be possible in theory, but that would not be very fast (since most of Malcat's algorithms use random access patterns) and would complexify things a lot. That's why by default the whole file is read in memory.

But with version 0.8.5, we've added the possibility to open files using memory mapping (aka mmap or CreateFileMapping). To be fair, the code was in Malcat from the beginning, but we did deactivate it since memory-mapping suffers from a major drawback: when open in rw mode, the software is not in control of when and where the modified file is written to disk. And that kind of conflicts with Malcat's save and undo/redo operations.

To circumvent this issue, we introduce "Big File Mode" in Malcat 0.8.5. In "Big File Mode", arbitrarily large files can be open using memory-mapping. The trick is that file is open in read-only mode and we use a MAP_PRIVATE or WRITE_COPY memory-mapping. It means that modifications to the file are not file-backed but written in newly allocated memory. The software is in control of saving modified bytes. But since we opened the file in read-only mode, modifications have to be saved to another file. That's a small price to pay (one does not modify 1Gb+ files every day) to support arbitrarily large files.

Figure 2: CPU and memory consumption on big files

In addition to memory-mapping, Malcat will automatically display a "Big File Mode" dialog when you open very large files (threshold can be configured in options, defaults to 256Mb). The dialog invites you to deactivate the most time-consuming analyses in order to speed-up the user experience. Because no, we are not wizards. Doing file carving or scanning for cross-references on a 5Gb+ file is going to be somewhat slow no matter what. If you deactivate everything, Malcat will change to a somewhat basic hexadecimal editor, but opening the file will be instant.

We hope that you will enjoy this feature, don't hesitate to send us feedback since it's still in beta.

Disassembly view

Many users coming from IDA did not really like the feel of Malcat's disassembly view. To be fair, the view did lack some features. The main complaints were: can't select opcodes and no copy/paste support. That is now a thing of the past.

Opcodes selection

In addition to selecting opcode bytes in the hexadecimal columns, you are now able to directly select opcodes from within the disassembly view. The selection is synchronized across all views as before. Adding comments to opcodes is now performed using the context menu, since it was not possible to keep it the way it was before.

Just try it out, I think it feels right now. The selection user experience is not exactly like in IDA, since only full lines can be selected. But eh, who said we have to be IDA?

Disassembly copy support

Another complain was the lack of copy support in the disassembly view. This has been fixed: hit Ctrl-C from within the disassembly view with some opcodes selected and the disassembly listing will be copied to the clipboard (with hexadecimal bytes, colors and formatting too). We did limit the number of opcodes that can be copied to a maximum of 10000, otherwise it would make the UI lag (disassembling is CPU-heavy after all). If you need to copy more code to the clipboard in your workflow, send us an email. Maybe we can make a mass-disassembly option where opcodes are copied to the clipboard in a background thread.

Figure 3: Opcode selection in and copying

Also note that a new copy option has been added to the selection context menu. This means that can now copy any selected bytes (from within any view) as Disassembly or Hexadecimal + disassembly to the clipboard. There is again the 10000 opcodes limit.

Local labels

For some time now, we have introduced local labels to the disassembly. Labels which are only referenced by jumps from within the current function are called local labels and are named .1, .2 ... .N instead of the usual loc_XXXXX. We find this make the disassembly more readable (in particular if you are familiar with nasm syntax) and also give extra information to the user, i.e is the label referenced from outside the current function.

But at the end it is a matter of taste, and some of you would like to have it like in IDA (every label is name loc_XXXXX). So we've added an option in the Options dialog: Code view->Local labels. Note that it is set by default, because we still thing that it's a nice feature.

Yara integration

Disassembly

When carefully chosen, a small machine code snippet can go a long way in improving the quality of a Yara rule. Malcat had support to add bytes or strings to the current Yara rule for some time now. Today, we have extended its capability by also supporting disassembly listing. Select the instructions you want in the disassembly view or any of the data view, and from the context menu chose: Selection->Add selection to yara->Add to Yara rule XXX (disassembly). The selected bytes will be added to the current Yara rule as an hexadecimal pattern. The pattern will be prefixed by a large comment showing the actual machine-code representation of these bytes, which should greatly improve the rule's maintainability.

Note that this feature is not limited to x86 and works for all architectures supported by Malcat.

New rule dialog

In order to make the life of malware analysts a bit easier, we have improved the "new yara rule" dialog. You are now proposed with different options when creating the rule, and your input will be used to populate the rule's metadata. Some metadata (like "created" or "sample") are populated automatically for you.

For some controls (the rule's tag or rule category) you are offered a dropdown list already populated with tags and categories Malcat did find in your Yara ruleset. Most of your choice will be remembered and recalled the next time this dialog is shown. This should make the process of creating a new rule a lot faster.

A quick tip: if you are not happy the way new rules look, you can edit the new rule template located in <malcat installation directory>/data/signatures/new_yara_rule.tpl.

Corpus search

The paid version of Malcat has a powerful feature named corpus search. In the Options dialog, you can define a set of corpus, which is a nothing more than a labeled directory. You may chose directories holding clean files (useful to check for false positives) or malware files (useful for attribution), or anything you want.

Once corpus have been defined, Malcat then offers the possibility in its context menu to search for a selected object inside all the files found inside your corpus directories (directories are scanned recursively). Until now, you were limited to search for strings or byte patterns, but in the 0.8.5 version you may also scan against a selected Yara rule. You may chose to display only perfect matches (i.e the Yara rule matches the file) or partial matches (Yara rule condition may fail, but at least one pattern of the rule is matching).

Figure 8: Whole file corpus scan against selected Yara rule

Note that the scan is performed in parallel using the number of threads you have specified in the Options dialog. We hope that detection engineers will find this feature useful when writing Yara rules.

Disk images support

VHD images

For a few months now, after Microsoft strengthened Office's security, we have seen threat actor switch to the ISO and UDF (aka .img and .iso) file formats for their malicious emails campaigns. So we've added support for those file systems to Malcat. And now, on a smaller scale (hi Bumblebee), we see the emergence of Virtual Hard Disk images (.vhd), the format used by Microsoft's HyperV. The idea is the same: some malicious executables / scripts / docs / shortcuts are packaged inside a virtual disk image and waiting for the user to mount the drive and click on them.

Figure 9: Analysing a malicious VHD image

So in this release, we have added a parser for .vhd files. We did not add support for static disks since they are just a raw dump of the disk content with a small 512 bytes footer. Malcat's carving algorithm is more than enough to handle such images. But dynamic disk images are more complex, and the new VHD parser let you identify and unpack the content of the disk in a few clicks (click on "used_space" file and you will get a raw view of the disk's content).

FAT12 / FAT16 / FAT32 support

Adding support for disk image is a good start, but it's not enough. Hard disks are formated in partitions, each partition using a file system to store its files. In malware campains, since the virtual disk images are usually rather smalls, the threat actors use FAT file systems to store their malicious documents. We have seen FAT12 file systems (for small static images mostly) and FAT32 filesystems (for large and/or dynamic images). So we have decided to add support to the FAT12, FAT16 and FAT32 file systems to Malcat too. From now on, Malcat is able to automatically identfy FAT file systems, and the user can naviguate and extract any file of the directory tree.

It was a relatively easy task since the FAT filesystem is really simple. Note that Malcat won't be able to handle huge directory trees (like 10k+ files), since currently the whole file system is loaded and displayed at analysis time. We may introduce lazy tree naviguation in the future to tackle the issue. In the meantime, this should be more than enough to perform malware analysis. Identifying FAT filesystem may also be useful for reverse engineers analysing firmwares, a use case we would like to focus on in the future.

Full changelog

Here is the complete changelog of this release:

● Yara:
    - You can now add code/selection/dynamic strings to the current yara rule as "disassembly"
    - Improved new rule dialog in Signatures view
    - Added "search pattern matches in current file" action in yara rule context menu
    - Added "search in corpus" action in context menu to scan whole corpus for selected rule
    - Added "search in corpus (partial matches allowed)" action in context menu to scan whole corpus for any string match
    - Add comments when adding named annotation to the current Yara rule
    - Automatically switch to Yara editor after adding a new pattern to current Yara rule (can be overriden in Options)
● Disassembly view:
    - Disassembly listing can now be selected using the mouse
    - Selected disassembly can now be copied to the clipboard, with or without raw bytes  (Ctrl+C or ContextMenu->Selection->Copy as->Disassembly)
    - Adding comments is now done using the context menu (no more clicking in the comment column)
    - Removed "Disasm - Long jump arrows" options from Options dialog 
    - Added "Disasm - Smart Labels" options to Options dialog
    - Only dynamic/stack strings are now shown in disassembly comments
● "Big File" mode:
    - Optional use of memory mapping for huge files
    - Config dialog when opening huge files 
    - Added "Big File Mode threshold" options in Options dialog
    - Optimised UDF magic regexp (improved overall performance on huge files)
● .NET:
    - Fixed parsing of very long resource names
    - Added parsing of function parameters names, visible in function name
    - Added parsing of function flags (static public, etc.)
    - Added proper support for nested classes
    - Better detection of obfuscated class/function/field names
    - Obfuscated names are now replaced by a special identifier by default (#obfuscated_id_xxx)
    - Added new option "deobfuscate symbols" in "Analysis Setup" Options panel.
● New parsers:
    - Added support for .VHD images (dynamic disks only)
    - Added support for FAT12 file system 
    - Added support for FAT16 file system 
    - Added support for FAT32 file system 
● Misc:
    - Added "reapply last transform" entry to context menu (if you want to decrypt stuff in batch)
    - Improved memory consumption for pattern searches in corpus directories
    - Increased maximal search size for dynamic strings
    - Added imphash hash in Summary view
    - Yara rules now have a context menu in Summary view and Signatures view
    - Added context menu for found constants
    - Added "edit bytes" context menu
    - Added "search in current file" context menu action for strings and selected bytes
    - Added "maximum columns "option in Options>Dataview for hexadecimal and structure view
    - Anomaly quickview now switches to structure view when the anomaly is located inside an identified structure.
● Bug fixing:
    - Fixed: error in golang function parsing for go 1.18+
    - Fixed: crash when adding/removing new corpus directories in Options dialog
    - Fixed: some false positives for cobalt strike file parser
    - Fixed: memory exhaustion for range preview on very large ranges
    - Fixed: closing a sub-file while a python user script is running would cause a crash
    - Fixed: utf16-be strings would not be properly displayed in structure view
    - Fixed: jumping to end of file would not work in data view for files bigger than 2 Gb
    - Fixed: invalid parsing of strings > 64K in MSI installer tables
    - Fixed: error in .NET parser in the HasCustomAttribute index
    - Fixed: regression on data view's colored scrollbar display: annotations were not in sync
    - Fixed: updated requirements.txt to exclude pyasn1>=0.5.0