0.9.3 is out: python, python, python (and firmwares)

Sun 08 October 2023 malcat team news

Today we are happy to announce the release of version 0.9.3. This release focuses mainly on the python bindings and comes with the new headless scripting mode! Several new classes and methods have also been added in the meantime, making the python bindings as powerful as the user interface. In addition, several other improvements have been made to the software:

  • Added parsing and unpacking support for firmware images: JFFS2, SquashFS and UImage
  • Better Rust support: CFG reconstruction update and a RUST-specific string extraction algorithm
  • Improved the Yara module: OpenSSL-based attributes (e.g. pe.number_of_signatures) are now available
  • Various user interface improvements
  • A few bugs have been fixed (thank you again for forwarding them !)
  • .. and the usual doc / anomalies / yara signatures updates

Also since the headless mode has been added for paid users, we've thought that it would be nice to move one of the paid features into the lite version. So, lite users can now enjoy fast multithreaded analysis too! Note that you main need to increase the Number of threads option in Edit > Preferences > Analysis setup.

Python, python, python

New headless mode

With version 0.9.3 we have bundled Malcat's binary analysis into a python module. The module is available to full & pro users. Using the module you can use Malcat's code and data analyses without having to run the GUI (aka headless scripting). That's the way to go if you want to process large amount of files. See the user manual about how to install and use it.

Malcat's headless python mode
Figure 1: Malcat's headless python mode

We think it's a pretty cool addition to Malcat and it adds many possibilities:

  • Export/import analysis details to/from other code analysis tools
  • Enable batch file analysis to solve complex problems: malware classification, detection, etc.
  • Write command line tools

While the python module is still in beta, we're very happy about this change and chances are it will bring many great functionalities in the future.

Renaming

Before the headless mode, python scripting in Malcat was limited to the script editor window. Thus it was not a big deal that Malcat's bindings module was (stupidly) named bindings since it would only be used from within Malcat. But with the addition of the headless python module, this had to change. In particular, two non-backward compatible renamings took place:

  • Malcat's python module is now named malcat, which makes a lot more sense
  • as a consequence, the current analysis object in the script editor has now been renamed from malcat to analysis

All of Malcat's parsers, templates, scripts and anomalies have been rewritten accordingly. If you wrote you own scripts, you will have to update them too (well, do two string replacement, that should be ok).

Additional bindings

A few additional python bindings have been added in version 0.9.3. They focus on user annotations and editing:

  • User comments now have bindings: you can add, edit and navigate through user comments
  • User highlighted regions (like comments but on data ranges and with colors) now have bindings: you can add, edit and navigate through them
  • You can now force and unforce functions definition in python
  • You can now force and unforce custom types definition in python
  • You can now view and dump virtual files from python
  • You can now override the detected file type from python
  • You can now override the detected CPU architecture from python
  • Helper functions have been added to pretty print or convert addresses
  • [Headless mode] You can analyse arbitrary files or bytes buffer via the malcat.analyse() function
  • You can invalidate and rerun analyses (useful if you have edited the files or added user annotations)
  • You can apply templates to an analysis object to generate a report
  • You have access to the analysis undo/redo manager from python
  • You can load and save projects to disk from python

While the python bindings are not 100% complete, they should be now almost as powerful as the user interface. If you want to see them in action, have a look at the new command-line tool in <malcat dir>/bin/malcat.report.py: a simple tool to generate a textual report for a file using the template() method.

File formats

Firmware support (JFSS2, SquashFS and UImage)

If there is one domain when one encounters unknown binary files in larges quantities, it is the subtle art of firmware reversing. In order to help our fellow hardware reversers, we have added added support for the three most common images formats:

  • UImage is more a container than a file system, but it is often used in firmware nonetheless. Malcat can identify the container and extract its content, provided a lzo/lzma/gzip/bzip2 compression is used.
Opening a UImage container
Figure 2: Opening a UImage container
  • JFFS2 is a log-structured file system designed for use on flash devices in embedded systems. Malcat supports all LSB file systems (if you have a MSB one, please send it to us), and can unpack files in-app, provided lzo/lzma/rtime/zlib compression is used.
Exploring a JFFS2 file system
Figure 3: Exploring a JFFS2 file system
  • SquashFS is a compressed read-only file system for Linux. Malcat can parse it, its different streams and can unpack files in-app, provided lzo/lzma/xz compression is used.
SqushFS file system
Figure 4: SqushFS file system

And that is only a start! We feel that Malcat could be useful in this area, so you are likely to see additional improvements in the future. For instance, ARM/MIPS disassembly support, in order to have a look at all these extracted programs.

Note that like with all other parsers in Malcat, when a format is supported, it means that you will be able to see its internal structures and also identify it when embedded inside larger files (aka file carving).

Improved ZIP unpacking

The ZIP parser has seen some minor improvements too:

  • The parser should now run about 30% faster, which is always nice to have.
  • We have fixed a small bug which prevented unpacking files compressed in stream mode (where the compressed size is given in a later DataDescriptor entry.
  • We have added (basic) support for AES-encrypted ZIP (WzAES compression method).

Note that since the python zipfile module lacks support for AES-encrypted ZIP archives, unpacking is done manually in python and we only support AES+deflate or AES+bzip2 combos. Anyway, it should be just enough to open password-protected ZIP files coming for malware bazaar.

Did you know: Malcat can automatically open in-app files from a password-protected ZIP archive if the password is either "infected", "virus" or "malware". No need to unpack them on disk! This is also true for 7Z archives, provided you have the py7zr python library installed.

Better PE debug info parsing

We have made a small improvement to the PE parser: now all debug information are parsed

Malcat parsing multiple debug informations
Figure 5: Malcat parsing multiple debug informations

Also a video from @struppigel made me aware of the REPRO debug info type. This type of debug info is now parsed correctly and additionally, all PE timestamp-based anomalies are now set to silent when REPRO debug entries are found.

Rust

Better CFG reconstruction

This is not a big news: Rust usage is increasing, and it means that we're starting to see it used in malicious software too. While Rust-compiled binaries do not differ a lot from say Golang binaries, there were a few minor differences which annoyed our CFG reconstruction algorithm.

Rust's use of UD2 after noreturn calls
Figure 6: Rust's use of UD2 after noreturn calls

For instance, noreturn calls are followed by an UD2instruction, which is an artifact of the unreachable instruction in LLVM. Well, Malcat can now recognize this call pattern as a noreturn call. A couple of other LLVM-specific artifacts are now better recognized by the CFG reconstruction algorithm.

String extraction heuristics

Similar to Golang, Rust compiled programs do not use standard null-terminated strings, but instead relies on (string pointer, string size) pairs. These pairs can be located in arrays, or more annoyingly hardcoded directly in the compiled code. To make things works, strings are often stored side by side with no delimiter between strings, which make the standard linear-sweep search algorithm useless when trying to reconstruct Rust strings.

Recovering Rust strings using heuristics
Figure 7: Recovering Rust strings using heuristics

In order to have better string identification in Malcat for Rust programs, we've added a set of new string extractions heuristics targeted at the most common string loading patterns found in Rust. These heuristics is still in their early days, but you should benefit from much better strings extraction from now on.

Yara

Crypto functionalities

We are now linking Yara against the OpenSSL library. While it adds more than 2 MBs to the library size, it enables crypto-related features in Yara rules, like for instance pe.number_of_signatures. If you had to deactivate a few of your rules because of such fields, give it a new try!

Improved Yara dialog

The new Yara rule dialog has been improved a bit: you can now override in which Yara file the new rule will be created. Before it was only possible to add rule to the currently opened Yara file.

Add pattern to a new Yara rule
Figure 8: Add pattern to a new Yara rule

We have also added a new context menu action for strings, bytes range and disassembly: add to new Yara rule. This lets you add the currently selected string/bytes/code to a new Yara rule, saving you a few clicks.

Full changelog

Here is the complete changelog of this release:

● The lite version can now use multithreaded analysis! 
● Python:
    - The current analysis object in scripts has been renamed "analysis" (was "malcat")
    - Renamed bindings module to "malcat" (was "bindings")
    - A new python headless mode was added to full & pro versions! You can now import the malcat module from any python interpreter & perform batch analyses!
    - Added "malcat.analyse()" method to the malcat module in headless mode
    - You can now view and edit user comments from python (analysis.comments)
    - You can now force/unforce function starts from python (analysis.fn.(un)force)
    - You can now set custom data types from python (analysis.struct.(un)force)
    - You can now view and edit user highlighted regions (analysis.highlights)
    - You can now view and dump virtual files from python (analysis.vfiles)
    - You can now override the detected file type from python (analysis.type = ...) 
    - You can now override the CPU architecture from python (analysis.architecture = ...) 
    - Added methods to drive the analysis (analysis.invalidate, analysis.run)
    - Made bindings for the analysis error log (analysis.log, analysis.status, analysis.last_error, analysis.failed)
    - You can now load and save Malcat projects (with all user modifications) from python (analysis.load/save)
    - Added helper functions for address translation and output (analysis.ppa, analysis.v2a, etc.)
    - You can now apply templates (.tpl files) to an analysis from python (analysis.template)
● File parsers:
    - Added support for UImage archive format (with in-app unpacking), often found in firmwares. Exotic compression algorithms are not supported.
    - Added support for JFFS2 file systems (with in-app file extraction), often found in firmwares. Exotic compression algorithms are not supported.
    - Added support for SquashFS file systems (with in-app file extraction), often found in firmwares. Exotic compression algorithms are not supported.
    - [PE] Improved debug info parser: now parses all debug info structures. Correctly interprets repro entries (thx @strupigel's video)
    - [PE] Added parser for bound imports
    - [FAT12/16/32] Ignore deleted entries in directories
    - [VHD] Proper handle of hollow dynamic drives
    - [ZIP] 30% performance optimisation
    - [ZIP] (Very) basic support for AES-password-protected archives. Should be just enough to open malware bazaar's files directly from within malcat.
    - [7Z] Automatic unpacking of password-protected archives if the passord is "infected", "malware" or "virus" (note that you need the py7zr library installed)
● Yara:
    - Added OpenSSL library to the Yara scanner: crypto-related fields such as pe.number_of_signatures should now work
    - You can now override the destination file when creating a new Yara rule
    - Added "Add to new Yara rule" context menu action to selection, strings and disassembly
    - Give focus and proper cursor position in Yara editor after "Add to (new) Yara rule" context menu action
● Rust:
    - Added support for Rust's final function call pattern (should help with CFG reconstruction)
    - Added Rust string analysis 
● Transforms:
    - Added JS beautify transform (requires jsbeautifier lib)
    - moved all "obfuscation" transforms to the text category
● User annotation:
    - You can now add custom annotations (custom text) using the selection context menu. Useful for screenshots and note taking.
    - Undo/redo support
    - Saved with project
    - Preview control
    - Hit a/A to jump to next/previous user annotation
● User interface:
    - In the structure view, also show the extended context menu (including xrefs) for selected fields
    - Reduced the size of the transform dialog to fit in smaller resolution screens
    - Optimized redraw speed of structures tree
    - Display a "bell" icon in case of warnings during analysis in the status bar
    - Clicking on the icon in the statusbar brings you to the output log window (script editor view)
    - Source code viewer now has wordwrap enabled
    - Using "Select All" (Ctrl+A) command in the script view now selects all text in either the script editor or script output window (depending on who has focus)
    - Using "Select All" (Ctrl+A) command in the decompiler view now selects the C code of the current function
    - Using "Select All" (Ctrl+A) command in the corpus view now selects all matching files
    - You can now select & copy multiple items in the corpus view list
    - Files in the Virtual File System tab are now sorted by name
    - The summary view has a new "Type" column that displays the current identified file type with an icon
    - Added "open" and "dump" actions to the string context menu. They convert strings to utf-8 beforehand
    - Library functions (e.g. FLIRT-identified fns) in symbol view are highlighted using the "DEBUG" highlighting color
    - Hexadecimal number display shortcut now changed to Ctrl+Shift+D: Ctrl+D should now properly duplicate the current line in all scintilla-based editor windows under Linux
    - Changing the number of threads for the analysis in the options does not require an app restart anymore
    - Optimized augmented scroll bar redraw performances when displaying large complex files
    - Use c/C in data view to jump to the next identified constant, use r/R to jump to the next string, use y/Y to the next Yara string match
● Bug fixing:
    - [.NET] Fixed an issue in .NET class parser where the last field of the FieldTable would not be parsed
    - Default syntax highlighting for text files would only consider lower-case file extensions
    - Better validation for python conversion of DosDate and DosDateTime fields
    - In some cases, long binary stack strings with no single ascii character were not detected
    - Added some extra vertical space to big file dialog (thx @Squiblydoo)
    - [IMPHASH] Malcat should now compute imphash exactly like pefile, using the same outdated ordinals list, for 100% backward-compatibility (thx @Marco)
    - [ZIP] Fixed zip extractor not being able to unpack files packed in stream mode
    - [PE] Fixed edge case where section gaps were incorrectly computed
    - [LINUX] Fixed int overflow error in the entropy analysis for FILE > 4Gb
    - Renaming a function is disassembly or decompiler view would not display the new name immediately in some cases