This crate contains the logic for parsing some executable and document datatypes, and for determining if a Zip file is an MS Office document or an archive of files.
Executable Types:
* ELF (feature flag elf
, default)
* Mach-O and Fat Mach-O (feature flag macho
, default)
* PE32 (feature flag pe32
, default)
* PEF (feature flag pef
)
For each executable, the goal is to extract: * Section information: names, sizes, entropy * Import data * Target: architecture, operating system, endianness, pointer size (32 vs 64-bit) * Binary type (object file, executable, library, etc)
Some complications: * How to get the imports for ELFs? Go has this figured out but I haven't been able to replicate. Goblin issue #363. * Should I ditch the custom parsers for Goblin? It would allow me to get Authenticode data from PE32 files, but I worry it won't be tolerant to malformed files (as malware tends to be). * Fat Mach-O files have a set of sections and characteristics per embedded Mach-O, how should this be related?
Document Types:
* PDF via pdf (feature flag pdf
, default)
There should be a simple way to represent the needed data so the component which stores the data in the database doesn't have to be aware of file formats.