Introduction

Emulation is a technique that could be very handy and effective when we have to deal with malware triage, configuration extractor and deobfuscate part of the code without rewriting complex algorithms. Even if it seems magic (and it’s not unfortunately) it’s still not possible to apply emulation on random code. However, if applied correctly this method could really speed up our malware analysis and triage. Through this blogpost I would like to give an overview about emulation usage and apply it in a real case scenario.

Recently, I followed a twitter warning about Vidar malware in the wild and eager to revalidate an IDA-python script to deobfuscate strings, I immediately jumped into it. However, extracting that sample it was pretty obvious that I was dealing with another stealer, called MarsStealer. Since I did have a lot of information about that sample, I thought that could be a good choice to experiment with emulation. The result was quite promising and because of that I want to take this occasion to show a few basic emulations that could, hopefully, help someone else to speed up its analysis.

Payload extraction

Opening up in IDA the original sample, it was clear that a lot of strings were actually obfuscated and the code was partially packed, because of the references to jumps or calls towards registries and un-initialized DWORD.

Figure 1: MarsStealer packed code

Figure 1: MarsStealer packed code

In order to extract the actual payload that will contain a deobfuscation string routine and additional code, it’s necessary to go for dynamic analysis and speed up our extraction. As always one of the quickest methods to extract unpacked information is to look for VirtualAlloc and VirtualProtect.

Figure 2: Unpacked payload retrieved

Figure 2: Unpacked payload retrieved

Deobfuscation routine analysis

Since that our purpose is to find out code to emulate, we could look for a deobfuscation routine within the payload extracted. Our search won’t last long because, almost immediately after the VirtualProtect call the malware jumps directly to the allocated memory, starting the name resolving routine.

Figure 3: Deobfuscation wrapper

Figure 3: Deobfuscation wrapper

The highlighted function is a wrapper that contains the actual routine. The code is quite easy to spot because of the three push instructions. Opening the payload in IDA, it’s possible to go a little bit deeper and explore the deobfuscation routine, reconstructing its signature and from that point, understanding the function flow.

Figure 4: Deobfuscation routine

Figure 4: Deobfuscation routine

Regardless of the red box related memory errors, it’s very easy to understand its core functionality and control flow. However, even if it seems quite easy to reconstruct its logic, I’m going to take this code as a use case to try a completely different approach, using emulation to resolve all the strings.

Emulation requirements

Before proceeding with emulation, there are few things to settle. The first one is that for emulating this code, we need to emulate the user mode because we are dealing with instructions that are going to make additional calls to WindowsAPI. For that reason, we are going to use dumpulator, implementing if needed some API calls. The second thing to talk about are the requirements for dumpulator. To make it effective, it’s necessary to take a minidump of the process that we are analyzing and understand the parameters for starting and stopping emulation.

  • In order to take a minidump, its possible to use x32dbg/x64dbg that include it as a command (e.g., minidump mstealer.dump);
  • then to take the starting point, it’s possible to take references to deobfuscation calls and save those addresses for later.

Figure 5: References to deobfuscation function

Figure 5: References to deobfuscation function

Now that we have two out of three requirements, it’s necessary to focus on the emulation ending point. Of course, this one is the most important requirement and could impact your result in terms of efficiency (emulation is tremendously slow) and code writing (could be required to write more code that fits your needs).

For the purpose of this analysis/tutorial about emulation I’m going straight to the point using hints collected from IDA and doing some dynamic code analysis.

Emulation ending point extraction

Observing carefully the Figure 4, it’s easy to spot that the plaintext string is settled after the for loop and before the VirtualProtect call. Looking at the assembly with the information acquired, it’s easy to understand that emulation should stop at the instruction push ecx. In fact, ecx register is going to be a pointer for the plaintext string.

Figure 6: Plaintext resolution

Figure 6: Focus on plaintext resolution

With all this information, the emulation end variable could be easily retrieved within the debugger at the address 0x031f4De5.

Figure 7: Emulation stop address

Figure 7: Emulation stop address

String Resolving Automation

Since that we have collected all the requirements for the emulation, we are ready to setup our code as follow:

#Dumpulator libraries
from dumpulator import *
from dumpulator.native import *
from dumpulator.handles import *
from dumpulator.memory import *

def deobfuscate(address):
    """
    start:
            [x] push ciphertext
                push key
                push key_length
    
    end:
            [x] push ecx
            call VirtualProtect
    """
    end = 0x031F4DE5
    dp = Dumpulator("mars_stealer.dump",quiet=True)
    dp.start(address,  end=end )
    out = dp.read_str(dp.regs.ecx)
    return out
    

starting_points = [0x031F2E46,0x031F2E5F,0x031F2E78,0x031F2E91,0x031F2EAA,0x031F2EC3,0x031F2EDC,0x031F2EF5,0x031F2F0E,0x031F2F27,0x031F2F40,0x031F2F59,0x031F2F72,0x031F2F8B,0x031F2FA4,0x031F2FBD,0x031F2FD6,0x031F2FEF,0x031F3008,0x031F3021,0x031F303A,0x031F3053,0x031F306C,0x031F3085,0x031F309E,0x031F30B7,0x031F30D0,0x031F30E9,0x031F3102,0x031F311B,0x031F3134,0x031F314D,0x031F3166,0x031F317F,0x031F3198,0x031F31B1,0x031F31CA,0x031F31E3,0x031F3206,0x031F321F,0x031F3238,0x031F3251,0x031F326A,0x031F3283,0x031F329C,0x031F32B5,0x031F32CE,0x031F32E7,0x031F3300,0x031F3319,0x031F3332,0x031F334B,0x031F3364,0x031F337D,0x031F3396,0x031F33AF,0x031F33C8,0x031F33E1,0x031F33FA,0x031F3413,0x031F342C,0x031F3445,0x031F345E,0x031F3477,0x031F3490,0x031F34A9,0x031F34C2,0x031F34DB,0x031F34F4,0x031F350D,0x031F3526,0x031F353F,0x031F3558,0x031F3571,0x031F358A,0x031F35A3,0x031F35BC,0x031F35D5,0x031F35EE,0x031F3607,0x031F3620,0x031F3639,0x031F3652,0x031F366B,0x031F3684,0x031F369D,0x031F36B6,0x031F36CF,0x031F36E8,0x031F3701,0x031F371A,0x031F3733,0x031F374C,0x031F3765,0x031F377E,0x031F3797,0x031F37B0,0x031F37C9,0x031F37E2,0x031F37FB,0x031F3814,0x031F382D,0x031F3846,0x031F385F,0x031F3878,0x031F3891,0x031F38AA,0x031F38C3,0x031F38DC,0x031F38F5,0x031F390E,0x031F3927,0x031F3940,0x031F3959,0x031F3972,0x031F398B,0x031F39A4,0x031F39BD,0x031F39D6,0x031F39EF,0x031F3A08,0x031F3A21,0x031F3A3A,0x031F3A53,0x031F3A6C,0x031F3A85,0x031F3A9E,0x031F3AB7,0x031F3AD0,0x031F3AE9,0x031F3B02,0x031F3B1B,0x031F3B34,0x031F3B4D,0x031F3B66,0x031F3B7F,0x031F3B98,0x031F3BB1,0x031F3BCA,0x031F3BE3,0x031F3BFC,0x031F3C15,0x031F3C2E,0x031F3C47,0x031F3C60,0x031F3C79,0x031F3C92,0x031F3CAB,0x031F3CC4,0x031F3CDD,0x031F3CF6,0x031F3D0F,0x031F3D28,0x031F3D41,0x031F3D5A,0x031F3D73,0x031F3D8C,0x031F3DA5,0x031F3DBE,0x031F3DD7,0x031F3DF0,0x031F3E09,0x031F3E22,0x031F3E3B,0x031F3E54,0x031F3E6D,0x031F3E86,0x031F3E9F,0x031F3EB8,0x031F3ED1,0x031F3EEA,0x031F3F03,0x031F3F1C,0x031F3F35,0x031F3F4E,0x031F3F67,0x031F3F80,0x031F3F99,0x031F3FB2,0x031F3FCB,0x031F3FE4,0x031F3FFD,0x031F4016,0x031F402F,0x031F4048,0x031F4061,0x031F407A,0x031F4093,0x031F40AC,0x031F40C5,0x031F40DE,0x031F40F7,0x031F4110,0x031F4129,0x031F4142,0x031F415B,0x031F4174,0x031F418D,0x031F41A6,0x031F41BF,0x031F41D8,0x031F41F1,0x031F420A,0x031F4223,0x031F423C,0x031F4255,0x031F426E,0x031F4287,0x031F42A0,0x031F42B9,0x031F42D2,0x031F42EB,0x031F4304,0x031F431D,0x031F4336,0x031F434F,0x031F4368,0x031F4381,0x031F439A,0x031F43B3,0x031F43CC,0x031F43E5,0x031F43FE,0x031F4417,0x031F4430,0x031F4449,0x031F4462,0x031F447B,0x031F4494,0x031F44AD,0x031F44C6,0x031F44DF,0x031F44F8,0x031F4511,0x031F452A,0x031F4543,0x031F455C,0x031F4575,0x031F458E,0x031F45A7,0x031F45C0,0x031F45D9,0x031F45F2,0x031F460B,0x031F4624,0x031F463D,0x031F4656,0x031F466F,0x031F4688,0x031F46A1,0x031F46BA,0x031F46D3,0x031F46EC,0x031F4705,0x031F471E,0x031F4737,0x031F4750,0x031F4769,0x031F4782,0x031F479B,0x031F47B4,0x031F47CD,0x031F47E6,0x031F47FF,0x031F4818,0x031F4831,0x031F484A,0x031F4863,0x031F487C,0x031F4895,0x031F48AE,0x031F48C7,0x031F48E0,0x031F48F9,0x031F4912,0x031F492B,0x031F4944,0x031F495D,0x031F4976,0x031F498F,0x031F49A8,0x031F49C1,0x031F49DA,0x031F49F3,0x031F4A0C,0x031F4A25,0x031F4A3E,0x031F4A57,0x031F4A70,0x031F4A89,0x031F4AA2,0x031F4ABB,0x031F4AD4,0x031F4AED,0x031F4B06,0x031F4B1F,0x031F4B38,0x031F4B51,0x031F4B6A,0x031F4B83,0x031F4B9C,0x031F4BB5,0x031F4BCE,0x031F4BE7,0x031F4C00,0x031F4C19,0x031F4C32,0x031F4C4B,0x031F4C64,0x031F4C7D,0x031F4C96,0x031F4CAF,0x031F4CC8,0x031F4CE1,0x031F4CFA]
for s in starting_points:
    # push offset is 4byte. 
    # 3 push instuction before call
    # 0xC bytes backward to take all function parameters
    s = s - 0xC
    print(f'Address {hex(s)} : {deobfuscate(s)}')

Launching this script we are able to extract a lot of information on Mars Stealer, starting our triage without even reversing the whole malware. In fact, from the resolved string we have something related to common stealer targets such as: credit cards, browser, crypto wallet, etc..

Figure 8: Retrieved strings

Figure 8: Retrieved strings

Additionally we also have a chance to get a few insights about anti-analysis or reversing-aware functions such as: IsDebuggerPresent or CreateToolhelp32Snapshot. Additionally we have also some indications about anti-sandbox techniques with HAL9TH, that should be the Microsoft sandbox computer name. All deobfuscated strings could be found within the Reference section.

Conclusion and next chapter

Emulation represents the state of the art for analyzing malware functions or triaging sample without losing yourself in complex and heavily obfuscated routine. It was pretty fun to analyze Mars Stealer through this technique. I’m thinking of creating additional and probably more structured content (maybe a Whitepaper) about malware emulation.

The script above could be used as a reference for further analysis, it’s quite simple (and not perfect) but very effective and I used that as a “soft” introduction to this topic and also to give an idea of emulation capabilities.

Hope you enjoyed reading this post as much as I had reversing this malware and writing this article!

References

Sample analyzed:

Minidump:

String resolver:

Extracted strings:

Dumpulator: