What is Malware Analysis?

Karolina Koval

Add to my AI research

Lots of children break things not because they are little evil creatures but because they are curious about “how it’s made.” Eventually, some of those children grow up and become Cybersecurity Analysts. They do basically the same but in an adult world.

Malware analysis is the process of studying a malware sample to understand what it’s made of and how it works. Sometimes you never know unless you try, so you need to actually run the malware. And sometimes, it’s necessary to thoroughly examine the code line by line without triggering the execution.

Of course, learning what is malware analysis brings many benefits. Knowing the enemy means knowing the ways of winning over. It means that Malware Researchers help their SOC teams to come up with more targeted detection algorithms, as well as improve their incident response.

The exciting part is that modern malware is getting more and more sophisticated. Often Security Analysts have to study something that they don’t have access to. If ten years ago malware could be defined by one executable file, right now, one file is only a beginning of a journey. Let’s dive a bit deeper into malware analysis and see how to do it.

How Can You Start Malware Analysis?

It’s easy – you can start malware analysis after you obtain a malware sample. Organizations that employ a Defense in Depth approach have multiple tools and processes in place to regularly find new samples. Some analysts will tell you that they haven’t seen anything conceptually new for years, while others will admit that they have new malware samples approximately every month. It all greatly depends on the research depth.

Analyzing malware might consume a lot of time; that’s why many SOC teams prefer not to go that deep. However, the time spent on research definitely pays off because it’s never too late to improve the security posture.

What are the Types of Malware Analysis

Malware analysis can be conducted in different ways and with the use of various tools. There are three major analysis types:

Static
Dynamic
Hybrid

To figure out how to analyze malware that you have at hand, try to answer a few questions first:

How might this malware be triggering its execution?
In which ways will it try to evade detection?
Which parts of malicious code we can and cannot see?
Which tools will help us analyze all aspects of a given malware sample?

If you’ve got a sample of a malicious file, you can perform static analysis. However, if this file is designed only to launch further stages of a kill chain, the static analysis won’t show how the main payload executes. And sometimes, the only way to find this out is to run the malware in a safe environment. So, let’s see how to perform different types of malware analysis and how they might be useful for a Cybersecurity Analyst.

Dynamic Malware Analysis

Dynamic malware analysis can be performed either in an automated sandbox or on a VM where you can test the sample manually. Keep in mind that sophisticated malware will look for signs of being in an emulated environment, and automated malware analysis is not the best option in this case.

Just like any other software, malware can be programmed to do practically anything that a code can do. So you want to understand what the code (as a set of instructions) does and what’s the logic behind it.

For example, some security tools inject their DLLs into local processes. The malware will scan the environment, and if it finds these files – it will terminate itself. Other than this, malware scanners will look for:

Debuggers and other tools used for interactive analysis (API calls like IsDebuggerPresent, CheckRemoteDebuggerPresent, NtQueryInformationProcess).
Processes, windows, registry keys, files, mutex objects, etc. for malware analysis.

As the name suggests, dynamic malware analysis is all about observing the malware in action. You want to interact with it in as many ways as possible and create a full picture of its functionality. To do that, make sure that you use a fully isolated environment; it should be disconnected from the Internet, run on a host-only network, and the VM must not share any folders with the host.

Even though debugger evasion scanners are present in many malware samples, Security Researchers do use debuggers in dynamic malware analysis. These tools are good for identifying risky API calls and memory allocations. If the malware actively uses fileless techniques, it won’t be visible in the static analysis since it works through legitimate processes – no exploit needed.

For example, a code injected into PowerShell allocates memory in powershell.exe and executes an obfuscated code in a new thread (which spawns a trusted process via Process Hollowing):

powershell exploit graph

Inside the debugger, you will be able to see how the code runs through all the instructions that it’s programmed to execute. Don’t forget to check strings in memory and compare them with the code that’s responsible for these memory interactions. Look for injection APIs, such as VirtualAlloc, VirtualAllocEx, WriteProcessMemory, CreateRemoteThread, etc. Also, examine NTFS transactions for Process Doppelgänging injections.

Use debugger-hiding tools to make sure you can analyze detection-evading malware. We’ll talk more about tools a bit later. Now, let’s see how to perform static malware analysis.

Static Malware Analysis

Static malware analysis is the type of analysis that is possible to perform without running a code. This type of analysis might require advanced knowledge of low-level software languages, processor instructions, and the principles of memory management.

However, beginner-level analysts have their ways of statically analyzing files, too. For example, a single look at a file extension can tell you what kind of file is that, which allows you to suggest whether it’s suspicious to have this file in the place where you found it or not. Next, you can search for its hash or fuzzy hash in threat intelligence resources like VirusTotal. Be careful when making assumptions, though, because even if the hash is not on VirusTotal, the file can be malicious anyway. Don’t forget to analyze other static properties, such as header information.

Static analysis can be performed through automatic tools for initial triage, as well as tools for manual analysis for a more granular look. Security Analysts study code instructions, dependencies, and what they mean.

If you open an executable file in a Notepad, here’s what you’ll see:

binary file screenshot

It is a code compiled into a machine-readable format. Processors know what to do, but humans are unlikely to understand anything here. So you need to run this code through a disassembler or decompiler to put it back into a format that you can read and analyze. After doing that, the code will look something like this:

decompiled binary screenshot

Understanding assembly instructions is necessary for arriving at insightful conclusions. Check out this quick guide for refreshing your knowledge, or dive deeper with this reference manual from Oracle.

You can also use Microsoft’s Sysinternals suite to analyze strings. Its tools can also help to identify which executable is associated with certain Windows API calls and even determine IOCs. Scroll down to the last section of this blog post to discover more about tools for malware analysis. And now, let’s talk about the pros and cons of static and dynamic malware analysis.

Static vs Dynamic Malware Analysis

Obviously, employing more than one type of malware analysis would be the best option. Static analysis will show how a particular executable file operates. Meanwhile, a standard kill chain nowadays often includes more than one executable. Instead, it can be a collection of scripts and files that trigger one another. If that’s the case, static analysis of one of the files won’t be sufficient for understanding how the whole payload works.

Depending on the goals and objectives of the SOC team, some resources can be spent on studying malware samples. And it’s a very individual choice of whether to go for static or dynamic analysis. Most often, there is a need to maintain a fine balance between the two, which is often called a hybrid malware analysis.

Hybrid Analysis

Hybrid malware analysis is a combination of static and dynamic malware analysis.

When it comes to complex samples, it’s best to analyze malware in stages. For example, first, you do static analysis and identify which API calls might be evading detection. Then, in an emulated environment, you perform dynamic analysis to see the sample in action and check if it downloads other binaries. And if you obtain the latter, you can perform a static analysis of their internals again.

Malware Analysis Use Cases

Every SOC team is unique, and cybersecurity processes might be organized differently, depending on the business context, the scale of the organization, and risk factors. Below we gathered some of the use cases in which malware analysis is applicable.

Detect Threats

IOCs and behavioral patterns are common inputs for various kinds of threat detection activities in specific software programs (SIEM, SOAR, EDR/XDR). Obtaining important malware data during an analysis helps to detect the latest threats.

Get Alerts

Getting accurate alerts is a vital part of any cybersecurity pipeline. Malware analysis helps to reduce the number of false positives and false negatives, thus ensuring a higher level of cyber protection.

Hunt for Threats

New and sophisticated strains of malware are not that easy to catch. That’s when threat hunting comes into play. Having malware samples and understanding how they work is the best way to hunt down more threats that are lurking in the wild.

Respond to the Incidents

Obviously, there’s no good cybersecurity posture without the incident response. A proper response can be crafted and executed when Security Engineers possess detailed information on what malware looks like, which systems it affects, and what processes it tries to run.

What are the Stages of Malware Analysis?

An analytical job is not the easiest one. What’s easy is to get stuck in piles of data or misinterpret the code. For cyber defense to be successful, researchers divide their activities into a few malware analysis steps.

Assessment and Triage

The initial stage of a malware analysis can be performed with the help of automation tools. For static analysis, some preparation steps are required, such as decompiling the code. Anyway, in the first stage of malware analysis, it’s necessary to sort out parts of the code that require close attention. They might also be divided into levels of difficulty and priority. After the scope of the malicious code to review is defined and prioritized, it’s time to move on to the next stage.

Data Interpretation

Next, Security Analysts get to examine specific malware samples. As I mentioned before, it can be done by analyzing the static properties or by running malware in a safe, isolated environment. When the Malware Analyst obtains all the data that can be exposed during static and dynamic analysis, they try to interpret what they see. They can further test the samples by renaming variables, running the code, and making comments about execution patterns.

Reverse Engineering

This is the most challenging part, especially if there’s an encrypted sample, and it’s unclear what it does and why. What’s more, the code might have multiple dependencies which are also not that obvious. Trying to reverse-engineer malware is an advanced task. Yet, it’s key to a Defense In Depth approach.

Conclusions & Further Actions

When the analysis findings are formulated, it’s time to document them and take further action. The Analyst writes a malware report where they describe a malware sample, stages of analysis that were taken, and conclusions. They can also give some remediation recommendations.

What are the Tools for Malware Analysis?

There’s quite a wide selection of tools for malware analysis that Security Engineers use daily. Let’s start with static analysis tools:

PeStudio is widely used by CERT teams worldwide for capturing artifacts of malicious files.
PEiD easily recognizes packed and encrypted malware and gives details about what it’s made of.
BinText is a text extractor that can find Ascii, Unicode, and Resource strings in a file.
MD5deep will calculate the malicious hashes. Used as a malware analyzer, this software package will run lots of files through various cryptographic digests.
Dependency walker will scan 32-bit or 64-bit Windows modules and create a dependency tree.
IDA Pro will disassemble the binary code into a source code. It also offers a cross-platform debugging capability that can handle remote applications.

When it comes to analyzing malware in sandboxes, you might have already tried Cuckoo, yet I’d recommend checking out ANY.RUN and Joe Sandbox Cloud. They both have MITRE ATT&CK® mapping, use rules, and give a very detailed view of behaviors.

Of course, Wireshark needs no introduction as it’s still one of the most widely used tools for real-time network analysis. INetSim realistically simulates internet services in a lab environment (the samples are highly unlikely to recognize the simulation). Then, there’s the Microsoft Sysinternals suite that I already mentioned above. You’ll find there a whole range of tools for dynamic malware analysis. ScyllaHide is another interesting tool that will let you hide a debugger from malware that you want to run. And if you feel like doing a serious reverse-engineering project, check out Ghidra – a suite of tools developed by NSA’s Research Directorate.

Learning how to do malware analysis might seem boring at the start, but stay patient and dig deeper to find real treasures. Once you feel comfortable with low-level data, you’ll be able to see what not so many people would see. So, instead of thinking that malware engineers are the smartest guys in the world, you’ll be able to prove that even the most advanced malware doesn’t do magic – it’s just another piece of software that can be decomposed and neutralized.

Finally, for quick and efficient Threat Detection and Threat Hunting, subscribe to SOC Prime Detection as Code Platform – here, you’ll find thousands of Sigma-based rules for spotting the latest cyber-attacks. And if you are ready to share your own expertise and monetize on your knowledge, join our global crowdsourcing initiative, Threat Bounty, which helps to make the cyber world a safer place.

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.