Threat Hunting Hypothesis Examples: Prepare For a Good Hunt!

WRITTEN BY

Karolina Koval

[post-views]

August 15, 2022 · 5 min read

Threat Hunting Hypothesis Examples: Prepare For a Good Hunt!

Table of contents:

A good threat hunting hypothesis is key to identifying weak spots in an organization’s digital infrastructure. Just learn to ask the right questions, and you will get the answers that you’re looking for. In this blog post, we review a proactive threat hunting methodology: Hypothesis-Driven Threat Hunting. Let’s dive right in!

Detect & Hunt Explore Threat Context

What is a Threat Hunting Hypothesis?

A threat hunting hypothesis is an informed assumption about a cyber-attack or any of its components. Just like in scientific research, in hypothesis-driven threat hunting, Threat Hunters make hypotheses the foundation of their investigations.

Once a hypothesis is made, a Threat Hunter must take steps to test it. It is in the strategy for testing the hypothesis that most of the threat hunting work is completed (e.g., forming a useful query often takes longer than its execution). This often includes identifying related data sources (security events, system logs, etc.), relevant analysis techniques (querying, stack counting, etc.), and then taking action on this strategy.

Threat hunting hypothesis facilitates a proactive cyber defense routine. One of the many variants of the latter consists of:

Predicting adversary behavior
Suggesting ways to find a threat
Detecting anomalies, intrusions, baseline/threshold hits
Studying event correlation
Testing samples in sandboxes, honey pots, and emulated environments
Documenting results
Improving the protection of assets and infrastructure
Performing mitigation
Informing authorities (if applicable)

Overall, the success of threat hunting greatly depends on an insightful hypothesis, so let’s see how to make one.

How to Generate a Hypothesis for a Threat Hunt?

To make it easier at the beginning, you can think of threat hunting hypotheses as kind of user stories but from malware’s perspective.

Threat Hunting Hypothesis #1

As a [malicious script], I want to [send a request via TCP port 50050] so I can [establish connection].

As an [infected zip], I want to [use WMI] so I can [maintain persistence].

As a [Javascript code], I want to [exploit BITSAdmin] so I can [download modules].

Alternatively, let’s make a hypothesis from an attacker’s perspective.

Threat Hunting Hypothesis #2

As an APT37 (North Korea), I want to attack the US government for political reasons, so I’ll use Cobalt Strike in T1218.011, planting rundll32 for proxy execution of malicious code.

However, there is no single template of a “right” format for a threat hunting hypothesis. For example, they can also be more complex than just one sentence. For malware that executes a multi-stage kill chain, a threat hunting hypothesis might also include a few points.

Threat Hunting Hypothesis #3

Malware X:
Executes a command via EXE file
Imports and executes PowerShell cmdlets from an external source
Runs a local .NET binary
Uses setenv() function to add the variable to the environment

Now let’s move on to more complex examples.

Advanced Threat Hunting Hypotheses

Threat hunting hypotheses can be operational, like the examples above, or tactical and strategic. Seasoned Threat Hunters can formulate broader hypotheses that can nevertheless result in finely targeted tests. To do that, they need to include:

Domain expertise – having experience, sharing knowledge
Situational awareness – knowing internal infrastructure, vulnerabilities, core assets
Intelligence – pulling threat intelligence data like IOCs and TTPs

Apply all of the above to formulate a deeply analytical hypothesis about what systems attackers will target and what they will try to achieve.

For example, a Threat Hunter Bob has been researching some IOCs obtained through a threat intel feed. Having done a Crown Jewels Analysis (CJA), he knows that their company’s jewel in the crown is the place where they store proprietary algorithms. His experience with previous hunts and a talk with a fellow researcher Alice allow suggesting the most likely adversary behavior in a given situation. So he formulates a hypothesis.

Threat Hunting Hypothesis #4

Attackers that tried to gain initial access through a phishing email will do lateral movement and privilege escalation to get to the heart of the system and exfiltrate data.

Hypotheses can also be targeted not at predicting the future steps of the attackers but at understanding patterns, dependencies, and so on. In other words, seeing the whole picture.

Where do they have their C2 servers? How do they obfuscate them? How do they maintain persistence? What’s the relationship between specific servers and various attack campaigns?

In this case, cybersecurity is not just about seeing issues and quickly remediating them. It’s also necessary to ask questions. A bit like investigative journalism with its 5W rule:

Who
What
When
Where
Why

Because often the situation is like this. There are multiple events in multiple places. Millions of scripts, scheduled tasks, files, and user actions. They all do something. Those various events might be stages of a kill chain. But you don’t know that because a malware piece encrypted itself and hid somewhere in a legitimate file. It also stole certificates, so it executes as part of the reputable software that the company bought. You might’ve taken care of IOCs, but it was not enough. The company might be spied on, but there’s no hard evidence of that. So on the surface, nothing catastrophic happened. A hypothesis will help to identify such a sophisticated attack or prove an absence of it.

Threat Hunting Hypothesis #5

A state-sponsored threat actor A uses the same C2 servers as threat actor B, so they might be part of the same botnet. They use a software pipeline infection and plant malware with a 1-2 weeks dormant period before triggering reconnaissance which lasts 2-6 months. If our scan shows obfuscated data inside legitimate binaries, we should look for signs of establishing a connection with a C2 server so we can conclude that those files are malware.

Conclusion

Practice makes perfect, so don’t worry if things like threat reports and raw data look a little gibberish to you at the start. Learn computer science subjects like networks, low-level languages, and application architecture to feel more comfortable with specific terms and numeric values (like port numbers, etc.) Anyway, there’s always lots of information to deal with – don’t feel discouraged if you don’t understand everything you come across. It’s barely possible to know it all, so using Google sometimes helps a lot, too.

A good threat hunting hypothesis allows one to arrive at valuable conclusions and prevent possible attacks. Moreover, it helps Threat Hunters to examine the right data at the right time instead of having to search through millions of logs for millions of likely reasons. Join our Detection as Code platform to gain access to near real-time detection algorithms compatible with 25+ SIEM, EDR, and XDR solutions and instantly search for the latest threats in your environment.

Was this article helpful?

Like and share it with your peers.

Blog, Latest Threats — 3 min read

XE Group Activity Detection: From Credit Card Skimming to Exploiting CVE-2024-57968 and CVE-2025-25181 VeraCore Zero-Day Vulnerabilities

Veronika Telychko

Blog, Latest Threats — 3 min read

Lumma Stealer Detection: Sophisticated Campaign Using GitHub Infrastructure to Spread SectopRAT, Vidar, Cobeacon, and Other Types of Malware

Veronika Telychko

Blog, Latest Threats — 4 min read

TorNet Backdoor Detection: An Ongoing Phishing Email Campaign Uses PureCrypter Malware to Drop Other Payloads

Veronika Telychko

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.

Threat Hunting Hypothesis Examples: Prepare For a Good Hunt!

What is a Threat Hunting Hypothesis?