Bedrohungsjagd Grundlagen: Manuelles Vorgehen

GESCHRIEBEN VON

Adam Swan

Leiter für Bedrohungsjagd-Engineering

[post-views]

September 25, 2019 · 7 min zu lesen

Bedrohungsjagd Grundlagen: Manuelles Vorgehen

Inhaltsverzeichnis:

Ziel dieses Blogs ist es, die Notwendigkeit von manuellen (nicht alarmbasierten) Analysemethoden im Threat Hunting zu erklären. Ein Beispiel für eine effektive manuelle Analyse mittels Aggregationen/Stack Counting wird bereitgestellt.

Automatisierung ist notwendig

Automatisierung ist absolut entscheidend und als Threat Hunter müssen wir dort, wo es möglich ist, so viel wie möglich automatisieren.

Allerdings basiert Automatisierung auf Annahmen über Daten oder wie die Automatisierung in einer bestimmten Umgebung effektiv sein wird. Viele dieser Annahmen werden von anderen Analysten, Ingenieuren, Systembesitzern usw. für den Threat Hunter getroffen. Ein häufiges Beispiel ist das Whitelisting von Prozesserstellungsereignissen von System Center Configuration Manager (SCCM) oder anderen Endpoint-Management-Produkten in alarmbasierten Erkennungen. Ein weiteres Beispiel sind SIEM-Ingenieure, die ungenutzte Logs herausfiltern, um Ressourcen zu sparen. Angreifer sind zunehmend darauf bedacht, solche Annahmen zu identifizieren und sich innerhalb dieser verborgen zu halten. Beispielsweise wurden Tools entwickelt, um Schwachstellen in der Sysmon-Konfiguration eines Systems zu identifizieren [1].

Indem sie die Schichten der Annahmen zurückziehen und inspizieren, können Threat Hunters möglicherweise Erfolg darin haben, Lücken in der Sichtbarkeit zu identifizieren und auf diesen Lücken zu jagen, um einen Kompromiss aufzudecken. Dieser Blogbeitrag konzentriert sich darauf, einige dieser Annahmen zu beseitigen, indem Aggregationen verwendet werden, um interessante Daten effizient manuell zu überprüfen.

Manuelle Ansätze sind notwendig

Vielleicht ist das dominierende Prinzip beim Threat Hunting „Kompromiss annehmen“. Die Reaktion auf einen Kompromiss beinhaltet (fast) immer manuelle menschliche Analyse und Intervention, insbesondere während des Scopings. Effektives Scoping beinhaltet nicht nur das Überprüfen von Alerts. Effektives Scoping beinhaltet die manuelle Analyse bekannter kompromittierter Hosts nach Indikatoren und Verhaltensweisen, die im Rest der Umgebung durchsucht werden können. Daher ist, wenn wir als Threat Hunter einen „Kompromiss annehmen“, eine manuelle Analyse von Natur aus erforderlich.
Ein anderer Blick darauf ist, dass wir, indem wir nur auf alarmbasierte Daten prüfen, annehmen, dass ein erfolgreicher Angreifer mindestens eine Regel/einen Alarm in unserer Umgebung auslösen wird, der klar und handlungsfähig genug ist, um eine Entscheidung zu treffen, die zur Identifizierung des Kompromisses führt.

Das bedeutet jedoch nicht, dass Threat Hunter sich mit der manuellen Analyse jedes Protokolls für jede Datenquelle in der Umgebung belasten sollten. Stattdessen müssen wir einen Weg finden, um relevante Daten zu überprüfen und Entscheidungen so effektiv wie möglich zu treffen.

Das Zurückziehen der Logik, die wir für das Alarming verwenden, und das Aggregieren der Felder und Kontexte, die wir in unserem Alarming verwenden, ist ein Beispiel für eine effektive manuelle Analyse für die meisten Umgebungen.

Aggregation als Beispiel (Stack Counting)

Einer der einfachsten und effektivsten Ansätze für manuelles Hunting ist das Aggregieren von interessanten/umsetzbaren Feldern bei der passiven Datenerfassung in einem bestimmten Kontext.

Wenn Sie jemals die Pivot-Tabellen von Microsoft Office, den Befehl „stats“ von Splunk oder den „top“-Befehl von Arcsight verwendet haben, kennen Sie dieses Konzept.Hinweis: Diese Technik wird auch häufig als Stack Counting, Daten-Stacking, Stacking oder Pivot-Tabellen bezeichnet :). Ich glaube, dass sich unerfahrene Hunter eher mit dem Konzept der Aggregation vertraut fühlen, daher verwende ich diesen Begriff hier. Fireeye scheint das erste Unternehmen zu sein, das dieses Konzept im Kontext des Threat Hunting veröffentlicht hat [2].

Hinweis: Passive Daten sind eine Datenquelle, die Ihnen über ein Ereignis berichtet, unabhängig davon, ob es für die Sicherheit relevant ist oder nicht. Beispielsweise könnte eine passive Datenquelle Ihnen mitteilen, dass ein Prozess erstellt wurde, eine Netzwerkverbindung hergestellt wurde, eine Datei gelesen/geschrieben wurde usw. Hostprotokolle, wie Windows Event Logs, sind großartige Beispiele für eine passive Datenquelle. Passive Datenquellen sind ein wichtiger Bestandteil des Rückgrats der meisten Threat Hunting Programme.Zum Beispiel zeigt Bild 1 einen Teil einer Aggregation aller Sysmon-Netzwerkverbindungsereignisse mit dem Zielport 22 (SSH) in einer Umgebung über 30 Tage. Ein Threat Hunter könnte diese Aggregation nutzen, um Prozesse zu ‚jagen‘, die normalerweise nicht mit Verbindungen über Port 22 in Verbindung gebracht werden.

Bild 1: Einfache Aggregation in Kibana

Bild Eins:
Aggregationsfeld: ProzessnameKontext: Prozesse, die Port 22 innerhalb von 30 Tagen verwendenErgebnisse: 120Zeit zur Analyse: < 1 minKontext ist im Jagdbereich mit Aggregationen König, da er die Absicht Ihrer Suchhypothese enthält. Der Kontext einer Aggregation wird typischerweise in der zugrunde liegenden Abfrage festgelegt und dem Analysten über die Felder, auf denen wir aggregieren und die wir beobachten, zugänglich gemacht. In Bild 1 wird der Kontext „Prozesse, die Port 22 verwenden“ in die Abfragelogik (symon_eid == 3 UND Zielport == 22) konvertiert und durch Aggregation/Darstellung des Feldes, das die Prozessnamen enthält.

Es ist wichtig, ein Gleichgewicht zwischen einem engen oder einem breiten Kontext innerhalb einer Aggregation zu finden. Zum Beispiel habe ich in Bild 2 den Kontext vom vorherigen Bild erweitert, um alle Prozesse mit Netzwerkverbindungen zurückzugeben. Es ist möglich, in diesem Kontext Böses zu finden, jedoch wird es schwieriger, Entscheidungen über die Daten zu treffen, es sei denn, es gibt einen offensichtlich ungewöhnlichen Prozessnamen oder einen Prozess, der eigentlich keine Netzwerkaktivität haben sollte (was zunehmend seltener wird).Bild 2:
Aggregationsfeld: ProzessnameKontext: Prozesse mit NetzwerkverbindungenErgebnisse: 1000+Zeit zur Analyse: 1 min

Bild 2: Eine weniger effektive Aggregation ohne ausreichenden KontextSchließlich werden Aggregationen weniger effektiv, wenn Felder aggregiert werden, die nicht für Entscheidungen verwendet werden sollen. In Bild 3 habe ich das Feld „Prozess-ID“ zur letzten Aggregation hinzugefügt. Die Kenntnis der Prozess-ID kann nützlich sein, sobald wir einen ungewöhnlichen Prozess identifizieren, jedoch entsteht dadurch ein doppelter Eintrag für jede einzigartige Kombination aus Prozessname und ID. In diesem Laufbeispiel vervierfachten sich die Ergebnisse mehr als und viele Prozessnamen wurden dupliziert. Es ist wichtig, auf Feldern zu aggregieren, die Ihnen ermöglichen, Entscheidungen zu treffen. Informationen, die möglicherweise erforderlich sind, um einen spezifischen Host oder Benutzer zur Triage zu identifizieren, sollten mithilfe einer zusätzlichen Abfrage mit engem Kontext identifiziert werden. Im Beispiel von Bild 1, wenn wir identifizieren möchten, wer Putty für SSH verwendet hat, können wir die Logik verwenden (process_name==”*putty.exe” UND sysmon_eid==3). Meiner Meinung nach ist dies ein Bereich, in dem Kibana andere von mir verwendete Analysetools übertrifft, da das Pivotieren zwischen Abfragen und Dashboards über ihr anheftbares Filtersystem hoch effizient ist [4].

Bild 3:
Aggregationsfeld: Prozessname + Prozess-IDKontext: Prozesse mit NetzwerkverbindungenErgebnisse: 1000+Zeit zur Analyse: 10 min

Bild 3: Eine weniger effektive Aggregation mit nicht-kontextuellen Feldern
Hinweis: In bestimmten Systemen wie Elasticsearchs Kibana ist es einfach, von einer Datentabelle zu einer anderen über deren Dashboards zu pivotieren. Andernfalls wechselt ein Analyst, nachdem er eine interessante Aggregation identifiziert hat, typischerweise zur Überprüfung der Hosts oder Konten, die das interessante Verhalten gezeigt haben.

Hinweis: Sie sollten sich der Falle der Ausreißererkennung bewusst sein. Verlassen Sie sich nicht auf das Konzept von „gewöhnlich ist gut“ und „ungewöhnlich ist schlecht“ bei Aggregationen/Stack Counting. Dies ist nicht unbedingt wahr, da Kompromisse in der Regel mehrere Maschinen betreffen und Gegner versuchen können, diese Annahme auszunutzen, um Lärm zu erzeugen und normal zu erscheinen. Darüber hinaus existieren in fast jeder Umgebung Nischen-Software und Anwendungsfälle. Es ist leicht, sich in der Triagierung jedes „am wenigsten häufigen“ Stacks zu verfangen und Zeit damit zu verschwenden, false positives zu identifizieren. Das Kennen der Umgebung vor dem Kompromiss und das Schärfen Ihrer Instinkte über das Verhalten von Bedrohungsakteuren [3] wird Ihnen hier helfen.

Aber ist es skalierbar?

Die manuelle Analyse von Logs skaliert bei weitem nicht so gut wie das Alarming, da ein Analyst in der Regel nur einen einzelnen Kontext zu einer Zeit beobachtet. Zum Beispiel ist es üblich, eine einzige Aggregation mit zehntausenden oder sogar hunderttausenden Ergebnissen zu überprüfen. Die längste Zeit, die Sie sich für die Überprüfung einer Aggregation nehmen sollten, sind wahrscheinlich 10 Minuten. Falls Sie sich als Threat Hunter überfordert fühlen, könnte es helfen, den Kontext zu verengen. Zum Beispiel können Sie eine Umgebung mit 20.000 Hosts in zwei Umgebungen mit je 10.000 Hosts aufteilen, indem Sie Abfragelogik verwenden, die Hosts nach ihren Namen trennt. Alternativ können Sie kritische Assets/Konten identifizieren, die die „goldenen Nuggets“ oder „Schlüssel zum Königreich“ enthalten, und diese manuell analysieren.

Es ist möglich, Inhalte zu erstellen, Alerts zu überprüfen und Hosts effizient genug zu triagieren, um Zeit für mehr manuelle Threat Hunting-Techniken zu haben.

Die SIEM-Inhalte, die in SOC Primes TDM [5] verfügbar sind, sind reich an Inhalten, die vollständig als Alarming automatisiert werden können und auch Inhalte enthalten, die manuelle Ansätze zum Threat Hunting ermöglichen.

Ressourcen und Erwähnungen zu vorherigen Arbeiten:
[1] https://github.com/mkorman90/sysmon-config-bypass-finder
[2] https://www.fireeye.com/blog/threat-research/2012/11/indepth-data-stacking.html
[3] https://socprime.com/blog/warming-up-using-attck-for-self-advancement/
[4] https://www.elastic.co/guide/en/kibana/current/field-filter.html
[5] https://tdm.socprime.com/login/

War dieser Artikel hilfreich?

Gefällt es Ihnen, teilen Sie es mit Ihren Kollegen.

Treten Sie der Detection as Code-Plattform von SOC Prime bei um die Sichtbarkeit in Bedrohungen zu verbessern, die für Ihr Unternehmen am relevantesten sind. Um Ihnen den Einstieg zu erleichtern und sofortigen Nutzen zu bieten, buchen Sie jetzt ein Treffen mit SOC Prime-Experten.

Kostenlos beitreten Ein Treffen buchen

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.