Nozioni di base sul Threat Hunting: Entrare nel Manuale

SCRITTO DA

Adam Swan

Responsabile dell'Ingegneria della Caccia alle Minacce

[post-views]

Settembre 25, 2019 · 8 min di lettura

Nozioni di base sul Threat Hunting: Entrare nel Manuale

Indice:

Lo scopo di questo blog è spiegare la necessità di metodi di analisi manuale (non basati su avvisi) nel threat hunting. Viene fornito un esempio di analisi manuale efficace tramite aggregazioni/conta stack.

L’automazione è necessaria

L’automazione è assolutamente cruciale e come threat hunters dobbiamo automatizzare il più possibile dove possibile.

Tuttavia, l’automazione si basa su assunzioni sui dati o su come l’automazione sarà efficace in un dato ambiente. Molte volte queste assunzioni sono state fatte per il threat hunter da altri analisti, ingegneri, proprietari di sistemi, ecc. Un esempio comune è la whitelist degli eventi di creazione del processo di System Center Configuration Monitor (SCCM) o altri prodotti di gestione degli endpoint nelle rilevazioni basate su avvisi. Un altro esempio è quando gli ingegneri SIEM filtrano i log inutilizzati per risparmiare risorse. I malintenzionati sono sempre più consapevoli di identificare tali assunzioni e rimanere nascosti all’interno di esse. Ad esempio, sono stati scritti strumenti per identificare le debolezze nella configurazione sysmon di un sistema [1].

Rimuovendo e ispezionando gli strati di assunzioni i threat hunters possono avere successo nell’identificare lacune nella visibilità e cacciare su queste lacune per scoprire una compromissione. Questo post del blog si concentra sulla rimozione di alcune di queste assunzioni utilizzando aggregazioni per esaminare manualmente in modo efficiente i dati interessanti.

Gli approcci manuali sono necessari

Forse la premessa dominante del threat hunting è “Assumere Compromesso”. Rispondere a un compromesso (quasi) sempre comporta analisi e interventi manuali soprattutto durante la delimitazione. Una delimitazione efficace non si limita solo a esaminare gli avvisi. Una delimitazione efficace comporta l’analisi manuale degli host noti compromessi per indicatori e comportamenti che possono essere cercati nel restante ambiente. Pertanto, come threat hunters se stiamo “Assumendo Compromesso”, l’analisi manuale è intrinsecamente richiesta.
Un altro modo di vedere la cosa è osservare che esaminando solo i dati basati su avvisi, stiamo assumendo che un attaccante riuscito innescherà almeno una regola/avviso nel nostro ambiente che è chiaro e abbastanza azionabile da permetterci di prendere una decisione che porta all’identificazione della compromissione.

Detto questo, i threat hunters non dovrebbero caricarsi dell’analisi manuale di ogni log per ogni fonte di dati nell’ambiente. Invece dobbiamo identificare un modo per far esaminare i nostri cervelli i dati rilevanti e prendere decisioni nel modo più efficace possibile.

Rimuovendo la logica che usiamo per avvisare gli eventi e aggregando sui campi e contesti che usiamo nei nostri avvisi è un esempio di analisi manuale efficace per la maggior parte degli ambienti.

Aggregazione come esempio (Conta Stack)

Uno dei metodi più semplici ed efficaci per approcci di caccia manuale è l’aggregazione sui campi interessanti/azionabili della raccolta passiva dei dati dato un contesto specifico.

Se hai mai usato le tabelle pivot di Microsoft Office, il comando stats di Splunk, o il comando “top” di Arcsight sei familiare con questo concetto.Nota: Questa tecnica è anche comunemente chiamata conta stack, stacking dei dati, stacking, o tabelle pivot :). Credo che i cacciatori alle prime armi saranno più familiari con il concetto di aggregazione, quindi uso quel termine qui. Fireeye sembra essere stata la prima a pubblicare questo concetto nel contesto del threat hunting [2].

Nota: I dati passivi sono una fonte di dati che ti informa su un evento, sia esso rilevante per la sicurezza o meno. Ad esempio, una fonte di dati passivi potrebbe dirti che un processo è stato creato, una connessione di rete è stata stabilita, un file è stato letto/scritto, ecc. I log degli host, come i log eventi di Windows, sono ottimi esempi di una fonte di dati passivi. Le fonti di dati passivi costituiscono una parte importante della spina dorsale per la maggior parte dei programmi di threat hunting.Ad esempio, l’Immagine 1 mostra parte di un’aggregazione di tutti gli eventi di connessione di rete sysmon con porta di destinazione 22 (SSH) in un ambiente su un periodo di 30 giorni. Un threat hunter potrebbe utilizzare questa aggregazione per ‘cacciare’ processi che normalmente non sarebbero associati a connessioni sulla porta 22.

Immagine 1: Aggregazione semplice in Kibana

Immagine Uno:
Campo di Aggregazione: Nome del ProcessoContesto: Processi che usano la porta 22 entro 30 giorniRisultati: 120Tempo da Analizzare: < 1 minIl contesto è il re nella caccia con aggregazioni e contiene l’intenzione della tua ipotesi di caccia. Il contesto di un’aggregazione è tipicamente impostato nella query sottostante e viene esposto all’analista tramite i campi su cui aggreghiamo e osserviamo. Nell’Immagine 1 il contesto di “Processi che usano la porta 22” è convertito nella logica di query (symon_eid == 3 E porta di destinazione == 22) e aggregando/mostrando il campo contenente i nomi dei processi.

È importante trovare un equilibrio tra quanto ristretto o ampio sia il contesto in un’aggregazione. Ad esempio, nell’Immagine 2 ho ampliato il contesto rispetto all’Immagine precedente per restituire tutti i processi con connessioni di rete. È possibile trovare il male in questo contesto, tuttavia, sarà più difficile prendere decisioni sui dati a meno che non ci sia un nome di processo evidentemente insolito o un processo che non avrebbe mai realmente attività di rete (cosa sempre più rara).Immagine 2:
Campo di Aggregazione: Nome del ProcessoContesto: Processi con connessioni di reteRisultati: 1000+Tempo da Analizzare: 1 min

Immagine 2: Un’aggregazione meno efficace senza abbastanza contestoInfine, le aggregazioni diventano meno efficaci quando campi che non saranno usati per prendere decisioni vengono aggregati. Nell’Immagine 3, ho aggiunto il campo “ID processo” alla nostra ultima aggregazione. Conoscere l’ID del processo può essere utile una volta che identifichiamo un processo insolito, tuttavia, crea un duplicato per ogni combinazione unica di nome del processo e id. Nell’esempio corrente i risultati sono più che quadruplicati e molti nomi di processi sono stati duplicati. È importante aggregare su campi che ti permettono di prendere decisioni. Le informazioni che possono essere necessarie per identificare un host o utente specifico per triage dovrebbero essere identificate utilizzando una query aggiuntiva con contesto ristretto. Nell’esempio dell’immagine 1 se volessimo identificare chi stava usando putty per SSH, possiamo usare la logica (process_name==”*putty.exe” E sysmon_eid==3). A mio parere, questo è un campo dove Kibana supera altri strumenti analitici che ho usato perché il passaggio tra query e dashboard è altamente efficiente tramite il loro sistema di filtro pinabile [4].

Immagine 3:
Campo di Aggregazione: Nome del Processo + ID ProcessoContesto: Processi con connessioni di reteRisultati: 1000+Tempo da Analizzare: 10 min

Immagine 3: Un’aggregazione meno efficace con campi non contestuali
Nota: In certi sistemi come Kibana di elasticsearch è facile passare da una tabella di dati all’altra usando i loro dashboard. Altrimenti, una volta identificata un’aggregazione interessante, un analista passerà tipicamente a esaminare l’host o gli account che sono stati osservati compiere il comportamento interessante.

Nota: Dovresti essere consapevole della trappola del rilevamento degli outlier. Non affidarti al concetto di “comune è buono e “non comune è cattivo” nelle aggregazioni/conta stack. Questo non è necessariamente vero, poiché i compromessi coinvolgono generalmente più macchine e gli avversari possono cercare di approfittare di questa assunzione per creare rumori e apparire normali. Inoltre, software e casi d’uso di nicchia esistono in quasi ogni ambiente. È facile lasciarsi coinvolgere nell’analisi di ogni “stack meno comune” e perdere tempo a identificare falsi positivi. Conoscere l’ambiente prima del compromesso e affinare il tuo istinto sul comportamento degli attori di minaccia [3] ti aiuterà qui.

Ma scala?

L’analisi manuale dei log non scala altrettanto bene degli avvisi poiché un analista tipicamente osserverà un singolo contesto alla volta. Ad esempio, esaminare una singola aggregazione con decine o anche centinaia di migliaia di risultati è comune. Il tempo massimo in cui vuoi trovarti a esaminare un’aggregazione è probabilmente 10 min. Se come threat hunter ti trovi sopraffatto, potresti provare a restringere il contesto. Ad esempio, puoi dividere un ambiente di 20.000 host in due ambienti di 10.000 host con logica di query che separa gli host per i loro nomi. In alternativa puoi identificare asset/account critici contenenti i “nuggets d’oro” o le “chiavi del regno” e eseguire l’analisi manuale su quelli.

È possibile creare contenuti, esaminare avvisi e fare triage degli host abbastanza efficientemente da avere tempo per tecniche di threat hunting più manuali.

Il contenuto SIEM disponibile nella TDM di SOC Prime [5] è ricco di contenuti che possono essere completamente automatizzati come avvisi così come contenuti per abilitare approcci più manuali al threat hunting.

Risorse e ringraziamenti ai lavori precedenti:
[1] https://github.com/mkorman90/sysmon-config-bypass-finder
[2] https://www.fireeye.com/blog/threat-research/2012/11/indepth-data-stacking.html
[3] https://socprime.com/blog/warming-up-using-attck-for-self-advancement/
[4] https://www.elastic.co/guide/en/kibana/current/field-filter.html
[5] https://tdm.socprime.com/login/

Questo articolo è stato utile?

Metti mi piace e condividilo con i tuoi colleghi.

Unisciti alla piattaforma Detection as Code di SOC Prime per migliorare la visibilità sulle minacce più rilevanti per il tuo business. Per aiutarti a iniziare e ottenere valore immediato, prenota ora un incontro con gli esperti di SOC Prime.

Iscriviti gratis Prenota un incontro

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.