Fundamentos de la Búsqueda de Amenazas: Volviéndose Manual

ESCRITO POR

Adam Swan

Líder de Ingeniería de Caza de Amenazas

[post-views]

septiembre 25, 2019 · 7 min de lectura

Fundamentos de la Búsqueda de Amenazas: Volviéndose Manual

Tabla de contenidos:

El propósito de este blog es explicar la necesidad de métodos de análisis manual (no basados en alertas) en la caza de amenazas. Se proporciona un ejemplo de análisis manual efectivo a través de agregaciones/contado por stack.

La Automatización es Necesaria

La automatización es absolutamente crítica y como cazadores de amenazas debemos automatizar tanto como sea posible siempre que sea posible.

Sin embargo, la automatización se basa en suposiciones sobre los datos o cómo será efectiva la automatización en un entorno dado. Muchas veces estas suposiciones han sido hechas para el cazador de amenazas por otros analistas, ingenieros, propietarios de sistemas, etc. Por ejemplo, una suposición común es la lista blanca de eventos de creación de procesos del System Center Configuration Monitor (SCCM) u otros productos de gestión de endpoints en detecciones basadas en alertas. Otro ejemplo es que los ingenieros de SIEM filtren los registros no utilizados para ahorrar recursos. Los atacantes son cada vez más conscientes de identificar tales suposiciones y permanecen ocultos dentro de ellas. Por ejemplo, se han escrito herramientas para identificar debilidades en la configuración de sysmon de un sistema [1].

Al desglosar e inspeccionar las capas de suposiciones, los cazadores de amenazas pueden tener éxito en identificar brechas en la visibilidad y buscar en estas brechas para descubrir un compromiso. Esta entrada de blog se centra en eliminar algunas de estas suposiciones utilizando agregaciones para revisar de manera eficiente datos interesantes manualmente.

Son Necesarias las Enfoques Manuales

Quizás el principal premisa de la caza de amenazas sea «Asumir Compromiso». Responder a un compromiso (casi) siempre involucra análisis humano manual e intervención especialmente durante la delimitación. La delimitación efectiva no solo implica revisar alertas. La delimitación efectiva implica análisis manual de hosts comprometidos conocidos en busca de indicadores y comportamientos que se puedan buscar en el resto del entorno. Por lo tanto, como cazadores de amenazas, si estamos «Asumiendo Compromiso», el análisis manual es inherentemente requerido.
Otra forma de ver esto es observar que al solo revisar datos basados en alertas, estamos asumiendo que un atacante exitoso activará al menos una regla/alerta dentro de nuestro entorno que sea clara y lo suficientemente procesable como para que tomemos una decisión que resulte en identificar el compromiso.

Dicho esto, los cazadores de amenazas no deben agobiarse con el análisis manual de cada registro de cada fuente de datos en el entorno. En su lugar, debemos identificar una manera de que nuestras mentes revisen datos relevantes y tomen decisiones de la manera más efectiva posible.

Desglosar la lógica que estamos usando para alertar en eventos y agregar en los campos y contextos que usamos en nuestras alertas es un ejemplo de análisis manual efectivo para la mayoría de los entornos.

Agregación como Ejemplo (Conteo por Stack)

Uno de los métodos más simples y efectivos para enfoques de caza manual es agregar en campos interesantes/accionables de la recolección pasiva de datos dado un contexto específico.

Si alguna vez has utilizado las tablas dinámicas de Microsoft Office, el comando stats de Splunk o el comando «top» de Arcsight, estás familiarizado con este concepto.Nota: Esta técnica también se conoce comúnmente como conteo por stack, apilamiento de datos, apilamiento o tablas dinámicas :). Creo que los cazadores novatos estarán más familiarizados con el concepto de agregación, así que uso ese término aquí. Fireeye parece ser el primero en publicar este concepto en el contexto de la caza de amenazas [2].

Nota: Los datos pasivos son una fuente de datos que te informa sobre un evento, sea o no relevante para la seguridad. Por ejemplo, una fuente de datos pasiva podría informarte que se creó un proceso, se estableció una conexión de red, se leyó/escribió un archivo, etc. Los registros de hosts, como los registros de eventos de Windows, son grandes ejemplos de una fuente de datos pasiva. Las fuentes de datos pasivas son una parte importante de la columna vertebral de la mayoría de los programas de caza de amenazas.Como ejemplo, la Imagen 1 muestra parte de una agregación de todos los eventos de conexión de red de sysmon con el puerto de destino 22 (SSH) en un entorno durante 30 días. Un cazador de amenazas podría utilizar esta agregación para «cazar» procesos que normalmente no estarían asociados con conexiones al puerto 22.

Imagen 1: Agregación Sencilla en Kibana

Imagen Uno:
Campo de Agregación: Nombre del ProcesoContexto: Procesos que utilizan el puerto 22 en 30 díasResultados: 120Tiempo para Analizar: < 1 minEl contexto es clave en la caza con agregaciones y contiene la intención de tu hipótesis de caza. El contexto de una agregación se establece típicamente en la consulta subyacente y se expone al analista a través de los campos en los que agregamos y observamos. En la Imagen 1, el contexto de "Procesos que usan el puerto 22" se convierte en la lógica de consulta (symon_eid == 3 Y puerto de destino == 22) y al agregar/mostrar el campo que contiene nombres de procesos.

Es importante encontrar un equilibrio entre qué tan estrecho o amplio es el contexto dentro de una agregación. Como ejemplo, en la Imagen 2 amplié el contexto de la Imagen previa para devolver todos los procesos con conexiones de red. Es posible encontrar maldad en este contexto, sin embargo, será más difícil tomar decisiones sobre los datos a menos que haya un nombre de proceso inusualmente obvio o un proceso que nunca tendría actividad de red (lo cual es cada vez más raro).Imagen 2:
Campo de Agregación: Nombre del ProcesoContexto: Procesos con conexiones de redResultados: 1000+Tiempo para Analizar: 1 min

Imagen 2: Una agregación menos efectiva sin suficiente contextoFinalmente, las agregaciones se vuelven menos efectivas cuando se agregan campos que no serán utilizados para tomar decisiones. En la Imagen 3, añadí el campo “id del proceso” a nuestra última agregación. Conocer el ID del proceso puede ser útil una vez que identifiquemos un proceso inusual, sin embargo, crea una entrada duplicada para cada combinación única de nombre e id de proceso. En el ejemplo en curso, los resultados se cuadruplicaron y muchos nombres de procesos fueron duplicados. Es importante agregar en campos que te permitan tomar decisiones. La información que pueda ser requerida para identificar un host o usuario específico para el triage debe identificarse utilizando una consulta adicional con un contexto restringido. En el ejemplo de la Imagen 1, si queremos identificar quién estaba usando putty para SSH, podemos usar la lógica (nombre_proceso==»*putty.exe» Y sysmon_eid==3). En mi opinión, este es un lugar donde Kibana sobresale sobre otras herramientas de análisis que he usado porque la transición entre consultas y paneles es altamente eficiente a través de su sistema de filtrado fijable [4].

Imagen 3:
Campo de Agregación: Nombre del Proceso + ID del ProcesoContexto: Procesos con conexiones de redResultados: 1000+Tiempo para Analizar: 10 mins

Imagen 3: Una agregación menos efectiva con campos sin contexto
Nota: En ciertos sistemas como el Kibana de Elasticsearch es fácil moverse de una tabla de datos a otra usando sus paneles de control. De lo contrario, una vez que identificas una agregación interesante, un analista típicamente cambiará a revisar el/los host(s) o cuenta(s) que fueron observadas realizando el comportamiento interesante.

Nota: Debes estar al tanto de la trampa de la detección de atípicos. No confíes en el concepto de «lo común es bueno» y «lo poco común es malo» en las agregaciones/conteo por stack. Esto no es necesariamente cierto, ya que los compromisos generalmente involucran múltiples máquinas y los adversarios pueden intentar aprovechar esta suposición para crear ruido y parecer normales. Además, existe software especializado y casos de uso en casi todos los entornos. Es fácil quedar atrapado en el triaje de cada «stack menos común» y perder tiempo identificando falsos positivos. Conocer el entorno antes del compromiso y afinar tus instintos sobre el comportamiento de los actores maliciosos [3] te ayudará aquí.

¿Pero escala?

El análisis manual de registros no escala tan bien como las alertas dado que un analista típicamente observará un solo contexto a la vez. Por ejemplo, revisar una sola agregación con decenas o incluso cientos de miles de resultados es común. El tiempo más largo que querrás encontrarte revisando una agregación es probablemente 10 minutos. Si como cazador de amenazas te sientes abrumado, podrías intentar reducir el contexto. Por ejemplo, puedes dividir un entorno de 20,000 hosts en dos entornos de 10,000 hosts con lógica de consulta que separe los hosts por sus nombres. Alternativamente, podrías identificar activos/cuentas críticas que contengan las «joyas de la corona» o «llaves del reino» y realizar análisis manuales en esos.

Es posible crear contenido, revisar alertas y realizar el triaje de hosts lo suficientemente eficiente como para tener tiempo para técnicas de caza de amenazas más manuales.

El contenido de SIEM disponible en el TDM de SOC Prime [5] está lleno de contenido que puede ser completamente automatizado como alertas, así como contenido para habilitar enfoques más manuales de caza de amenazas.

Recursos y Agradecimientos sobre trabajo previo:
[1] https://github.com/mkorman90/sysmon-config-bypass-finder
[2] https://www.fireeye.com/blog/threat-research/2012/11/indepth-data-stacking.html
[3] https://socprime.com/blog/warming-up-using-attck-for-self-advancement/
[4] https://www.elastic.co/guide/en/kibana/current/field-filter.html
[5] https://tdm.socprime.com/login/

¿Fue útil este artículo?

Dale me gusta y compártelo con tus compañeros.

Únase a la plataforma Detection as Code de SOC Prime para mejorar la visibilidad de las amenazas más relevantes para su negocio. Para ayudarle a comenzar y obtener un valor inmediato, reserve una reunión ahora con los expertos de SOC Prime.

Únase Gratis Reserve una Reunión

Publicaciones relacionadas

Abr 25/2025 2 min de lectura Plataforma SOC Prime Buscar en el Mercado de Detección de Amenazas de Uncoder AI by Steven Edwards

Jun 11/2024 9 min de lectura Plataforma SOC Prime SOC Prime Introduce una Política de Uso Justo by Veronika Telychko

May 24/2024 4 min de lectura Plataforma SOC Prime Integración de la Plataforma SOC Prime con GitHub by Veronika Telychko

Todas las noticias

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.