Reducing the Breach Detection Time: Availability of the Log Data

WRITTEN BY

Andrii Bezverkhyi

CEO & co-founder

[post-views]

December 09, 2015 · 5 min read

Reducing the Breach Detection Time: Availability of the Log Data

Hello again! In previous article, we already established that many things might get out of hand when you are into building a virtual or a full-scale SOC, especially when it comes to operationalizing the SIEM as the core technology of any SOC. We have also established that automation is the way to go if one wants to keep up with modern threats and overhead that is produced by SIEM & SOC technologies. Today we begin analyzing each SIEM deployment and operations component piece by piece, the first that comes to mind is data availability. There are many factors that impact breach detection time and as of 2015 it is still above 200 days average, but this is not because SIEM is a technology of choice for breach detection. A SIEM can only provide you the output based on the inputs it gets, and if we miss something at the input, we should not expect the output to be whole, actionable and sometimes we will not have any output at all.

Is it really SIEM’s fault that it misses logs (and incidents)? Well, this again depends on many factors and to clear this mess up and to maintain SIEM continuously operational a person responsible for its administration/maintenance (your company has one, right?) needs to have enough information at any given point of time to answer these two questions: “Is all in scope log data being collected?” and if the answer is a NO “Is the trouble on SIEM side or outside of it?”. If the answer to first question is a NO, be assured that you are missing important things that have direct impact on your incident detection time and accuracy. And the truth is, no SIEM solution simply answers this out of the box – it is always a manual effort to respond to this question.

Have you ever seen a SIEM that says “Your network log data availability is 85%”? I didn’t. However, we still have a second question that obviously is not answered by system either since it cannot answer the first one. But, all is not lost and answers are there if one does enough research or has devoted enough time and effort to seek them manually.

Let us consider some examples based on how one collects log data and start with most common log collection mechanism – syslog. It is unlikely one can name one SIEM implementation that does not use syslog at all. There are many ways data can be lost with syslog: a daemon on pipe collection method loses data when the collecting component is stopped; a UDP (default) syslog protocol has no guarantee for packet delivery, high volume of syslog traffic (both UDP and TCP) can be impaired by buffer size limits and bandwidth limitations and even packet rate processing specs of a concrete NIC. Even in a case of reading log data from a file one must consider file rotation and integrity. Diagnostics of these issues is always a manual routine task that involves reading diagnostic logs of SIEM components itself.. If it has diagnostic data to begin with! Even the ugliest diagnostic data is better than no diagnostic data at all and a cheap excuse of “oh, our SIEM is so magical that it has no errors!”. Next we would be busy with building(or re-using?) correlation content that baselines log flow amounts and deviations (Is anyone really satisfied with results of such a content? What about the resources it eats up?), running TCPdump to check packet drop rates, monitoring components availability through external sources… Just wait, I think we came up with at 3 additional tools that need to be added to SIEM to monitor itself, just to answer a simple question “What is % of my network log data availability?”. If we talk concrete attempts to make this automatic, say, Splunk on Splunk, are they really efficient? How much extra $$ of license cost one has to add to be able to self-diagnose the SIEM/Log Management platform and how much performance overhead does this most popular app produce?..

Okay, let us put syslog and Splunk aside for few minutes and think about second most popular log source – Microsoft Windows Event Log. To keep issues short: log rotation, network bandwidth, authentication errors / password lockouts, WMI and JCIFS instability, high load of busy Windows Servers (file audit, Active Directory etc.). Monitoring this would again require a whole set of tools and those tools will be different when compared to syslog monitoring tools! We can go on for a long list including databases, firewalls/ngfw/ips/ngips/utm (hello OPSEC & SDEE) and will discover even more things that happen outside of SIEM, that SIEM has no effect on but must know about(!) and it has provide this information fast and accurate to the administrator. Yet, for ones who read this far, there are good news, since SIEM (or most SIEMs) itself has readable (almost) diagnostic logs. Those hard to find diagnostic files, hidden API calls or multiline java stack traces have bits and bytes of information that could provide answers to many of the issues raised above. And by combining those together and applying meaningful profiling metrics we can answer those two simple questions that mean a whole difference between knowing your mission critical security detection platform is working as intended or ignoring the issue / half-fixing it with throwing more FTE at the problem. Automation is here for all who want to assure their log data is available as planned in the project, required by organization/customer and they get a proper return on their SIEM investment. I hope this leaves you with some food for thought. Stay tuned!

p.s. In case you happen to be one of the HPE ArcSight experts, there are more good news! As part of global initiative to make SIEM platforms deliver the value and saving time of SIEM experts worldwide, SOC Prime provides a free online instant SIEM Health Check service, that can eat agent.log files and provide you top 5 critical issues and solutions to them < 5 minutes. Got an agent.log? Throw it here and see for yourself!

Was this article helpful?

Like and share it with your peers.

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.

Reducing the Breach Detection Time: Availability of the Log Data

Was this article helpful?

Boost Your Cyber Defense withThreat Detection Marketplace