SOC Prime’s Privacy-Centric Mindset

Vlad Garaschenko CISO

Add to my AI research

Privacy is a core value when it comes to digital security. The dynamic pace of cybersecurity evolution stresses the importance of privacy protection, which involves safeguarding user identity and keeping data private, safe and secure when online. In this blog article, I would like to share my expertise gained through 20+ years in the field and as CISO at SOC Prime.

For the last 7 years in SOC Prime, we have found our own ways to communicate with customers, collect feedback, and combine products and services in the right and scalable proportion. While delivering professional SIEM and co-managed services for Fortune 500 and Global 2000 companies, we saw a lot of skillful approaches applied by cybersecurity engineers from various organizations and industries who successfully managed to build proper detection mechanisms regardless of limited resources at their disposal. Let’s draw an analogy with manufacturing history to explain the root cause of privacy concerns and ways to handle them.

Humanity came up with the right way to make goods as a result of a long history of attempts and mistakes. This journey started from handmade goods only for personal consumption, followed by crafting goods for sale in medieval markets, and reaching full-scale production for the mass market via automated manufacturing processes.

Taking this analogy, in terms of content development in the cybersecurity field, most enterprises are still in the Bronze Age, making in-house content out of sync with current threats. There are several reasons for that. First, the difficulty revolves around ownership, as it is hard to truly own something in a large enterprise. Thus, it is quite challenging to assume responsibility for making a wrong decision. Another challenge involves hiring and keeping mature experts, both due to a talent shortage of ~3 million people, as well as the fact that top experts often progress to specialized businesses. For example, an expert with strong malware research and reversing skills is likely to transition to an EDR vendor, so it requires a major effort to keep them in-house. The restrained budget and limited time on content development is another stumbling block.

Finally, when it comes to content development challenges, privacy concerns remain an issue. All options except in-house content development require third-party access to the organization’s sensitive data. At the same time, all kinds of logs and telemetry coming from edge, cloud, SaaS, and “crown jewels” applications contain sensitive data that is frequently subject to regulations and may often contain the company’s IP. As this data usually ends up in SIEM, log analytics and EDR silos, the access is often privileged and unrestricted. Still, available options in today’s market addressing this challenge require giving full access to a third party. If that third party is a well-established MSSP or MDR, the privacy concerns are less pressing, but quite often, content development is outsourced to startup-like third-party vendors that have a few dozen people on the team, including non-engineering positions, striving to protect the data of a Fortune-100 company “like their own”.

For organizations like government or federal agencies, it is nearly impossible to approve such third-party access to sensitive data. Here, I will try to shed some light on the existing privacy concerns and prove that the industrial revolution in content development has genuinely occurred. Privacy is no longer a stumbling block to the development of detection algorithms, including the automated processes.

First, let’s take a step back and agree on terms to be on the same page. An alert, signal or detection — whatever you name it — by its nature consists of a detection rule or hunting query (algorithm) applied to logs, telemetry, API information (data) with a certain result (outcome).

Introducing Our Content Development Maturity Model

To make a high-quality rule from a privacy perspective, you should have access to SIEM / EDR / XDR / Data Analytics / Log Analytics / Threat Hunting Platform (hereinafter, Platform), as well as to a data source and to an alert outcome. No privacy concerns exist until you plan to involve a third party. By analyzing a large scope of rules across a variety of our customers, including their unique ways of processing alerts, we have built the following content development maturity model, which helps explain how SOC Prime mitigates privacy risks. This model contains four stages of content development maturity. The higher the stage, the more mature the content development process, as well as from the privacy protection perspective:

Content development via consulting services. This stage involves collecting logs from live infrastructure, finding suspicious patterns, describing them as rules, queries, or Machine Learning (ML) recipes in the specific Platform format, and implementing these content items across the same infrastructure. This requires access to raw logs, rules, queries, and triggered alerts in the customer’s Platform. Normally at this stage, detection content development is delivered by consulting services.
Content development via co-managed services. This stage focuses on analyzing typical logs/events in the lab environments, simulating the latest threats and attacks to catch attack patterns in logs. It also involves creating real-time detection rules, threat hunting queries, or ML recipes on the customer’s side, checking the outcome and fine-tuning rules/queries. Performing these operations requires access to rules/queries and triggered alerts in the customer’s Platform. Normally, at this stage, detection content development is delivered by co-managed services.
Crowdsourcing development model with a feedback loop. This stage represents the crowdsourcing development model, which involves engaging hundreds of external security engineers and researchers across the world, analyzing logs in the lab environments, describing rules or queries using the universal language supported by various SIEMs, EDRs, or other types of Log Analytics Platforms, as well as automatically implementing these rules or queries across different infrastructures using a predefined set of customizations. Collecting customer feedback to ensure continuous improvement of rules/queries is crucial at this stage, which requires getting access to rules/queries (optional*), including access to anonymized and aggregated statistics on the triggered alerts in the customer’s Platform.
Crowdsourcing development model without a feedback loop. At this stage, detection content development reaches the highest degree of maturity. Developing cross-tool content for various SIEMs, EDRs, or other types of Log Analytics Platforms is available without receiving continuous feedback from customers. This stage highly resembles the third one, but the rule or query accuracy relies on the feedback already collected from various customers during a significant period of time for a large set of different rules/queries. Customer feedback is highly appreciated, but not required. This stage requires access to rules/queries (optional*) in the customer’s Platform.

optional* — access is not required if the customer manually deploys content items.

The first stage of the described content development maturity model is the most susceptible to privacy risks. In addition to the customer’s privacy concerns, content development at the first stage also raises further challenges with the IP (Intellectual Property) rights, since content made in the customer’s infrastructure based on the customer’s logs belongs to the customer only. These challenges clearly explain why SOC Prime’s products and services cannot be delivered at the least mature and the most privacy vulnerable first stage. SOC Prime can work at the second stage by delivering co-managed services, but the crowdsourcing development model available at the third and fourth stages provides more maturity and greater privacy awareness. The highest content development maturity achieved at the fourth stage is what makes SOC Prime stand out from other cybersecurity vendors. We have already collected customer feedback for a significant period of time, allowing us to be less dependent on the customer feedback loop.

To dive deeper into the privacy concerns, I will explain next how we mitigate general privacy risks for our customers by illustrating the examples of SOC Prime’s core products and functionality. I’ll start from Uncoder.IO, one of SOC Prime’s pioneer projects and highly notable from a privacy perspective.

Uncoder.IO

Uncoder.IO is the online Sigma translation engine for SIEM saved searches, queries, filters, API requests, which helps SOC Analysts, Threat Hunters, and Detection Engineers convert detections to the selected SIEM or XDR format on the fly.

We apply the following industry best practices to ensure data privacy protection for all users working with Unocoder.IO:

Fully anonymous: no registration, no authentication, no logging
All data is kept session-based, stored in memory, no storage on server disks
Full reimage every 24h
Microservice-based architecture
Based on the community verified project “sigmac”
Amazon AWS hosting
Data at rest encrypted using an industry standard, AES-256 encryption algorithm
Data in transit encrypted using TLS 1.2 encryption protocol
Overall Rating A+ according to Qualys SSL Labs

Uncoder CTI

Uncoder CTI and its public version, CTI.Uncoder.IO, are online converter tools that make IOC-based threat hunting easier and faster. With Uncoder CTI, Threat Intelligence specialists and Threat Hunters can easily convert IOCs to custom hunting queries ready to run in the selected SIEM or XDR.

Unlike a standard Threat Intelligence Platform (TIP), Uncoder CTI allows grabbing IOCs from any feed or source. Just like Uncoder.IO, Uncoder CTI was designed with privacy in mind. Only the user running in each particular Uncoder CTI session has access to their IOC data. SOC Prime doesn’t collect any IOCs or their logs, and no third parties have access to this data.

Uncoder CTI addresses the privacy concerns as follows:

Fully anonymous: no registration, no authentication, no IOC logging, no IOC collection
All data is kept session-based, stored in memory, no storage on server disks
Full reimage every 24h
Microservice-based architecture
Amazon AWS hosting
Based on the community verified project “sigmac”
Sigma translations are performed on dedicated microservices and are not saved at any stage
All conversions are held in RAM (Random Access Memory) to ensure high performance, scalability, and privacy
Platform parses IOCs locally in the user’s browser environment, no IOCs are sent to the Uncoder CTI server side
Platform returns ready-to-use queries directly to your browser via an encrypted channel
Data at rest is encrypted using an industry standard, AES-256 encryption algorithm
Data in transit is encrypted using TLS 1.2 encryption protocol
CTI report does not leave your local environment (your computer and browser)

Log Source and MITRE ATT&CK Coverage

The world’s first platform for collaborative cyber defense, threat hunting and discovery provides security professionals with management tools for tracking threat detection effectiveness and data coverage using SIEM or XDR.

Log Source Coverage visualizes how the organization-specific log sources are covered by the SOC content from Threat Detection Marketplace, which allows tracking and orchestrating the overall threat detection program for the company. MITRE ATT&CK® Coverage displays the live progress in addressing ATT&CK tactics, techniques, and sub-techniques based on the explored or deployed detection content.

Both Log Source and MITRE ATT&CK® Coverage are designed to mitigate privacy risks with the following information protection means:

No third-party access to user data processed by the Detection as Code platform
Data at rest is encrypted using an industry standard, AES-256 encryption algorithm

Shortly, we are going to enable exporting the coverage data to the CSV and JSON formats compatible with the MITRE ATT&CK Navigator tool. This capability comes in handy for security performers working offline in Excel or with the web-based MITRE ATT&CK Navigator tool.

Quick Hunt

Quick Hunt helps Threat Hunters visualize and hunt for the latest threats in their SIEM & XDR with a single click. Quick Hunt was designed according to the best practices of privacy protection, particularly:

All hunting queries are launched during the existing browser session
Data in transit is encrypted using TLS 1.2 encryption protocol
User’s feedback is fully anonymized and provided on user’s choice exclusively

Threat Detection Marketplace

The world’s first bounty-driven Threat Detection Marketplace for SOC content aggregates the most up-to-date Sigma-based threat detection content from over 300 researchers and natively delivers it via subscription to 20+ SIEM and XDR platforms. The SOC Prime Threat Detection Marketplace contains more than 130,000 detections aligned with the MITRE ATT&CK framework and continuously updated. To tailor the content search to the user’s security role and needs, the SOC Prime Threat Detection Marketplace offers a variety of customization settings and filters configured within the user profile. Still, this user profile customization based on the recommendation engine is optional.

The SOC Prime Threat Detection Marketplace sticks to the following best practices to ensure data privacy protection:

One-time password (OTP)
Two-factor authentication (2FA)
Logon, view and download history are stored on dedicated analytical servers
Security logging (audit trail)
Amazon AWS hosting
Web Application Firewall (WAF) protection
Data at rest is encrypted using an industry standard, AES-256 encryption algorithm
Data in transit is encrypted using TLS 1.2 encryption protocol
Overall Rating A+ according to Qualys SSL Labs

Continuous Content Management

The Continuous Content Management (CCM) module powered by the SOC Prime’s Detection as Code platform streams compatible SOC content directly into the user’s environment. The CCM module automatically delivers the most relevant detection content in real time, freeing security teams to spend more time securing and less time hunting. To enable real-time content delivery and management and stay ahead of the curve, the CCM module was built following the proactive approach toward the CI/CD workflow development practice.

The CCM module mitigates the privacy risks in the following way:

No log data collected from your SIEM, EDR or XDR environment
SOC Prime has only information on what rules are running and their hit rates
No user data, no IP or host information is collected via CCM
Open source API script, easy-to-read and verify
Data at rest is encrypted using an industry standard, AES-256 encryption algorithm
Data in transit is encrypted using TLS 1.2 encryption protocol

Summing up, I would like to focus on the most crucial points related to the SOC Prime’s approach to privacy covering all the listed above technical controls and their efficiency verified by the independent audit. Achieving SOC 2 Type II compliance validates that SOC Prime has all the above mentioned technical controls in place for its cybersecurity solutions, business operations procedures, and technical infrastructure. All the projects are run by SOC Prime’s in-house team, ensuring no third-party access to the SOC Prime’s Detection as Code platform. Data encryption using industry standards and best practices, like AES-256 encryption algorithm for data encryption at rest and TLS 1.2 encryption protocol for data in transit, illustrates SOC Prime’s commitment to data security and privacy.

To explore in more detail how we adapt the privacy imperative at SOC Prime, please refer to https://my.socprime.com/privacy/.

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.