What Are SIGMA Rules: Beginner’s Guide
Table of contents:
This blog post argues for SIGMA as a detection language, covers the most critical SIGMA rule components (logsource & detection), SIGMA taxonomy, testing SIGMA Rules, and generally prepares analysts who are new to SIGMA to write their first rules. A short discussion on detection engineering with SIGMA is also provided regarding noise, ideas, log sources, etc.
The Case for SIGMA Rules
In the past, SIEM detections existed in vendor / platform specific silos. Partners wishing to share detection content often had to translate a query from one vendor into another. This is not sustainable, the defensive cyber security community must improve how we share detections to keep pace with our ever-evolving adversaries.
Much like YARA, or Snort Rules, SIGMA is another tool for the open sharing of detection, except focused on SIEM instead of files or network traffic. SIGMA allows defenders to share detections (alerts, use cases) in a common language.
First released in 2017 by Florian Roth and Thomas Patzke, SIGMA is paving the way forward for platform agnostic search. With SIGMA, defenders are freed from vendor & platform specific detection language and repositories and can harness the power of the community to respond timely to critical threats and new adversary tradecraft.
There are many reasons to use SIGMA:
- Researchers and intelligence teams who identify new adversary behaviors and want an agnostic way of sharing detections
- MSSP / MDR responsible for multiple SIEM / EDR / Log Analytics solutions & data taxonomies/schemas (ECS, CEF, CIM, etc)
- Avoid vendor-lock in, by defining rules in a SIGMA we can more easily move between platforms.
- Researchers in the offensive security space wanting to create detections based on their research
Note: In this blog SIEM is used to describe any platform used to collect and search on logs. I accept that many of the platforms listed may not fit your definition of “SIEM”. However, using the terms “platform” or “log platform” is too ambiguous.
Creating SIGMA Rules
Writing SIGMA rules requires having basic knowledge on the SIGMA schema and taxonomy, having an idea, fitting that idea to SIGMA, testing, sharing, and potentially maintaining the rule.
Recommended Background & Context
Despite the length of this blog, thanks to YAML and forward thinking by the creators, SIGMA is easy to understand and write. At SOC Prime we like to say “anyone can learn SIGMA”. The art of detection engineering is where things can get more complicated.
There are many other resources such as the official wiki and some guides written by SIGMA experts (listed below). There are certain traps such as proper handling of wildcards or incorrect field names that can cause broken rules and many of these are addressed in these resources.
If you are a researcher looking to get into SIGMA, SOC Prime’s Threat Bounty Program is a great opportunity to get started and earn a little bit of cash. Submitted rules go through a thorough review process where we can guide you and help you understand mistakes and grow as an analyst.
Recommend Reads:
- How To Write SIGMA Rules, Florian Roth 2018
- A Guide to Generic Log Sources, Thomas Patzke 2019
Recommended Watch:
Types of Detections SIGMA Rules Сan Express
Today there exist currently two basic types of rules:
- SIGMA Rules based on matching, widely supported, easiest to write
- SIGMA Rules based on matching and simple correlations, limited support, less easy to write
Note: There are also multi-yaml SIGMA rules, however these have generally fallen out of favor for log source specific rules. The SOC Prime Team generally doesn’t create multi-yaml rules because they add unnecessary complexity to rule maintenance and deployment. Someone can create a multi-yaml SIGMA rule if they can create two SIGMA rules.
Let’s Create a Simple SIGMA Rule!
An idea (and some thoughts on detection engineering with SIGMA)
Users and administrators often keep sensitive passwords in plaintext documents such as text files, excel, word, etc. I am concerned that adversaries may identify these files before I do in an environment. We want to instruct our users on how to properly store passwords before they are discovered by a criminal hacker.
For many SIGMA rules it is at the author’s benefit to abstract the idea and broaden the target ‘reasonably’. For ideas such as this we can take educated guesses of what the behavior may look like, not only what we have observed. For instance, we may make educated guesses on additional terms and extensions that users may use to store passwords in plaintext.
The idea of ‘broadening’ a rule is counterintuitive to many analysts’ instincts. Killing all ‘false positives’ is not necessarily the goal of the original author when a rule will be consumed in unknown and unfamiliar environments. We can let the EDR and Anti-Virus vendors worry about creating detections that can’t have any false positives. With SIGMA rules can be tested in environments, and tuned easily.
SIGMA is easily understood, testable, and tunable. If a term like ‘details’ is too noisy for an environment, the person implementing the rule should feel empowered to tune the rule. Deploying all rules at once without testing is a recipe for disaster. Turning rules off instead of digesting and tuning their intentions for an environment will cause a shop to miss out on solid detection content.
I like to give the example of psexec. In some environments, psexec is completely normal and the status-quo for administrators remotely administer hosts. In other environments, psexec is (probably rightfully) unapproved, blocked, and an actionable offense for administrators to use. So, is a SIGMA rule to detect any psexec usage usage ‘noisy’ or just better for some environments than others. If you deploy content without testing, tuning noise will always be a problem. Only those rules identified as “critical” are meant to be safe to use without testing.
Back to creating our password exposure SIGMA rule.. we can expand the idea to include additional file names such as:
- pw
- psw
- pass
- password
- passwords
- accounts
- account
- info
Created with software like:
- Notepad
- Notepad++
- Wordpad
- Office Applications
A data source / A log source
Once we have an idea, we will need a log source. SIGMA supports any log source theoretically, however we should identify a log source that most folks have. For instance, we might be able to write a rule for a Data Loss Prevention log source but Data Loss Prevention is rarely parsed and ingested into SIEMs, and the industry hasn’t a clear favorite. So, we can create a valid rule, but it will not be as easily adopted.
For Windows endpoint rules, Sysmon is a great place to start. Sysmon is commonly deployed in environments, and many log sources provide synonymous data (EDRs, etc). With Sysmon there are two main options, process creation (process_creation in SIGMA) and file create (file_event in SIGMA).
We will build our detection off of process creation as it is more broadly adopted, thus ensuring our rule is as useful as possible. Process creation is a great log source to learn from and it is one of the most useful / popular log sources used in endpoint detections.
Note: Often ideas come directly from data sources themselves. By reviewing the types of data available to you in your SIEM / Lab one can easily identify SIGMA rules worth writing. We can also use other sources like vendor documentation.
With sysmon process creation events (Event ID 1), a user accessing a file containing passwords may contain these interesting fields:
Image: C:\Windows\System32\notepad.exe
CommandLine: “C:\Windows\System32\NOTEPAD.EXE” C:\Users\John\Desktop\password.txt
Fitting the detection idea to SIGMA
Now that we have an idea, and a data source to work with, we can begin to build our rule.
This isn’t documented but the true minimal components required to translate a rule are just logsource & detection (for some backends like Splunk, just detection is enough). Everything else is ‘just’ metadata to help the SIGMA rule consumer. When you start it is in your interest to start with these minimal fields, confirm your logic is working and then add additional SIGMA fields & data. If you want to publish your rule to the public SIGMA repo it is worth checking previous submissions and emulating their formatting.
A basic SIGMA rule with minimal components for potential password exposure:
title: Potential Password Exposure (via cmdline)
author: Adam Swan
tags:
- attack.t1552.001
- attack.credential_access
logsource:
product: windows
category: process_creation
detection:
selection:
Image|endswith:
- '\notepad.exe'
- '\word.exe'
- '\excel.exe'
- '\wordpad.exe'
- '\notepad++.exe'
CommandLine|contains:
- 'pass' #pass will match on password, including password is redundant
- 'pwd'
- 'pw.' #pw.txt, etc.
- 'account' #accounts,
- 'secret'
- 'details' #I included plural details based on experience
condition: selection
Logsource component
The logsource component helps the SIGMA backend translator (SIGMAC) know what type of data the rule should be acted against. It empowers the rule creator to create more generic rules. For instance, with logsource being “product: windows, category: process_creation” we do not need to specify EventIDs (Sysmon 1, Windows 4688, ProcessRollup, etc). The consumer of the rule can specify what event ids, indexes, etc they want to be associated with log sources in the SIGMA Config. Without specifying indexes, event ids, etc rules will likely be unnecessarily expensive (performance) for the consumer.
Additionally, often telemetry can contain similar fields but imply entirely different behaviors. For instance, Sysmon network connection events (Event Id 3) and process creation (Event ID 1) share the Image field. The existence of explorer.exe in the Image field of a Sysmon network connection event is completely different from the existence of explorer.exe in a process creation event. By providing the proper logsource component we provide invaluable context to the detection.
Detection component
The detection component is where the author defines their detection criteria. This includes at least one selection component and a condition component. There is an optional timeframe component which is required for correlation based rules.
Selection sub component(s):
Generally, this will take the form Field A contains/startswith/endswith/equals Value B. Of course, as observed in the example rule above, if an author needs they can expand and include logic such as Field A contains/startswith/endswith/equals Values X, Y, or Z. This logic is always case insensitive.
There are more advanced ‘modifiers’ that increase the complexity of the rule, or enable authors to be more precise. For instance, regular expressions are handled through the operator re and enable authors to do things such as write case sensitive queries. For compatibility purposes it is best to stick to only the basic regular expression operators . ? + * | { } [ ] () “ \.
Selections are named (e.g. selection, selection2, selection3, filter). Selections can be named (almost) anything you want. Often a variation of selection is used, but one can just as easily name their selection banana and the rule would still work. Generally, the term filter is used for selections that will be excluded (e.g. selection AND NOT filter).
Condition sub component:
The condition component contains boolean logic (AND, OR, NOT) defining how each selection should be included in the final query.
E.G. (selection_a OR selection_b) AND NOT filter
Condition component with correlation:
There are two types of correlations supported by backends today. There are other correlations supported by the SIGMA schema.. but not yet by the available backends.
Count() by Y:
Count the unique instances of Y field value and compare (greater than, less than) it to a static number.
Example: Count() by src_ip > 2
Count(X) by Y:
Count the unique instances of X field value per Y value and compare (greater than, less than) the count of X to a static number.
Example: Count(EventID) by src_ip > 2
Common Correlation Use Cases:
Count() by src_ip > 10 | Count unique matching events by the source IP. |
Count() by dst_ip > 10 | Count unique matching events by the destination IP |
Count(EventID) by ComputerName | This will let you search for unique instances of eventid. For instance, if you want to chain sysmon event ids 1 (process creation) AND event id 5. e.g. a process is created and terminated in less than 1 min. |
Timeframe Sub Component:
The timeframe component is used in conjunction with conditions that include a correlation. Many backends ignore the timeframe, however, it is generally always included and required to be included in most repositories including SOC Prime’s.
Complete Examples Using Splunk:
Here are some examples of SIGMA and their translations for Splunk. If you are not familiar with Splunk, asterisks are a wildcard so a term surrounded by asterisks (e.g. *term*) is ‘contains’, a term with a leading asterisk (e.g. *term) is endswith, a term with a trailing asterisk is ‘endswith’ (e.g. term*).
SIGMA detection component | Splunk Translation (Asterisk is a wildcard) |
detection: selection: fieldX: 'suspicious' condition: selection | fieldX="suspicious" |
detection: selection: fieldY|contains: - 'suspicious' - 'malicious' - 'pernicious' condition: selection | (fieldY="*suspicious*" OR fieldY="*malicious*" OR fieldY="*pernicious*") |
detection: selection: - fieldX: 'icious' - fieldX: - 'susp' - 'mal' - 'pern' condition: selection | (FieldX="icious" AND (FieldX="susp" OR FieldX="mal" OR FieldX="pern")) |
detection: selection: - FieldX|endswith: 'icious' - FieldX|startswith: - 'susp' - 'mal' - 'pern' condition: selection | (FieldX="*icious" AND (FieldX="susp*" OR FieldX="mal*" OR FieldX="pern*")) |
detection: selection: FieldX|endswith: 'icious' filter: FieldX|startswith: - 'del' - 'ausp' condition: selection AND NOT filter | (FieldX="*icious" AND NOT ((FieldX="del*" OR FieldX="ausp*"))) |
detection: selection: FieldX: 'suspicious' timeframe: 1m condition: selection | count by src_ip > 3 | FieldX="suspicious" | eventstats count as val by src_ip| search val > 3 #notice splunk ignores the timeframe value, the value must be set at search by the user |
detection: selection: FieldX: 'suspicious' condition: selection | count(ComputerName) by src_ip > 3 | FieldX="suspicious" | eventstats dc(ComputerName) as val by src_ip | search val > 3 |
Taxonomy Questions (e.g. what field names to use)
Theoretically you can use whatever field names you wish, as long as someone is willing to put in the time to write a SIGMA Config to translate from your fields.. to theirs.
Note: Field names are case sensitive! CommandLine and commandline are two different values. CommandLine is part of the existing taxonomy, commandline is not.
That being said, it is best to use field names that are documented by SIGMA. There are three places the public SIGMA repository documents the taxonomy.
- As a general rule we use the SIGMA field names specified in the wiki for categories
- https://github.com/SigmaHQ/sigma/wiki/Taxonomy
- The wiki will reveal to readers that SIGMA uses
- SYSMON fields for endpoint rules
- W3C Extended Log File Format for webserver & proxy rules
- The fields for firewall, antivirus
- Followed by SIGMA field names specified in existing rules & SIGMA config files
- https://github.com/SigmaHQ/sigma/tree/master/tools/config
- Really the official documentation for fields. Users can create / modify these as required when they translate rules.
- https://github.com/SigmaHQ/sigma/tree/master/rules
- https://github.com/SigmaHQ/sigma/tree/master/tools/config
Then finally if no config or rules exist we use the original field names from the originating log source. If field names come from nested values (e.g. userIdentity nested under accountId in aws cloudtrail) we use a period to indicate that the field is nested as this is relatively consistent across different SIEMS (e.g. userIdentity -> accountId becomes userIdentity.accountId).
Testing SIGMA Rules
Testing SIGMA rules is simple. Often folks are even able to submit content without directly testing it themselves. Most public researchers do not have access to diverse environments to test rules against ‘the set of all SIEMs’. Instead, one can rely on public feedback, feedback from trusted parties, etc. Even Florian Roth, a co-creator of SIGMA regularly pushes rules to the public for feedback via his Twitter. I’ve also seen folks publish straight to their personal blogs and LinkedIn, etc. If you think you have a good rule to share, put it out there, trust me if it is wrong (or not) the lovely folks on the internet will let you know! Don’t take yourself too seriously and be prepared to make changes and learn something.
There are some basic steps you can take:
- Ensure the rule translates (uncoder or by using SIGMAC)
- Sanity checking (e.g. ensuring the rule meets your original expectation, follows the correct taxonomy, etc) – see pitfalls: https://github.com/SigmaHQ/sigma/wiki/Rule-Creation-Guide
- Checking the rule in a lab environment
- Sharing the rule broadly for testing / sharing the rule with the SOC Prime Team via the Treat Bounty Program
Note: From a rule author perspective, generally you should not worry about the backend implementations of rules. It is up to the SIGMA backend authors, and folks like SOC Prime to ensure that the translations meet the original intention of a valid rule. If a bug is identified, it is always worth submitting an issue to GitHub.
Call to Action & Future Work
If you made it this far, you are more than prepared to write and share your first rule! If you enjoyed this blog, you may enjoy another one coming soon about using SIGMAC to customize content.