Fluentd: How to Use a Parser With Regular Expression (regexp)
This guide explains configuring Fluentd to extract structured data from unstructured log messages using the parser plugin with a regular expression (regexp). If you need to extract specific fields, such as log_source and index, from a log message, you can do this as follows.
Input Log:
{
"message": "Log source 'WinCollect DSM - SRV-AD-001' has stopped emitting events"
}
Configuration:
<filter **>
@type parser
key_name message
reserve_data true
<parse>
@type regexp
expression /'(?<log_source>[^']+)\s-\s(?<index>[^']+)'/
</parse>
</filter>
Explanation:
key_name message: Specifies that theÂmessage field should be parsed.reserve_data true: Keeps the originalÂmessage field along with the extracted fields.regexp expression:(?<log_source>[^']+): Captures the text beforeÂ- asÂlog_source.(?<index>[^']+): Captures the text afterÂ- asÂindex.
Output Log:
{
"message": "Log source 'WinCollect DSM - SRV-AD-001' has stopped emitting events",
"log_source": "WinCollect DSM",
"index": "SRV-AD-001"
}
If you need to extract fields such as timestamp, level, module, and message from logs with timestamps, you can do this as follows:
Input Log:
{
"message": "2024-12-18 10:15:30 ERROR [Auth] Login failed for user 'jdoe'"
}
Configuration:
<filter **>
@type parser
key_name message
reserve_data true
<parse>
@type regexp
expression /(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(?<level>[A-Z]+)\s+\[(?<module>[^\]]+)\]\s+(?<message>.*)/
</parse>
</filter>
Explanation:
(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}): Extracts the timestamp.(?<level>[A-Z]+): Captures the log level (e.g.,ÂERROR).(?<module>[^\]]+): Extracts the module name (e.g.,ÂAuth).(?<message>.*): Captures the remaining log message.
Output Log:
{
"message": "2024-12-18 10:15:30 ERROR [Auth] Login failed for user 'jdoe'",
"timestamp": "2024-12-18 10:15:30",
"level": "ERROR",
"module": "Auth",
"message": "Login failed for user 'jdoe'"
}
If you need to extract key-value pairs from a log message, you can do this as follows:
Input Log:
{
"message": "user=jdoe status=failed ip=192.168.12.1"
}
Configuration:
<filter **>
@type parser
key_name message
reserve_data true
<parse>
@type regexp
expression /user=(?<user>\w+)\s+status=(?<status>\w+)\s+ip=(?<ip>[^\s]+)/
</parse>
</filter>
Explanation:
(?<user>\w+): Captures the username.(?<status>\w+): Extracts the status (e.g.,Âfailed).(?<ip>[^\s]+): Captures the IP address.
Output Log:
{
"message": "user=jdoe status=failed ip=192.168.12.1",
"user": "jdoe",
"status": "failed",
"ip": "192.168.12.1"
}