Fluentd: How to Use a Parser With Regular Expression (regexp)
This guide explains configuring Fluentd to extract structured data from unstructured log messages using the parser plugin with a regular expression (regexp). If you need to extract specific fields, such as log_source and index, from a log message, you can do this as follows.
Input Log:
{
"message": "Log source 'WinCollect DSM - SRV-AD-001' has stopped emitting events"
}
Configuration:
<filter **>
@type parser
key_name message
reserve_data true
<parse>
@type regexp
expression /'(?<log_source>[^']+)\s-\s(?<index>[^']+)'/
</parse>
</filter>
Explanation:
key_name message
: Specifies that theÂmessage
 field should be parsed.reserve_data true
: Keeps the originalÂmessage
 field along with the extracted fields.regexp expression
:(?<log_source>[^']+)
: Captures the text beforeÂ-
 asÂlog_source
.(?<index>[^']+)
: Captures the text afterÂ-
 asÂindex
.
Output Log:
{
"message": "Log source 'WinCollect DSM - SRV-AD-001' has stopped emitting events",
"log_source": "WinCollect DSM",
"index": "SRV-AD-001"
}
If you need to extract fields such as timestamp
, level
, module
, and message from logs with timestamps, you can do this as follows:
Input Log:
{
"message": "2024-12-18 10:15:30 ERROR [Auth] Login failed for user 'jdoe'"
}
Configuration:
<filter **>
@type parser
key_name message
reserve_data true
<parse>
@type regexp
expression /(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(?<level>[A-Z]+)\s+\[(?<module>[^\]]+)\]\s+(?<message>.*)/
</parse>
</filter>
Explanation:
(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})
: Extracts the timestamp.(?<level>[A-Z]+)
: Captures the log level (e.g.,ÂERROR
).(?<module>[^\]]+)
: Extracts the module name (e.g.,ÂAuth
).(?<message>.*)
: Captures the remaining log message.
Output Log:
{
"message": "2024-12-18 10:15:30 ERROR [Auth] Login failed for user 'jdoe'",
"timestamp": "2024-12-18 10:15:30",
"level": "ERROR",
"module": "Auth",
"message": "Login failed for user 'jdoe'"
}
If you need to extract key-value pairs from a log message, you can do this as follows:
Input Log:
{
"message": "user=jdoe status=failed ip=192.168.12.1"
}
Configuration:
<filter **>
@type parser
key_name message
reserve_data true
<parse>
@type regexp
expression /user=(?<user>\w+)\s+status=(?<status>\w+)\s+ip=(?<ip>[^\s]+)/
</parse>
</filter>
Explanation:
(?<user>\w+)
: Captures the username.(?<status>\w+)
: Extracts the status (e.g.,Âfailed
).(?<ip>[^\s]+)
: Captures the IP address.
Output Log:
{
"message": "user=jdoe status=failed ip=192.168.12.1",
"user": "jdoe",
"status": "failed",
"ip": "192.168.12.1"
}