Fluentd: How to Use a Parser With Regular Expression (regexp)

[post-views]
December 23, 2024 · 3 min read
Fluentd: How to Use a Parser With Regular Expression (regexp)

This guide explains configuring Fluentd to extract structured data from unstructured log messages using the parser plugin with a regular expression (regexp). If you need to extract specific fields, such as log_source and index, from a log message, you can do this as follows.

Input Log:


{
  "message": "Log source 'WinCollect DSM - SRV-AD-001' has stopped emitting events"
}

Configuration:

<filter **>
  @type parser
  key_name message
  reserve_data true
  <parse>
    @type regexp
    expression /'(?<log_source>[^']+)\s-\s(?<index>[^']+)'/
  </parse>
</filter>
Explanation:
  • key_name message: Specifies that the message field should be parsed.
  • reserve_data true: Keeps the original message field along with the extracted fields.
  • regexp expression:
    • (?<log_source>[^']+): Captures the text before - as log_source.
    • (?<index>[^']+): Captures the text after - as index.
Output Log:
{
  "message": "Log source 'WinCollect DSM - SRV-AD-001' has stopped emitting events",
  "log_source": "WinCollect DSM",
  "index": "SRV-AD-001"
}

If you need to extract fields such as timestamp, level, module, and message from logs with timestamps, you can do this as follows:

Input Log:

{
  "message": "2024-12-18 10:15:30 ERROR [Auth] Login failed for user 'jdoe'"
}

Configuration:

<filter **>
  @type parser
  key_name message
  reserve_data true
  <parse>
    @type regexp
    expression /(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(?<level>[A-Z]+)\s+\[(?<module>[^\]]+)\]\s+(?<message>.*)/
  </parse>
</filter>
Explanation:
  • (?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}): Extracts the timestamp.
  • (?<level>[A-Z]+): Captures the log level (e.g., ERROR).
  • (?<module>[^\]]+): Extracts the module name (e.g., Auth).
  • (?<message>.*): Captures the remaining log message.
Output Log:
{
  "message": "2024-12-18 10:15:30 ERROR [Auth] Login failed for user 'jdoe'",
  "timestamp": "2024-12-18 10:15:30",
  "level": "ERROR",
  "module": "Auth",
  "message": "Login failed for user 'jdoe'"
}

If you need to extract key-value pairs from a log message, you can do this as follows:

Input Log:

{
  "message": "user=jdoe status=failed ip=192.168.12.1"
}

Configuration:

<filter **>
  @type parser
  key_name message
  reserve_data true
  <parse>
    @type regexp
    expression /user=(?<user>\w+)\s+status=(?<status>\w+)\s+ip=(?<ip>[^\s]+)/
  </parse>
</filter>
Explanation:
  • (?<user>\w+): Captures the username.
  • (?<status>\w+): Extracts the status (e.g., failed).
  • (?<ip>[^\s]+): Captures the IP address.
Output Log:
{
  "message": "user=jdoe status=failed ip=192.168.12.1",
  "user": "jdoe",
  "status": "failed",
  "ip": "192.168.12.1"
}

Was this article helpful?

Like and share it with your peers.
Join SOC Prime's Detection as Code platform to improve visibility into threats most relevant to your business. To help you get started and drive immediate value, book a meeting now with SOC Prime experts.

Related Posts