Logs

Collecting Logs

If you want to collect and route logs - use Fluentd agent

Overview

Fluentd is a high-performance, open-source data collector designed for unified logging layers in distributed systems.

For more information and configuration guides, visit the official Fluentd site.


Fluentd Architecture

Fluentd Architecture

Fluentd operates on the principle of a unified logging layer:

  • Input Plugins – Collect raw data from files, network sources, APIs, etc.
  • Parsers – Structure unformatted log lines (JSON, Apache, syslog, or custom regex).
  • Filters – Transform, enrich, and normalize logs.
  • Buffers – Store events temporarily (in-memory or file-based) for durability and fault tolerance.
  • Output Plugins – Route logs to storage or processing systems (Elasticsearch, Kafka, S3, stdout, etc.).

Installation (td-agent)

td-agent is the production-ready build of Fluentd.

Example for Ubuntu 20.04 (focal):

curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh | sh

Key Plugin: in_tail (File Tailing)

The in_tail plugin continuously reads the end of log files, similar to the tail -F command.

Basic Example:

Fluentd Configuration

<source>
  @type tail
  path /var/log/json.log
  pos_file /fluentd/logs/json.pos
  tag json.log
  <parse>
    @type json
  </parse>
</source>

Parameters

ParameterDescription
pathPath to the log file (supports glob patterns)
pos_fileTracks last read position to resume after restart
tagRoutes events through the pipeline
parseDefines the parser type (json, apache2, syslog, or custom)

Advanced Multi-Source Configuration

This configuration reads JSON, Apache, and Syslog logs, enriches them with metadata, and outputs to stdout.

<source>
  @type tail
  path /var/log/json.log
  pos_file /fluentd/logs/json.pos
  tag json.log
  <parse>
    @type json
  </parse>
</source>

<source>
  @type tail
  path /var/log/apache.log
  pos_file /fluentd/logs/apache.pos
  tag apache.log
  <parse>
    @type apache2
  </parse>
</source>

<source>
  @type tail
  path /var/log/syslog.log
  pos_file /fluentd/logs/syslog.pos
  tag syslog.log
  <parse>
    @type syslog
  </parse>
</source>

<filter *.log>
  @type record_transformer
  enable_ruby true
  <record>
    Timestamp ${Time.at(time.to_i).utc.strftime('%Y-%m-%dT%H:%M:%SZ')}
    Level ${record["level"] || "Info"}
    Message ${record["message"]}
    Source ${record["source"]}
    SourceIp "#{Socket.ip_address_list.detect(&:ipv4_private?)&.ip_address}"
    LogPath /var/log/${tag.split('.').first}
    TagName ${tag}
  </record>
</filter>

<match *.log>
  @type stdout
</match>

How It Works

File Reading - in_tail starts reading from the current end of the file. After rotation, it resumes from the start of the new file.

Position Tracking - .pos file ensures continuity across Fluentd restarts, so no logs are missed.

Parsing - Each source uses the correct parser:

  • json
  • apache2
  • syslog

Transformation (via record_transformer) Enriches logs with:

  • Timestamp in ISO 8601 UTC format
  • Default log level (if not set)
  • Host’s private IP address
  • Source file name
  • Original tag

Output - All .log events are sent to stdout
(can be swapped for Elasticsearch, S3, etc.)

Previous
Installation