Logs
Collecting Logs
If you want to collect and route logs - use Fluentd agent
Overview
Fluentd is a high-performance, open-source data collector designed for unified logging layers in distributed systems.
For more information and configuration guides, visit the official Fluentd site.
Fluentd Architecture
Fluentd operates on the principle of a unified logging layer:
- Input Plugins – Collect raw data from files, network sources, APIs, etc.
- Parsers – Structure unformatted log lines (JSON, Apache, syslog, or custom regex).
- Filters – Transform, enrich, and normalize logs.
- Buffers – Store events temporarily (in-memory or file-based) for durability and fault tolerance.
- Output Plugins – Route logs to storage or processing systems (Elasticsearch, Kafka, S3, stdout, etc.).
Installation (td-agent)
td-agent
is the production-ready build of Fluentd.
Example for Ubuntu 20.04 (focal):
curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh | sh
Key Plugin: in_tail (File Tailing)
The in_tail
plugin continuously reads the end of log files, similar to the tail -F command.
Basic Example:
Fluentd Configuration
<source>
@type tail
path /var/log/json.log
pos_file /fluentd/logs/json.pos
tag json.log
<parse>
@type json
</parse>
</source>
Parameters
Parameter | Description |
---|---|
path | Path to the log file (supports glob patterns) |
pos_file | Tracks last read position to resume after restart |
tag | Routes events through the pipeline |
parse | Defines the parser type (json , apache2 , syslog , or custom) |
Advanced Multi-Source Configuration
This configuration reads JSON, Apache, and Syslog logs, enriches them with metadata, and outputs to stdout.
<source>
@type tail
path /var/log/json.log
pos_file /fluentd/logs/json.pos
tag json.log
<parse>
@type json
</parse>
</source>
<source>
@type tail
path /var/log/apache.log
pos_file /fluentd/logs/apache.pos
tag apache.log
<parse>
@type apache2
</parse>
</source>
<source>
@type tail
path /var/log/syslog.log
pos_file /fluentd/logs/syslog.pos
tag syslog.log
<parse>
@type syslog
</parse>
</source>
<filter *.log>
@type record_transformer
enable_ruby true
<record>
Timestamp ${Time.at(time.to_i).utc.strftime('%Y-%m-%dT%H:%M:%SZ')}
Level ${record["level"] || "Info"}
Message ${record["message"]}
Source ${record["source"]}
SourceIp "#{Socket.ip_address_list.detect(&:ipv4_private?)&.ip_address}"
LogPath /var/log/${tag.split('.').first}
TagName ${tag}
</record>
</filter>
<match *.log>
@type stdout
</match>
How It Works
File Reading - in_tail
starts reading from the current end of the file. After rotation, it resumes from the start of the new file.
Position Tracking - .pos
file ensures continuity across Fluentd restarts, so no logs are missed.
Parsing - Each source uses the correct parser:
json
apache2
syslog
Transformation (via record_transformer
) Enriches logs with:
- Timestamp in ISO 8601 UTC format
- Default log level (if not set)
- Host’s private IP address
- Source file name
- Original tag
Output - All .log
events are sent to stdout
(can be swapped for Elasticsearch, S3, etc.)