Setup a unified logging layer for your python applications.

Introduction:

The craze for analytics seems to be substantially growing. This means the significance of data has sky-rocketed like never before. I’d personally consider logging mechanism as a predominant contributor for data monitoring, analysis, debugging and last but not least identifying data patterns.

Let’s take an example of a VM running different applications or services.

All the applications collect logs which have different log structures based on the tech stack. What if you want a unified system to collect logs from different applications?

The legacy logging systems do not have the intelligence to parse data from diverse systems. We need a unified logging layer that can combine and parse logs from different systems.

Why Fluentd?

Fluentd is an open source data collector for building the unified logging layer. Once installed on a server, it runs in the background to collect, parse, transform, analyze and store various types of data.

https://www.fluentd.org/faqs

Fluentd tries to structure data as JSON as much as possible: this allows Fluentd to unify all facets of processing log data: collecting, filtering, buffering, and outputting logs across multiple sources and destinations.

Fluentd decouples data sources from backend systems by providing a unified logging layer in between.

A unified logging layer lets you and your organisation make better use of data and iterate more quickly on your software.

Installing Fluentd via terminal:

i.Download the binary

There are multiple ways to install fluentd . For brevity, we will just stick to one method.

# td-agent 4 (experimental)
curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-jammy-td-agent4.sh | sh

ii. Install and start the service

$ sudo systemctl start td-agent.service
$ sudo systemctl status td-agent.service

● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
     Loaded: loaded (/lib/systemd/system/td-agent.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-05-16 20:28:04 IST; 45s ago
       Docs: https://docs.treasuredata.com/display/public/PD/About+Treasure+Data%27s+Server-Side+Agent
    Process: 7154 ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (cod>
   Main PID: 7160 (fluentd)
      Tasks: 9 (limit: 9325)
     Memory: 96.6M
        CPU: 1.549s
     CGroup: /system.slice/td-agent.service
             ├─7160 /opt/td-agent/bin/ruby /opt/td-agent/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent>
             └─7163 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /opt/td-agent/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /v>

May 16 20:28:03 dineshkumarkb systemd[1]: Starting td-agent: Fluentd based data collector for Treasure Data...
May 16 20:28:04 dineshkumarkb systemd[1]: Started td-agent: Fluentd based data collector for Treasure Data.

iii. Test your server

$ curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test
$ tail -n 1 /var/log/td-agent/td-agent.log
2018-01-01 17:51:47 -0700 debug.test: {"json":"message"}

The default configuration (/etc/td-agent/td-agent.conf) is to receive logs at an HTTP endpoint and route them to stdout.

For more info on installing and troubleshooting fluentd, please refer here.

Configuring fluentd:

Now that the installation is done, let’s do the configuration. The configuration looks something similar to the config file of an Apache server.

The config file is located at /etc/td-agent/td-agent.conf.

Initial config file:

####
## Output descriptions:
##

# Treasure Data (http://www.treasure-data.com/) provides cloud based data
# analytics platform, which easily stores and processes data from td-agent.
# FREE plan is also provided.
# @see http://docs.fluentd.org/articles/http-to-td
#
# This section matches events whose tag is td.DATABASE.TABLE
<match td.*.*>
  @type tdlog
  @id output_td
  apikey YOUR_API_KEY

  auto_create_table
  <buffer>
    @type file
    path /var/log/td-agent/buffer/td
  </buffer>

  <secondary>
    @type file
    path /var/log/td-agent/failed_records
  </secondary>
</match>

## match tag=debug.** and dump to console
<match debug.**>
  @type stdout
  @id output_stdout
</match>

####
## Source descriptions:
##

## built-in TCP input
## @see http://docs.fluentd.org/articles/in_forward
<source>
  @type forward
  @id input_forward
</source>

## built-in UNIX socket input
#<source>
#  type unix
#</source>

# HTTP input
# POST http://localhost:8888/<tag>?json=<json>
# POST http://localhost:8888/td.myapp.login?json={"user"%3A"me"}
# @see http://docs.fluentd.org/articles/in_http
<source>
  @type http
  @id input_http
  port 8888
</source>

In the config file you could match and filter the logs and print them to stdout or forward the logs to a central log server. The default option is to log the output to /var/log/td-agent/td-agent.log on ubuntu systems.

The initial log file before any output is written looks something like below.

2023-05-17 07:26:52 +0530 [info]: #0 starting fluentd worker pid=2052 ppid=2049 worker=0
2023-05-17 07:26:53 +0530 [info]: #0 [input_debug_agent] listening dRuby uri="druby://127.0.0.1:24230" object="Fluent::Engine" worker=0
2023-05-17 07:26:53 +0530 [info]: #0 [input_forward] listening port port=24224 bind="0.0.0.0"
2023-05-17 07:26:53 +0530 [info]: #0 fluentd worker is now running worker=0
2023-05-17 07:26:52.346049769 +0530 fluent.info: {"pid":2052,"ppid":2049,"worker":0,"message":"starting fluentd worker pid=2052 ppid=2049 wor>
2023-05-17 07:26:53.044804817 +0530 fluent.info: {"uri":"druby://127.0.0.1:24230","object":"Fluent::Engine","worker":0,"message":"[input_debu>
2023-05-17 07:26:53.047743097 +0530 fluent.info: {"port":24224,"bind":"0.0.0.0","message":"[input_forward] listening port port=24224 bind=\"0>
2023-05-17 07:26:53.048527633 +0530 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"}

Please note that the output has been edited for brevity. You will see the above message at the end of the file.

iv. Edit the config file

Now, let’s edit the config to listen to traffic on port 24224 and output them to stdout. please add the below lines in your config file if not available already and save it.

<source>
  @type forward
  port 24224
  bind 0.0.0.0
  @id input_forward
</source>

<match *.**>
  @type stdout
</match>

v. Restart your td-agent.

After saving, do not forget to restart your td-agent.

$ sudo systemctl restart td-agent.service

Now that we’re done with the fluentd configuration, we will move on to the application.

Application:

Python has a great support for plethora of libraries which includes one for fluentd logging.

fluent-logger-python is a Python library, to record the events from Python application.

https://github.com/fluent/fluent-logger-python

https://pypi.org/project/fluent-logger/

$ pip install fluent-logger

Python Code:

Let’s take an example of a simple list to dictionary conversion. Our objective is just to get the logs to the fluentd.

from logging import getLogger
import logging.handlers
from fluent import handler

log_format = f"%(levelname)s:%(filename)s:%(lineno)d - %(asctime)s - %(message)s"

logger = getLogger(__name__)

# app name, host name and ports are configured here
fluent_handler = handler.FluentHandler("New App", host='localhost', port=24224)

logger.addHandler(fluent_handler)

logging.basicConfig(level=logging.INFO, format=log_format, datefmt="%Y-%m-%d %H:%M:%S")


def convert_list_dict(input_lst: list) -> dict:
    logger.info(f" This is a simple program in python to convert list to dict")

    return {input_lst[i]: input_lst[i+1] for i in range(0, len(input_lst), 2)}


if __name__ == "__main__":
    logger.info(" Initiating conversion ")
    my_lst = ["a", "1", "b", "2", "c", "3"]
    result = convert_list_dict(my_lst)
    logger.info(f" Conversion complete ")
    logger.info(f" Result : {result} ")

On executing this code, the logs should appear on the config file located at

/var/log/td-agent/td-agent.log

2023-05-17 07:26:53.047743097 +0530 fluent.info: {"port":24224,"bind":"0.0.0.0","message":"[input_forward] listening port port=24224 bind=\"0>
2023-05-17 07:26:53.048527633 +0530 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"}
2023-05-18 17:44:09.000000000 +0530 New App: " Initiating conversion "
2023-05-18 17:44:09.000000000 +0530 New App: " This is a simple program in python to convert list to dict"
2023-05-18 17:44:09.000000000 +0530 New App: " Conversion complete "
2023-05-18 17:44:09.000000000 +0530 New App: " Result : {'a': '1', 'b': '2', 'c': '3'} "
2023-05-18 17:44:16.000000000 +0530 New App: " Initiating conversion "
2023-05-18 17:44:16.000000000 +0530 New App: " This is a simple program in python to convert list to dict"
2023-05-18 17:44:16.000000000 +0530 New App: " Conversion complete "
2023-05-18 17:44:16.000000000 +0530 New App: " Result : {'a': '1', 'b': '2', 'c': '3'} "

There you go! We now have our application logs showing up in fluentd.

If you’re interested in forwarding syslogs to a remote server, please read here.

Happy Coding!!

Please do show your appreciation and follow me for more content on python.

Set up a Unified Logging Layer for Your Python Applications

Configure and install Fluentd for your Python application logging