Log analysis for application monitoring

Have you ever heard of logs? The simple supervision of these is one of the biggest headaches when carrying out research and support tasks and poses one of the most recurrent problems in software development companies. Let’s see how we can simplify and automate these tedious tasks with some specific tools.

What are logs and what are they for?

Most companies in the software sector highly value the quality of the support they offer to their different customers, considering both speed and efficiency. This task is complex and involves tasks of different kinds. Even so, in most cases, the problems they face have to do with errors or unwanted behaviors in their applications. Therefore, the information they generate at runtime is of vital importance.

This information is materialized in logs. Logs are a sequential collection of all events that occur in the course of a particular application process. Normally, they are collected in a file or a database and constitute evidence of its behavior.

Types of logs

There are different types of events of interest, and therefore, different sources of logs coexist within the same system. Each of these sources is specialized in providing information of a specific type, being able to reflect, for example:

  • The connections that are established with users or servers.
  • The queries made to the database.
  • The flow of the user within the application.

Taking into account the disparity in the format of the aforementioned log sources, the different volumes of data they produce and the possible complications that may appear in relation to time references (in distributed systems, the time of the server and that of a client is not the same), the process of analysis and abstraction of events is arduous. Here is where the importance of having a correct system for log centralization and analysis lies.

How to analyze logs

There are currently multiple and diverse tools for this purpose. Most of these offer basic utilities to store, process, format, filter and sort logs, as well as other more complex ones used to detect anomalies and notify whoever proceeds when they occur.

As with any other problem in which we find a large volume of data from which we want to extract accurate information (Big Data), the application of specialized data mining techniques for the problem in question makes the process much easier. These techniques can obtain results that replace the mere inspection of the logs providing the same information in a much more concise way, or even bring out information that would otherwise be impossible to study when it goes completely unnoticed.

Structure of an analysis system

In most cases, companies involved in the deployment of a log analysis system usually request to have it cloud-hosted, in order to benefit from its advantages. Among the most noteworthy advantages of this type of service are:

  • High availability and accessibility.
  • The optimization of the resources used.
  • The latent possibility of scaling the deployed services.
  • Reducing equipment configuration and maintenance times.

The existing SaaS tools specialized in log analysis, also known as search platforms, are abundant and have a community of considerable size. All of them perform tasks of storage and filtering of logs, but delegate to other types of solutions the tasks of reading, processing and loading them.

This is where log aggregators and forwarders come into play. These tools are intended to read logs from various sources, whether files or connections established on specific ports, process them by extracting the desired fields from them and forward them to a specific destination. They are able to carry out an individualized treatment for each source of logs even working with several simultaneously, and they can also have different treatments for the same source if different destinations are specified.

If we also wanted to expand the functionalities of our search platform and perform more advanced machine learning tasks, we could always look for another specialized cloud microservice that can be integrated with the rest of the deployed components. The offer of this type of utilities is also high and there are usually no limiting problems when integrating one of them.

Log Aggregators and Forwarders examples

  • Logstash: free data processing pipeline that ingests data from a multitude of sources, transforms it, and sends it to its destination. It is the lowest layer of the Elastic Stack, which makes it the most notorious option among the existing ones.
  • FluentD: open source data collector focused on creating a unified logging layer. It has a large number of plugins and a fairly high user community, and this is why it is postulated as the main alternative to Logstash.
  • Fluent Bit: ultra-fast, lightweight and highly scalable log processor and forwarder. It bills itself as the simplified and agile version of its big brother FluentD and is currently a community favorite. Although it has a smaller number of plugins, its integration possibilities are more than enough.

Search Platforms examples

  • Elasticsearch: Lucene-based search server. It provides a distributed full-text search engine with multitenancy capability. It is normally used with Kibana, the last layer of the Elastic Stack focused on visualization and log exploitation. These two tools, when they were still open source, were the starting point for numerous SaaS focused on this type of work, such as io or OpenSearch.
  • OpenSearch: open source search and analytics engine built by AWS and powered by the community. It grew out of the two previously detailed components of the Elastic Stack, Elasticsearch and Kibana, which were open source at the time. Considering its huge list of implemented and future functionalities, it is undoubtedly the most complete, versatile and powerful on the market.
  • Datadog: monitoring service for cloud applications that provides monitoring of systems of all kinds through a SaaS data analysis platform. It is postulated as the most interesting option among the non-Elasticsearch-based options.