background with green dots
Background

Transform your logs in-flight with Lua, AI, and Calyptia

Written by Erik Bledsoe in Calyptia Coreon April 3, 2024

Transform your logs in-flight with Lua, AI, and Calyptia

In the modern infrastructure and application landscape, organizations face significant challenges collecting, processing, and managing observability data. There is always more data to ingest and more tools to route to. The result is more noise, decreased developer efficiency, and increased costs. 

Telemetry pipelines like Calyptia, from the creators of Fluent Bit, enable you to seamlessly manage your data from any source to any destination. Calyptia integrates with your existing sources in minutes and enables turnkey enrichment, processing, transformation, and routing of data to multiple destinations.

Calyptia includes dozens of out-of-the-box processing rules for transforming and enriching data in flight. These rules include converting data from one format to another, removing fields, redacting sensitive data, and more. 

There may be times, however, when no out-of-the-box solution will accomplish the exact processing you need. For example, you may need to apply some complex business logic to the data or enrich the data with some sort of computation. For such bespoke needs, Calyptia also lets you write custom processing rules using Lua. 

Lua is a lightweight, high-level, multi-paradigm scripting language designed primarily for embedded use in applications. It has a Python-like syntax, making it easy for many developers to pick up. It is widely used as an extension library, including in apps such as World of Warcraft, Adobe Photoshop Lightroom, and Redis. Lua’s built-in pattern matching makes it well suited for parsing and transformation of records, and Lua scripts are typically much less resource-intensive than complex regex formulas.

But what if you don’t know Lua or you need some help debugging your code? Calyptia integrates Generative Pre-trained Transformers (GPT) to bring the power of artificial intelligence to help you write your bespoke Lua code. AI is also embedded into many of Calyptia’s other out-of-the-box processing rules to provide assistance with generating regex and more.

In this blog post, we’ll demonstrate how to use Lua and GPT to simplify the processing of observability data as it flows through your pipelines.

Getting started

For the purposes of this blog, we’ll skip setting up the initial telemetry pipeline, but if you are new to Calyptia, you may want to watch this walk-through of installing Calyptia and configuring a pipeline or check out the docs


If you would like to follow along with the examples you can sign up for a free trial of Calyptia. We’ll be using the processing rules playground. The playground lets you experiment with processing rule modifications nondestructively. You can even import samples of your actual data so that you can see exactly the impact of any modifications before applying them to your actual pipeline.

Screenshot with arrows pointing to the playground menu item, the processing menu item, and the button labeled launch playground

We’ll be using an Apache2 access log as sample input. You can download the data here if you would like to use it to follow along with the examples. 

Parse your log data into structured JSON using AI

When you first launch the playground it preloads some default sample data in the input screen. Simply select and delete the default data and paste in our sample log file. Then hit the “Run Actions” button. With no other processing rules enabled, Calyptia transforms the raw log data into JSON.

Animated gif showing a user replacing the default sample data and then parsing it into JSON

However, in this case, it has created each log entry as a single key value pairing. 

{
  "log": "79.172.141.87 - - [01/Feb/2022:06:40:58 +0000] \"GET /wp-content HTTP/1.0\" 200 5000 \"https://thomas.com/search.jsp\" \"Mozilla/5.0 (iPad; CPU iPad OS 9_3_5 like Mac OS X) AppleWebKit/534.0 (KHTML, like Gecko) FxiOS/11.6r9285.0 Mobile/26R322 Safari/534.0\""
}
{
  "log": "17.67.206.39 - - [01/Feb/2022:06:41:28 +0000] \"PUT /posts/posts/explore HTTP/1.0\" 200 4966 \"http://www.burns.net/\" \"Mozilla/5.0 (Windows NT 5.2; tt-RU; rv:1.9.1.20) Gecko/2015-05-23 16:51:49 Firefox/3.8\""
}

We prefer a more structured output, so our first transformation will be to parse the logs into distinct key-value pairs.

Press the “Add New Action” and then select Parse from the dropdown list of available actions. Type in “log” as the source key. You can leave the Destination Key field as the default “parsed” value. 

The Parse rule uses regular expressions (regex)  to transform your input. But if you are like me, generating the proper regex is difficult. But fortunately it is something that GPT excels at. Click the link “Generate regular expression with GPT” to let it go to work. 

Screen shot showing the three steps

The GPT generated regex will appear in the window in less time than it would take to type it, and much less than the time I would normally take Googling and experimenting to see if someone had already solved the problem for me.

Tip: If you receive an error message when GPT tries to generate the regex, try removing all but a single log entry from the input screen and try again. You can add the full dataset back after GPT successfully generates the regex.

Caution: Both your prompt and the sample data in your window will be sent to OpenAI. If you do not wish for your sample data to be shared with OpenAI do not use actual data.

Apply the new Parse rule and then run actions again to see the result, which should look something like this.

{
  "parsed": {
    "time": "01/Feb/2022:06:40:58 +0000",
    "user": "-",
    "path": "/wp-content",
    "code": "200",
    "agent": "Mozilla/5.0 (iPad; CPU iPad OS 9_3_5 like Mac OS X) AppleWebKit/534.0 (KHTML, like Gecko) FxiOS/11.6r9285.0 Mobile/26R322 Safari/534.0",
    "method": "GET",
    "referer": "https://thomas.com/search.jsp",
    "size": "5000",
    "host": "79.172.141.87"
  },
  "log": "79.172.141.87 - - [01/Feb/2022:06:40:58 +0000] \"GET /wp-content HTTP/1.0\" 200 5000 \"https://thomas.com/search.jsp\" \"Mozilla/5.0 (iPad; CPU iPad OS 9_3_5 like Mac OS X) AppleWebKit/534.0 (KHTML, like Gecko) FxiOS/11.6r9285.0 Mobile/26R322 Safari/534.0\""
}

Clean up your JSON output with out-of-the-box processing rules

Our output is closer to what we expect to see, but we probably don’t want to keep the entire untransformed log entry as a single key-value pair. Nor do we want our log data to be a child of the parsed key. 

Calyptia lets you apply multiple rules to your data. They will run in the order presented in the Actions section of our screen. 

Let’s add two more out-of-the-box actions. 

First add a Delete key and indicate that you want to delete the “log” key. 

Then add a “Flatten subrecord” rule and indicate that the key you would like to flatten is “parsed” — you can leave the other settings as the defaults. 

Once you apply the rules and run the actions, your output should look like this:

{
  "time": "01/Feb/2022:06:40:58 +0000",
  "user": "-",
  "path": "/wp-content",
  "code": "200",
  "referer": "https://thomas.com/search.jsp",
  "agent": "Mozilla/5.0 (iPad; CPU iPad OS 9_3_5 like Mac OS X) AppleWebKit/534.0 (KHTML, like Gecko) FxiOS/11.6r9285.0 Mobile/26R322 Safari/534.0",
  "method": "GET",
  "host": "79.172.141.87",
  "size": "5000"
}

Create a custom transformation with Lua

In the examples above, we’ve seen how Calyptia can save hours of an engineer’s time by using Calyptia out-of-the-box processing rules to transform log data rather than writing custom transformations. But there may be times when you need a more bespoke solution. That’s what the “Custom Lua” action provides.

We’ll start with a very simple example. Adding a new key-value pair to our output. 

Add a new action, and select “Custom Lua.” Replace the default text with the following.

return function(tag, ts, record, code)
  record.greeting = "Hello world"
  return code, ts, record
end

In the Custom Lua processing rule, you can access a particular field ("field_a" for example) by using record.field_a or record['field_a']. This works both for existing fields and new fields you would like to create. So our Lua code creates a new field named greeting and adds a value of Hello world to it. 

Once you apply the rule and run the actions, your output should look something like this.

{
  "time": "01/Feb/2022:06:40:58 +0000",
  "user": "-",
  "path": "/wp-content",
  "referer": "https://thomas.com/search.jsp",
  "code": "200",
  "method": "GET",
  "size": "5000",
  "greeting": "Hello world",
  "host": "79.172.141.87",
  "agent": "Mozilla/5.0 (iPad; CPU iPad OS 9_3_5 like Mac OS X) AppleWebKit/534.0 (KHTML, like Gecko) FxiOS/11.6r9285.0 Mobile/26R322 Safari/534.0"
}

Truthfully, we could have accomplished the same effect more easily by applying the out-of-the-box “Add/Set key/value” rule, but this is a Lua tutorial. Now let’s do something more interesting than saying hello to the world. 

Use AI to generate your Lua transformation

Let’s imagine a situation where you want to alert when there are errors accessing your application, and when a particular client is having issues you want to elevate the situation immediately. Calyptia can process the data in-flight and apply your business logic in such a way that when your alerting platform receives the data the actionable items are already clearly identified.

Let’s add another Custom Lua rule. But this time, we’ll take advantage of Calyptia’s GPT integration to generate a Lua script that is more complex than our previous “hello world” effort. 

In the comment field of the Custom Lua rule enter the following prompt and click “Generate processing rule from comment with GPT”:

If the value of code is anything other than "200" set priority equal to "P3". However, if the code is anything other than "200" and the referer contains "burke.com" set priority to "P1". Otherwise, if code equals "200" set priority to ""

Calyptia’s GPT integration should generate a Lua script resembling this (complete with commenting!):

return function(tag, ts, record)
    -- check if code is not equal to "200"
    if record['code'] ~= '200' then
        -- check if referer contains "burke.com"
        if string.match(record['referer'], 'burke.com') ~= nil then
            -- set priority to "P1"
            record['priority'] = 'P1'
        else
            -- set priority to "P3"
            record['priority'] = 'P3'
        end
    else
        -- set priority to ""
        record['priority'] = ''
    end
    return 1, ts, record
end

After we apply and run the action, we can search our output to find our transformations.

Tip: You can search both the input and the output in the processing rules by clicking into the area and then pressing Ctrl-F / Command-F. You can also use regex in your search. 

A log with a code value of 200 should look like this, with the value of priority being empty. 

{
  "size": "5000",
  "host": "79.172.141.87",
  "priority": "",
  "user": "-",
  "path": "/wp-content",
  "code": "200",
  "time": "01/Feb/2022:06:40:58 +0000",
  "referer": "https://thomas.com/search.jsp",
  "greeting": "Hello world",
  "agent": "Mozilla/5.0 (iPad; CPU iPad OS 9_3_5 like Mac OS X) AppleWebKit/534.0 (KHTML, like Gecko) FxiOS/11.6r9285.0 Mobile/26R322 Safari/534.0",
  "method": "GET"
}

When we find a log with a 404 error it should look like this, with the value of priority being P3.

{
  "size": "4943",
  "host": "180.54.101.99",
  "priority": "P3",
  "user": "-",
  "path": "/wp-content",
  "code": "404",
  "time": "01/Feb/2022:06:42:31 +0000",
  "referer": "http://www.butler.org/posts/blog/register/",
  "greeting": "Hello world",
  "agent": "Mozilla/5.0 (iPod; U; CPU iPhone OS 4_1 like Mac OS X; ckb-IQ) AppleWebKit/534.20.3 (KHTML, like Gecko) Version/4.0.5 Mobile/8B112 Safari/6534.20.3",
  "method": "PUT"
}

But the same error when it comes from burke.com warrants a P1 prioritization.

{
  "size": "4917",
  "host": "154.24.70.230",
  "priority": "P1",
  "user": "-",
  "path": "/posts/posts/explore",
  "code": "404",
  "time": "01/Feb/2022:17:01:24 +0000",
  "referer": "http://www.burke.com/privacy/",
  "greeting": "Hello world",
  "agent": "Mozilla/5.0 (X11; Linux i686; rv:1.9.5.20) Gecko/2015-01-04 18:00:51 Firefox/14.0",
  "method": "GET"
}

Next steps

In just a few minutes we have used Calyptia to transform raw Apache2 logs into structured JSON and applied additional bespoke transformations that reflect specific business requirements. Although we didn’t cover building a pipeline in this blog, it would only take a few more minutes to configure Calyptia to send our transformed data to any number of destinations. 

To discover how Calyptia could transform and simplify your observability practice, request a customized demo that addresses your specific use cases. 

Or sign up for a free trial and use the processing rules playground as we did today, but using your own sample data. 



You might also like

Calyptia Core adds support for Redpanda

Calyptia Core adds support for Redpanda

Calyptia Core now supports Redpanda as a destination for high-volume streaming data pipelines.

Continue reading
Photo of user at computer from over their left shoulder

Protecting Customer Data with Calyptia Core

Calyptia Core lets you identify sensitive data and redact, remove, or hash it midstream before it is stored where it doesn’t need to be stored.

Continue reading
Getting started with Calyptia Core

Getting Started with Calyptia Core

A video demonstration of installing Calyptia Core and using the UI to create an auto-healing, auto-scaling telemetry pipeline to send data to OpenSearch.

Continue reading