Kubernetes log processing: 4 challenges and how to solve them

CAST AI
· 5 min read

Application logs are a great help in understanding what’s happening inside the application. They come in handy for debugging and monitoring cluster activity. But what are the most common challenges in Kubernetes log processing, and how can developers deal with them? 

In this article, we explore the problem of log collection and processing in Kubernetes by asking a few important questions. We also show how a managed service can help developers take care of logging without breaking a sweat (and devote their precious time to coding instead).

1. How to meet logging Service Level Agreements (SLAs)?

You will find only one pod per each Kubernetes worker node. If that pod ends up being rescheduled, this will impact all of the other pods in the worker node. And this presents developers with a new challenge.

Since every node can run up to 110 pods, developers need to find a method of ensuring that their log agency is able to collect logs from all of those pods.

This often creates a noisy environment that developers might find hard to navigate. Moreover, a single error might create more errors in a given worker node. 

One way to deal with that would be to disable logging the namespace which is proving to be problematic. However, while it’s possible to avoid emitting the log, doing the same for a part of collecting the log isn’t an option.

Now, a slow disk might result in significant latency for log transport. If you fail to solve the back-pressure issues, you might end up with DDoS on your log agent.

That’s why it’s so important to have a fast disk. Also, developers should take their time while solving back-pressure problems to prevent any logging issues from arising (and taking even more time to solve).

2. What’s the deal with namespace logging?

When all of your workloads run in shared worker VMs, ones that belong to different projects are divided by namespaces. Since different projects might come with some unique logging preferences – for example, where the log goes to or which tools are used to manage it. Developers need to find a way to configure them without affecting their overall security.

How can you achieve that? For starters, you can use the Kubernetes CRD (Custom Resource Definition). Just use the kubectl command from this kubectl cheatsheet. Then apply role-based access control (RBAC) to this particular resource. 

This is how you keep your security measures in place. This process is quite similar to one called “sink resource” in Pivotal Container Service (PKS). 

3. How to collect all the critical logs? 

If something goes wrong in your application, pods might be deleted and recreated quickly. What happens to the log file? Most likely, it follows the fate of the pod.

If you fail to collect important logs when something goes wrong, your chances of solving the problem fast will be much lower. 

The Kubernetes community is probably working on a solution to this challenge. But there’s something you can do right now to avoid this issue – use a log agent. Such agents can scan new folders or log patterns at regular intervals to capture data even from your short-lived pods. You can even set a 1-second interval to achieve higher performance.

4. How to handle layered logging?

When using Kubernetes, developers create clusters that consist of multiple layers that require monitoring. And each of them produces different types of logs. That’s why it’s so important that you understand exactly what logs are generated – and then how to use them. 

In Kubernetes, you will find logs of many different kinds. Those different logs come with different characteristics and priorities. And those can differ not only on a layer vs. layer basis. You might also find different SLAs for the same layer.

You can only imagine what happens when they’re all logged together. Developers have to deal with so many layers in the Kubernetes container system that it all becomes really hard to handle.

How to get out of the logging complexity?

Sure, you’d rather spend time on writing code than trying to implement these logging improvements. But hear us out. If you don’t have end-to-end visibility, how can you troubleshoot issues efficiently?

The fastest way to do logging correctly is to use a managed Kubernetes service that sets it up for you. It’s also worth seeing if parts of this setup can be adjusted to your needs.

As an example, when you build your first cluster with CAST AI we will automatically install tools to handle the logging. There’s Prometheus for Metrics collection, ElasticSearch and Filebeat for log collection and indexing. We also provide single sign-on to observability frontends such as Kibana for logs and Grafana for metrics. Moreover, we expose Hubble based on Cilium for network and service visibility.

The best part is – if you prefer another tool for one of those functions, you can remove and substitute the components we installed. Our primary goal is to make your life in DevOps easier.

This sort of freedom to choose is something we cherish at CAST AI. Not only will you have the opportunity to easily replace components in your cluster. Your cluster will span across multiple clouds so you’re never locked in with them either.

Explore a sensible way to handle the complexity of logging in Kubernetes. Sign up for CAST AI here.

P.S. We’re offering free cloud credentials to test CAST AI for a limited time. Just write credentials to our support after you’ve signed up.

Leave a reply

0 Comments
Inline Feedbacks
View all comments