Navigating requests

DevOps Copilot Is a Helpful Assistant

DevOps Copilot Is a Helpful Assistant

Navigating requests

DevOps Copilot Is a Helpful Assistant


Adapting the Large Language Model DevOps Copilot to tasks in your project’s infrastructure makes it a useful assistant. Let’s outline its concept and implementation.

An IT service provider's daily work consists of supporting customers with their digitization projects. Typically, implementation requires a cross-functional team with heterogeneous skills, including software architects, developers, automation experts, and DevOps engineers. In recent years, DevOps skills have become increasingly relevant as many projects are built according to principles such as Shift Left, Automate Everything and “You build it, you run it”. The latter places responsibility for software operation in the hands of a project team.

As a result of this new responsibility, developers are facing more and more challenges in infrastructure configuration, monitoring, and tracing. The increasing use of cloud-native resources in particular leads to a greater variety of infrastructure services and configuration options. DevOps engineers are increasingly taking on the role of an internal support center. They get consulted for all matters relating to a project's infrastructure and are flooded with “can you just do this?” tasks.

Due to a project role’s specialization, this kind of developer behavior is understandable. You see that DevOps engineers have less time to perform actual tasks in the realm of automation. Automation experts also tried to relieve themselves and digitize their own support activities. However, it seemed impossible to counter the enormous variance of requests with appropriate tools. This basic attitude changed with the advent of LLMs and multi-agent systems.

Conceptual ideas

Large Language Models (LLMs) offer potential that can’t even be estimated yet due to their creativity in process digitalization. This especially applies to processes with an unknown path, or when there are too many paths to implement. These new possibilities gave rise to the idea of providing an LLM with functions corresponding to a DevOps engineer’s actual actions. If these functions are described semantically, the LLM should be able to independently create a sequence and solve a specific problem.

The size of the solution space is crucial for an LLM’s usability. Therefore, a list of “usual” activities shouldn’t be created. Instead, you should create basic building blocks that can be combined into a complex process through creative composition and choreography. The approach is similar to the difference between a traditional purpose-built toy and building blocks that give a child the opportunity to create almost anything out of small parts.

The first basic building blocks should allow the LLM to read and modify infrastructure configurations. In most projects, versioning and automation tools like Gitlab, GitHub, and TeamCity are used to manage Infrastructure as Code (IaC) and to provision corresponding resources as part of an automated pipeline. Accordingly, these tools’ APIs should be used to find, read, and modify specific resources in the code. Based on these initial functions, more automation systems in the ITSM/ESM environment can be connected to solve more complex tasks.

Another necessary basic building block is observing internal documentation and guidelines for using the infrastructure. A wiki like Confluence can be connected to contain specific information. Implementing the DevOps Copilot was based on this concept.

Implementation with Semantic Kernel

Implementing the DevOps Copilot is based on the Semantic Kernel framework from Microsoft (Fig. 1). Semantic Kernel enables the definition and implementation of skills available to the LLM. In combination with existing knowledge sources (“memory”), Semantic Kernel can create a plan to solve a given task. The plan can be checked, adjusted, and executed by the user (see also Fig. 3). Execution involves automatic orchestration of individual steps by creating a topological sorting based on their required inputs.

Fig. 1: Skill orchestration

Fig. 1: Skill orchestration (cf. Microsoft Semantic Kernel)

For example, a skill to change an IaC configuration requires the input for the file path, the line number, and the content to be inserted. If these inputs aren’t directly apparent from the user's task description, extra skills must be included in the plan first.

Fig. 2: Skill interface definition for Replace Line

Fig. 2: Skill interface definition for Replace Line

Each skill is made up of an imperative function with parameters, a return value, and a semantic description. Using natural language, the skill interface explains what the function can be used for, what inputs are needed, and what results you can expect. Figure 2 shows the skill interface for the Replace Line skill, which contains the path to the file, the line number, and the content to be inserted.

A skill is implemented in the same way as an Azure Function or AWS Lambda function by implementing a serverless HTTP endpoint. Input is passed as query parameters and can be used to execute the function. The skill also receives contextual information like the current user. Additional skills can be described and developed.

To implement the first use case, the DevOps Copilot should be given the power to make a configuration change in the IaC. Instead of a single skill for each conceivable request, generic skills were developed corresponding to a DevOps engineer’s granular actions. Only three skills are needed to handle the IaC and an existing roll-out automation:

Find File to determine the relevant file for the request Find Line to determine the relevant line within a file Replace Line to change the content of the relevant line and for the subsequent commit

With the help of these three skills, you can already make the necessary changes to the IaC configuration. For example, you can change an application server’s allocated memory.

But after the first tests, we see that initially, the skills selected the wrong files. It wasn’t clear which criteria could be used to find the line responsible for the memory. Here, it became clear that the LLM lacked the specific domain knowledge to take into account the conventions present in the project. Usually, naming and structuring conventions exist depending on which files and configuration blocks are created. The LLM needs context to answer a user query correctly.

In the context of Semantic Kernel, this knowledge can be stored in the memory and be automatically added as a meta-prompt for each query. The memory can be realized with different technologies or interfaces. For our use case, we used the Chroma Vector DB. The project wiki content is easily accessible.

Fig. 3: Example plan for changing the memory allocation of the foobar service

Fig. 3: Example plan for changing the memory allocation of the foobar service

Figure 3 shows a plan displayed in the DevOps Copilot interface. The Copilot shows the individual steps, the skills used, and the data flow. The user can check the plan and make adjustments if data from the request has been woven in incorrectly. Once the plan is executed, individual skills are called up and change the infrastructure configuration so that the foobar service has 512 MB of memory.

Conclusion and outlook

The ability to translate a user's natural language queries into concrete, executable plans offers enormous potential for process automation. By using generic skills, the ability to combine can be increased. A copilot can solve tasks that weren’t initially considered and prepare for the future.

In the course of developing the DevOps Copilot, we found that combining it with domain or project knowledge in the form of wikis, coding guidelines and best practices leads to reproducibly reliable behavior.

Integrating authorization techniques so you can decide in fine detail which skills can be executed by which user still needs to be finished. So far, the Semantic Kernel Framework has been implementing control with Azure AD/Entra, which is only used for some of our projects. We’re working on a solution to integrate different identity and access management (IAM) systems so that the DevOps Copilot can be technology-agnostic.

We plan to integrate further systems in the future in the form of skills to achieve an even higher degree of automation. From our point of view, the biggest advantage is the ability of all project members, regardless of experience and area of responsibility, to be involved in infrastructure activities.


Gen AI Engineering Days 2024

Live on 29 & 30. October 2024 | 13:00 – 16:30 CEST