Simplifying Content Management using local AI with LLAMA and Optimizely 12

In today’s fast-paced digital content landscape, the efficiency of content editors is of importance. One of the time-consuming tasks for content editors is filling in of metadata for web pages. But what if we told you there’s a solution that can automate this process?

In this blog post, we’ll explore how LLama, a language model, can help content editors by automatically generating metadata fields in Optimizely CMS 12.

Creating Content Models:

Let’s begin by setting up the foundation. We’ll create a Content Page definition using Optimizely CMS 12, implementing the IContentPage interface. This interface will define the key metadata fields we want LLama to automatically populate.

public interface IContentPage
{
    public XhtmlString? Content { get; set; }
    public string? MetaTitle { get; set; }
    public IList? MetaKeywords { get; set; }
    public string? MetaDescription { get; set; }
}

[ContentType(
    DisplayName = "Content Page",
    Description = "Displays a Content Page",
    GUID = "19671657-B684-4D95-A61F-8DD4FE60D559",
    GroupName = "Content")]
public class ContentPage : PageData, IContentPage
{
    // ... Properties and attributes
}

Hooking into the Publish Event:

Now that our modeling is complete, let’s connect to the publishing life cycle of the content. This can be achieved in a few steps. First, create an event listener, hook into the publishing event, and add the event listener to the IoC container. Ensure that you hook into the right event, as the published event will be too late in the publish life cycle. Finally, create an initialization module to add the new event listener.

public class ContentEventListener
{
    private readonly IContentEvents _contentEvents;
    private readonly ILLamaService _lLamaService;

    public ContentEventListener(
        IContentEvents contentEvents,
        ILLamaService lLamaService)
    {
        _contentEvents = contentEvents;
        _lLamaService = lLamaService;
    }

    public void AddEvents()
    {
        _contentEvents.PublishingContent += _contentEvents_PublishingContent;
    }

    private void _contentEvents_PublishingContent(object? sender, ContentEventArgs e)
    {
        if (e.Content is not IContentPage contentPage) return;

        // Get content without HTML (Be aware of the LM size limit) 
        var pageContent = contentPage.Content?.GetWithoutMarkup() ?? string.Empty;
        pageContent = pageContent.Length > 500 ? pageContent.Substring(0, 500) : pageContent;

        // Ask for a Meta title/ description and keywords based on the content
        var metaInformation = _lLamaService.GetMetaInformation(pageContent);

        // Fill in the data
        contentPage.MetaDescription = metaInformation.MetaDescription;
        contentPage.MetaTitle = metaInformation.MetaTitle;
        contentPage.MetaKeywords = metaInformation.MetaKeywords;
    }
}

[InitializableModule]
[ModuleDependency(typeof(EPiServer.Initialization.CmsCoreInitialization))]
public class ContentEventsSubscriber : IInitializableModule
{
    public void Initialize(InitializationEngine context)
    {
        context.Locate.Advanced.GetInstance().AddEvents();
    }

    public void Uninitialize(InitializationEngine context)
    {

    }
}

Generating the Meta Information:

With the event handling in place, the next challenge is to instruct LLama to generate metadata based on the content provided by the content editor. In this section, we dive into the process of communicating with LLama, specifying the instructions and formatting for the metadata output.

Our starting point is the result of our previous blog: Our first steps into local AI.

public interface ILLamaService
{
    MetaInformation GetMetaInformation(string content);
}
public class LLamaService : ILLamaService
{
    private readonly IWebHostEnvironment _env;
    private LLamaWeights? _model;
    private ModelParams? _parameters;

    public LLamaService(IWebHostEnvironment env)
    {
        _env = env;

        Initialize();
    }

    public MetaInformation? GetMetaInformation(string content)
    {
        if (_model == null || _parameters == null)
        {
            // Could not be initialized. Return here
            return new MetaInformation();
        }

        // Initialize a chat session
        using var context = _model.CreateContext(_parameters);

        var ex = new InstructExecutor(context);
        var session = new ChatSession(ex);

        var prompt = "Hi Alice, how are you doing?";

        var inter = new InferenceParams
        {
            Temperature = 0.1f
        };

        foreach (var text in session.Chat(prompt, inter))
        {
            Console.Write(text);
            Debug.Write(text);
        }

        return new MetaInformation();
    }

    private void Initialize()
    {
        var appDataFolder = Path.Combine(_env.ContentRootPath, "App_Data");
        var modelPath = Directory.GetFiles(appDataFolder).FirstOrDefault(file => file.EndsWith(".gguf"));

        if (string.IsNullOrEmpty(modelPath))
        {
            return;
        }

        _parameters = new ModelParams(modelPath)
        {
            ContextSize = 1024,
            Seed = 1337,
            GpuLayerCount = 5,

        };

        _model = LLamaWeights.LoadFromFile(_parameters);
    }
}

We want the language model to generate meta information based on the text provided by the content editor. We already receive content in the GetMetaInformation method, although it’s not used at the moment. It should be part of the prompt variable later on. Let’s explore what happens when we ask the model to summarize the text and create a Meta Title, description, and keywords.

Note how we set-up the InferenceParams.Temperature to 0.1f. When using a higher temperature, the model will use more “creativity”, and the less temperature instruct model to be “less creative”. We do not want the model to make up things when summarizing the texts.

Instructing the Language Model:

Before we proceed, let’s discuss the line var ex = new InstructExecutor(context). The choice of executor can impact the result, and LLama comes with three default executors: InstructExecutor, InteractiveExecutor, and StatelessExecutor.

The InteractiveExecutor will perform well for chatting. The InstructExecutor for giving instructions. If your input is quite simple and a one-time job the stateless mode will be a good choice. We’ll use the InstructExecutor to provide specific instructions to the language model.

We’ll start by making sure the model understands its assignment. We’ll ask it if it’s clear on what needs to be done. The following prompt is used:

[INST]
<>
You are a Website SEO specialist and can make summaries of text I send to you
<>
What is your task? [/INST] 

And resulting in the output:

As a Website SEO specialist, my task is to analyze and summarize the text you provide me with. 
I can help you optimize your website's content for search engines by identifying the most important keywords and phrases, and creating concise and informative summaries of the text. Please provide me with the text you would like me to summarize, and I will get started!

The model responds with a clear understanding of the task.

Testing LLama’s Summarization and JSON Output:

We began by instructing LLama with a task to summarize the content, using the content of the page with the following prompt, where {content} is the actual content received in the method GetMetaInformation:

[INST]
<>
You are a Website SEO specialist and can make summaries of text I send to you.
<>
Summarize the following text and generate a Meta Title, Description, and Keywords:
Text: [{content}]
[/INST] 

The results were impressive:

Meta Title: "Domesticated Cats: A Comprehensive Guide"
Description: Learn everything you need to know about domesticated cats, including their history, behavior, and role as pets. Discover the unique characteristics of this species and how they differ from their wild counterparts.
Keywords: domesticated cats, Felis catus, house cats, pet cats, companionship, hunting ability.

Next, we tested LLama’s ability to output the summarized information in JSON format, which is important for integration into your code. We provided a sample JSON object and include it in the prompt to make the model use it’s format.

var metaDummyObject = new
{
   MetaTitle = "",
   MetaDescription = "",
   MetaKeywords = "",
};
var metaDummyObjectJson = JsonConvert.SerializeObject(metaDummyObject);

The serialized string ‘metaDummyObjectJson’ is inserted into the prompt text, outputting the desired JSON format.

[INST]
<>
You are a Website SEO specialist and can make summaries of text I send to you.
<>
Summarize the following text and generate a JSON object with Meta Title, Description, and Keywords:
Text: [{content}]
JSON format:
{
  "MetaTitle":"Value",
  "MetaKeywords":["Value"],
  "MetaDescription":"Value"
}
[/INST]

Resulting in the output:

Sure, I can help you with that! Here's a summary of the text you provided:
Meta Title: The Domestic Cat - A Small Carnivorous Mammal
Meta Keywords: domestic cat, house cat, Felidae, companionship, hunting ability
Meta Description: Learn about the domestic cat, a small carnivorous mammal commonly kept as a pet and valued for its companionship and hunting abilities.

Here's the JSON format you requested:
{
"MetaTitle": "The Domestic Cat - A Small Carnivorous Mammal",
"MetaKeywords": ["domestic cat", "house cat", "Felidae", "companionship", "hunting ability"],
"MetaDescription": "Learn about the domestic cat, a small carnivorous mammal commonly kept as a pet and valued for its companionship and hunting abilities."
}
I hope this helps! Let me know

Great! It’s clear that the model understands the concept of JSON and the structure of our object. However, we encountered a challenge: the model includes additional text in the response, making it difficult to directly deserialize the output into a JSON object.

We introduced the “JSON only: true” parameter in the prompt and included the instruction “without the introductory sentence.” This adjustment is aimed at explicitly asking the model that we require a response containing only the JSON object, without any introductory text.

We tried the following prompt:

INST]
<>
You are a Website SEO specialist and can make summaries of text I send to you.
<>
Generate a JSON object of the summarized text with Meta Title, Description, and Keywords without the introductory sentence:

Text: [{content}]
JSON only: true
JSON format:
{"MetaTitle":"Value","MetaKeywords":["Value"],"MetaDescription":"Value"}
[/INST]

Resulting in the output:

Sure, here is the summarized text in JSON format:
{
"MetaTitle": "Domesticated Cat",
"MetaKeywords": ["cat", "Felis catus", "domesticated species", "house cat", "pet", "farm cat", "feral cat"],
"MetaDescription": "The domesticated cat (Felis catus) is a small carnivorous mammal that is the only domesticated species in the family Felidae. Cats are commonly kept as pets, but can also be farm or feral cats, with the feral cat ranging freely and avoiding human contact."
}

The model’s difficulty in generating clean JSON output might be related to its relatively small size. We tried several approaches for extracting the JSON data.

You can choose one approach, which involves manually removing all text that appears before the opening curly brace ‘{’ character. This way, you can separate and extract the JSON data of the response.

Alternatively, we could implement a two-prompt strategy. In the first prompt, we instructed the model to create a JSON object containing the Meta Title, Description, and Keywords, explicitly specifying that no introductory text should be included. In the second prompt, we provided the actual text to be summarized. This method allowed us to guide the model more effectively, resulting in cleaner JSON output.

First prompt:

[INST]
<>You are a Website SEO specialist and can make summaries of text I send to you.
<>
Generate a JSON object of the summarized text with Meta Title, Description, and Keywords without the introductory sentence of the text I sent to you

JSON format:
{"MetaTitle":"Value","MetaKeywords":["Value"],"MetaDescription":"Value"} 
[/INST]

Output first prompt:

Of course! Please provide the text you'd like me to summarize, and I will create a JSON object with the relevant metadata.

Output second prompt:

{
"MetaTitle": "Domesticated Cat",
"MetaKeywords": ["Felis catus", "domestic cat", "house cat", "companionship", "killing"],
"MetaDescription": "The domesticated cat (Felis catus) is a small carnivorous mammal that is commonly kept as a pet and valued for its ability to kill."
}

The process went smoothly. However, we encountered another challenge: the language model uses the character ‘>’ to signal the completion of its response and indicate that it’s the user’s turn to continue. To work around this, we simply removed the trailing ‘>’ character from the response string before parsing it into a JSON object.

Please bear in mind that response times from the model can vary, so some patience may be required while waiting for the output to be generated.

That’s pretty much it. Happy coding and chatting!