Our first steps into local AI

It’s buzzing around. No it’s shouting a round! AI. On every street corner there is someone who can guide you into this future which is already reality. We use it ourself all day; ChatGPT, CoPilot, DALL-E and the likes.

After a consumer of the tools, mentioned above, for some time now, we as developers like to dive deeper into it. How does it work? Where to start? Can I create my own? etc. In this blog we dive deeper into our first steps to run LLM’s on our local machine. And surprisingly this was far more easy than we expected before.

While we have some use-cases in our mind for our own Chaplin stack we also see nice use-cases for the CMS systems we use. Therefor, in our next post we will put somethings together in an Optimizely 12 installation.

Where to start?

Within a couple of steps you can run Language Models on your local machine with C#.

Step 1: Get a license

Licenses differ per model. But one of the nicest models available to start with is Meta’s Llama 2. It’s open, free to use for research and commercial use and available in all kind of formats. You only need a license which you can get from their website. After you received the license you can skip the rest. Make sure to subscribe yourself at Hugging Face, where you can download tons of (other) models and datasets.

Step 2: Download a model

Models are in different sizes and different formats. Formats are described as f.e. 7B. Where 7 is the size and B means Billion (parameters). The bigger the number the more GB the models is the more memory it eats when running on your computer. If you have an average developers laptop go for models of 7B or smaller. Depending on which program or library you use you need a different format (see step 3). We use for example the Llama 2 7B Chat. Download one of the recommended files.

Step 3: Running a model

There are more options to run a model. But a couple worth noticing are:

LM Studio – A nice and easy UI.
LLama.ccp – a C++ implementation running Llama models.
LLamaSharp – A C# wrapper based on LLama.ccp

Be aware that different versions of these programs and libraries use different formats of the model. For example LLamaSharp v0.5.1 does use the GGUF format. (see step 2).

We use LLamaSharp. To do so; Create a console app in your preferred IDE. And put the following code in it (this example is based on the example in the LLamaSharp repo).

using LLama;
using LLama.Common;

string modelPath = "c:\\\*.gguf"; 

// Load a model
var parameters = new ModelParams(modelPath)
{
    ContextSize = 1024,
    Seed = 1337,
    GpuLayerCount = 5
};

using var model = LLamaWeights.LoadFromFile(parameters);

// Initialize a chat session
using var context = model.CreateContext(parameters);
var ex = new InteractiveExecutor(context);
var session = new ChatSession(ex);

// show the prompt
Console.WriteLine();
Console.Write("What's your question? ");
var prompt = Console.ReadLine();

// chat with llm until 'stop'
while (prompt != "stop")
{
    foreach (var text in session.Chat(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List { "User:" } }))
    {
        Console.Write(text);
    }
    prompt = Console.ReadLine();
}

// save the session
session.SaveSession("SavedSessionPath");

That’s pretty much it. Happy coding and chatting!

Blogs and Urls

If you like to dive deeper into it hereby some nice urls to start with: