Unleash the Power of Local Open Source LLM Hosting

9 min readOct 20, 2023

Recently, I ventured out and experimented with hosting an open-source language model on my Windows laptop. Why would I do this, you might ask, if there already are options like GPT4, ChatGPT, PaLM, and others?

Well, there are a few reasons why I feel it is a good idea to host a small LLM locally.

Firstly, as a hard-core techie, it's a small project that is fun to explore, play around with, and get to understand how an LLM works. With hosting an open source LLM locally, you have the option to “look under the hood” at how a large language model works, and you can tinker with the settings, source code, and the different infrastructure options that you have available.

If you wanted to, you could also spin up a cloud server to host your language model in the cloud. But, if you’re only starting out and experimenting, you can get away with running a smaller model on your Windows / Mac.

Secondly, with a locally running language model, you have the option of fine-tuning the model on your own data.

If you’re part of a small enterprise, the costs of fine-tuning a proprietary model can become a costly affair.

If you have sensitive data that you do not want to be exposed to third parties, running an open-source, privately hosted language model is a significant benefit as your private data remains contained, and there are no risks of data leaks.

Furthermore, fluctuating vendor pricing makes it difficult to know for certain what your usage costs will be at the end of each billing cycle. However, you could set up hard and soft usage limits to avoid running into excessive costs on a proprietary model.

With proprietary language models, you are locked into a specific vendor, and there are limited options for flexibility. Thankfully, there are open source libraries available to enable a smooth transition from a proprietary model to an open source one.

Open source language models are getting better at providing better responses, as the open source community is making huge strides in improving the quality of outputs. However, in this aspect, proprietary models still have the upper hand, are far more accurate in their responses, and are improving daily. But there will come a time when open source models will catch up (and maybe even surpass) proprietary models.

With that being said, let's dive into the nitty-gritty of setting up and running your very own language model hosted on your personal laptop/PC. The steps for setting up an LLM locally are simple and straightforward, but it does require technical expertise and know-how to set up and configure.

Setting up WebUI for Text Generation

Currently, there are several open source language models that you could experiment with. A few open source models that are worth mentioning include Orca, Mistrel, and Falcon.

You can look up the technical specs for each of these open source models on Huggingface to learn more. Here are the links for some of these Huggingface models:

pankajmathur/orca_mini_13b · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

TheBloke (Tom Jobbins)

LLM: quantisation, fine tuning

huggingface.co

TheBloke/MistralLite-7B-GPTQ · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

The first step is to set up the WebUI interface. This interface allows you to interact with, configure, and fine-tune the language model according to your needs. You can access the Github repo for WebUI at this link;

GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports…

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, llama.cpp (GGUF), Llama models. - GitHub …

github.com

The Github repository offers comprehensive installation instructions to get you up and running with the WebUI interface in no time. There are ‘one-click’ options available to download and extract ZIP files to your local drive.

There is also an option to run the Docker version of the WebUI interface if you prefer installing it this way. I personally haven’t experimented with using Docker for this experiment but maybe I might give this a try at a later stage.

I opted to clone the Gihub repo instead and set up the UI from there. If you’re going to clone the Github repo, I recommend that you do it in a Conda environment.

I am doing the setup from my Windows laptop and have Anaconda installed, so the Conda terminal comes pre-installed. I would not recommend using a regular Windows terminal. Open up a Conda terminal, as shown in the image below.

From your Conda terminal, enter the following command:

conda create -n tui python=3.10.9

Here, we are specifying the name of the Python environment, “tui,” and the Python version. In this case, we are using Python version 3.10.9.

When you are promoted to Proceed, type in “y” or hit Enter since “y” is the default option.

When the setup completes successfully, you will see the following screen:

Type in ‘conda activate tui’ to activate the Conda environment. Your command prompt will now change to “tui” to show that you are in your new Python environment. The next step is to install the Python packages that WebUI requires. At your terminal, enter the following line;

pip3 install torch torchvision torchaudio — index-url https://download.pytorch.org/whl/cu117

The next step is to clone the Github repository. The link for the WebUI Github repo is:

GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports…

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, llama.cpp (GGUF), Llama models. - GitHub …

github.com

At your Conda terminal, type in:

“git clone https://github.com/oobabooga/text-generation-webui.git”

After cloning the Git repo, you’ll need to install all the Python modules for the repository. At your terminal, enter in:

pip install -r requirements.txt

Since you are using a Conda Python environment, you won’t run into any dependency issues while doing a `pip install.`

If you are using an NVIDIA GPU, there's an additional step that is required. For Linux, run the following command:

For Windows, run the following command:

If you are just working on a CPU and you don’t want to run these models on your GPU, you can skip the steps above.

My laptop does not have a GPU, so I skipped the steps above. Now that you’ve installed all the dependencies, the only thing that is left is to start up your server, which you can do by typing in the following command at your terminal:

python server.py

Copy the URL provided, and paste it into your Chrome browser, to start up the WebUI interface, which will look like the image below:

Once you have the WebUI interface running, you will need to hook it up to an open source language model. There are three different ways in which you can do this.

The first method is by downloading the Huggingface model directly from the WebUI interface.

As I mentioned above, there are several open source models for you to choose from. For the purposes of my experiment, I initially chose to use the Mistral-7b model, which wasn't the best option since I later discovered that WebUI does not support the Mistral models yet.

You are free to experiment with any open source model. With so many available, you are not limited to using any specific model. However, as a first start, I recommend that you start with using the orca_mini_13b.

Copy the link to the Huggingface model, and paste it into your WebUI interface. Go to the Models tab in WebUI, and paste in the link to the Huggingface model. Click the Download button when you are ready to begin downloading the model.

The second method for downloading the Huggingface model is to download it from the Conda terminal using the following command:

python download-model.py pankajmathur/orca_mini_13b

In the command above, I have pasted the link to the Huggingface model to download the Orca mini 13b model.

The final method of downloading a Huggingface model is doing it all manually, which is not the most ideal option, as it can lead to unnecessary issues and frustrations. If you want to follow this method, open the link to the Huggingface model in your web browser, and navigate to the “Files and Folders” tab

Then, download each of the files individually, and copy them into the models folder in your text-generation-ui project.

Once you’ve successfully installed the model, go back into the WebUI interface and apply the model that you have downloaded in the Model tab.

And that is all that is needed for setting up the WebUI interface and configuring an open source model!

There are additional settings that you could configure to optimize your environment. For example, if you are using a CPU, you can toggle the “cpu” option.

What's nice about using WebUI is that you can fine-tune the model on your own data. I have not yet explored this option as yet, as I am still familiarising myself with the interface and exploring different models etc. You can access the fine-tuning option by navigating to the “Training” tab of the WebUI interface.

A benefit of hosting an open source language model on your local PC is that you have access to an LLM even when there is no internet access available. So if you are traveling by train, bus, or airplane without internet access, you still have the ability to perform text generation tasks from your local LLM.

I hope this post has helped you with experimenting with open source models and got you interested in further developing applications with large language models!

Note: All the open source language models that I have mentioned in this article are only meant for text generation.

Call to Action

Know someone on the hunt for a top-notch development partner to bring their MVP dream to life? Connect them with me on LinkedIn — your referrals are the fuel that drives innovation! 🚀

You can also find me here on Github!

Unleash the Power of Local Open Source LLM Hosting

Setting up WebUI for Text Generation

pankajmathur/orca_mini_13b · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

TheBloke (Tom Jobbins)

LLM: quantisation, fine tuning

TheBloke/MistralLite-7B-GPTQ · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports…

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, llama.cpp (GGUF), Llama models. - GitHub …

GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports…

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, llama.cpp (GGUF), Llama models. - GitHub …

Call to Action

Written by Yattish Ramhorry