Once the model has downloaded, you’re ready to start chatting away. Select the Chat tab in the top navigation bar.
I bought a new desktop PC this year with the goal of getting back into GPU related programming and learning about AI. I
wasn't comfortable with all the sign up requirements for products like ChatGPT and AI image generation sites, and thought running open source
equivalents locally would be better. Up until recently I'd been playing with stable diffusion via various UIs locally, but
still needed to try running a text generation LLM. I'd previously tried running Alpaca-electron, but experienced a lot of instability issues.
After a few minutes of searching, [This blog](https://sych.io/blog/how-to-run-llama-2-locally-a-guide-to-running-your-own-chatgpt-like-large-language-model/)
led me to find [text-generation-webui](https://github.com/oobabooga/text-generation-webui) which proclaims a goal of being the Automatic1111
for text generation. This sounded like just what I wanted, and was.
## Requirements / Environment
* I'm running Ubuntu, but the same process probably works on Windoze. To follow exactly you need a terminal and `git`.
* An Nvidia GPU. Although this apparently works with CPU-only too (but it's probably very slow/limited)
# Setting up `text-generation-webui`
[text-generation-webui](https://github.com/oobabooga/text-generation-webui) is a webapp you can run locally that supports a whole
bunch of LLM stuff that I don't understand. But seems to use a convention-over-configuration approach meaning that most of the defaults just work,
but you can configure things when you know what you're doing.
The steps to get up and running are simple:
1. `cd` to the location you want to install it
```
cd /path/to/install
```
2. Clone the Git repo and enter the created directory
```
git clone git@github.com:oobabooga/text-generation-webui.git && cd text-generation-webui
```
3. Run the script `./start_linux.sh` which automatically installs dependencies and then runs the app. \
When prompted, I chose `NVIDIA` for GPU, and `N` to go with the latest CUDA versions. Once everythign is downloaded, you wil be up and running.
## Installing a model
The app is capable of a lot, but it can't do anything without an LLM model to work with.
Once the app has started, navigate to http://127.0.0.1:7860/ (where the app is listening) and switch to the Model tab at the top.
You now need to pick a model. I really don't know much about these, but [HuggingFace](huggingface.co) is the "GitHub of models". You'll find models of all different types, differing by [quantization](https://maartengrootendorst.substack.com/p/which-quantization-method-is-right) method and other stuff I don't really understand (YET!).
I read about a recently released model that had been getting a lot of praise, OpenHermes2.5. So I looked on HuggingFace and found [TheBloke/OpenHermes-2.5-neural-chat-v3-3-Slerp-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2.5-neural-chat-v3-3-Slerp-GPTQ) which is a quantized version of [Yağız Çalık's merged model](https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp).
The notes explain that there are different quantisation parameters available on different branches. I chose the 4bit-32g version solely because it had "highest inference quality".
The model variant you choose will depend on the conditions of your computing environment (eg. VRAM of your graphics car).
Whatever model you choose first, enter the identifier in the Download text field in the format `username/model:branch`. So in our case it's
```
TheBloke/OpenHermes-2.5-neural-chat-v3-3-Slerp-GPTQ:gptq-4bit-32g-actorder_True
```

Clicking `Get file list` will confirm that it's working. Then click `Download` to start pulling it to your machine. You can
watch the progress in the terminal.
Once the model has downloaded, you're ready to start chatting away. Select the `Chat` tab in the top navigation bar.

Remember that each time you restart the app, you'll need to "Load" (not Download) the Model using the list at the top left under the Model tab.

I actually tried a Llama-2 model before OpenHermes2.5, but the difference in quality and speed when I switched to OpenHermes was so insane, that I skipped mentioning it.
# Launcher
Now that you're able to chat, you might find it convenient to create an Ubuntu launcher so that you don't have to
run the script from the terminal each time you want to start it up. \
See my blog post on how to do that: nostr:naddr1qqxnzdesxgurydp3xqursv3nqgsqfrktznwkq72z058z2rkresazp6etkheuhzueu0jd8wpy0s52c7qrqsqqqa28dlzvaf
Discussion
nostr:note1qslsr5m8ptk7p5t55yar6fg2c7pj5jewf0cfczt2kmy85arp6g3qa8dl4d
Don't you need to train the model? Or does it come pre-trained (with some large dataset)?