Replying to Avatar rewolf

I bought a new desktop PC this year with the goal of getting back into GPU related programming and learning about AI. I

wasn't comfortable with all the sign up requirements for products like ChatGPT and AI image generation sites, and thought running open source

equivalents locally would be better. Up until recently I'd been playing with stable diffusion via various UIs locally, but

still needed to try running a text generation LLM. I'd previously tried running Alpaca-electron, but experienced a lot of instability issues.

After a few minutes of searching, [This blog](https://sych.io/blog/how-to-run-llama-2-locally-a-guide-to-running-your-own-chatgpt-like-large-language-model/)

led me to find [text-generation-webui](https://github.com/oobabooga/text-generation-webui) which proclaims a goal of being the Automatic1111

for text generation. This sounded like just what I wanted, and was.

## Requirements / Environment

* I'm running Ubuntu, but the same process probably works on Windoze. To follow exactly you need a terminal and `git`.

* An Nvidia GPU. Although this apparently works with CPU-only too (but it's probably very slow/limited)

# Setting up `text-generation-webui`

[text-generation-webui](https://github.com/oobabooga/text-generation-webui) is a webapp you can run locally that supports a whole

bunch of LLM stuff that I don't understand. But seems to use a convention-over-configuration approach meaning that most of the defaults just work,

but you can configure things when you know what you're doing.

The steps to get up and running are simple:

1. `cd` to the location you want to install it

```

cd /path/to/install

```

2. Clone the Git repo and enter the created directory

```

git clone git@github.com:oobabooga/text-generation-webui.git && cd text-generation-webui

```

3. Run the script `./start_linux.sh` which automatically installs dependencies and then runs the app. \

When prompted, I chose `NVIDIA` for GPU, and `N` to go with the latest CUDA versions. Once everythign is downloaded, you wil be up and running.

## Installing a model

The app is capable of a lot, but it can't do anything without an LLM model to work with.

Once the app has started, navigate to http://127.0.0.1:7860/ (where the app is listening) and switch to the Model tab at the top.

You now need to pick a model. I really don't know much about these, but [HuggingFace](huggingface.co) is the "GitHub of models". You'll find models of all different types, differing by [quantization](https://maartengrootendorst.substack.com/p/which-quantization-method-is-right) method and other stuff I don't really understand (YET!).

I read about a recently released model that had been getting a lot of praise, OpenHermes2.5. So I looked on HuggingFace and found [TheBloke/OpenHermes-2.5-neural-chat-v3-3-Slerp-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2.5-neural-chat-v3-3-Slerp-GPTQ) which is a quantized version of [Yağız Çalık's merged model](https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp).

The notes explain that there are different quantisation parameters available on different branches. I chose the 4bit-32g version solely because it had "highest inference quality".

The model variant you choose will depend on the conditions of your computing environment (eg. VRAM of your graphics car).

Whatever model you choose first, enter the identifier in the Download text field in the format `username/model:branch`. So in our case it's

```

TheBloke/OpenHermes-2.5-neural-chat-v3-3-Slerp-GPTQ:gptq-4bit-32g-actorder_True

```

![Download example, with hugging face name/model in text field]()

Clicking `Get file list` will confirm that it's working. Then click `Download` to start pulling it to your machine. You can

watch the progress in the terminal.

Once the model has downloaded, you're ready to start chatting away. Select the `Chat` tab in the top navigation bar.

![Chat example]()

Remember that each time you restart the app, you'll need to "Load" (not Download) the Model using the list at the top left under the Model tab.

![Model selector, for loading already installed models]()

I actually tried a Llama-2 model before OpenHermes2.5, but the difference in quality and speed when I switched to OpenHermes was so insane, that I skipped mentioning it.

# Launcher

Now that you're able to chat, you might find it convenient to create an Ubuntu launcher so that you don't have to

run the script from the terminal each time you want to start it up. \

See my blog post on how to do that: nostr:naddr1qqxnzdesxgurydp3xqursv3nqgsqfrktznwkq72z058z2rkresazp6etkheuhzueu0jd8wpy0s52c7qrqsqqqa28dlzvaf

Once the model has downloaded, you’re ready to start chatting away. Select the Chat tab in the top navigation bar.

Reply to this note

Please Login to reply.

Discussion

nostr:note1qslsr5m8ptk7p5t55yar6fg2c7pj5jewf0cfczt2kmy85arp6g3qa8dl4d

Don't you need to train the model? Or does it come pre-trained (with some large dataset)?