Nostr Web Client

#LoRA makes me bullish on #ai hacking on consumer hardware while still leveraging large pretrained models as a base. It works by putting lower rank matrices at each layer of the transformer stack in a larger base model like llama and training those.

The base model’s weights are frozen, but you can train these low-rank “adapters” which are much smaller and require less memory/compute.

Nice thing about fine tuning is that you are basically teaching the ai new things that it won’t forget all the time. So we can give it lots of domain knowledge about nostr, nips, etc. hardest part is setting up a good training dataset.

Reply to this note

Please Login to reply.

Discussion

average_bitcoiner 1y ago

LoRa is better tech.

wolfpy 1y ago

All your notes are belong to us

Some Guy 1y ago

LoRA works well for diffusion image generation, but I haven't had much luck with LLMs. I also don't have the resources to train a larger LLM

jb55 1y ago

yeah building a good training data set seems to be the challenge

Some Guy 1y ago

Yes, in my experience, this is what takes the most time. There is a lot that you can automate, but there is still a lot that takes a human touch if you want good results.

brittenedborȝ 1y ago

Imparting "Knowledge" with LoRa / QLoRa has been challenging IME, unless you have *highly* structured data like Q&A with all of the right prompt template tokens for the given model (e.g. https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/)

The effect that LoRa training has is kind of noisy, which is why people say that they like them for image models where the results they're looking for are 'thematic' instead of structural or 'domain knowledge', but they are just as effective at imparting thematic/stylistic 'color' on LLM's in my (limited) experience.

brittenedborȝ 1y ago

My suggestion is to go for ~low epoch training and iterate your dataset / parameters frequently instead of wasting compute on big training sessions, you will learn so much more and get much better results doing 10x small training sessions than 3x medium for example

jb55 1y ago

Yeah heres the dataset im building:

[Q]show me the latest posts from thomas.[/Q][A]{"kinds":[1],"authors":["thomas"],"limit": 100}[/A]

[Q]show me the latest zaps from Vanessa, bob, and steve[/Q][A]{"kinds":[9735],"#P":["Vanessa","bob","steve"], "limit": 100}[/A]

[Q]top zapped profiles[/Q][A]{"nscript":"top-profile-zaps"}[/A]

[Q]latest posts from thomas.[/Q][A]{"kinds":[1],"authors":["thomas"]}[/A]

[Q]latest articles from alice[/Q][A]{"kinds":[30023],"authors":["alice"], "limit": 100}[/A]

[Q]top zapped articles from bob and alice[/Q][A]{"kinds":[30023],"authors":["bob", "alice"],"nscript":"top-zaps"}[/A]