Nostr Web Client

TKay 5mo ago

The Ollama app was just updated to allow users to run a local 20b LLM with search capabilities and some tooling.

The models are somewhat big, but still great to see.

nostr:note1r8qzvx68myc8zvx3dahe923kwlamh4py2d52e4grju7fcnaeferqxatedk

Reply to this note

Please Login to reply.

Discussion

أنس القطونجي 5mo ago

These models are going to melt my i5 1235u + iris xe laptop

TKay 5mo ago

It’s melting mine.

This is too much.

Rich Nost 5mo ago

I have no context as to what kind of hardware can run these open models. Do you need a GPU?

أنس القطونجي 5mo ago

Yes

The bigger the model, the stronger the gpu

Alex Gleason 5mo ago

Size doesn't matter. It's how you use it.

TKay 5mo ago

lol

أنس القطونجي 5mo ago

It matters

I tried a 1.5b llama model if my memory wasn't weak, and it gave me a made up answer for updating solus linux

A higher nB model (7 or 8) gave me the correct result

Things doesn't come without a cost

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 5mo ago

i'm already running codestral 22b using LM Studio

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 5mo ago

it's only 9gb. version 0.1. runs sweet on my 16gb Radeon RX 7800 XT (256bit memory, GDDR6 i think). here's brave Leo's summary of the device:

----

Radeon RX 7800 XT Specs

The AMD Radeon RX 7800 XT is a high-end graphics card built on the RDNA 3.0 architecture using a 5nm process technology (TSMC N5) and based on the Navi 32 XT GPU chip, which features 28.1 billion transistors and a die area of 346 mm².

It is connected to the system via a PCIe 4.0 x16 interface.

The card is equipped with 3840 stream processors (also referred to as shading units), 240 texture mapping units, and 96 render output units (ROPs).

It includes 60 second-generation ray tracing cores and 120 AI acceleration cores.

The GPU has a base clock of 1295 MHz and can be boosted up to 2430 MHz, with some factory-overclocked models reaching boost clocks of up to 2565 MHz.

The RX 7800 XT features 16 GB of GDDR6 memory connected via a 256-bit memory bus, resulting in a memory bandwidth of 624.1 GB/s and a memory data rate of 19.5 Gbps.

It also incorporates 64MB of AMD Infinity Cache technology, which helps improve memory efficiency.

The card supports DirectX 12 Ultimate, including hardware-accelerated ray tracing, variable-rate shading, and other modern graphics features.

It includes 3x DisplayPort 2.1 and 1x HDMI 2.1 output ports, enabling support for up to four displays simultaneously.

The card is a dual-slot design with dimensions of 267 mm x 111 mm x 50 mm and uses two 8-pin power connectors, with a maximum power draw of 263 W.

The card was officially launched on August 25, 2023, with a launch price of $499.

----

with an actual PCI-E 4 interface and the system memory running DDR5 it probably could do partial offload and handle even larger models, a bit slower. but 22b is really sufficient for the job. i'm just waiting for jetbrains to allow junie to use local models. there is an issue for it but not any strong statement from jetbrains about opening this up. cloud compute is no go for a lot of dev shops.