Nostr Web Client

this is super bullish for self hosted LLMs when you can now get a device that can run 70b models at nearly the full speed at a price of $3000

i'm definitely following up on this thread of research.

nostr:nevent1qvzqqqqqqypzqcm69uw6t5hur9mvznkqp782usq44l5ehvj4fuw4xlgqs6sknhv0qyfhwumn8ghj7am0wsh82arcduhx7mn99uqzqwn45t5uxetxw3cr6vskas026ehp3e5fhk30nxq2dtuhyhg7kll2cpyfrs

Jimmy 1mo ago

Unfortunately it is not that impressive as presented in the paper https://arcprize.org/blog/hrm-analysis

Reply to this note

Please Login to reply.

Discussion

Jimmy 1mo ago

One can recreate results with a simpler transformer architecture, without multiple levels. The trick is in training setup, and the iterative Q learning loss, not the hierarchy and the recursion via latent space.