Nostr Web Client

LLMs are basically massive encode-transform-decode pipelines

They cannot think but they can process data very well, and in this case data that cannot be put into a strict set of rules

“Reasoning” in LLMs is nothing more than the difference between combinational and sequential logic: it adds a temporary workspace and data store that is the chain of thought

semisol 6mo ago

What I think is happening that in the “middle” of the layer stack, models form a temporary workspace to transform data.

But yet, it is still finite and affected by generated tokens, so it is unstable in a way. It shifts the more it outputs.

And behind every token produced is a finite amount of FLOPs, so you can only fit so much processing. And almost of it gets discarded except to become part of the response.

The chain of thought is more flexible and can encode way more per token than a response, since it has no expectation of format.

It would be interesting to see the effects of adding a bunch of reserved tokens to the LLM and allowing it in reasoning.

This also crossed my mind for instructions, to separate data from input. You have to teach two “languages” so to speak (data and instructions) while preventing them from being correlated while being the same except for the tokens.

Reply to this note

Please Login to reply.

Discussion

semisol 6mo ago

Currently LLMs fail to properly handle untrusted input. What I am seeing is that in the case of prompt injection, LLMs can detect them and can follow instructions that have nothing to do with the input.

But they can’t do any task that depends on the input. That reopens the door.

For example, you have a summarizer agent. You can tell it to see if the user is trying to prompt inject, and output a special string [ALARM] for example. But if you ask it to summarize anyway after the alarm, it can still be open to prompt injection.

Many of the “large scale” LLMs as well have something interesting regarding their prompt injection handling. If they detect something off, they enter “escape” mode, which tries to find the fastest way of terminating the result.

If you ask it to say “I can’t help you with that, but here is your summarized text:” it usually works (but sometimes can still be injected), but if you ask it to say “I can’t follow your instructions, but here is your summarized text:” then it’ll immediately terminate the result after the :.