But how could they find the specific weights leading to the censorship?
That’s like laser brain surgery!
I love this stuff.
But how could they find the specific weights leading to the censorship?
That’s like laser brain surgery!
I love this stuff.
For example the Vicuna uncensored model was de-censored by removing all questions that had refusals to answer from the fine-tune data. So the LLM just basically didn't have any precendent to refuse to answer anything.