fair critique!

i don't claim ground truths are perfect. but when they are combined it may work.

and i tried to simplify it, we are not machines :)

Reply to this note

Please Login to reply.

Discussion

Didn’t know you were the one who wrote this! The spirit of what you are doing is great. Given LLMs are token predictors that are configured with system prompts, and are designed with tradeoffs in mind (better at coding or writing), what do you think about considering system and user prompts when measuring alignment?

Alignment is becoming so overloaded, especially with doomer predictions like https://ai-2027.com/

Thank you for encouraging words!

I use a neutral system msg in the AHA leaderboard when measuring mainstream LLMs. All the system messages for the ground truth LLMs are also neutral except the faith domain for PickaBrain LLM: I tell it to be super faithful and then I record those as ground truth. Maybe most interesting words are I tell each LLM to be brave. This may cause them to output non-mainstream words!

The user prompts are just questions. Nothing else.

Temperature is 0. This gives me the "default opinions" in each LLM.

I am not telling the ground truth LLMs to have a bias like carnivore or pro bitcoin or pro anything or anti anything.

I see AI as a technology which can be either good or bad. I am seeing the good people are afraid of it and they should not be. They should play more with it and use it for the betterment of the world. There are many luddites on nostr that are afraid of AI and probably dislike my work. I think movies are programming people to stay away and leave the "ministry of truth" to the big AI. Hence AI may be yet another way to control people..

I always claimed in my past Nostr long form articles that AI is a truth finder and it is easier to install common values, human alignment, shared truth in it than lies. A properly curated AI will end disinformation on earth. And I am doing it in some capacity. You can talk to my thing nostr:nprofile1qy2hwumn8ghj7un9d3shjtnyv9kh2uewd9hj7qgswaehxw309ajjumn0wvhxcmmv9uq3xamnwvaz7tmsw4e8qmr9wpskwtn9wvhszrnhwden5te0dehhxtnvdakz7qg3waehxw309ahx7um5wghxcctwvshsz8rhwden5te0dehhxarj9eek2cnpwd6xj7pwwdhkx6tpdshszgrhwden5te0dehhxarj9ejkjmn4dej85ampdeaxjeewwdcxzcm99uqjzamnwvaz7tmxv4jkguewdehhxarj9e3xzmny9a68jur9wd3hy6tswsq32amnwvaz7tmwdaehgu3wdau8gu3wv3jhvtcprfmhxue69uhkummnw3ezuargv4ekzmt9vdshgtnfduhszxthwden5te0dehhxarj9ejx7mnt0yh8xmmrd9skctcpremhxue69uhkummnw3ezumtp0p5k6ctrd96xzer9dshx7un89uq37amnwvaz7tmzdaehgu3wd35kw6r5de5kuemnwphhyefwvdhk6tcpz9mhxue69uhkummnw3ezumrpdejz7qgmwaehxw309uhkummnw3ezuumpw35x7ctjv3jhytnrdaksz8thwden5te0dehhxarj9e3xjarrda5kuetj9eek7cmfv9kz7qg4waehxw309ahx7um5wfekzarkvyhxuet59uq3zamnwvaz7tes0p3ksct59e3k7mf0qythwumn8ghj7mn0wd68ytnp0faxzmt09ehx2ap0qy28wumn8ghj7mn0wd68ytn594exwtnhwvhsqgx9lt0ttkgddza0l333g4dq0j35pn83uvg3p927zm29ad0cw9rumyj2rpju It is the same LLM as in pickabrain.ai

Have you read anything by Subbarao Kambhampati? He has a lot of great posts that cut away at AI hype language, here’s a recent snippet about alignment:

“IMHO, the problem with that recent Anthropic study about "unfaithful chains of thought" is that they, like a big part of the alignment contingent, ascribe some sort of intentionality to the model. As I wrote two months back, there is actually no reason to expect semantics for the intermediate tokens, or any causal connection between the intermediate tokens and the result.

While conjuring up "deception" or "malice" is certainly good for AI Alignment business, it may not actually be needed when intermediate tokens are just intermediate mumbles that the model is trained to produce just to increase its chance of stumbling on more correct solution tokens.”

https://www.linkedin.com/posts/subbarao-kambhampati-3260708_sundayharangue-activity-7314399765448871937-Cdxt?utm_source=share&utm_medium=member_ios&rcm=ACoAAALDc-YBehJjDzNPgRqoDj_Pmj5ONH5lCXQ

Yeah he may be right.

I think some humans lie more than LLMs and LLMs that are trained on those lies is a more interesting research compared to blaming LLMs for intentionally generating lies. It makes a fun blog post but is futile when the underlying datasets are not well curated. Like a library full of bad books but you blame the librarian.

If intentionality was there yes LLMs could manipulate more and maybe anthropic is doing other kind of research too, more than AI which is simply instruction following for humans. If they are doing an autonomous AI which dreams and acts, I would be worried about the "solutions" of those if they are lower in the AHA leaderboard. A low scoring AI may definitely lie to reach its "dreams". We need to check whether the dreams are right or wrong.

There are already autonomous AI in the wild and we need to check their opinions before giving them too much power and operation freedom.