I recall this being discussed somewhat back when openAI was considering using "synthetic data" to train chatGPT. at that point, I started to believe that the best LLMs would be tiny LLMs that are use-case specific in circumstances where accuracy is imperative – military, medical, law, etc. I think the problem was that nobody really wants to edit training data for accuracy at the web page level, so "training" LLMs has come down to general consensus as a form of "truthmaking".
it feels like a circleback moment to when everyone was debating the idea of disinformation on social media, and I wonder how AI is ever going to be fully reliable for fact-checking if humans can't even agree on the truth.
that said, the LLMs are pretty good so far, though I'd say that their real benefit is that we don't have to click on a hundred different web pages to get basic answers.
imo, any LLM trained on "synthetic data" is really just being trained to obfuscate. atm, most people are being sold on the notion of quick access to comprehensive internet search with a personality of sorts, though this in itself is incredibly useful.
I think the next wave might be LLMs with an endpoint "human in the loop" for things like healthcare, which would be cool since the AI agents are already in the works and they'll help bridge the gap between LLMs and protocols like Nostr.
there's a ton of space to fill there in terms of practical application to industries that require a high degree of niche and specialist level expertise.
nostr:nevent1qqsqqqqjqf6w78llh4mekga0fg46rlsewfexnxeuc0gzwqfpfh4dzwspremhxue69uhkummnw3ez6ur4vgh8wetvd3hhyer9wghxuet59upzqwlsccluhy6xxsr6l9a9uhhxf75g85g8a709tprjcn4e42h053vaqvzqqqqqqy6eczfd