This is a start of a BIG trend. As more and more people get their information from LLMs and "summaries" at the top of search pages instead of articles written by people (no profit or point in making informational sites if no one clicks on the links to go there anymore), the information becomes more and more controlled by the few corporations who own the LLMs. Once it is completely in their hands, they now have the power to skew the information to push whatever agenda they want.

Also, no one is talking about how internet content providers are going to intentionally switch from writing for human audience to writing for LLMs. The articles on the internet will still be their, bur their main purpose will be to provide training data for LLMs in order to get the LLMs to push their message. Want an LLM to mention or suggest your brand when a user asks about a specific product or problem? Mass-create large quantities of LLM readable website text so that when the LLMs scrape the web for training data, they get trained to promote your product.

Meanwhile, the paranoid creators of human-centered content that may have more impartial and truthful data get paranoid and do their best to HIDE their content from LLMs behind "prove you are not a bot" cloudflare walls, because, "Oez noez!!! LLM might get trained on my article amd that'd be terrible, because how dare it!! Stop it immediately!!!"

Yeah... The internet is about to go through a phase.... ๐Ÿ˜’

nostr:nprofile1qqsvn6daczcrcgdaxdap9h84k33af876l6yy4gfth9gvrqhfund7nwqprfmhxue69uhkummnw3ezuum4v3hkxctjd3hhxtnrdaksz9nhwden5te0wfjkccte9ec8y6tdv9kzumn9wsqj2amnwvaz7tmzw4a85cn0wskhyetvv9ujucn4d3kxjumgvfhh2mn50yhxxmmdfgk5h0 10000

nostr:nevent1qqs8v2resmhhgfkcfanuqfe07vajrkn3df2lyjd9vgteh02kl24j7xcpzemhxue69uhhyetvv9ujumt0wd68ytnsw43z7q3q76rs4lx7gjqwepgg75psfpv7zjj3xz0lyj4n7rux93ftm390sarsxpqqqqqqzkxskux

It will be sadly like that! We have to find an antidote to all this. Nice to meet you friend ๐Ÿค๐Ÿ˜‰

Reply to this note

Please Login to reply.

Discussion

Nice to meet you too! What do you think the antidote could be? I feel like we are moving away from having free information, the thing we've been lucky enough to enjoy these last few decades on the internet.

When LLMs kill free info on the internet and become monopolies on information, you KNOW they are going to jack up prices for access. They will regardless, because right now they are operating at a big loss to get people used to them.

If LLMs like DeepSeek R1 are anything to consider... that might be a start.

Consider what?

Locally downloaded LLM's, and potentially with your own dataset written by humans.

That seems to be the path forward.

DeepSeek is a terrible candidate for this. It's already programmed with chinese govt propaganda. Ask it about the Tyanamen Square. Usually doesn't work so well.

And where are you gonna get the dataset to train it? The window for scraping the internet is closing fast as more and more people put up anti-bot/anti-LLM walls in front of their content.

I suppose there's going to be a market for training datasets, but it won't be nearly as large as the ones GPT amd the other leading models trained on (the entire internet)

It's still not gonna solve the main problem of free information dosappearing from the internet.

GPT's dataset is unlawful and unethical. One engineer called this out, and was 38'd (murdered) for it.

All it takes is asking permission to allow one to get the contents of something and train an AI model with it, no scraping required.

DeepSeek R1 is a Free Software model (under MIT). Sure, it may have its limiations, but you can make a dataset to cause chronic forgetting of its programming by DeepSeek.

Jesuit agent Mike Adams made a dataset that induces chronic forgetting for any LLM, including DeepSeek. Take the propaganda out of there, and the reasoning capabilities of this computer program (AI is literally a computer program that can be weaponized by stupid engineers) would make humans obsolete if they were NOT weaponized.

I don't think training on stuff is unlawful and unethical. It's not the same as copying.

And, from what I understand, the dataset to train an LLM properly is HUGE. Not easily accessible.

DeepSeek has a gnarly license agreement where you don't own anything you make wirh it.

As a Free Software enthusiast, I'd have to disagree with the third statement. MIT is a Free Software license, much like GPL and BSD are. Someone may do some things to make MIT-licensed software proprietary, but I think that's usually rare. Otherwise, DeepSeek is Free Software when downloaded locally.