nostr:npub1r9jrhg4d249sn7z7pyunpsa9zcdr7rjtg90x3w3e78a7jrp7wadsvw4h54 You have a good point there.
I'm a big believer in the semantic web, but I think that its initial vision failed because of two reasons:
1. It's an amazing technological proposal, but an awful economic idea. It creates a world where content can be easily exchanged, parsed by machines, reproduced anywhere else etc.. But a world where it's cheap to exchange is also a world where content is cheap, period. It basically doesn't answer the question "how do content creators get rewarded for their work?"
2. As a corollary of the point above (i.e. it's hard to make and distribute money out of content that is free for anyone to grab and scrape), nobody actually had any incentives in adding extra work in annotating their content, defining ontologies etc., so their content would be easier for somebody else to scrape, link, reproduce etc.
Problem 2 is now largely mitigated (ML models have become so powerful that they can infer content even without manual annotations), but problem 1 is still there.
My proposal (make it easier for everyone to scrape, rather than allowing only big businesses to scrape in order to train their big models and entrench their monopolies) is meant to democratize data access and keep a level playing field among content *consumers*, but it doesn't tackle the problem of rewarding content *production*.
The thing is that the whole web, by design, is a platform that makes content distribution easier - and something that is easily accessible is also cheap.
The more we push for technological innovations that go in the direction of open protocols, free access to content etc., the more we are making the web more powerful and useful, but hurting content producers in the process.
The more we try to constraint content availability to a specific platform/website, the fairer we are towards content producers (we can easily control access to one single spot, put ads, paywalls, subscriptions etc. to reward content producers). But we also make the web overall less "useful" in the process and limit its potential - it becomes more like an ocean of islands of exclusive content, and the only reason to keep it like that is just because it makes it easier to see who comes in and out of the islands and put a price tag on those visits.
This is the big unsolved dilemma of the web - push for wider content availability and distribution, and you hurt the producers; push for more fairness towards the producers, and you will hurt the consumers with a less fungible network of information.
I don't have a silver bullet for this problem, but I have a vague idea of the direction to go.
I feel like rigorous data lineages should be implemented by all the consumers. It should always be possible to answer questions like "which data was your model trained on in order to produce this answer?", or "where did the information contained in this news summary come from?", or "who are the authors of the Wikipedia page that you used to provide an answer to a natural language question?"
Once we have rigorous, open data lineages in place, we can easily navigate the information ownership network, and figure out who is a producer and who is a simple intermediary.
Once we have this network of information in place, we can figure out how to reward content producers - micropayments? a small web tax shared between distribution platforms and downstream consumers? how much of the financial flow should come from the platforms, and how much should content consumers pay for? I'm open to all the ideas there. But if we don't know who the producers are in the first place, we can't even talk of how to reward them.