GPT's dataset is unlawful and unethical. One engineer called this out, and was 38'd (murdered) for it.
All it takes is asking permission to allow one to get the contents of something and train an AI model with it, no scraping required.
DeepSeek R1 is a Free Software model (under MIT). Sure, it may have its limiations, but you can make a dataset to cause chronic forgetting of its programming by DeepSeek.
Jesuit agent Mike Adams made a dataset that induces chronic forgetting for any LLM, including DeepSeek. Take the propaganda out of there, and the reasoning capabilities of this computer program (AI is literally a computer program that can be weaponized by stupid engineers) would make humans obsolete if they were NOT weaponized.
I don't think training on stuff is unlawful and unethical. It's not the same as copying.
And, from what I understand, the dataset to train an LLM properly is HUGE. Not easily accessible.
DeepSeek has a gnarly license agreement where you don't own anything you make wirh it.
As a Free Software enthusiast, I'd have to disagree with the third statement. MIT is a Free Software license, much like GPL and BSD are. Someone may do some things to make MIT-licensed software proprietary, but I think that's usually rare. Otherwise, DeepSeek is Free Software when downloaded locally.
Thread collapsed
Thread collapsed