I don't think training on stuff is unlawful and unethical. It's not the same as copying.
And, from what I understand, the dataset to train an LLM properly is HUGE. Not easily accessible.
DeepSeek has a gnarly license agreement where you don't own anything you make wirh it.