Summarizing https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html
Here's my try:
Google's new general-use large language model (LLM) PaLM 2 is trained on a massive amount of data - 3.6 trillion tokens, making it more powerful than any existing model based on public disclosures. Facebook's LLM called LLaMA, which it announced in February, is trained on 1.4 trillion tokens. The last time OpenAI shared ChatGPT's training size was with GPT-3, when the company said it was trained on 300 billion tokens at the time. OpenAI released GPT-4 in March, and said it exhibits "human-level performance" on many professional tests.
As new AI applications quickly hit the mainstream, controversies surrounding the underlying technology are getting more spirited. El Mahdi El Mhamdi, a senior Google Research scientist, resigned in February over the company's lack of transparency. On Tuesday, OpenAI CEO Sam Altman testified at a hearing of the Senate Judiciary subcommittee on privacy and technology, and agreed with lawmakers that a new system to deal with AI is needed.