This may be a silly question, but can someone explain how a large language model "learns" from sets of data?

I'm imagining a scenario where, lets say scientists that study a rare animals like the Sumatran rhino, could use an AI model and feed it daily reports for years, then the AI could majorly help to summarize in various ways all the data together to distill insights.

Is this possible with a large language model?

Can large databases be searched and summarized in this example?

Could existing databases be fed to AI models?

How do I go about getting and testing my own AI model to use for specialized training?

#asknostr #nostr #nostriches #plebs #plebchain #grownostr #ai

Reply to this note

Please Login to reply.

Discussion

Could I use this to have my own trained AI?

https://bigscience.huggingface.co/blog/bloom

I'm not well invested in LLMs but hope it helps this, also I think for using BLOOM you gonna need to use Paperspace in case you don't have the PC to run that model locally.

https://www.cloudflare.com/learning/ai/what-is-large-language-model/

https://www.infoworld.com/article/3705035/5-easy-ways-to-run-an-llm-locally.html

https://learn.microsoft.com/en-us/semantic-kernel/prompt-engineering/llm-models

I heard bloom is absolutely terrible.

I see that nobody has given you a reply.

To be honest with you , I haven't a clue how it works , and to be honest with you , a lot of people that claim to know don't seem to know how it works either , but what do I know . I'll boost your question , maybe it will help to get a reply .

#LLM

#AI

#MachineLearning

#Rhino

#DataScience

I'm far from knowing how it works but my simple understanding is this:

The model is 'trained' by feeding it sentences and paragraphs. All it is doing when providing you an answer is determining the probability of the next word in the sentence.

So if the model has been fed, "what is a rhino? the rhino is green" 20 times, and "what is a rhino? the rhino is hungry" 10 times. Then you ask it "what is a rhino?" It'll say, the rhino is green.

It has been fed so many sentences that it almost seems like a human, but it is just guessing words.

One popular way to go about it is called RAG (retrieval augmented generation), basically maintain your knowledge base of any type of data and chat with LLM about, you can try nuclia.com