The same limitations apply… it’s still an LLM limited to a knowledge domain and is acting as an approximation machine. We still need some pathways for generating high quality data for it to be trained off of… if we use it’s “intelligence” as an excuse to stop producing that data, it will stop learning.
Yeah, I agree it should help keep duplicate questions off the platform. And it’s for sure in chaptGPTs best interest that humans continue to produce high quality data, ranked by quality even, for more effective training. I think the ultimate and best solution is one where both systems work together. A massive part of why chatGPT is so good is because the curation of data on stackoverflow is so good. Garbage in, garbage out, as the old adage goes.
Yeah, the more it has access to for sure increases it’s capabilities. Like giving it access to modify your terraform to deploy infrastructure. I think it’s really compelling, but I also think there is still a long ways to go. Another ML engineer in a community that I’m in put together a great piece on the current state of AI:
https://medium.com/@khalil.alami/the-boring-state-of-artificial-intelligence-152244075d7f
Data sharing negotiations for and by AI systems 🤣
For sure!! I’m not saying these things aren’t possible. But I don’t think they’re solved by an LLM alone.
Right! But it’s still relying on a human in this context. Maybe it breaks our current ecosystem by not then storing that information into a publicly accessible place like stack overflow and makes us dependent on the knowledge or hordes. But what you’re describing still required missing data to be provided to it by a human.
The other thing to note… models that generate data and then train themselves on that data that they generated, tend to get further and further away from meaningful content. If you had a model that wrote novels and eventually it’s training only on the models it wrote. They would start to all sound the same and converge. Part of why it is so compelling is that the data it is trained off of is created by a large number of humans with different perspectives and goals. You’d have to create a lot of individual models with niche goals and perspectives to sustainably let it just create indefinitely without new human content.
Depends on the language and/or support. Could be that someone figure it out through a conversation or a guess or checking site package source code or decompiling code. Or if it’s truly a closed source project, via a client support rep. Ultimately someone posts about it somewhere and then chaptgpt knows the new argument.
Also - think about security bugs that are reported. It knows about them because they get reported on centralized platforms. Security pen testers find them. Right now, it’s simply a language model… it has no ability to execute pen testing. It relies on humans to produce content (release notes, open source code, quality reports or GitHub threads) to learn. It’s still very much piggy backing off of human generated data. And not to say models won’t generate that data in the future… but it will be a series of models with different tasks that communicate with each other.
Ok, so let’s say there is an update to a package that isn’t open source. I’m using the new version and can’t figure out why it’s breaking. It has no idea that the new version changed an argument name and the one I’m using is deprecated. Maybe if the release notes are in a place it has access to and can probabilistically relate them together. But if the information isn’t online (not conclusions from data points, but actually missing data points), it can’t solve that. That’s what a lot of stack overflow questions are. That’s way so many are so old… they are asked when the data points don’t exist in an accessible place. If you’re asking a question on stack overflow for data that already exists you’re just going to be pointed to an older ticket.
I don’t think it’s fair to say no one understands how they work. Not being able to interpret weights is not the same thing.
To give it access to do this it would need to 1. Read all code for all projects and understand every implementation and version and/or 2. Have access to execute code to observe and trouble shoot. It’s an LLM… it’s just synthesizing data that exists, and drawing conclusions and derivations from that. Impressive? Yes. Better than search? Yes. Has access to derived thought based on things that have happens that haven’t yet been put on the internet… no, not yet.
It can take two concepts and combine them… which is what I do when I’m searching stackoverflow. Rarely does one post solve my problem. I read multiple and I synthesize that data into a solution. To me, that’s not the same as answering net new questions for new problems. And it is for sure a “not yet” problem, but I don’t think we’re there yet.
We’ve had a lot of discussions about this. Chaptgpt can’t really answer any engineering questions that haven’t previously been answered on stackoverflow (or other sites, but I mean, it’s pretty much it). So it may pull some traffic from the lurkers who are just trying to get their questions answered. But if a question doesn’t already have an answer, someone will have to ask it to a human in stack overflow. Essentially, chaptgpt just offers better search for it, but doesn’t actually replace the recommendations for net new problems.
Whoa, that sounds nifty 🐈 I’ll have to look into that
I’m sorry 😢 I hope it goes smoothly and recovery is swift!
Not exactly swing, but I used to do shag dancing in college. Purely for fun, socially, and I’m not very good 🥴 but it was always so much fun!
I’m super excited to see what comes out of the work you’re doing with health. Nostr as a solution for specific industries really has a lot of value. Thanks for all that you do for the community!