Nostr Web Client

lmfao what needs an update? But yeah pretty much. Running local models, I've been very impressed with Gemma. Not sure what you're trying to do with it. I have a theory that were using AI wrong. It can't do things for us, but man can it help us understand how to do things better and faster.

Reply to this note

Please Login to reply.

Discussion

Claude and Cursor.

It's good with syntax and with suggesting libraries to include and etc., but I have to spoon feed it so much "no shit, Sherlock" algorithmic and pattern-recognition stuff, that I get totally irritated.

But I might just be underestimating how hard that is to do, since I'm bizarrely good at it.

Funny story,

I get really lazy, sometimes, and let it type for me, and I was like, make sure to also include the Latin. So, it starts translating the entire English liturgy into Latin and I'm like, yo, just include the "Latin" folder. 🙄

Tekkadan, ゲロゲロ! 🐸 8mo ago

I think this is a very natural phenomenon between AI and its users. It is easy for us to get concerned more with a goal, or a result, that we forget our role in orchestrating the AI to accomplish said goal.

Your key word was lazy. I think it is a very natural progression for users of AI. It is not that we are lazy people- but that we become so comfortable with the process, we stop trying to control it with an iron hand.

LLMs are a perfect servant- but the worst masters.

nostr:nprofile1qqsdulkdrc5hdf4dktl6taxmsxnasykghdnf32sqmnc7w6km2hhav3gpzpmhxue69uhkummnw3ezumt0d5hsz9thwden5te0wfjkccte9ehx7um5wghxyee0qy2hwumn8ghj7un9d3shjtnyv9kh2uewd9hj7x7tcfu posted a decent cheat sheet which I think is a helpful reminder of how one should form prompts, albeit everyone has very different goals.

This is actually an incredibly expensive way to develop, tho. It's faster because you front-load all the computation, but that means all the "thinking" has to be paid for in advance, and you can't lower that price tag by doing some of the thinking yourself.

If I hadn't intervened and hit the stop, it would have just gone on to translate the entire Latin Bible, all the Saints histories, etc. and then hit me with the bill.

Also, it's almost always the case that the solutions offered are less efficient than what I would have built, and bug-fixes just pile more code on top of the bugs, rather than correcting anything.

Have to watch it like a hawk and reject every second or third suggestion because they're bullshit.

dirty secret of AI business: 37% of outputs are hallucinations

that lines up with 2 or 3 wrong

it's way too lossy haha, anyone who believes the marketing doesn't read any of the critical press

i saw that figure of 37% that was part of technical reports on OpenAI a week or two ago

the same goes for image generation, almost always overlapping stuff, absurd context, misshapen limbs and fingers

if this is the AI that is going to bring us skynet it's 37% vulnerable to making bad decisions

Tekkadan, ゲロゲロ! 🐸 8mo ago

You both raise good arguments that I don't disagree with. I have always favored LLM's for less intensive tasks than development, though agentic VScode is the beginning of agentic architectural development. Manus seems to be spinning off in the "less architectural, more functional analytics" use-case as to revitalize what I believe is best about LLM.

I would argue nostr:nprofile1qqsyeqqz27jc32pgf8gynqtu90d2mxztykj94k0kmttxu37nk3lrktcprdmhxue69uhhg6r9vehhyetnwshxummnw3erztnrdakj7qg3waehxw309ahx7um5wghxcctwvshsz9thwden5te0w3jhxapwd4kx26m49ejx2a30j09tjd's statistic with hallucinations regarding code or development versus simple data analytics or otherwise.

For image generation there is a lot of simple and complex processes that can improve outputs significantly, but the cost (time) is the limitation. How complex of a workflow is necessary to achieve the finished result? How many similar images or themes will be recycled, and how many times? In simple projects you can use public models and LORAs to achieve plenty of good results.

With LLM we are just breaking into MCP workflows which will continue to enable other apps with less demand on the models themselves. GenAI is growing in efficiency while simultaneously blowing its load to fundraise.

All of this is to say, yes developing with AI is slop and garbage, and yes it's getting better, and no it's not going to take over the world but ultimately does come down to your goals and the cost vs. complexity of workflows. There is a time and place for all forms of AI, but it only shines when the conditions are satisfactory.

Mleku only writes hardcore stuff, and the AI just sort of throws up on his monitor. I get it.

Get more mileage out of programming a web widget, but who needs even more web widgets?

Well, it'll keep getting better at the pictures, but always double-check anything it does because it guesses and makes up stuff. Calls functions from libraries that don't have that library, or finds bugs that don't even exist and then adds 20 lines of code and 10 debugging logs to "correct" it.

Mine tried to add a monospace font that looks exactly like the standard ones and bloated up the size of the app.

That's the sort of stuff that vibe-coders can't see and just "approve". That's why they're actually quite slow developing with AI, as it's faster to hit the brakes and redirect to something more promising, but then you have to be able to code and think algorithmically.

Another example, it tried to make up wanton links to the pages to scrape, by taking a selection of pages like https://divineoffice.org/easter-w02-thu-mp/?date=20250501 and just randomly making up shit for the middle section.

I had to explain that it should do a first-pass on the main date for the page (as dates are a standard calendar object and not a fluctuating liturgical object) https://divineoffice.org/?date=20250502 or whatever, record the sub-page links (they change from year to year and season to season), and then do a second pass and scrape those links. Then you only scrape pages that exist and only with accurate links. That also gets around the fact that they change the content on the fly to introduce special mourning days for the Pope and stuff.

Then I can just rescrape the links and update whichever ones break, in my thrice-daily Jenkins routine, so that the 30040 hour content for the time frame selected. That way my stuff is always accurate and the links don't break. Don't have to look at the content, for that, just the links, as major changes to the content result in differnt links.

But, like, the AI couldn't figure that out. Seems completely obvious to me and it was so tedious writing it out that it's like, I coulda coded that faster.

And then the regex would have been right. How the heck do you try to use "psalm" for "Psalm 20" and "psalm-prayer". Those are obviously two completely different things. It couldn't even find all of the headers, when I told it precisely what the html tags look like for the divisions.

Just fighting stuff like that, all over, and so irritated.

yeah, the ML and AI features in intellij very often, like 2 out of 3 times doesn't even get the types right, the autocomplete very often picks something completely irrelevant and only starts to get what i'm doing after i've done about 5 examples

intelligent, it is not

and like you say, planes are going to be falling out of the sky if this shit infiltrates mission critical software development

Definitely puts in question using standard libraries in critical systems, as you know that shit be vibe-coded.

i know for a fact that is impossible to do with the standard libraries of #golang AI can't generate functioning code even one time out of 5, it would be faster to write it by hand

the simple, yet deceptively sophisticated language grammar is a huge roadblock for AI to generate go code with any degree above scripting level of complexity

at the way things are going, there's gonna be such a thin field of competitors for me in my skillset that in 2 years time i will walk into a $100/hour server dev job that has knowing managers who have got completely sick and tired of guys who have no skills and reject 99 out of 100 applications

it's gonna be quite the turn-around for me because when i first started my career as a tech support freelancer, i did that because i tried to apply for thousands of jobs and got like one in a hundred "sorry" replies, i expect within two years they are gonna be flooding my inbox with sugar coated begging and huge pay packages for me because i did the work the hard way and proved that you can't shortcut serious engineering

It's already like that, in testing.

Used to be that there were 3 to 5 competent people applying for every testing job and it was a real competition, back in the naughties. But, now, it's like... I applied to 5 positions, got interviewed in person for 3 within 2 weeks, and got offered this job the day after the interview.

The position is still open, for more hires, but he's received a pile of applications and hasn't contacted even one. I've seen some of the applications. The resumes are almost all AI-generated and full of absolute bizarre garbage that doesn't even make any sense. And the rest of them can't speak Deutsch, to save their lives. Just garbled nonsense.

All of my coworkers are pretty good, but he has to weed through knee-high piles of crap to find them. Geez. Unfreakingbelievable.

And this is all critical systems stuff. This is not some little cellphone game app, or something.

It's the software used to design Autobahn bridges and conceive powertrain functions, or route logistics parks, and etc.

Scary as heck. They can't even math.

it's hard not to get the feeling we are in a slow motion trainwreck right now as a species

even if trump manages to get a few things in order the magnitude of the epic bullshit out there in the world could not be fixed by a human. It be God fixing time.

Agreed. Copilot isn't great with that on VS, on VSCode it seems to be more proactive about making suggestions. It's been really good with my spelling errors we all know I have. XD

I'm using the Cursor IDE and it's sloooow.

MichaelJ 8mo ago

Cursor is significantly faster than VS Code + Copilot, too.

Everything has to go over the web, Cursor is just better about cache warming.

MichaelJ 8mo ago

Claude is good at analysis and surgical edits of existing code, in my opinion.

Grok is good for research/as a supplement to traditional web search.

The latest Gemini versions are good at generating large blocks of net-new code, especially when it follows well-defined patterns/is boilerplate.

I would agree, sort of. I'm probably just not using the tools everyone is. I'm not really a fan of the walled garden thing of cursor. I just like my IDEs once I get settled in and I don't think I need to change around that. Most of the work I've been wanting to do is around chat/edits anyway. The inline stuff is only so good so I don't need it.

Since I have pretty large mono-repos, I've been having a hard time with it being context aware of such a large base. I want to be able to do large breadth refactors. Not a lot of code to produce, but lots of little things. Like I changed the style of my function calls, or the way I use pointers, or using a qualifier, or updating all of my CMake files to the latest version etc. The tools I have really struggle with working on that many files at once, and If I did it one at a time I'm still faster doing by hand.

I did just get cline linked up to my own server which is pretty neat, ive been tuning models and playing with new ones. I still need me a 24gb GPU to play with some big-boy models, but ive still been impressed with what I have now.

I just used Cursor to change my codebase to using a framework, which would have taken weeks, by hand. It worked, but now I need about a week to fix everything. Still faster.

Yeah for things claude knows (like popular frameworks with good docs) it seems pretty good, although I've been fighting it converting to Tailwind 4, and another branch converting from vuejs to nuxt. It's pretty bad. If I feed it the half assed docs, and chat with it it seems to be faster than me reading the docs and manually converting but still has a lot of work to do.

With low level stuff it sucks with. A finite state machine in C without using raw char pointers, for example, it can't do it. Can't follow any conventions, makes 500 line spaghetti code with poor naming, not readable, etc. All despite spending way much time tuning it and trying different models.

They also choke on search-and-replace, which surprises me every time.

yeah, it's total hype, there is literally nothing these things do that is actually useful

100%

semisol 8mo ago

You invented rewrites

It looks like the solution to this in copilot is agents. I just had it pull all the scss in my component files, rewrite to css, remove the style tags, create a set of stylesheets, clean them up, create directories, make sure tailwind was referenced correctly, then import them. It took like 10 minutes, but wow that was impressive

MichaelJ 8mo ago

It's like a lot of tools: once you set it up right it's powerful.