2c
maxp
2c364b35938eea6f9a3099493327a6fb68c1901a973fe7a9fc9586a306580998
Replying to Avatar Five

## Why I am sidelining DVMs for now

When I started to grow tired of random posts on nostr I had a plan to create a DVM to tackle the problem of noise. I wanted a feed that caters to people like me in it for the professional discussions and meaningful FOSS work.

I used the LLM approach because simple hashtag parsing can only do so much. First I thought I would fine-tune a BERT-like model that should be easier to self-host. It turned out to be a harder goal to achieve than I expected.

I wanted to use GPT4-o to help me generate data for the fine-tuning. Sadly that was a bummer. It screwed me over so many times with garbage, I got exhausted.

Then I downloaded a bunch of high-rated stackexchange posts to be clustered and used for the fine-tuning.

This already took me more time than I wanted and realized that for a PoC I might as well use openai api. So did that and started experimenting with different prompts for GPT4-o mini.

In the meantime I got acquainted with the python based nostr-dvm framework and setup the basics I need for a DVM service.

After some grind I got everything working but still was not very pleased with the classification results.

Now, I know I could have put in even more time to really nail that prompt but I kinda lost my faith and appetite. I successfully use AI to learn and generate rudimentary code and chat about ideas. But what I needed is to generate a feed without manual intervention. AI people tend to recommend techniques to improve results that are borderline witchcraft. And every single time GPT finds a way to hallucinate plenty of stuff that I did not expect, no matter how hard I seem to try. This is my experience from months of daily interaction with GPT4-o too, which is much better than the mini version.

So no, I won't get trapped in the ai hype again. It is what it is: without human oversight these things are still worthless. I'm not aiming for a 90% usable thing, and I don't have a straightforward way to get to a 100%. No one has because these stochastic models are not AIs at all. They are mimicking parrots, nothing more. And this direction is a dead-end if you ask me.

All in all to customize a high-quality feed I tend to agree now that something like #nostrscript seems to be the better way to go from nostr:nprofile1qqsr9cvzwc652r4m83d86ykplrnm9dg5gwdvzzn8ameanlvut35wy3gpz3mhxue69uhhyetvv9ujuerpd46hxtnfduq3qamnwvaz7tmwdaehgu3wwa5kuegpzemhxue69uhhyetvv9ujuurjd9kkzmpwdejhglzevy3 .

It has the human element but is enhanced with the right tech to be much more than just picking hashtags to follow.

We might see a bunch of better use-cases for DVM-s but this is my overall sentiment right now.

I wonder if there are any examples how it failed

Zuckerberg is boasting about AI replacing devs in 2025. I see it this way: Zuckerberg spooks devs so it'll be easier for his HR department to insist on lower compensation, and spooked devs usually eat it because they're blinkered in general with too much work.

Hmm, there's a guy asking whether reaching out to former employees of the company on Linkedin could help him decide on a job offer. I wonder if the information would be skewed, as those employees left the company for a reason, and their opinions are certainly biased. Additionally, what is their incentive to respond seriously? You'll likely only get opinions from the work-unfocused, highly extroverted cohort, while others will ignore it to save their precious time as highly-paid professionals.

Every time you speed up clicking controls on a website and get slowed down by a CAPTCHA, realize that a properly written bot has already parsed the entire site today, along with thousands of others.

Every time you click the 'I'm not a robot' checkbox, realize that 100 bots have already passed through the page today and you're just performing an action in vain.

A guy made a thoughtful point about centralized systems: 'Don't ask why they were banned from the platform - ask yourself why banning was possible in the first place.'

very suspicious of the way it's going with NIP-32, looks like censorship practice is trying to spread to Nostr network.

just did a spot check comparison of DALL-E 3 and YandexArt, the 1st one is far more flexible. #AI

"On Saturday, Triplegangers CEO Oleksandr Tomchuk was alerted that his company’s e-commerce site was down. It looked to be some kind of distributed denial-of-service attack.

He soon discovered the culprit was a bot from OpenAI that was relentlessly attempting to scrape his entire, enormous site.

“We have over 65,000 products, each product has a page,” Tomchuk told TechCrunch. “Each page has at least three photos.”

OpenAI was sending “tens of thousands” of server requests trying to download all of it, hundreds of thousands of photos, along with their detailed descriptions.

“OpenAI used 600 IPs to scrape data, and we are still analyzing logs from last week, perhaps it’s way more,” he said of the IP addresses the bot used to attempt to consume his site.

“Their crawlers were crushing our site,” he said “It was basically a DDoS attack.”

Triplegangers’ website is its business. The seven-employee company has spent over a decade assembling what it calls the largest database of “human digital doubles” on the web, meaning 3D image files scanned from actual human models.

It sells the 3D object files, as well as photos — everything from hands to hair, skin, and full bodies — to 3D artists, video game makers, anyone who needs to digitally recreate authentic human characteristics."

https://techcrunch.com/2025/01/10/how-openais-bot-crushed-this-seven-person-companys-web-site-like-a-ddos-attack/

#CyberSecurity #AI #GenerativeAI #OpenAI #WebScraping #DDoS #AITraining

Looks like no CAPTCHA on the site