Does anyone know which kind of attacks or malicious activity would cause high data-out? I’m getting slammed at the end of every month with crazy data-out charges, can’t scale this way..

I’ve made the ā€˜View All’ files for pay only accounts, I compress most public files… The apps cache these images so the same user isn’t downloading…

Ideas? Added blank index.html files in open directories..

Not needing donations, just any thoughts on how to prevent what appears to be a data-out leakage.

Reply to this note

Please Login to reply.

Discussion

For now I’m blocking access to most of the old files before 3/3/23, until next month….

#[1]​ no!

šŸ˜†

I šŸ’œ your service. āš”ļø will happen.

I’m ok without zaps for now, too kind..

More for the devs to think about what might be happening.

DM me and maybe let’s setup a call so o can help you troubleshoot this madness! 🐶🐾🫔

šŸ’œ

hmm. are you able to id the source of the spikes at all, either by ip or domain? i’d guess rogue crawlers but hard to say

Not yet, but will start looking at ip addresses.

I use Nostr.build every day. šŸ™

So you are causing the spikes?!? 🤣

Gulp! 😜

it just appears things are growing.

part of the problem with object storage and "cloud compute" is you can't really access real server logs like you can with apache and a traditional environment.

such clues lead to not having to guess as to what is actually drawing the bandwidth, and from where.....

ps this is #[2] via an alt.

It is possible I just need to be in a lot higher tier for data out..

Is 9TB out a month crazy, or about right for something like nostr.build you think?

well I pay for 10tb out...

but no if you do the rough figuring it kinda works like this:

]

- user uploads image.

-10-30 relays pull that image and store it.

so each jpg or png stored is roughly going to inflate 1,000-3,000%.

if you start running imagemagick if you aren't doing that already on your front-end, then that would be a great help.

but overall, the protocol is broken in the way it's designed,

and what will happen eventually is the only people that will be able to actually run relays successfully when nostr scales are literally big data operations with millions of dollars...

i am almost 95% certain this is inherently by design.

there's actually nothing great about the "decentralized architecture," as it is currently posited.

there goal of "publish to the web" in a social sense isn't really attainable, there is no logical reason the entire internet needs a permanent record of what someone publishes. ever.

the http protocol did this already fairly well, but then the last 15 years of "smart" encroachment and lack of technical folks making decent tools for people to create basic websites, made this all fall flat.

we need another thing like geocities, imho.

it's a much better way to share things than in real time and fragmented attentionspans.

--- fragmented attention sharing focuses on risk/reward/ego stroking, versus supplication and sharing of personal identity, goals, morality and ethics, and is short lived, and highly temporal.

anyway just some thoughts..

okay so i have a blog right? sometimes when i post a link on nostr for a blog, i get over 1,300 hits to that article. i don't know if it's pulling all the media ( could look at apache logs but havent gotten that drill down just yet)

oh .. the thing i forgot was:

---with fragmented sharing the throughput from a user is much less personal and much less meaningful and memorable. and of much lower quality, VS a person designing a webpage and talking about their actual passions and interests and taking the time to think about it.

versus the dynamic of --nostr-- which is just like going to a bar, and vying for attention, people tend to lose focus on themselves and focus on impressing others.

which imho is stupid. (it also results in much much higher webtraffic.) and this last point, is something the people really don't grasp and cannot be expected to grasp unless they are educated about network topology, albeit in a simple way.

I really appreciate your critical thinking and open expression, what you are talking about it's a matter we definitely have to face.

I agree with some your concerns, currently I'm more interested in the long form content, it could be the hidden gem that boosts Nostr as protocol (not as platform) to support a new flourish of small personal websites.

However assets/media hosting still rises as a critical point.

thank you Daniele. I'm going to reply to and follow you from my main account: ringo@nostr-check.com (where this will come from, although I began composing this in iris.)

as far as the long form content, I think that writefreely *almost* nails it.

I run a blog server that could support thousands of users, but nobody ever asks me to start a blog.

I *Love* writing, though. and have been in the practice of keeping online blogs for the last 22 years, and on paper more or less forever.

---

I think what's happening is multifold, but people have largely lost the ability to communicate outside of an image... which is rather sad because if you think about it, the image isn't actually communicating what that person is thinking, and MAAAYBE if we're lucky only a fraction of what they are feeling.

That said, I'm not really certain how much thinking is actually going on in most peoples heads, but out of my nature and politeness both, I hesitate with trepidation to call them stupid, or ignorant. What I really do wonder though, is how effective THEY feel that a gif or an image is, insofar as communicating their feelings, emotions and thoughts. I've thus far, never gotten to the bottom of this one.

---

When you say "long form content," what kind of things are you seeing or would like to see here?

How would a website "run nostr?" I could hash out my conceptualization of that, but I'd rather hear what someone else thinks, so asking in earnest curiosity. Pros, Cons, overall architecture, or whatever you feel coming to mind that's both salient and universally relevant.

Every communication medium is necessarily an abstraction of a thought, images are at a higher level and this can increase the gap and lower the signal. Then I guess AI generated visuals, beyond often promoting users lazyness, inject some elements that are not contained in the original idea, and this make the transmission more complex.

Nostr puts togher a protocol and an simple open format; this makes possibible to create and upgrade a personal self hosted web sites, on a custom domain, using Nostr as a storage backend, promoting a "content first" phylosophy but letting the user also express himself on the preferred visual presentation (layout, colors, typography, etc) under his brand name without locking in a specific framework. This anti-locking attitude expresses itself to the point that someone else in the meantime can also access, save and broadcast the same content using a different client; so the content - if valuable - can actually survive not only a censorship but also backup errors, the madness of the NICs, and in the end to the wishes of the user.

Back to images: I think images should be bundled with the content, and since *we're going to pay for the hosted data* (otherwise we'll be the product, right?) a natural selection will emerge as to what a user will decide to publish and keep and what mark as ephemeral (ex. using an external url or a new kind). I'm not at all concerned about a 200KB *whole signed* article incorporating 2/3 useful photos or a interesting 10KB SVG graphic.

Would you like to store a 1MB meme gif forever? Pay for it, I will decide if pay my bandwidth to download it.

I love words but I think it's wrong to divide textual and visual content with the aim of making Nostr "more efficient", this implies that we consider visuals expendable garbage, and therefore users will produce expendable garbage. Ask for value to manage a (visual) content and it will express a value, as PoW teaches.

Just random thoughts, lots to think about and polish. Maybe I'll write something about this when I have a clear vision.

Ok, good to know.

So you think it could be a possibility of how the nostr protocol is, could be, having to serve all the clients l, and their multiple uses.. A few people offered to check things out, let’s see how we can optimize..

Using imagic for a lot of things, not everything.

Absolute worse case I delete or take off-line the old stuff.

I think its a function of how many users (how many relays each of those users is allowing to read events to) because each read and publish is equivalent to an http fetch, in your case of a nostr.build URL.

if even one person on one of those relays DL's the image, then that's a successful HTTP/OK. this isn't taking into account people who are fetching and not following a user, and just having the image display in global as they browse, which is also a fetch.

there's a lot of overhead that's inherently unnecessary, because of the way the protocol is designed.

it's really not that efficient.

actually come to think of it, it'd have been a lot easier to just build something like a lighthttpd server, and each person "builds a webpage" on their device and then posts go out, and you can control weather or not people who are your contacts can see the thing, OR if you want to broadcast to the whole web.

In fact I'm surprised it wasn't done like this in the first place.

for some reason half of your reply didnt' show up earlier in iris.

but i'm seeing it here in more-speech.

--

Yes. I think it's due to the topology of nostr itself, having to as you said serve all the clients, with all the various requests they make.

This should be interesting.

Meanwhile, I wonder how many people realize that basically we're all doing work for free..

If the topology and design is the problem, I will likely have to hide older files from free users, create a model like ā€˜only paying customers can see images older than 1 week’…

This would be fine imo. The past is dust and the dust is going to get pretty thick at yours if nostr scales. I really appreciate being able to post images but I don’t treasure them or expect you to hold them forever. If I did would certainly expect to pay for the service.

I don’t want to do that, I’m sure we can find a solution..

there may be a way around that, but that may likely be useful to leave on the table, for now.

imho there are ways to incentivize community interaction towards keeping your project floating, and i'm curious to explore this because i'm wondering what similar kind of model i can emplooy with my blog.. which gets thousands of hits and no donations, and i pay for out of pocket each month.

@iefan had some sort of paywall thing he was working on, and i'd be curious to see what he thinks about all this, too.

its unfortunate that we live with a world of money, when most of us frankly hate money, and only see it as a tool that's about that we never asked for, of which is all that some people understand in terms of human interaction, having lost the magic of life, giving, gifting, creating, and loving every moment of it.

i'll leave this here for now..

Can you explain why each image has to be downloaded by 10-30 relays per user?

As far as I understand, the image url is referenced within an event. That does not necessarly mean a relay would have to fetch it right?

Since the user first fetches the events from relays, which then make a separate request from the client to the image server.

If for any reason all relays would need to fetch images (let“s say to verfiy the image content with a hashsum or whatever) this also may be solved in another way.

Also another thought that came to my mind: To serve images that would cause high traffic, lets assume we know the user posting it has a lot of follwers: Image is uploaded to imager server. The relay recognizes this image as a very important image (VII) and publishes a specific event which will list a clients for that image that have downloaded this image within the last 20 minutes. Using this event, clients could request the image from other clients instead of getting it from the image server. It is like a client-only peer-to-peer image CDN. I am not sure though how this could be implemented so that clients could be found. Maybe with ephemeral events and web workers.

>From: cd9ea7c...<-8eb09... at 03/26/23 23:15:02 on wss://relay.damus.io

>---------------

>Can you explain why each image has to be downloaded by 10-30 relays per user?

>

((without getting too complicated, it doesn't mean that, necessarily.)) not sure how many relays most people use.

>As far as I understand, the image url is referenced within an event. That does not necessarly mean a relay would have to fetch it right?

>

(( referenced within an event, yes. and correct, the relay doesn't necessarily have to fetch the jpg. )) but the moment someone on global or that relay OR that persons followers list, fetches the event, that jpg is suddenly being pulled by from the media storage server whereever that is, and pulled by the users accessing it via that event on that relay, if this makes sense.

>Since the user first fetches the events from relays, which then make a separate request from the client to the image server.

>

>If for any reason all relays would need to fetch images (let“s say to verfiy the image content with a hashsum or whatever) this also may be solved in another way.

>

((this is interesting and sounds convoluted, but i'm curious what your proposing here, and how it would be beneficial..))

>Also another thought that came to my mind: To serve images that would cause high traffic, lets assume we know the user posting it has a lot of follwers: Image is uploaded to imager server. The relay recognizes this image as a very important image (VII) and publishes a specific event which will list a clients for that image that have downloaded this image within the last 20 minutes. Using this event, clients could request the image from other clients instead of getting it from the image server. It is like a client-only peer-to-peer image CDN. I am not sure though how this could be implemented so that clients could be found. Maybe with ephemeral events and web workers.

(( this is actually very interesting. )) I like this alot.

what I've yet to see is some sort of procedural regularity with the relays that i'm using (about 40 of them), and wherein I can predict a pattern therein that the above would actually be applicable in a relevant way,}{ without turning into the dreaded "this torrent started but has no seeds, scenario."

furthermore, on your last point: that would mean that other clients also have knowledge of an image uploaded, and where it is, making clients more a search engine back end, (but then how would they serve and validate metadata requests for an image or media resource, or mp4 for example?) the discovery portion of this is a bit vexing...

I wonder what thor@tigerville.no thinks of all this.

all of you are welcome to come discuss this on my irc server, if you'd like. there are already a few tech folks that idle there if you need some extra brains to pick.

2x2chat.com #2x2

check your IRC )

>From: cd9ea7c...<-8eb09... at 03/27/23 12:11:08 on wss://relay.damus.io

>---------------

>check your IRC )

i see that you came by! :) yeah feel free to relaunch and use something you like inside a VPS.. :) irc is like long form pingpong or chess by mail, works best when people are idling at the post office. =D

the client that im using to quote is "more-speech."

fyi: someone implemented a cdn

https://github.com/lovvtide/nostr-torrent

> 200 MB / minute on a average.

How does that compare to your dataset?

Off hand it seems high, but maybe things aren't caching.

>From: 21b4191...<-nostr... at 03/26/23 22:19:01 on wss://relay.damus.io

>---------------

>> 200 MB / minute on a average.

>How does that compare to your dataset?

>Off hand it seems high, but maybe things aren't caching.

that is a RIDICULOUS amount of data.

but then again, i've never ran a professional CDN.

I’m not even using CDN technically, when I turned that on I wracked up $100 in 4 days of cdn overages…

last month, for example I only pulled 428 gigs.

Pulled form what, which site?

from dailymessenger.is and writehere.is

It is something like 7-9TB out a month, I think that is too high, and seems to get a lot higher each month….

You’re doing great work for the community, I appreciate you šŸ’œ

Already have a few brilliant people helping.. not worried at all.

Your post is generating a lot of interest.

Added to the https://member.cash/hot feed

The hosting service has a revenue growth target to hit at the end of each month 🤣

Reminds me of a story of Google's early days. They got a fiber connection across the US but used only a few hours / days a month due to its fee structure. It was bursts like those 😁

Might be a valid traffic? Cache helps with performance but not data out?

Would help to see the access pattern. And maybe cross-reference that back to the notes and pubkey that ref them.

Might actually be good to investigate this on a grander scale, this way you can actually bill people properly if they cause a lot of traffic.

There should be a price difference between someone hosting and image with 100 followers, and someone hosting videos with 50k followers.

Starting to investigate and test now..

It’s hard to believe it is regular people usage related, even for influencers, 9TB out is a lot!

All of the green skull memes

Logs analyzing of course can help.

In the meanwhile I would try to apply a throttling based on IP/URL to see if any clients is DOSing the service without knowing it or there is a bad actor around.