I imagine 98% of the data you serve is stuff that was uploaded in the past 48 hours?

Also I don’t think social media persistence has huge value to the user? It’s mostly downside.

Data out spikes might be headless clients or some kind of automated scraper that someone is using? Exposure to this is also a vulnerability you have.

Reply to this note

Please Login to reply.

Discussion

Yeah, wondering the same about scrapers..

are your media urls going straight to s3, or do you have custom code proxying your s3 bucket? with the latter you could look at user agent headers and drop obvious bots. not sure if cloudfront supports that kind of logic.

Not on S3 yet, we are doing just that, looking at headers and bots. Thank you