> tcp connections are cheap

I would disagree but we could be on different scales, that's at least two extra OS handlers, at least 16k (usually 64k) of kernel space (x2 for rx and tx) and often userspace (usually shared) buffers, per connection, per LB. L7 lbs generally multiplex tcp connections and dramatically cut down on memory consumption. While it can be a bit aggressive. For every websocket that's opened in my http server, tuned to 16k, thats about 256k after the upgrade is established (because websockets are usually fully buffered) and http buffers are freed just in userspace. For 5000 connections that's over 1.2G of committed system memory minimum just hanging about. I would expect software like nginx to be more optimized in comparison but that still hogs up LBs and I know I've heard other sysadmins share their stories of websocket overhead.

L7 lbs multiplex these http/1.1 connections, generally cutting down to double digit connections to service 5-6 digit ingress traffic.

I agree with the basic architecture, that is, a highly functional with a "popularity decay" model. I can't remember the scientific name.

Reply to this note

Please Login to reply.

Discussion

os handles*

popularity decay => least recently used (LRU)

Yes but that's not a great model. LRU breaks down in pure form here. A single user pulling an old file forces fresher content out. TTLs have to be attached and respected by each tier. There is an official name for this model I just don't remember it. As files age their ttl decreases and avoids polluting caches.

frecency-based models are what you are looking for

Yeah but there is still a more specific model Im trying to remember the acronym for. It literally describes exactly how to label content with TLLs based on the frequency of the exact content being hosted. Something one of the big tech companies published. Basically, given that the site is hosting images to be shared on social media, it's known that the images will have the highest frequency immediately after being shared, then general decay at a known rate. It can even get as accurate as saying, given an image of a puppy, you can apply TTLs that model how the image of the puppy should be cached.

substitute "puppy" here for some known constant, like: my users usually upload X content. In the case of devine. Users will only be publishing short looping videos designed for user attention. Which might frequently have images of cute animals...