Profile: 21823843...
Thank you! LMK if you have any feature requests!
I'm going to be rolling out support for custom feeds soon and a few other neat things, like support for long-form notes.
I'm in, please bridge me. You are doing great work Alex, thank you!
Rudolphs Sleigh on a North Pole PCB
https://hackaday.com/2024/12/20/rudolphs-sleigh-on-a-north-pole-pcb/
Here's the video of it in action, it's awesome!
It depends on what you want to print. For me, I did FDM printing (the regular "melt plastic filament using a motorised hot end") for years and was never totally happy with the results or the process. Calibrating and maintaining the machines is a ton of work (some more than others of course). There's always a compromise between speed and quality, and even with the slowest/best settings you're always going to have visible layer lines and other small defects.
Last year I bought a resin printer (Elegoo Saturn 3 Ultra) and this has completely changed the hobby for me:
* Almost no maintenance. Once you've got it setup it pretty much just works. There are some exceptions naturally, but in general they need a lot less babysitting than FDM machines.
* Quality is incredible. The detail can be absolutely stunning -- way beyond what even the best FDMs are capable of. I've printed stuff for people and they literally did not believe me that I printed them ('you just ordered this on Amazon bro, this is clearly injection molded').
* Much better options for materials. There's somewhat more variety of resins than there are filaments, plus you can mix them: Want a red tinted plastic with a bit of flex to it that glows in the dark? No problem, just needs some experimentation with the proportions.
* Speed. If you're just printing one thing then FDM is probably faster (though the Saturn 3 Ultra can actually go quite fast). BUT resin printers really shine when you're printing lots of things at the same time. Once you have your model perfected, suppose you want to print 10 copies. With an FDM printer that's roughly 10x the time. For resin, assuming you can fit all of them on the bed, it takes the same time as printing 1, since each layer has a constant exposure time.
There are some downsides to resin though:
* You need to buy and store extra devices, like washing and curing stations.
* My garage is now filled with dangerous chemicals and I had to install a ventilation system to expel toxic fumes (how toxic they are is actually debatable, but I'm playing it safe).
* I have to periodically drive to a sketchy Chinese warehouse to buy 99% pure isopropyl alcohol in bulk. And getting rid of the waste is complicated/annoying. Look into your options for chemical disposal. Don't even bother trying the water washable resins, you'll just go alcohol eventually.
Anyway, I had fun with FDM for years but at this point I'm much much happier with resin printing. Everyone's situation and goals are going to be different, I just wanted to give you my perspective. You'll have a blast with whatever you choose. Good luck!
This is one thing I really wish you could do in C++. noexcept is... not this.
Should I block AI web crawlers on Oddbean?
On oddbean.com I see a *lot* of web crawling traffic from AI bots like GPTBot hoovering up nostr notes presumably for training purposes. I guess it's probably one of the easiest nostr sites to crawl since everything is rendered as plain HTML and they don't need to execute JS code to query relays.
To avoid wasting bandwidth I decided to use the following method to soft-block them (honour-system robots.txt): https://coryd.dev/posts/2024/go-ahead-and-block-ai-web-crawlers
You could argue they're just wasting my resources and won't bring any visitors or benefit the nostr community in any way. On the other hand, I guess they can/will access this data in some other way, and maybe the world-at-large gets some modicum of benefit from better AI models (?).
Thoughts? #asknostr
Hey sorry just saw this. Yep like fiatjaf said, these are the index definitions:
https://github.com/hoytech/strfry/blob/master/golpe.yaml#L24-L47
Just below is the indexPrelude, which is the code that actually populates those indices.
The querying is most in the DBQuery file here: https://github.com/hoytech/strfry/blob/master/src/DBQuery.h
But it's not really well document or obvious. In general, there are some rough heuristics about which index to use, and then does a scan of that, only loading the actual event data if needed (sometimes it can satisfy the query with index-only). Here are some basic docs in the README:
https://github.com/hoytech/strfry?tab=readme-ov-file#database
Happy to try to answer any specific questions if you have them.
So it looks like some Scuttlebutt folks have made a new protocol called PZP (previously PPPPP, which was a better name) and they even have a section in their docs about Nostr: https://pzp.wiki/guide/intro/#how-pzp-is-different-from-nostr
But aside from that the protocol seems to be pretty much like SSB, but with per-device keys and no strict need for a key to follow a single chain of notes. I can only imagine this makes the implementation much harder.
Do you have opinions nostr:npub1wmr34t36fy03m8hvgl96zl3znndyzyaqhwmwdtshwmtkg03fetaqhjg240 nostr:npub16zsllwrkrwt5emz2805vhjewj6nsjrw0ge0latyrn2jv5gxf5k0q5l92l7?
> sig-chains - you know if you're missing content in PZP
You could do the same on nostr if anybody actually cared about this. Just include a "prev" tag with the ID of the previous message in the chain you're building. You could also have a "last" tag or something to indicate the chain is complete. Clients could render it as "message 3/9" or whatever, like people do manually on Twitter.
In fact, PZP's replication appears to *rely* on these hash chains, since it uses hash graph replication: https://codeberg.org/pzp/pzp-sync
I did some analysis of this sync method and if you tried to use it for non-linked data you can easily cause it to degrade into a worst-case behaviour: https://github.com/hoytech/automerge-poison
IMO RBSR/negentropy is better, one of the reasons being you can sync arbitrary collections of (unlinked) data, for example all kind 0 notes, or all notes with a certain hash tag.
Yep looks like the same approach to me.
> It might help you find a better alternative by searching for his recommendation for C/C++ projects needing to do this
Thanks, I will look into this. Go has much better Unicode support than C++, which basically doesn't have it at all, or rather, you need to pull in a library to do even basic things. This is why I'm doing the hack I mentioned above: I don't want to add a dependency on a library like ICU (also it is very efficient).
OTOH, Perl has outstanding Unicode support. If you don't care about byte length, then you can simply pick out the first 100 grapheme clusters with a regexp like this: /^\X{0,100}/. This will handle the segmentation according the the TR-29 rules.
Although you're probably right for this use-case, in my experience byte-length limits are quite common and you have to deal with them somehow, ideally without causing weird artifacts. For example in nostr you have byte-size limits on note length, tag values, etc. Another tricky aspect is that theoretically grapheme clusters are unbounded in length. So a single "character" could take up gigabytes of encoded space -- worth keeping in mind due to the DoS risk.
Is there an easy way to truncate to a max byte-length using the Go standard library? This is the best answer I could find (after only 1 minute of searching though):
Note that as the answer says, this doesn't understand grapheme clusters, meaning that café can become cafe (depending on normalisation form). Also it involves another pass over the data which, if already known to be UTF-8, is redundant.
This 3rd party package looks to be the rough equivalent of my Perl module: https://github.com/rivo/uniseg
Truncating text is complicated.
Today I spent some time fixing some bugs on oddbean.com that I've been putting off for a while. Most just involved some uninteresting grunt work, but there's one that is a huge rabbit hole and, if you've never thought about it before you may be surprised at how deep it goes.
On Oddbean, we only show the first ~100 characters of a nostr note and then cut it off ("truncate" it). This is all well and good, except some titles got an unexpected weird character at the end:
Nostr Advent Calendar 2024 の 11 日目の記事を書きました。 2024年のNostrリレ�…
Now, I'm no expert on Japanese script but I'm pretty sure that diamond question mark character is not supposed to be there. What gives?
The answer is that almost all text on the web is encoded in UTF-8, which is a multi-byte Unicode encoding. That means that these Japanese characters actually take up 3 bytes, unlike Latin letters which take up 1. Oddbean was taking the first 100 bytes and cutting it off there. Unfortunately, that left an incomplete UTF-8 encoded code point which the browser replaces with a special replacement character (U+FFFD, the diamond question mark).
OK, easy fix right? Just do substr() on the code-points (not the UTF-8 encoding). Sure, but that is quite inefficient, requiring a pass over the data. Fortunately there is a more efficient way to fix this that relies on the fact that UTF-8 is a self-synchronising code, meaning you can always find nearest code point boundaries no matter where in the string you jump to. So that is what I did:
Problem solved right? Well, that depends on your definition of "solved". Notice above I've been referring to "code points" instead of characters? In many languages such as English we can pretty much get away with considering these the same. However in other scripts this is not the case.
Sometimes what we think of as a character can actually require multiple code-points. For example, the character 'â' can be represented as 'a' followed by a special ' ̂' combining character. Most common characters such as â *also* have dedicated code-points, and which representation is used depends on the Unicode Normal Form. You may also have seen country flags represented by two composite characters, or emoji alterations such as skin tone -- it's the same principle. Cutting in between such characters will cause truncation artifacts.
So rather than "character" (which is an imprecise notion), Unicode refers to Extended Grapheme Clusters, which correspond as closely as possible with what we think of as individual atoms of text. You can read more than you ever wanted to know about this here: https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
Note that many langauges need special consideration when cutting on graphemes (or indeed words, lines etc). Especially Korean Hangul script is interesting, having been designed rather than evolved like most writing systems -- in fact it's quite elegant!
So my hack for Oddbean doesn't do all this fancy grapheme truncation, and that's because I know if I tried I would end up in a seriously deep rabbit hole. I know because I have and I did! 10 years ago I published the following Perl module: https://metacpan.org/pod/Unicode::Truncate
I'm pretty proud of this yak shave, because of the implementation. I was able to adapt the regular expressions from Unicode TR29, compose them with a UTF-8 regular expression, and compile it all with the Ragel state machine compiler ( https://www.colm.net/open-source/ragel/ ). As a result, it can both validate UTF-8 and (correctly!) truncate in a single-pass.
If you want (a lot) more Unicode trivia, I also made a presentation on this topic: https://hoytech.github.io/truncate-presentation/
Yeah this was a pretty interesting article. Reminds me how deep the hash-table rabbit hole can go!
If I had more time to play around with radios I'd snap this up.
25W output, built-in FT-8, open source, reasonable price.
Interesting perspective. We could debate what decentralised means, but I doubt we'd ever be able to find a universal definition -- it is too much of a spectrum. If your definition of decentralised is that there are no servers at all, then I guess you'd think only purely P2P protocols are, and not even nostr would qualify. From a total purity perspective, probably anything using DNS would be disqualified too.
My view on these protocols is as follows:
Usenet has essentially the same model as nostr. Yes, there are servers (relays), but people are free to choose which ones they use. They can post their messages to any of them, and those messages may get propagated to other servers. Each server can have its own message acceptance/forwarding policies, and choose which other servers to connect with.
IRC is also a decentralised network (the R stands for relay). An IRC network consists of many different servers relaying messages. Each server agrees roughly with the rules of the wider network, but is generally free to administer its server as it sees fit (including user bans, preventing relaying certain channels, etc). Sometimes server operators disagree, and this results in them leaving the network and establishing their own. That's why there are many different IRC networks, EFnet, IRCnet, DALnet, etc.
HTTP and email are both decentralised in the sense that you don't need to get anybody's permission to connect to the network, and there are no single points of failure.
> is there a historical business model, or models, that we could mimic?
Good question! Usenet and IRC were often hosted by ISPs, which users paid for indirectly with their internet subscriptions. People also paid for access to Usenet. DejaNews was a popular archive/provider service before it was bought by Google. On IRC people pay for service providers to keep them constantly connected (bouncers) and provide vanity DNS names. Also, people pay to run bots like Eggdrop to manage/moderate their channels.
For email and HTTP there are the obvious hosting and other service providers, but I suppose the biggest value they support is the things built on top of them, which is maybe fitting for a general-purpose protocol.
nostr has no global source of truth, and that is a good thing
Out of interest, I follow the progress of a lot of other projects similar to nostr, and a couple links surfaced today:
BlueSky has a big "firehose" connection that streams all updates (new posts, reactions, etc) to subscribers. Unsurprisingly, this is difficult to process except on beefy servers with lots of bandwidth. So, one proposed solution is to strip out all that pesky cryptography (signatures, merkle tree data, etc): https://jazco.dev/2024/09/24/jetstream/
And over on Farcaster, keeping their hubs in sync is too difficult, so they want to make all posts globally sequenced, like a blockchain. The details are still being worked out, but I think it's safe to assume there will be a privileged global sequencer who decides on this ordering (and possibly which posts are included at all): https://github.com/farcasterxyz/protocol/discussions/193
In my opinion, both of these issues are symptoms of an underlying errant philosophy. These projects both want there to be a global source of truth: A single place you can go to guarantee you're seeing all the posts on a thread, from a particular user, etc. On BlueSky that is https://bluesky.app and on Farcaster that is https://warpcast.com .
Advocates of each of these projects of course would dispute this, pointing out that you could always self-host, or somehow avoid depending on their semi-official infrastructure, but the truth is that if you're not on bluesky.app or warpcast.com, you don't exist, and nobody cares that you don't exist.
nostr has eschewed the concept of global source of truth. You can't necessarily be sure you are seeing everything. Conversations may sometimes get fragmented, posts may disappear, and there may be the occasional bout of confusion and chaos. There is no official or semi-official nostr website, app, or relay, and this is a good thing. It means we are actually building a decentralised protocol, not just acting out decentralisation theatre, or pretending we'll get there eventually and that the ends justify the means.
Back when computers were primitive and professional data-centres didn't exist, it was impossible to build mega-apps like Twitter. Protocols had to be decentralised by default -- there was simply no other way. We can learn a lot by looking back to protocols of yesteryear, like Usenet and IRC, and still-popular protocols like email and HTTP. None of these assume global sources of truth, and they are stronger and better for it, as is nostr.
LMK if there's any bug on the strfry side and I will look into it!
I haven't heard about any, please drop by the telegram channel and let us know if you notice any!
The answer to this is surprisingly complicated.
TLS can optionally support compression which would most likely have universally worked for all wss:// connections. However, this was disabled in OpenSSL and other TLS libraries because of a critical information leakage that arises when secret and non-secret information are combined in the same compression context: https://blog.qualys.com/product-tech/2012/09/14/crime-information-leakage-attack-against-ssltls
HTTP-level compression does not apply to websockets (since its framing replaces/upgrades the HTTP framing) so instead compression is specified by the websocket RFCs. It is optional, so not all clients support this.
Websocket compression happens per message, and can use an empty window for each message, or can have a "sliding compression" window where messages are effectively compressed with previous messages. Some implementations will support both of those modes, some only one, and some neither. Even if an implementation supports compression, it may choose not to use it, and/or may use it only for particular messages (and not others). Furthermore, in the websocket compression handshake, bi-directional window sizes need to be negotiated and sometimes windows cannot be negotiated in one or both directions.
Almost all browser websocket clients support full compression with sliding windows in both directions, and so does strfry. The sliding window has a relatively large memory overhead per connection, so it can optionally be disabled. The compression ratios can be seen in the strfry logs.
Although strfry <> browser connections are almost always compressed both ways, different clients and relays have different levels of support and often can't negotiate optimal compression.
Hehe in fact my original attempt at sync in strfry was a protocol called "yesstr" using my Quadrable library: https://github.com/hoytech/quadrable
It used binary flatbuffers over websockets. It caused a lot of problems since binary websocket messages require a distinct message type from text ones. Since nothing else in nostr uses these, of client libraries were having trouble integrating with it. Using a structured binary format like CBOR or flatbuffers also means clients would have to pull in a (probably heavy) serialisation library.
The nice thing about the current approach is that any existing nostr client already must support websocket text messages containing JSON containing hex data.
> If you’re sending binary, base64 only inflates by 33% compared to hex strings’ 100%.
On purely random data, after compression hex is usually only ~10% bigger than base64. For example:
$ head -c 1000000 /dev/urandom > rand
$ alias hex='od -A n -t x1 | sed "s/ *//g"'
$ cat rand |hex|zstd -c|wc -c
1086970
$ cat rand |base64|zstd -c|wc -c
1018226
$ wcalc 1086970/1018226
= 1.06751
So only ~7% bigger in this case. When data is not purely random, hex often compresses *better* than base64. This is because hex preserves patterns on byte boundaries but base64 does not. For example, look at these two strings post-base64:
$ echo 'hello world' | base64
aGVsbG8gd29ybGQK
$ echo ' hello world' | base64
IGhlbGxvIHdvcmxkCg==
They have nothing in common. Compare to the hex encoded versions:
$ echo 'hello world' | hex
68656c6c6f20776f726c640a
$ echo ' hello world' | hex
2068656c6c6f20776f726c640a
The pattern is preserved, it is just shifted by 2 characters. This means that if "hello world" appears multiple times in the input, there may be two different patterns for it in Base64, but only one in hex (meaning hex effectively has a 2x larger compression dictionary).
Since negentropy is mostly (but not entirely) random data like hashes and fingerprints, it's probably a wash. However, hex is typically faster to encode/decode and furthermore is used for almost all other fields in the nostr protocol, so on the whole seems like the best choice.
> Personally, I’d prefer to see the message format specified explicitly as debuggable JSON, if feasible.
This is theoretically possible, but it would be very difficult to interpret/debug it anyway, and it would add a lot of bandwidth/CPU overhead.
Probably our telegram channel is best currently: https://t.me/strfry_users
Hopefully some day we'll migrate fully to nostr!