Interesting to hear these reactions still happen. My dad would spend a couple of months a year in rural Japan in the 1960s and 70s, setting up factories. He thought in many cases he was the first white man they had seen, and sometimes children would run away in fear at the unexpected sight of a strange giant walking along a lane. Back then, after losing so much weight on the first trip, he would take blocks of cheese and some tinned western food to sustain him. Now, even living outside London in the UK, there are stores with a choice of natto varieties.
Nostrasia was my first trip to Japan, and I only visited cities, so a very different experience - I absolutely loved the place, the people and the culture, and plan to come back to see more of the country. I wish I had visited while my dad was still alive. We still have a number of handmade gifts from people my dad worked with all those years ago.
An aspect of this compression approach I'm exploring which I've failed to describe properly so far is the potential uses of the local LLM.
In an abstract sense, I'm thinking of LLMs as being stores of vast amounts of 'work' undertaken during their training, which can be hooked into as a shared reference point by both sender and receiver.
When the sender uses the local LLM to find a semantic equivalent of their message (which I see as reducing the entropy), parameters such as temperature could be set to be as deterministic as possible. Then when the same LLM under the same parameters is used during decoding, I'm hoping that a small set of information such as first word, most frequently occuring words can be used to dramatically cut down the decoding search space.
I intend to experiment with combinations of these approaches for efficiently narrowing down the decode search space. It may be that different message sizes and content types benefit from different encoding types.
I'm no expert in any of this, just an enthusiast. It's likely that others could do this better, or have already done so but not applied it yet. Feedback welcome.
I finally found the clip from "Contact" (the film of Carl Sagan's book) that I have been looking for - https://youtu.be/yCkD5GOjvx8
# Semantic Ultra-compression for Low Bandwidth Communication
## Concept Overview
In any communication system, there is work required to transmit information. This work is distributed among three main components:
1. Sender
2. Transmission System
3. Receiver
Traditional high-bandwidth systems place most of the workload on maintaining the transmission infrastructure, with minimal effort required from senders and receivers. However, in very low-bandwidth scenarios (e.g., mesh radio networks), this approach becomes impracticable.
The Semantic Ultra-compression project aims to shift the workload from the transmission system to the senders and receivers, enabling effective communication over extremely limited bandwidth channels.
## Key Components
1. Shared Reference Dictionary
2. Encoding Process
3. Transmission
4. Decoding Process
## Process Flow
1. Pre-setup:
- All participants download a shared large language reference dictionary and the encoding/decoding application.
- This represents a significant upfront investment of time and resources by all users.
2. Encoding (Sender's work):
- Sender types a message
- App finds closest semantic matches in the reference dictionary
- Sender manually reviews and selects best combinations to convey intended meaning
- This process may involve multiple iterations and careful consideration by the sender
- References are hashed into a Merkle Tree
3. Transmission:
- Only the top Merkle Tree hash is transmitted
- This minimises the work required by the low-bandwidth transmission system
4. Decoding (Receiver's work):
- Receiver's app receives the Merkle Tree hash
- App performs intensive computational work to reconstruct the message:
* It systematically tries combinations from the shared reference dictionary
* Each combination is hashed and compared to the received Merkle Tree hash
* This process continues until an exact match is found
- The receiver may need to wait a significant amount of time for this process to complete
- Once a match is found, the app reconstructs the exact message approved by the sender
- The receiver reviews the decoded message to understand the sender's intended meaning
This flow emphasises that both sender and receiver invest significant time and computational resources in the communication process, offsetting the limitations of the low-bandwidth transmission system. The decoding process, while computationally intensive, results in an exact reconstruction of the sender's approved message, ensuring fidelity of communication.
## Benefits
1. Enables communication over extremely low-bandwidth networks
2. Preserves exact semantic meaning of messages as approved by the sender
3. Utilises participant resources instead of transmission infrastructure
4. Ensures perfect reconstruction of the original message, eliminating ambiguity
## Challenges
1. Requires significant pre-setup (dictionary download)
2. Encoding process requires manual effort from the sender
3. Decoding process is computationally intensive and potentially time-consuming
4. Balancing compression ratio with semantic accuracy during the encoding phase
5. Ensuring the shared dictionary is comprehensive enough for various communication needs
## Optimization Strategies
While the basic conceptual model demonstrates the core idea of shifting work from the transmission system to senders and receivers, it presents extreme inefficiencies, particularly in the decoding process. Here are some strategies to address these inefficiencies:
### Partner Merkle Tree Hash
The primary optimization involves the sender's app generating a partner Merkle Tree hash that accompanies the main message hash. This partner hash is designed to be low-work to resolve and contains information to drastically reduce the permutations needed to decode the main message hash.
#### Concept:
1. Divide the shared dictionary into multiple sections (e.g., 1024 sections).
2. These sections could be organised:
- Randomly (to distribute common phrases)
- By semantic association (grouping related concepts)
- By frequency of use (to optimise for common communications)
3. The encoding process tracks which dictionary sections were used.
4. A partner Merkle Tree is created from this section usage information.
5. Both the main message hash and the partner hash are transmitted.
#### Decoding Process:
1. The receiver's app first decodes the partner hash (low-work process).
2. This reveals which dictionary sections were used in encoding.
3. The app then only needs to check permutations within these identified sections to decode the main message hash.
#### Benefits:
- Drastically reduces the number of permutations to check during decoding.
- Maintains the low-bandwidth transmission requirement.
- Preserves the security and privacy aspects of the hashing system.
### Additional Optimization Ideas:
1. **Adaptive Dictionary Sections**: Dynamically adjust section sizes based on usage patterns to further optimise common communications.
2. **Layered Hashing**: Use multiple layers of partner hashes for extremely large dictionaries, each layer providing more specific guidance for the decoding process.
3. **Semantic Tagging**: Include broad semantic categories in the partner hash to help guide the decoding process towards relevant dictionary sections more quickly.
4. **Frequency Hints**: Incorporate usage frequency data in the partner hash to prioritise checking of more commonly used phrases or words.
5. **Error Correction**: Implement a simple error correction system in the partner hash to allow for some transmission errors without completely failing the decoding process.
## Potential Applications
1. NOSTR protocol over LoRa radio
2. Disaster communication scenarios
3. Censorship-resistant messaging systems
This concept leverages the motivation of specific user groups (e.g., NOSTR and mesh radio network users) who are willing to invest more effort in the communication process to overcome bandwidth limitations. The significant work required by both senders and receivers makes this approach suitable only for scenarios where traditional high-bandwidth communication is unavailable or undesirable. However, it guarantees exact reconstruction of the sender's intended message, making it reliable for critical communications in challenging environments.
I've started to work on a variation of this idea, on a hobby basis.
Seeing what's happening in the world, I think I need to step up my efforts.
If you tell them the truth they may object to war.
https://www.rt.com/news/599387-french-general-ukraine-crisis-lessons-learned/
By default, ISPs where I am are required to block that link as it's Russia-based RT, ostensibly to keep us 'safe' from mis/disinformation. Actually, I don't know many people who take news stories from any source on trust anymore.
Once I got to the article, I saw it references western sources for it's information anyway, and there's no attempt to block those. The attempted block just makes me more aware of the efforts to manage my perception.
Just noting an idea that dropped into my head as I experiment with Meshtastic over LoRa, which I'm (almost certainly naively) thinking could allow "compressed" message size such that consumption of relay storage is dramatically reduced and transmission could be supported even on low-bandwidth decentralised mesh networks over LoRa radio...
- Train a custom, single-purpose LLM for a type of Dictionary-based Redundancy Compression (DbRC), such that messages can be losslessly described by the minimal-size format of reference(s) to the model (base best-match reference plus modifier references to refine it to an exact match)
- Clients and Relays can download a copy of the DbRC LLM (existing practicality of local voice recognition data files on mobile makes me think this could be too)
Each message is optionally passed as the reference(s) to the shared DbRC LLM
- Clients/relays without a local copy of the LLM can decompress (and compress, if this would ever be worthwhile?) via a service (assuming they have sufficient bandwidth)
I know basically nothing about compression approaches or custom LLMs, so is this even practicable?
Has the challenge of Nostr over low-bandwidth mesh networks been solved already?
Was your sump pump overwhelmed too quickly to move precious items?
Or no sump pump at all?
https://jenkinsrestorations.com/the-basics-and-the-benefits-of-a-sump-pump
Really sorry to hear about this. Thanks for everything you do.
nostr:npub1sn0wdenkukak0d9dfczzeacvhkrgz92ak56egt7vdgzn8pv2wfqqhrjdv9 🫂
Signs up saying no Halloween events on Shibuya streets. Last night there were some access restrictions.
I'm back at my hotel (phone died). It looks like the building and sign in Leo's photo
I think there's more than one Belle Salle on Tokyo. I'm at the Sumitomo Fudosan Shibuya First Tower Belle Salle now, and can't see anything matching your photo - have I got the right Belle Salle?
Some Kato track in there? https://www.katomodels.com/
I'm flying over the Black Sea on my way to Nostrasia right now - hoping to check out some of the local model railway culture while I'm there.
"Deletion-compliance relay crawlers" sending then deleting notes from random nsecs could do a reasonable job of flagging relays that didn't comply (whether they claim to or not).
Couldn't detect "logical deletion" this way, though. So likelihood of a copy somewhere is high.
I signed up a while back but no acknowledgement or confirmation, so I can't book flights or accommodation as I've no way of knowing if I have a spot reserved.
Same for me. Can't book flights and accommodation until I get confirmation
I haven't thought this through yet, but maybe encrypted border wallet entropy grid sets fit in here somewhere?

