You have a point, figuring out how to make that dividing line might be the hardest part. I think you see it in the Ross Ullbricht case, where he exposed an email address very early on, when he was asking a coding question, trying to build the website.
Did you ever stop and *really* think about what it means to "do a Satoshi Nakamoto"?
Context for my weird question: I have met many, many bitcoiners over the years. Many of them take a stab at keeping privacy by doing some combo of: not revealing name, not revealing location, not revealing face. Etc. So often, if I happen to meet them in person, they end up revealing the things that they were hiding online. Quite literally a mask came off (pre covid!) once we started drinking - a simple, funny anecdotal example of what I mean. Many complain about photos being taken, many focus on always using a pseudonym. I'm sure most people reading recognize these patterns of behaviour.
I can see the purpose, up to a point, so this is not criticism. It's a little like me doing coinjoin "here and there" - you don't expect to defend yourself against a hyper powerful aggressor, only against a casual criminal looking for an easy score.
But if you do want *real* defence against *strong* attackers, you have a huge problem. These half-measures will be useless, perhaps worse than that, if you get overconfident, because the determined investigator only needs *one* strand to pull on, and the measures I describe above, which are almost always rules only half-stuck to anyway, don't cut it, at all.
Which brings me to my point: is it even possible to "go all the way"? Clearly it is; Satoshi Nakamoto is not the only person who's ever done it, but it's pretty damn rare at the very least.
Imagine what it would mean. If you are engaged in a serious project, that takes let's say at least a year's worth of full time work, then you are going to do that for no reward. Not just, no money, people do that quite often when it comes to things they genuinely enjoy, but no recognition, no social context, not even "oh I won't bother you because I know you're busy with that project". Nobody will say that because nobody will know. Imagine doing a full, intense 8 hour day of work (more likely, split over many days) and knowing that there will *never* be a direct reward of any form, for that. And then doing it again, and again.
What's more, you don't just "not get a reward". You have to do almost double the work, to ensure that at every step, every pushed commit or technical discussion, does not expose anything at the network trace level, or the language, vocabulary etc. Managing tricky pseudonym accounts, handling the headaches of Tor etc. I'm not trying to say it needs super-genius level tech skills, I'm trying to say it's a massive amount of effort.
Could you do that? I daren't even ask the question of myself, because I'm almost sure it's a no. But to *imagine* where that kind of motivation would come from, that's what fascinates me.
I did look at bitcoin-utxo-dump but utxo_dump_tools seemed like a better option, at least for my needs.
The conversion to a sqlite database from the output of the dumptxoutset took I think 14 or 15 minutes. Querying that each time with a filter is pretty fast, maybe 30s? Prob depends on if the filter generates a large or small result.
The original dumptxoutset call itself, I didn't time it, I'm gonna guess 15+ mins also?
Hardware: it's a VPS but similar specs to a good laptop.
Utxo set analysis time!
I have found a "toolchain" to extract taproot utxo pubkeys that's at least *reasonably* efficient - more on that, at the end, for the engineers.
But here's an analysis of a snapshot of the whole 167M utxo set as of 16th March 2024.
Of the 167M utxos, a full 39M are taproot (in other words, about 1 in 4 of all the individual "bits of bitcoin" that exist in our global consensus, are taproot - but not 1 in 4 *bitcoins*, i.e. not by value)!
Of that 39M, 33M are *sub 1000 sats*, i.e. basically dust or near dust. Pretty obviously, these will be "data carrying" type (probably ordinals stuff? sorry I don't know the details). Here's a rough breakdown of the taproot outputs in the utxo set by value:
Amount in sats Number of utxos(taproot only)
> 5 million 51674
> 2.5 million 81512
> 1 million 154130
> 500k 238060
> 250k 352235
> 100k 800843
> 50k 1043038
> 25k 1333547
> 10k 2853756
> 1000 sat 6084116
> 100 (i.e. ~all) 39034007
This will not be news to most. IMO taproot *economic* usage only picks up when Lightning implementations start using it; there is only fairly limited other incentive, for now.
For my taproot based 'anonymous usage project' (see recent posts), a filter of about 500k sats makes sense to me - anon sets of 250k are pretty decent, though as we've seen, we can definitely support much larger sets.
About the "toolchain".
Step 1 is to run the dumptxoutset RPC call against Core. As noted, this currently returns an 11GB data set of 167 million coins, so be aware if your setup is size constrained.
Step 2 is to parse the custom format of this data set. I believe it's Level DB. I found the easiest way was to run this useful tool: https://github.com/theStack/utxo_dump_tools/ against the file created by Step 1. This creates a sqlite database with intelligible columns in the 'utxo' database (like 'value', 'scriptpubkey').
Then I wrote a primitive Python script to do a SELECT from utxos WHERE value >=? AND scriptpubkey LIKE '5120%' .. something along those lines.
I can directly take the output of Step 3 as input to the aut-ct tool I've been talking about recently to create tokens.
Right, would be interesting :)
Don't know if you saw my post from yesterday, but i found exactly one *p2tr* output above 0.02btc, in the past year, with an invalid EC point. Kinda interesting result.
Of course it doesn't imply there aren't more than that, in that category, that are unspendable. Do you think there are a ton more invalid EC points with dusty amounts?
It's just so high nowadays. I remember 3GB being an approx figure you could mention to people, to indicate how much extra space was needed for utxos, plus it being an indication for RAM - the idea being this much smaller DB needs to be in fast access, since every new tx must be checked against it. Nowadays I don't know how that works, with 11GB. Maybe the DB structure is part of it.
~/bitcoin-26.0/bin/bitcoin-cli dumptxoutset "./utxos16Mar"
{
"coins_written": 167129136,
.....
}
167 MILLION utxos, wtf. This is not your grandfather's chainstate.
In a fairly remarkable coincidence, Jeremy Rubin just yesterday tweeted about this exact point, illustrating that 'nobody understands bitcoin transaction semantics' (h/t nostr:npub1e0z776cpe0gllgktjk54fuzv8pdfxmq6smsmh8xd7t8s7n474n9smk0txy for linking me it):
https://x.com/jeremyrubin/status/1768453193107718369?s=46&t=WMmqJ4MdyeBHjVDNEbJ-rg
Wallet devs take note.
I 'm not 100% certain, need to double check the math, but I think this is the only unspendable taproot output created in the last year. The pubkey is not valid, i e. the x coordinate encoded is not on the curve secp256k1:
https://mempool.space/address/bc1peyz0h83xfh9vqqymhhpj8jh9ze2quu4kys3u26x7c6s4djjcht9qrnd3ng
I checked with sage, indeed that x coord is not valid. That's a fairly expensive error, assuming it was an error; a few 1000 usd worth.
I don't think it's data encoding, but really I have no idea.
What's striking is there is exactly one such output over an entire year, though to be clear, i filtered above 0.02 btc.
Yeah, though there are different options, that is for sure one, and i should prob focus more on it. Buy I'm less worried about scanning in the keys from a filter, more about making sure that constructing proofs is not impractically slow, which is mostly about constructing the curve tree itsel, intelligently.
2.4 million distinct mainnet taproot outputs with amounts greater than 500K sats (at least once), in the last year.
Proof that I own one of them **, and a key image to prevent reuse, size 2.9kB and verified in 64ms!
The keyset takes up 155MB, but half that if binary.
Scanning raw blocks using RCasatta's tool:
https://github.com/RCasatta/blocks_iterator
... takes minutes but not too bad. Creating the proof is slow, I estimate 5-8 minutes on my laptop. Can def be improved.
Work continues at https://github.com/AdamISZ/aut-ct
** (Ok, i cheated and inserted a key of my own into the list, instead of creating a new utxo. Sue me!)
I 'm not 100% certain, need to double check the math, but I think this is the only unspendable taproot output created in the last year. The pubkey is not valid, i e. the x coordinate encoded is not on the curve secp256k1:
https://mempool.space/address/bc1peyz0h83xfh9vqqymhhpj8jh9ze2quu4kys3u26x7c6s4djjcht9qrnd3ng
Implications of the Sterlingov case:
Consider, that if the following descriptions of this case are correct:
1. He has been prosecuted on the basis of chainalysis "evidence" that he bought the website (something along those lines). **
2. This evidence was only provided in summary form and outlined basically a normal tx flow (NOT coinjoins or any non-payment transactions) showing that his coins were connected to the coins that bought the website (I think?).
3. The detailed analysis, including the algorithms of the proprietary Chainalysis software that led to the deductions (which are statistical afaik, i.e. they say "x% chance these two addresses are the same actor"), were NOT provided to the defence or the public.
=> All this taken together means that YOU, dear reader, even if you NEVER use coinjoin software, or coinswaps or lightning or literally ANYTHING that might enhance your privacy, could, quite randomly be accused and then prosecuted for a crime that had nothing to do with you. And you'd have no way to defend yourself in court.
** I don't actually know what other, even circumstantial evidence there might be. I remember the defence lawyers said that the govt. had no other evidence.
Firing up a mainnet node wasn't as bad as I feared.
12 hours to chaintip, 32GB ram, fairly normal modern hardware. Set 100GB prune but guess not important.
Bandwidth being decent is doubtless the dominant factor.
Now to dmscan some raw block files!
Testing results coming in:
./rpcserver bigger-80000-180000
Pubkey count: 96649
Elapsed time for verify_curve_tree_proof: 46.41ms
Verifying curve tree passed and it matched the key image. Here is the key image: "a496230673e00ed72abe37b9acd01763620f918e5618df4d0db1377d0d8ba72d80"
Figures of 40-50ms for verification of a ZKP that you own (and "consumed") a single utxo pubkey out of 100K (this figure is basically exactly the same as for 50K pubkeys. For 1M, there will be some slowdown, but it's *very* sublinear, so, not much). My next job is to figure out how to extract all the existing taproot utxos without it taking a week! Any ideas?
If you don't know what the bitcoin LARP is, it simulates mining and transactions with pieces of card, each group also having a calculator that does a rudimentary hashcash search, to let them mine for blocks. Everything in the real P2P network of bitcoin is simulated, just with "easier" versions (like word identifiers instead of 32 byte hashes). The first thing to consider is: building activities like this is a *LOT* of work. That's why teachers at schools don't do this kind of thing 5 lessons every day. But they are popular.
Physical objects are a very good way to help people with abstract concepts; even if the objects are literally colored pieces of paper. (I remember doing this with post it notes to explain segwit in a talk in London. Oh, also Lightning. I like post-it notes apparently). The LARP uses this heavily (example: a small sticker of key is used to represent signing, with the (very clever!) idea to use a color coding to represent *which* key it is).
In observing less technical people in my group (we were in 6 separate groups, each with our own miner), I noticed that the first part, in which we constructed and checked transactions by considering validity of inputs and outputs, was *especially* helpful for them - it represents the kind of "skeleton" of bitcoin and it's something they could easily quickly pick up, especially in a group setting where confusions could be quickly clarified.
The activity slowly expands into a more chaotic structure. We start sharing transactions and validating others (also helps people clarify what "mempool" is really all about), and then mining by using the "calculators" and they're set up to take a few minutes (I guess, more or less), then we start passing around candidate blocks. I think a lot of the less technical are going to struggle a lot more at this point, because a lot of different things are happening at once. But as I said before, it's a lot of fun, so by a certain osmosis they are learning the basics. Meanwhile others find their competitive side takes over :)
It could make sense to spread this over multiple sessions. In chatting with a non-technical member of my group, that was also her thought: the second and third thirds (roughly) may have been more feasible to absorb if it was done in another session. I guess you can finesse it in quite a few ways.
Just got back from Buenos Aires for https://btcpp.dev/conf/ba24 . Just as high quality of an event as the one in Mexico, @niftynei did a great job organizing. Presentations were solid, mostly about engineering using Lightning for real world scenarios. It was great to get a concrete sense of what the south American community of Bitcoin engineers are like, I was impressed, a lot of very competent people. Bitcoin conferences used to mainly consist of dreamers .. but nowadays I see people that are both normal, and also seem like they could get shit done.
I also finally got to experience the famous "bitcoin LARP" in person (I have some things to say about that, will probably write a separate note).
As promised:
The Bitcoin LARP exercise at btc++ Buenos Aires: as a former teacher of many years, this was very interesting to me. I almost felt like I was back in my first year or two of teaching (or learning to teach), because this had the key elements that were considered so crucial in teaching to mixed ability groups. Learning in small groups, learning by discovery, open endedness of tasks, and the, let's say "emotional timbre" (fun vs dry/boring). Another aspect is having goals/targets .. a little bit of lighthearted competition. (1/2)
Yeah my bad, sorry. I should actually fix that.
I remember at the time of the Bitfinex hack arrests I said, no way those coins are going back to Bitfinex. At the minimum, *definitely* not for several years and prob never. I got some pushback on that.
Here we are a couple years later and I see breathless news reports about 'US govt moving the bitfinex hack seizure'. I think a lot of people are wondering about an auction... that's the usual playbook for 'profits from crime'.
Those coins belong to Bitfinex, very obviously, there is zero ethical grey area there.
I could still be wrong and some decency prevails, but let's see (even then, withholding such a large amount of money for years, from a victim, is disgraceful). I'm actually very curious.
I would have said Dulce Villareal, but i think she's not on nostr(?). nostr:npub1e0z776cpe0gllgktjk54fuzv8pdfxmq6smsmh8xd7t8s7n474n9smk0txy maybe you have a couple of names?
Reached an initial waypoint with the aut-ct (anonymous usage tokens from curve trees) project. It should now be easy to install and test; see the README:
https://github.com/AdamISZ/aut-ct
70-100ms verification time for 48K signet keys on my laptop. Generates a key image, a DLEQ proof and a Curve Tree proof. It's organized as a small RPC server (the API currently consists of only one call - "verify") and a client. Make the proof with the autct binary first then make the rpc verify call - for me it takes between 70 and 100ms for that larger keyset, or for smaller test ones. There'll be very little variation in timing in practice; I will try up to maybe 200K later.
Also contacted the paper authors to check a couple of minor points, which were fine. I'm quite optimistic about this, in that I now am fairly convinced this is a practical way to do what I previously called "RIDDLE" (see e.g. https://reyify.com/blog/little-riddle ) with much larger keysets - even the whole taproot utxo set, in the extreme. In that earlier construction, while the proofs were compact (1kB), the verification scaled linearly with the keyset size so it created a realistic limit. With Curve Trees we still have 2-3kB which is absolutely fine, but verification time is in the tens of milliseconds, and barely changes much when moving to 100k-1M range (which is completely impractical with the previous GK14/Triptych-based construction in that blog post).
If anyone could test the install-then-worked-example from the README and report back, I'd appreciate it.
Added Rust code, which you can test, plus a lot of explanation to https://github.com/AdamISZ/aut-ct
If you do want to try it with the given test file of 48K pubkeys but hit a stumbling block, do let me know.
Still very experimental to say the least!
Measuring only the verification call (and not any cruft), I'm getting it down to 75-100ms for that same 48K key set. Starting to look more and more like it might be viable.
Why? Customer preferences weren't the reason for the limit, so customers' requests aren't likely to alter it (sure, possible, but unlikely).
OK, that makes sense. Doesn't the libsecp API let you specify the additional randomness yourself though? I vaguely remember it does but i wouldn't be surprised if that's a bit fiddly.
A recollection from years ago, I remember gmax telling me he talked to Pornin quite a bit about the RFC6979 spec and that he thought it was unnecessarily complicated (difficult to disagree if you read it!) - the main concept is of course f(privkey, msg) where f acts as a PRNG. Vitalik implemented it wrong in pybitcointools (less of a 'burn' than it might sound, since the error didn't break anything except with negl. prob ... so it's more just an example of how complicated it was).
I don't understand the nostr context, but why can't you use a deterministic nonce generation algorithm? (Unless you're doing musig in which case, yes, you basically can't). Either the bip340 suggestion, without injecting fresh randomness, or rfc6979 even?
Not being rfc6979 shouldn't be a problem, the algorithm is never a matter of consensus with other parties.
Doubtless I missed at least one thing here ...
Added Rust code, which you can test, plus a lot of explanation to https://github.com/AdamISZ/aut-ct
If you do want to try it with the given test file of 48K pubkeys but hit a stumbling block, do let me know.
Still very experimental to say the least!
I'm still in the first of those stages, mostly :)
What IDE do you use? I'm finding VSCode's hand holding very helpful indeed, only a couple minor quibbles with it.
on 'anonymous usage tokens from curve trees': it's taken a good long while (being at a beginner level in Rust doesn't help!), I've been able to construct a rudimentary tool to do the job of creating a key image of a single utxo pubkey, and prove that it corresponds to a rerandomised entry in a curve tree.
My tests are now showing verification time of about 700ms for an anon set of 48K keys.
This is fast, but a lot slower than the 24ms listed in table in 6.1 of the paper! Though i am using optimised/release build, I'm not using parallelization or any actual benchmarking setup, so, hard to say. But I guess it's safe to say you could reach 1-300ms in practice(?). That should be fine in the cases where this primitive is useful.
Increasing to anon set 1-4M shouldn't increase verification time by more than 2x, afaict.
There are a ton more details that need to be checked out, but with the 'pedersen-dleq' bolt-on that I came up with, I think this curve trees approach should be better than the spartan-ecdsa approach; the latter is more powerful machinery but more general.
I asked the authors about my idea here (you can find the original paper, and my suggestion), though no response as of yet:
I think this is the original article?
https://tkp.at/2024/02/15/frankreich-mrna-kritik-kuenftig-strafbar/
Following the trial today, it's quite ironic that Craig isn't able to define 'unsigned', considering that is a status he specialises in ;)
Unfair, he never fails to provide sources :)
(I'm actually complaining... we should meme about him being a proven serial liar, not someone who doesn't back up his statements. Very different. )
With you 100% on this. There are a couple of people in particular I have in mind, though there are tons of others who I had no connection with that were influential.
By far the worst is Gavin Andresen. His idiotic participation in a 'private signing session' is forgivable. His point blank refusal to recant or explain, years later, is probably the single most destructive and immoral thing anyone in the bitcoin dev community has ever done.


