Avatar
David Chisnall (*Now with 50% more sarcasm!*)
ed98dd172c8fec81e65ac4d2b0687fe3ac855a12e939c028820df32002c896e7
I am Director of System Architecture at SCI Semiconductor and a Visiting Researcher at the University of Cambridge Computer Laboratory. I remain actively involved in the #CHERI project, where I led the early language / compiler strand of the research, and am the maintainer of the #CHERIoT Platform. I was on the FreeBSD Core Team for two terms, have been an LLVM developer since 2008, am the author of the GNUstep Objective-C runtime (libobjc2 and associated clang support), and am responsible for libcxxrt and the BSD-licensed device tree compiler. Opinions expressed by me are not necessarily opinions. In all probability they are random ramblings and should be ignored. Failure to ignore may result in severe boredom and / or confusion. Shake well before opening. Keep refrigerated. Warning: May contain greater than the recommended daily allowance of sarcasm. No license, implied or explicit, is granted to use any of my posts for training AI models.

nostr:npub1psdfxfpxz2cwmmnsk60y3nqpn2tqh9n24h4hstvfkwvr6eaek9js499sr7 nostr:npub1g0tuf634rz4suczwj7kgnecr6cyt0eu9xmp3sp0fku68mqehq4msp3tvm4

Again, how is that different from today? Services like WriteToThem let you write to an MP, as long as you provide a name and address, but they don't validate that you really are that person in any way. You can scrape the public electoral roll to find a set of people to impersonate. MPs would likely discover this only if they went to the house of the people and asked them about their letters.

Laws are published in draft forms and the Hansard records are public. Anyone who wants to mount that kind of attack can do so trivially. Having a public changelog and list of questions wouldn't expose any information that isn't already public, it would just make it easier to find. Attackers are already motivated to correlate this information from disparate sources, normal people are not and would now have the same visibility into the process as attackers.

nostr:npub1hykucplphuhelaxutcw4jw3vuu7gcg42czhqmk7jhchs8vdga4fsj73p33

Note that some airlines now explicitly prohibit both using and charging battery packs due to fire risks. Both China Airways and Air Korea had such bans when I flew with them a couple of weeks ago.

nostr:npub1ht9mkkuzgc9hu0amdptyjgkfat680mtsu4pmlfvy44ddzfaryztsavuqja I've done some Java. I found it encouraged tight coupling far more than other Smalltalk-family languages. It was quite rare for me to accidentally write reusable code in Java.

There's a lot of Java library code, but most of it seemed to start life as libraries, rather than be implemented for programs and then trivially moved to libraries.

Java programs are often structured as a load of packages, but in the Java codebases I've seen, it's rarely possible to just pull one of those packages out and put it somewhere else. They tend to contain a lot of program-specific logic and tight coupling between them and other packages that are tailored to the application.

I didn't want to say 'this reads like AI nonsense' because I didn't want someone who wrote something that bad and tried really hard to feel bad. But I did. And it was. So he should feel bad.

It turns out that prose is like code: it takes me far less time to write it from scratch than it takes to fix the nonsense that LLMs extrude.

I do enjoy the juxtaposition of crisp (chip, for Americans) flavours with names like Extra Meaty Barbecue Steak Manly Rah and then the ingredients saying 'suitable for vegetarians' and the main flavouring ingredient being yeast.

Ugh, I thought this document was full of nonsense. It turns out, it was Copilot slop.

I said this a while ago on a deeply nested thread, but I think people miss a critical part of why Excel is popular. It's not that it's a good programming language (the calc language in Excel is pretty bad, though the reactive programming model is nice) it's that it's an amazing debugger. Excel makes it trivial to visually inspect every variable (which it calls a cell). You can debug expressions (which are automatically updated) in exactly the same programmer model as the real expressions. You can create debugging views that mirror some important variables, without learning a different tool.

Build a debugger as nice as Excel and anyone can learn your language.

nostr:npub147c37tlw4ne0guqyxxzarkzfswd89eqc0hyn3pz6ghf7ecwy69lqzqu3x9 I keep meaning to play with Grist. It seems like one of the few projects I've seen that are at least trying to solve the right problem, rather than to badly copy a bad solution.

nostr:npub1n6rtkzq53wxrunn29e6a03jucx6j3zajpwx6lnzyyzzn0vlevkfsg5lueg

despite the very popular Excel bashing in the tech world, I think it is actually one of the better software products in the past few decades.

I don't disagree. It's just that Excel sets a very low bar. The fact that most alternatives are worse says far more about the state of the industry than it does about Excel.

The fact that Excel's calc language is the only programming language with a billion users says a lot more.

The problem is not that LibreOffice Calc is not an adequate replacement for MS Excel, it's that MS Excel is not an adequate solution for most of the things that people try to use it for. If you try to tell people to replace a bad tool with a similar tool that is subtly different, they will have a bad experience. 90% of the bad experience comes from using a VisiCalc clone, the remaining 10% comes from using one that has a subtly different UI.

Once you get over the learning curve, Jupyter notebooks are often a much better tool (and they have a lot of poor UI annoyances). A tool that provides the same flow and a simpler onboarding experience would be much easier to switch people away from MS Office, by not copying the bad things that Excel does, but by showing them that they can get away from all of the suffering Excel causes.

I still think about a New Fellows’ induction at my college a few years ago where they had all of the new fellows give a short overview of their research. And there was one whose talk was ‘I’m an aerospace engineer! I work with volcanologists! I build drones that fly into volcanoes! Here is a video of one of my drones flying into a volcano! We send them in to collect gas samples and return, but they also have lots of sensors and telemetry. So we fly them lower and lower until they catch fire! Here is a video of one of my drones catching fire and falling into the lava!’

The guy who went after her was the most upstaged I have ever seen anyone be. He had to follow that with ‘I study French poetry, here is a French poem. It has themes.’ And I felt bad for him because there was no possible talk that could not have been a letdown in his slot. Unless, perhaps, it was ‘I train resurrected dinosaurs to use rocket launchers’.

nostr:npub1dzjekzdn2qawmln6gnfuvn00x6l2jdxxw6z2gld0nfy4y4jc43qs4j7n2l

Not packing enough food, going on a long trip, discover a shop in the middle of nowhere just before you starve, break in and steal stuff, and then go home and tell everyone you discovered a place where you can get free food and come back with a load of armed friends to loot the store properly.

nostr:npub1lwdd26c8zprhjvtqgq690kh0xdtw3etg504wshpgy0g994t4malqfzck7j Publicly talking up anti-vax nonsense while getting vaccinated is also a weird choice. I understand if you really believe the nonsense that you wouldn’t want anyone vaccinated, but then you wouldn’t get the vaccines yourself. If you believe the science, you want everyone to be vaccinated because no vaccine is 100% and so reducing the chance you’ll come into contact with a carrier is good strategy. If you believe vaccine side effects are much more common than the official statistics suggest then your best strategy is to make sure everyone else is vaccinated and skip them yourself (so they take the risks and you are unlikely to be exposed to a carrier).

There is no starting set of beliefs followed by rational thought that leads to their actions.

I feel like a lot of the hate Liquid Glass gets should be directed to the previous versions of iOS that have set my expectations so low for Apple’s mobile UIs that my reaction to Liquid Glass was ‘meh, it’s probably not in the top ten worst things they did on iOS’.

When you talk about AI Regulation, you are accepting a framing based on a marketing claim that is not backed up by anything technical, namely that ’AI’ is special and different and existing regulations do not apply. And this is what the industry selling these things really wants you to accept, because if you didn't then you'd most likely conclude that a lot of what they do is already illegal under current laws.

nostr:npub1negj548wy4chhfmanee43k0eunug08rx8rrexpkzt56zqme8wyjq9cxget I found renting a dedicated server with a bunch of disks in it was much more cost effective than cloud storage. I'm paying £20/month for a machine with 4x2TB disks. It's mostly encrypted remote backups, so RAID-Z1 is fine (one disk can fail without it being annoying, two means I am down to not having remote backups anymore), which gives me 6 TB of usable storage.

nostr:npub1hjp5acdek7h3yj5tav6g37ck6n7mjns22dedl0en49aphse7zjps5fel3e

Is storage really that cheap? I feel like my console and laptop would be very bad off with â…” of the storage space dedicated to other people's data.

To answer these questions in the opposite order: both of those examples are places that tend to use quite expensive NVMe storage, because they're optimised for interactive use.

I can buy a 4TB spinning rust disk for under £100 now. Let's say I devote half of that to other people's data and have 2 TB myself, so £100 for 2 TB for the lifetime of a disk (plus power, but connect it to a RPi and the whole thing is going to use under £10/year of electricity). The disks come with a 3-year warranty, so let's assume that they die the day after the warranty expires. So that gives me a cost of £130 for 2 TB for three years. Let's say £120 to make the maths slightly easier, so £20/TB/year. I probably missed something (the cost of the RPi, for example, though they're so cheap they're basically a rounding error), so let's add 50% to that and claim £30/TB/year

Azure and AWS have about the same costs for blob storage. 2 TB on Azure in cool storage (assuming no access, just the cost of storage) is £180/year.

Even if you assume that I want three remote replicas for every local block, so I split my 4 TB disk into 1TB for local use and 3TB for other people, that's still a third the cost of Azure or AWS.

You wouldn't replace cloud storage with a single local disk because:

The local disk has no redundancy and so is a single point of failure.

Off-site replicas are important for protecting against things like theft or house fires.

Perhaps most importantly, the cost of a 1TB disk is only £50. So assume you want 1 TB for your backups, the incremental cost of turning that into a thing that has four-way redundancy with three of the replicas off site is only £50 over the lifetime of the disk. And that's much cheaper than any cloud backup thing.

You just need the hash to assure yourself that your data still exists somewhere in this amorphous cloud system, right

It's not just assuring yourself, it's assuring other people so that if I am hosting 10 blocks for you, someone else will be willing to host 10 blocks for me (or possibly 9 blocks, to account for failure). And possibly you'd also want to mediate bandwidth somehow, so uploading other people's blocks gave you some credit that you could use for storage or uploads.

Serious question for any etymologists:

When did Americans stop being able to tell their backsides from their donkeys?

A 1900 version of Merriam-Webster had two definitions of ass: donkey or stupid person (both uses are present in Shakespeare, and probably earlier). And arse was also defined, correctly.

But at some point between then and when I was born, Americans started using ass to mean arse.

Was it the result of some censorship thing where arse was deemed offensive but ass wasn't?

nostr:npub1td4rr5fdx6zqq8ytp3dyfn83knhtwqgp4jqal744jgkvq3037n4st2q4cm

Huh, SWT had a similar bug in their Cocoa back end in the early OS X days. They registered a repeating timer instead of a one-shot one, so the number of timers would gradually increase. If you left a SWT app doing nothing, its CPU usage would gradually climb to 100%. And all of that was waking for timers, seeing there was nothing to do, and reentering the event loop.

AI, n:

Machine learning applied to problems where machine learning is the wrong solution.

As a student, I was fortunate to have Roger Hindley as one of my lecturers but one moment in his lectures has stuck with me for many years:

He wrote an equation on the board and waved his hands and said 'and x is obviously infinity'. I was actually paying attention and didn't agree. I put up my hand and said 'Isn't x actually 12?'. He looked again, spotted where he'd made a mistake in the equation, fixed it, and then followed up by saying:

'Yes, but any number greater than three is approximately infinite'.

This has turned out to be a surprisingly good rule of thumb.

For years, academic hiring and promotions (in computer science, at least) have focused on precisely one thing: number of first-author papers in top-tier venues.

Focusing on the number of papers encourages people to publish the so-called minimum publishable unit: The smallest thing that stands a chance of being accepted. This discourages large research projects where you may take multiple years to reach something worthy of a top-tier venue. It also discourages high-risk projects (or, as they are more commonly called: research projects) because there’s a chance of not reaching a publishable unit at all.

Focusing on first author publications discourages collaborations. If two people work on a project that leads to a paper, only one will get the first-author credit. If a large project needs ten people on it, you need to do ten publications per year for it to have the same return on investment as everyone doing small incremental projects.

The net result of this is that even top-tier venues are saturated with small incremental projects. Academics bemoan this fact, but continue to provide the same incentives.

In the #CHERIoT project, we have tried, in a small way, to push back on this. We list authors in alphabetical order and we add an asterisk for people who deserve the conventional idea of ‘first author’ credit. Alphabetical order makes it clear that the author list is not sorted by contribution size (and such a total ordering does not exist when multiple authors made indispensable contributions in wildly different areas).

I was incredibly disappointed with the PC Chairs at the #ACM conference for a recent accepted submission deciding to push back on this (in particular, on including the exact same words that we used in our MICRO paper). ACM should be actively trying to address these problems, not enabling editorial policies that perpetuate them. If I had understood that the venue had such a strong objection to crediting key contributors, I would not have submitted a paper there nor agreed to give a keynote at one of the associated workshops.

I am in the fortunate position that paper credit no longer matters in my career. But that doesn’t mean that I am happy to perpetuate structural problems and it is very sad to see that so little thought is given to them by the organisations with the power to affect change.

The one exception that I have seen, which deserves some recognition is the Research Excellence Framework (REF) which is used to rank UK universities and departments. This requires a small number of ‘outputs’ (not necessarily papers) by each department, scaled by the number of research staff but not directly coupled to the individuals. These outputs are ranked in a number of criteria, of which publication venue is only one. It is not perfect (you will hear UK academics complaining about the REF with increasing frequency over the next couple of years as we get closer to the deadline for submission in the next round), but at least it’s trying.

Simply not actively trying to make the problem worse is a low bar that I would previously have expected any conference to clear.

nostr:npub1evlasggt6pl44vxp49fns8cjy2xxqgfk6akqqrwkqa4rj588xmhqt6yenf The sincere people are the worst sometimes. There are three kinds of replies that encourage engagement:

People just trolling (easy block, though I did that a bit late in one thread today).

People who are too lazy to look things up (fairly easy to ignore).

People who genuinely want to learn something and are asking questions that suggest that they've thought about the topic.

The last set are the ones I really want to talk to. But sometimes I have to remind myself not to. Or, at least, not to today.

Huh, so apparently something I thought was obvious about parsing and lexing was not. Perhaps because most of the languages I've worked on have had at least some context-specific keywords, whereas most toy languages and languages that were designed without later aggregating features do not have this property.

I always build front ends the opposite way around to how lex / yacc work. In this model (which I think of as 'push'), the lexer drives the parser. It identifies a token, then tells the parser 'I have a token of kind X, please handle it'. This works really badly for languages with context-dependent keywords. For example, in Objective-C, the token atomic may be a keyword if it's in a declared-property declaration or an identifier if it's anywhere else (including in some places in a declared-property declaration). The lexer doesn't know which it is, so you need to either:

Have the lexer always treat atomic as an identifier and then do some re-lexing in the parser to say 'ah, you have an identifier, but it's this specific identifier, so it's actually a keyword.

Replace everything else that uses an identifier with 'identifier or one of these things that are keywords elsewhere'.

The thing you want is to have (at least) two notions of an identifier (any identifier, or identifier-but-not-that-kind-of-identifier) in the lexer, but the lexer can't do this because lexing must be unambiguous in the push model.

In the pull model, the parser is in charge. It asks the lexer for the next token, and may ask it for a token of a specific kind, or a specific set of kinds. The parser knows the set of things that may happen next. If you're somewhere that has context-specific keywords, ask the lexer for them first, and if it doesn't have one ask it for an identifier. Now you have explicit precedence in the parser that disambiguates things in the lexer and avoids introducing complexity in the token definitions. You may also have simpler regexes in the lexer, because now you can specialise for the set of valid tokens at a specific point. If you know you need a comma or a close-parenthesis after you've parsed a function argument, you can ask for precisely that set of valid tokens, which compiles down to under five instructions on most architectures, rather than the full state machine that can parse any token.

Even without any performance benefits, it's just a much nicer way of writing a parser. Yet the other way around seems to still be taught and explained as if it's a sensible thing to do.