Null pointer broke it.

#BiggestSoftwareBugEvah

Reply to this note

Please Login to reply.

Discussion

are you coding Java today? šŸ˜„

#crowdstrike

Ah oc!

C is only for adults, kids. Stick to rust. šŸ¤™šŸ¼

Also:

"CrowdStrike CEO George Kurtz was previously CTO at McAfee. There, too, a faulty update occurred in 2010 that paralyzed Windows XP computers worldwide. Although the effects were not as dramatic back then, one might have expected Kurtz to have learned from the experience."

🤣🤣🤣🤣🤣

Quite interesting.

I'm an aggressive null checker. Internal functions are mostly debug if null is like a static control flow error.

Address 0x9c should be a protected region so worst that could happen was a crash? If I'm not mistaken the process dump showed an protected read fault.

Top voted comment on HN suggests (just like I would) "This should not have passed a competent C/I pipeline for a system in the critical path."

But besides banging on my keyboard, software bugs are going to happen. Adding more complexity is unlikely to fix the issue for a long time. Maybe would shouldn't depend on things where a single mistake in a single line of code (that was found) causes the world to nearly melt down. We've had memory safe languages for a long time, guess what, programmer errors still cause application crashes. Also many (if not all) "safe" languages just defer the unsafe stuff to the compiler/interpreter. The majority of the C# standard library uses pointers and P/Invoke.

Lets just keep making tool-chains larger and supply chains deeper I guess until the problem is resolved... because that works. I'm done.

This shouldn't even have passed a manual smoke test or system integration test. Literally meant you couldn't get the machine to start without a Blue Screen of Death.

There are about five different tests that would have all failed, as soon as you started. They apparently don't test.

Also, amazes me that the sys admins just rolled this out everywhere without even checking locally on one machine.

I wonder if anyone, anywhere, caught this before rollout.

It's just another Boeing incident lol

I know. Good grief.

Yeah that's crazy!

I'd guess sysadmins had nothing to do with it, and the update was pushed live over the air to all CrowdStrike customers because "it's a security update."

When everything is "X as a Service" these problems will occur more often, because end users don't have the choice to wait for updates to be validated in the wild before making those changes on their machines.

Oh, that's true.

Gosh, they're doing automatic roll-outs without a proper pipeline. That is scary.

On the other hand, customers have been rewarding this sort of reckless behavior. We can see that on here.

Yeah, fair share of people here advocating for aggressively fast updates "even if it breaks shit". There's always a balance.

That's one reason I trust Nostrudel. They offer a cutting-edge version "next.nostrudel.ninja" where they also don't activate new versions until you press the button. A couple of times, it broke something, so I just went and used the normal one, again.

Beta test environment, basically.

We used to do this at work. A public staging environment.

It think this sort of parallel release alleviates a lot of the curiosity the users feel (they want to "see" the new thing and try it out), while mitigating risk.

Even if you have a desktop or mobile app, you can created a simulated/test environment to show off a new feature and get feedback from users. There are Android emulators and stuff.

IMO frequent for security, and heavy testing regime and approval signoff pipeline

for everything else, make it optional, at all

The problem is not that there fast increments, it's that the increments are often barely tested, unreleased, and bug-fixes are rolled in with new features. So, you're basically forced to install every update, immediately, because what they delivered the last time is broken and this new one contains the fix. But the new one is also broken.

Software Version Mafia šŸ˜‚

And everything straight to full-rollout to production. Boom. And then everyone installs it and it immediately crashes. That's not even an alpha version, it's just a prototype.

And then it's like,

Ummm... my bad, reinstall the previous one.

yeah, move fast and break things should not be the motto of the QA lol, QA should be pig headed salty bastards who are sticklers for an extensive test routine to be run and passed, and i mean, like, a giant long checklist of features that should work that have to be run through a standard regime

and not even touch it without the unit tests all 100%, fix the damn tests, and no damn disabling the damn tests damn you

yes, the salty bastard rubber stamp bureaucrat style is what we want in QA

And no unit tests that are just "assert True; return;" šŸ˜…

We call those people "beta testers," and their enthusiasm should be channeled to help produce a stable product for the average user.

EXACTLY!!

I'm a natural for the beta tester role, as are many of the power users, and we tend to be mentally prepared for the software to mess our stuff up or act wonky.

Roll out to us _first_ and then roll out to everyone else a couple of days or a week, later.

And make the updates opt-in. Sure, it's better for security to stay fully up-to-date, but if someone wants to never update their desktop, it's on him if he gets hacked.

In commercial systems, the sysadmins should be responsible for validating and determining the necessity of updates.

Wait, I'll throw another out there. I agree with the C++ project owner (who's name escapes me) This isn't a C++ issue, it's a "native C" issue. His statements were something like modern C++, if used with recommended practices is considered a memory safe language.

I'm tired of this whole (my straw man) : "Developers are too dumb and lazy, and continue to have high rates of a particular error, they just need to learn this new language that people keep saying will prevent these issues instead of learning to use the current language with best practices" debate where we need to use rust everywhere, and then rust devs think their safe from memory issues and that it's a massive security talking point to write a bad program in rust and call it secure.

Just stop. Get better. Learn from mistakes. Software mistakes will occur, don't be so reliant on mistake free programming??

my disagreement is not about memory safety

moving that to a GC ends that altogether, though it can at times introduce performance issues with excess garbage

my disagreement is readability, and clarity of concepts

objects are a bad way to model systems, the most salient and important feature of an element of a system is its structure and interface, not its arbitrary fake family tree bullshit

objects lead to very ugly long complicated names, and expensive compilation because of the simple fact that objects are an abstract concept and structures and interfaces are concrete, and this comfortable shortcut leads to errors in logic that lie deeper than just the stupid fucking stack pointer

hard to read, hard to understand, hard to compile = easy to fuck up

nostr:note19qwl2emf4ywvmf2ldpfjxxezw2gmet4ha3l3pc4luuwhxzrj8pcq2xz07a

Can I run Rust on an obscure platform with resource limitations and no publicly available toolchain implementation

No idea. We're doing everything in C++. 😁

the instruction set is custom because FUCK YOU

thanks whoever thought this was a good idea

maybe with tinygo, it was made for that

Yeah I’m not implementing an entire compiler backend :/

technically you only need to write the part that takes the AST and generates code

i think clang makes this more possible

I'm still learning so much about this. It was 100% kernel code, so I was wrong here and why this was a much bigger issue. The flaw was a named pipe execution. And well makes sense to be using "real" addresses in this context.

It is a logic error NOT a null deref!

Oh, interesting. How did it get past the tests?

The entire update file was zeroes

The pushed obvious crap straight to full production rollout without a proper build server implementation. A simple smoke test would have found that and failed the build in 5 minutes.

#CodingLikeAFoss

They were like

PUSH-N-PRAY

LET THE CUSTOMERS TEST

ONLY BETA

When you release to your customers with a BECAUSE FUCK YOU version. 🤣🤣🤣

As my TL at work wisely pointed out, even if it's one person's fault that this breaking push happened, the whole organization failed by allowing such a mistake to reach production in the first place.

And it's a cybersecurity company, no less.

Every Comp Sci 101 course after today will have a lesson on Crowdstrike.