Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Microsoft IT

To Fix CrowdStrike Blue Screen of Death Simply Reboot 15 Straight Times, Microsoft Says (404media.co) 173

Microsoft has a suggested solution for individual customers affected by what may turn out to be the largest IT outage that has ever happened: Just reboot it a lot. From a report: Customers can delete a specific file called C00000291*.sys, which is seemingly tied to the bug, Microsoft said in a status update published Friday. But in some cases, people can't even get to a spot where they can delete that file. In an update posted Friday morning, Microsoft told users that they should simply reboot Virtual Machines (VMs) experiencing a BSoD over and over again until they can fix the issue.

[...] "We have received reports of successful recovery from some customers attempting multiple Virtual Machine restart operations on affected Virtual Machines," Microsoft told users. "We have received feedback from customers that several reboots (as many as 15 have been reported) may be required, but overall feedback is that reboots are an effective troubleshooting step at this stage."

This discussion has been archived. No new comments can be posted.

To Fix CrowdStrike Blue Screen of Death Simply Reboot 15 Straight Times, Microsoft Says

Comments Filter:
  • Beautiful (Score:3, Funny)

    by boulat ( 216724 ) on Friday July 19, 2024 @10:23AM (#64637821)

    I mean.. this joke writes itself

    • by cayenne8 ( 626475 ) on Friday July 19, 2024 @10:52AM (#64637991) Homepage Journal

      I mean.. this joke writes itself

      The next Microsoft Certification:

      MCRE == Microsoft Certified Reboot Engineer

      The test is a bitch on your fingers physically....constantly hitting that button.

      • The test is a bitch on your fingers physically....constantly hitting that button.

        That's covered in MCRAE. Microsoft Certified Reboot Automation Engineer.

        Rumor has it that it involves one of those drinking bird toys.

      • and of course, the 'test' to earn that certification is only $2495.
    • The public already has a low opinion of us geeks in IT and now they get confirmation that a) all our infrastructure is dangerously complex and dependent on a single point of failure and b) the problem can be fixed with the usual technique!
      • If it makes you feel any better I had to bring my machine in to the office to have it fixed. Rebooting many more than 15 times didn't do it.

        Took about 3 minutes.

    • So funny! Windows users who think rebooting is the answer to all problems was the punchline 20 years... no, 30 years ago... if you're clueless? reboot.
      Thinking is hard.
  • by leonbev ( 111395 ) on Friday July 19, 2024 @10:29AM (#64637849) Journal

    I'm curious how this "solution" works. Does Windows determine on it's own what system file is causing the system to repeatedly BSOD on startup and removes it on it's own? If so, that's kinda clever.

    • by 50000BTU_barbecue ( 588132 ) on Friday July 19, 2024 @10:35AM (#64637879) Journal

      Now if only we could get rid of extra apostrophes on their own!

      • Because that ruins everything to the point that the post is indecipherable to you.

    • by Guyle ( 79593 ) on Friday July 19, 2024 @10:37AM (#64637889)
      I think it's just a matter of the computer downloading the updated files from CrowdStrike before it BSODs again. Then on the next reboot it works. Two of my VMs were affected; one of them recovered after a single reboot. The other one I went in and deleted the files.
    • by znrt ( 2424692 )

      no idea but sounds like a timing issue in the boot sequence, maybe some variable hardware latency that's just right to prevent the bug from manifesting.

      those are some very nasty bugs to track and reproduce, but it will be always faulty design/coding that introduces dependence on such variables.

    • It's probably some kind of automatic snapshot feature. It detects a number of failed boots and decides to roll back. More like a last resort.

      • Addressing the home edition, I've applied updates that gave me BSOD, and the computer self-boots about three times and then says, "There's a problem. Wait until I fix it."

    • by kamitchell ( 1090511 ) on Friday July 19, 2024 @10:52AM (#64637993)

      Reminds me of high school days. A friend got a job maintaining a PDP-8 that controlled CNC machining equipment at a small factory.

      Sometimes the PDP-8 would get wedged. The problem is that turning the power off and on didn't help because the wedged state was preserved in the magnetic core memory.

      So he would rapidly cycle the power until a power glitch invalidated some of the memory, then the computer would crash and could be rebooted.

      • by whoever57 ( 658626 ) on Friday July 19, 2024 @11:13AM (#64638107) Journal

        Yeas ago, I worked on a CAD system that would occasionally hang. The solution was to get up from the seat, run on the spot until you developed a static charge, then touch the chassis of the computer. Usually, this worked.

        • wow! that sounds like that computer was not sufficiently grounded/protected from static charges. Well at least there was a positive use for that error by the design engineer.
      • The PDP-8 was slightly before my time; I started on a PDP-11/23, but we had a PDP-8 as a control computer for a large testing station. You're right, state was preserved on power outage, which was one of the advantage of core memory, which worked against you in cases like that. As I recall (I never did this, just saw it done) one solution was to power off, pull a bank of core memory, and power it back on.

        Fun fact, when we switched to the PDP-11, (shortly before I started) we had trouble convincing the mana

      • A novice was trying to fix a broken Lisp machine by turning the power off and on.

        Knight, seeing what the student was doing, spoke sternly: "You cannot fix a machine by just power-cycling it with no understanding of what is going wrong."

        Knight turned the machine off and on.

        The machine worked.

      • With a PDP-8, you could always toggle in the boot loader, read in the BIN boot loader, and load your program regardless of the state of core.
    • by godrik ( 1287354 ) on Friday July 19, 2024 @10:57AM (#64638027)

      A long time ago, I was working as tech support for an ISP. This was the era of USB DSL modem. Every thing in the installer of the driver and in the configuration of the tcp/ip stack was subject to race conditions.
      There were problems where the solution was reinstall the drivers in a loop until the race conditions resolves in the correct way. I remeber doing that with a customer on the phone 8 times. The guy was losing it after the 4th time. But on the 8th it worked.

      Crappy software is crappy And it is all closed source, so you can't fix it yourself.

      • > Crappy software is crappy And it is all closed source, so you can't fix it yourself.

        This belongs on a plaque.

  • by OrangeTide ( 124937 ) on Friday July 19, 2024 @10:31AM (#64637859) Homepage Journal

    and again. and again. and again ...

  • This is huge (Score:5, Insightful)

    by ArchieBunker ( 132337 ) on Friday July 19, 2024 @10:36AM (#64637881)

    Thousands of cancelled flights and massive financial disruptions.

    Will anyone learn anything?

    Unlikely.

    • Re:This is huge (Score:4, Interesting)

      by 14erCleaner ( 745600 ) <FourteenerCleaner@yahoo.com> on Friday July 19, 2024 @10:41AM (#64637923) Homepage Journal
      The obvious lesson is to NOT allow security updates.
      • by Megane ( 129182 )
        ...or at least not allow automatic updates. It also helps not to install them on every computer you have, at the same moment, without seeing that it works on a few "canary" computers before going wild and pushing it to every point-of-sale register / terminal in your company at the same time.
      • Re:This is huge (Score:5, Interesting)

        by MachineShedFred ( 621896 ) on Friday July 19, 2024 @11:35AM (#64638199) Journal

        Or set up a proper canary test environment that you deploy updates to first, which has been best-practice for at least 10 years now.

      • back when I was an IT guy at a private uni up in Massachusetts, we would first install patches on the test network and see what happened before then allowing the patches on the 'real' network. Yes, we actually had two sets of servers to do this with. ( we would rotate which net was 'real' and which was 'test', whenever the patches/installs were deemed 'good' we would switch over to the patched network and call that one 'real' and the previous 'real' would become 'test')

        I shit you not.

    • Thousands of cancelled flights and massive financial disruptions.

      Will anyone learn anything?

      Unlikely.

      Alternate possibility: We learn something. We decide it is too risky to run cyber security audit software, too risky to run antivirus, to risky to run management software with escalated privileges.

      We simply setup every computer with a local account and a local administrator, relying on the user to keep the system up to date and secure.

      How often do you suppose they will have to cancel flights then?

      I'm not being funny here, I'm legit wondering what you hope we learn here. The problem here stems from the very

  • by organgtool ( 966989 ) on Friday July 19, 2024 @10:36AM (#64637883)
    Tech Support: Did you try restarting it?
    Me: Of course!
    Tech Support: Did you try restarting it 15 times?
    Me: WTF?
    • by chill ( 34294 )

      In all fairness, 50% of the responses will probably be "Of course!" to that question as well.

  • Windows is too stupid to say "well this is clearly the faulting module. Let's disable it," That is not a feature of Windows for various reasons. Please tell there is no secret trigger in Windows where it hits a certain integeter on failed reboot count and goes "fine, I'll delete it," because I've certainly never heard of that.
    • by Junta ( 36770 )

      Could be a race between getting updated and the crash.

      I know that Windows for it's self-updates has a rollback if boot can't succeed. Guessing third party kernel modules don't get that benefit though.

    • Actually you reminded me of recent problems where the performance of Windows goes to shite and the only interesting clue that I've found so far is an "Efficiency" annotation in "System" and some other program. Maybe someone around here has figured it out already? It might involve Firefox or other "enemy" software running in the dragon's den of Microsoft...

      My theory is that Windows has detected some kind of performance problem and the System and the affected program have been triggered to run some kind of re

    • In kernel mode things are not as easy as you may think.
  • For every single machine? Is this real life?

    • Re: (Score:2, Funny)

      by Anonymous Coward

      For every single machine? Is this real life?

      Is this just fantasy?
      Caught in a landslide, no escape from reality...

    • Re:Seriously? (Score:5, Informative)

      by MachineShedFred ( 621896 ) on Friday July 19, 2024 @11:36AM (#64638211) Journal

      What is missing from the summary in a rush to get posted so people can shit on Microsoft: this is recommended for Azure VMs. Not physical hardware.

      • by HiThere ( 15173 )

        Is it only the VMs that were affected? Because that's not what other reports sound like. And if it's not, are they just being hung out to dry?

        • No, anything running Crowdstrike and Windows with an active internet connection was affected. But for non-VMs you can get into safe.mode and delete / move a single file and reboot.

          C:\Windows\System32\Drivers\CrowdStrike\C-00000291*.sys - get that out of there and you're fine.

          • And once you've done that, does the program work properly without it or does it download a new copy when you run it? If it runs OK without it, why not simply delete it preemptively and be done with it?
        • No, only VMs can be fixed like that due to some Azure specific failsafes.

          If you're running an normal version of windows you have alternate options, such as logging in as administrator and deleting a single file. If you can't do that, complain to the person who owns your computer - since it's clearly not under your control.

  • Do we need blood from a virgin and hair from a toad for this to work? Should we wait for full moon before attempting it?

  • I just left it off and ordered an adult computer system.
  • Comment removed based on user account deletion
    • How does it work is a good question.

      It doesn't get you past a security lockout, though, because it's not a feature which lets you into recovery when you're not allowed or whatever.

      The theory that Windows eventually automatically goes to a snapshot is a rational one, I don't know if it's true but it sounds reasonable to me based on what I know and have experienced about/with Windows in the past.

      On the other hand, I've been rebooting for almost an hour now, I don't know how many attempts it's been but it has

      • Comment removed based on user account deletion
        • by HiThere ( 15173 )

          You're right, but notice that a large part of the problem is monoculture. Even BSD will occasionally have a problem. (I think the Morris worm started on a Unix system.) And this one isn't affecting Apple or Linux devices...but there will be one analogous. (Though Linux itself is varied enough to limit the effects.)

          One should always be wary of monocultures. Sometimes they're REALLY necessary (communication is the only one that comes to mind), but ALWAYS be wary of them.

  • Two blue screens per power on.

    The nostalgia is real! I haven't seen a blue screen in a while... I stopped using Windows on my own machines :D

  • Comment removed (Score:4, Interesting)

    by account_deleted ( 4530225 ) on Friday July 19, 2024 @10:49AM (#64637975)
    Comment removed based on user account deletion
    • by HiThere ( 15173 )

      Sorry, but it's easy to design systems with built in race conditions. People have to work to avoid it. If you want to ensure a race condition at boot time, have a variable depend on, e.g., the clock that each process sets separately.

    • Absolutely false. If a system detects a fault in an automation it should not repeat that automation ad-infinitum. In an ideal world you end up with a recovery system, e.g. Azure rolling back the VM to a prior image.

      Having 100% the same behaviour in both good *and* detectable bad scenarios is a recipe for FUBARing your PC.

  • but do not do 20 as then the main power goes down takeing the raptor fences with it.

    • but do not do 20 as then the main power goes down takeing the raptor fences with it.

      Not to mention it will summon the bane of schoolboys everywhere - "Bloody Mary"

  • by Gavino ( 560149 ) on Friday July 19, 2024 @10:53AM (#64638007)
    Microsoft. Society is doomed. (OK maybe Apple or nVIDIA on any given day, but you get the point).

    Windows problems - reboot.
    Linux problems - be root. :)
  • ...this is what we used to do daily. Where is the difference between running Windows and being affected by CrowdStrike BSOD ?
  • People dying because of this would be a good reason to sue Microsoft into oblivion.

    Also, I wonder how the financial institutions affected by this are going to feel.

    • It's not really Microsoft's fault though. This was caused my Crowdstrike. Also, don't these massive companies have a test lab for patches BEFORE they roll out to production? Seems like failures on numerous fronts.

      • one would presume, logically, that they would test out patches and such prior to putting them on 'live' systems... but we all know how much finance bro's don't like to spend money on IT.
        • 100% agree! This seems like a failure to test patches before pushing to production. If these multi-national corporations can't be bothered to protect themselves and want a 3rd party vendor to cause these problems, that's on them as much as the shoddy vendor.

          As you said, it's companies being cheap on IT. Instead of seeing IT as an investment into productivity, they see IT as an expense that needs to be cut.

          This is a form of karma for the world.

          • It's the way they do accounting these days, if you produce something that pays the company, you are loved, if you are a cost to the company, you are hated and management is constantly looking for a way to cut your budget.

            Example, you are a comany that makes widgets. Production is both hated (as it costs money and has workers making stuff) and loved a little as the widgets are sold (Yeah! Sales is loved! Marketing is loved!) But the workers making shit? We gotta find a way to fire them all! I have it, Outs

          • by DarkOx ( 621550 )

            basic CYA - If your responsibility is IT security and the company gets pwnd/ransomeware'd etc because your policies delayed pushing the latest CS definitions - YOU get blamed; a bunch of boxes crash because CS pushes a bad update CS gets blamed.

            Take this mentality in a bunch of mid level managers and roll up to CIOs accross a buch of large Enterprises and here we are.

            Security boils down to people and process - which boils down to culture. What we see here today is our culture.

      • It's not really Microsoft's fault though. This was caused my Crowdstrike. Also, don't these massive companies have a test lab for patches BEFORE they roll out to production? Seems like failures on numerous fronts.

        We are lucky to have Microsoft, the one and only OS that is never at fault for anything. It does take a brave person to stand staunchly behind Microsoft, apparently not mentioning the ridiculous fix. That makes Microsoft look like the most amateurish outfit that ever blessed humanity with their perfection.

        A pity that it is impossible to bake in the protection needed instead of relying on an outside agency, to do something Microsoft isn't up to doing. Or maybe not capable.

        • Hey, don't look at me as an MS defender. I run Linux to avoid them. I'm just saying, this isn't all on Microsoft. They are part of the problem but they are not the entire problem.

          Others have pointed out that Crowdstrike does have a Linux option and not testing a patch before rolling it out to Linux could also cause this issue. That's why test labs should exist before things are pushed to production.

  • Switched on the news this morning and heard about the ClownStrike FUBAR. Switched around between a couple of newscasts. The only station that reported "does not affect Linux" was our (much hated on Slashdot) local right wing station.

  • Someone upthread noted that the error was preserved through reboots - capacitor. There's a simple answer to that, which I learned from an on-site tech call: unplug your computer, *then* hit the start button, until there's no flash of light. You've now discharged the cap, and can plug it back in and boot.

  • I've been using Windows since the 3.0 days in 1990. Back then you had to reboot any time you installed any piece of hardware, or any significant piece of software. Microsoft promised us they would work on reducing the number of reboots needed. But really they were just saving them all up so you can spend them today, on Reboot Friday. Happy rebooting everybody!

  • Verified (Score:5, Informative)

    by chill ( 34294 ) on Friday July 19, 2024 @11:42AM (#64638231) Journal

    My organization has several virtual desktop instances (VDI) running in AWS Workspaces. They got caught in a boot loop and just kept rebooting during the middle of the night, according to logs.

    Around 5:30 a.m., after CrowdStrike started pushing the fix (revert), they started getting lucky and getting the update before crashing. One last reboot and they were stable.

    This also happened with several AWS EC2 instances. A couple of physical machines needed manual intervention, as did any laptop that was impacted. Laptops that were offline or sleeping when this went down weren't impacted, as long as they weren't powered on until after 5:30 a.m., when the fix was being pushed.

    The CrowdStrike agent process starts very early in the boot sequence, and this "fix" is just it phoning home and getting an update before it crashed again.

  • Completely unacceptable for a driver in use by so many 'mission critical' installs. This is why competition is needed, speaking of which is there another software company that these victims could switch to?

    If there was a corporate death penalty this should be it. But in a week or two the Crowdstrike company will be back to normal shareholder operations.
  • Not just that - you have to sacrifice at least one family member, while Chanting slowly:

    Miiiii-crooooo-sofffft! 3000 times, which will immediately summon the bane of our childhoods Bloody Mary.

    Then you will strip naked, and twirl around until you lose your balance.

    All of this recorded on webcam and sent to Microsoft so that the fix will be maniufested by Beelzabub, their security officer.

    Might as well - Microsoft is a joke, and rebooting 15 time just shows how much of a joke they are. I can hardl

  • by roc97007 ( 608802 ) on Friday July 19, 2024 @12:06PM (#64638315) Journal

    I have a hazy memory of one of the first BOFH segments, where the BO tells a lUser to turn his computer on and off fifteen times or so. Over the phone he hears the power supply explode and goes back to his Lemmings game happy in the knowledge of a job well done.

  • If security software is crashing itself when the PC is booted up how secure can the product be? If the design and QA is such that obvious egregious flaws are able make the front page of the NYT why should anyone expect the kernel driver and associated software not to be riddled with large numbers of exploitable bugs?

  • simply torch the tumor

Hackers are just a migratory lifeform with a tropism for computers.

Working...