Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Intel IT Hardware

No Fix For Intel's Crashing 13th and 14th Gen CPUs - Any Damage is Permanent 85

An anonymous reader shares a report: On Monday, it initially seemed like the beginning of the end for Intel's desktop CPU instability woes -- the company confirmed a patch is coming in mid-August that should address the "root cause" of exposure to elevated voltage. But if your 13th or 14th Gen Intel Core processor is already crashing, that patch apparently won't fix it.

Citing unnamed sources, Tom's Hardware reports that any degradation of the processor is irreversible, and an Intel spokesperson did not deny that when we asked. Intel is "confident" the patch will keep it from happening in the first place. But if your defective CPU has been damaged, your best option is to replace it instead of tweaking BIOS settings to try and alleviate the problems.

And, Intel confirms, too-high voltages aren't the only reason some of these chips are failing. Intel spokesperson Thomas Hannaford confirms it's a primary cause, but the company is still investigating. Intel community manager Lex Hoyos also revealed some instability reports can be traced back to an oxidization manufacturing issue that was fixed at an unspecified date last year.
This discussion has been archived. No new comments can be posted.

No Fix For Intel's Crashing 13th and 14th Gen CPUs - Any Damage is Permanent

Comments Filter:
  • Comment removed based on user account deletion
  • by Vlad_the_Inhaler ( 32958 ) on Friday July 26, 2024 @12:42PM (#64657984)

    Hardware damage is permanent - once your processor is cooked, it's cooked. Did anyone really expect anything else?
    The good news is that this is Intel's problem and those affected should presumably be able to expect Intel to replace their processors.

  • The need to replace the CPU free, don't they? Whats the warranty term?

    • Comment removed based on user account deletion
    • and they need to pay for the new part + install cost.

    • Intel says standard warranty is 3 years on most processors [intel.com]. So nearly all consumers are covered; however, enterprise customers often have enhanced support contracts meaning extended warranties and better service. From Level1Techs discussions with some enterprise customers, they have dealing with this problem for a while now and they are not happy. OEMs like Dell may be frustrated with Intel behind the scenes, too.
      • by HiThere ( 15173 )

        Covered in what way? Does this mean a replacement of the exact same model, with the exact same defect?

        • Unfortunately this is the problem enterprise customers have been dealing with. They can get replacement CPUs but the replacements will need to be replaced after a while.
      • It seems that OEMs like Dell are less/not affected by this because they operate their motherboards at sensible rates and often have their own sensors that donâ(TM)t just trust the CPU.

        This is an interplay between the âdefaultâ(TM) overclock in many UEFI nowadays, but from what I can see especially a problem on âoeextreme gamingâ motherboards from the likes of Asus which have some awful settings that edges or exceeds recommended specs to boost performance at the cost of power consump

        • It seems that OEMs like Dell are less/not affected by this because they operate their motherboards at sensible rates and often have their own sensors that donâ(TM)t just trust the CPU.

          1) OEMs have sold these CPUs to consumers who may or may not overclock them in their Alienware line. 2) According to Level1Techs when talking to a data center service provider, [level1techs.com] they were getting 50% failure on server machines that were not overclocked. For Dell enterprise and consumer customers, that may be a problem.

          This is an interplay between the âdefaultâ(TM) overclock in many UEFI nowadays, but from what I can see especially a problem on âoeextreme gamingâ motherboards from the likes of Asus which have some awful settings that edges or exceeds recommended specs to boost performance at the cost of power consumption.

          Again, Level1Techs when talking to a data center service provider found they have zero capability to overclock server motherboards.

          Yes, it is something that Intel will have to fix, but the fact the motherboard/UEFI doesnâ(TM)t have safety settings of its own in many cases to notice the CPU is going out of spec or having the system out of spec as a âoeperformance boosting defaultâ is a problem in and of itself, something I didnâ(TM)t think was possible since I the 486 era when those settings were literally configured by jumpers on the motherboard.

          Again your premise is that all of these problems are just the

  • by xack ( 5304745 ) on Friday July 26, 2024 @12:58PM (#64658024)
    All the gamers, ai trainers, and crypto miners wanting more, higher benchmarks and less nanometers in their chips mean that chips are going to be ran at the most clocked settings by default, causing the inevitable decline. Intel is already rolling out lots of process nodes quite fast to catch up from several years of 14nm, there is going to be more problems. Hopefully the fallout means more people will look at ARM and AMD CPUs.
    • The subset of gamers that are interested in having the fastest possible gear, regardless of how much comical overkill is involved, have largely moved on to graphics cards. You won't wow anybody by overclocking the CPU. What you desperately need is a machine that can run 640x480 resolution at 300 fps.

    • Hopefully the fallout means more people will look at ARM and AMD CPUs.

      or OpenPOWER.

    • Hawk point and Strix point are more efficient than Apple M ARM. X86, whatever else, ARM is such a small portion of silicon these days. I do want to see RISC-V take over though. Closed ecosytems like x86 and ARM are why the industry is so fucked.
    • by Guspaz ( 556486 )

      Is Intel really rolling out the process nodes quickly, though? Their desktop chips are still on the same 10nm (rebranded to "Intel 7") process node they launched six years ago in 2018, albeit a newer generation of it. Only their latest mobile processors are on their 7nm process (rebranded "Intel 4"), and 71% of the silicon in those processors is actually made by TSMC.

      Intel rebranding 10nm to "Intel 7" made sense because it really did have similar density to TSMC's 7nm nodes. But "Intel 4" has lower density

      • Those nanometer specs are all bullshit anyway, nobody is operating gate or metal at 2 or 3 or 7 or 10 nm pitch, the industry is an order of magnitude away from that.

        • by Guspaz ( 556486 )

          Sure, but we can use them to group equivalent process nodes based on transistor density, and by that metric, Intel is still way behind and hasn't been rolling out improvements quickly. They're *scheduled* to roll out improvements quickly, but it hasn't happened yet, and there seem to already be signs that their schedule is slipping.

          • by guruevi ( 827432 )

            No, you can't, the TSMC 7 node is not the same as the Intel 7 node, Intel's 7nm node is 10nm smaller in most pitch measurements than Samsung/TSMC 7nm node. Intel 10nm node is similar to TSMC's 7nm node, TSMC's 3nm node is similar to Intel's 7nm node and since then the whole nm metric have been thrown overboard anyway.

            In all available metrics, Intel is slightly ahead of TSMC when it comes to pitch and density, they all use the same EUV products from the same vendor(s) they do keep track with each other in mo

    • AMD doesn't seem to be having problems keeping up. Nor do Apple nor Qualcomm. Stop making excuses for Intel's blunder.

    • All the gamers, ai trainers, and crypto miners wanting more, higher benchmarks and less nanometers in their chips mean that chips are going to be ran at the most clocked settings by default, causing the inevitable decline. Intel is already rolling out lots of process nodes quite fast to catch up from several years of 14nm, there is going to be more problems. Hopefully the fallout means more people will look at ARM and AMD CPUs.

      Higher benchmarks (and maybe even heat generated / power usage) maybe, but I doubt most of them are familiar with whatever nodes are being used to make the CPUs.

    • Hopefully the fallout means more people will look at ARM and AMD CPUs.

      Intel is essentially a managed monopoly at this point. Expect nothing good or reliable from them. All products will be effected/affected by the corruption. Management is entirely insulated from the real world. There is no fixing it. Good bye Intel.

  • and I expect the lawyers will start sending opt-out postcards in 9-months timeâ¦
  • by Talon0ne ( 10115958 ) on Friday July 26, 2024 @01:00PM (#64658032)

    Totally safe to let motherboard manufacturers overclock their GamerZ MOBOs... I lost an AMD 7950X3D at stock voltage, no overclocking. "3D" technology I don't think is all it's cracked up to be. It has no good way to cool the bottom layers.

    • Re: (Score:3, Informative)

      Except in this case, people have demonstrated these CPUs are failing when they were not overclocked. Level1Techs looked at data from server CPUs whose MBs have no options to overclock. In some cases the failure rate was 50% for environments where reliability was the priority over performance. One data center provider longer sells systems with Intel anymore and recommends their current customers switch over to AMD based systems as soon as possible.
      • by JackAxe ( 689361 ) on Friday July 26, 2024 @02:26PM (#64658294)
        The Server motherboards were not configured for reliability and are just as borked as some of the consumer boards. One example was a w680 on default settings pushing 253 wats to a 35watt CPU.

        Jazy2Cents mentions this towards the end of the video:
        https://www.youtube.com/watch?... [youtube.com]
        • Buildzoid went over the server mobo configs and remarked at how utterly stock they were, but especially using the default 1.1 load line which will overvolt a CPU by default.
        • by tlhIngan ( 30335 ) <slashdot@worf.ERDOSnet minus math_god> on Friday July 26, 2024 @03:58PM (#64658540)

          The problem was the chip was demanding too much power. It's not about overclocking. The CPU says it needs X watts of power - the motherboard and chipset comply with the request by sending it more power (usually by activating more VRMs).

          Modern CPUs when they're lighly loaded request little power, and motherboards often turn off VRMs that aren't needed to supply the load to keep power and cooling demands lower and efficiency up.

          As the chip is required to do more things, the power demand increases, so it requests more power and the motherboard activates more VRMs so they're able to supply the power the chip needs.

          If they chip requests too much power - too much voltage, for example, that will damage it. But this will apply to both servers and desktops as both will blindly give the chip the voltage it says it needs.

          Chances are if your chip wasn't used very heavily it probably was OK as it didn't make heavy demands. Of course, that's kind of the point of buying the fastest chips around, so they were probably subjected to high loads that caused higher voltages to be requested by the chip that damaged them. The ones that survived probably hadn't been loaded down enough to incur damage.

          • Sorry, that doesn't make much sense to me - electrical engineer, analogue mostly. Perhaps you're explaining in dumbed down wording for non specialists... Do modern CPUs request an amount of power (W) or, as things used to be, a voltage (V) on a set of power lines, from which the current flows into the processor as much as the circuitry will consume? The former seems to me to be a bit futuristic, but it's technically doable, I just don't see the point - it implies an integrated management engine with calcula
            • There is some functionality from some CPUs to request more power if it predicts it will need more power. However the power draw is limited to what is available on the hardware and what the chip can handle. A CPU cannot request 1.1 gigawatts of power for example but it can ask for more. Currently the max power draw is 336W on EPS 12V cable and 150W on the ATX cable. However not all CPUs (like budget CPUs) will/can request 486W of total power and some budget MBs do not even have the EPS 12V plug. AMD chips h

              • Thank you for your answer. At the risk of sounding blunt, I'll take that as a "maybe" with regards to the CPU having an engine for current and upcoming power usage. The CPU doesn't actually run on 12V, the motherboard has to provide the right voltage in a stable manner, i.e. without that voltage dropping much if the load becomes very low (due to many transistors switching at the same time and or more switching occurring in a certain timeframe). Typical voltage range is around 1V, since that is what highly i
    • > no good way to cool the bottom layers.

      At least AMD is moving to finally separate power and logic which everybody has been avoiding solving the engineering on for way too long. It's good news on the thermal-management front.

      Sometimes you bite the bullet, sometimes you deal with the consequences of not biting the bullet.

    • The X3D chip failures were mostly due to an AGESA bug. AMD fixed the bug and compensated anyone for chips that burned out due to abnormally-high vSoC voltages.

  • It's always been hardware failure because they messed up the engineering of the hardware.

  • Don't forget that detail.

    Yes, their compliance/certification testing needs additional work. Seems not that hard to look for voltages that are too high, but maybe the dynamic stuff is the issue? Running at the higher valid voltages for too long? Which would be harder to easily check for, but not impossible.

    • by Z00L00K ( 682162 )

      But not necessarily all of them, they have been looking into only limit that offer to the oxidization issue.

    • Watch the last few months of buildzoid rants on this topic. Intel has loose documentation, loose enforcement of default settings, and mobo makers are cowboys when it comes to voltages, current, load lines, etc.
      AMD on the other hand is an iron fist with its mobo partners, sets strict limits for stock behavior, sets upper bounds for PBO presets, etc. Look at how fast the 7800X3D SOC voltage problem was caught, fixed, and deployed.
  • ...to push closer and closer to the limits of physics and engineers' ability to manage complexity, these kinds of problems will continue.
    Anyone who demands the highest reliability should avoid the newest and fastest

    • Nope. As someone who works in the area (I do digital and mixed-signal designs, from 16 - 450nm), EDA tools have never been better at catching EM/IR and other issues. This seems to be a case of poor engineering.
  • by couchslug ( 175151 ) on Friday July 26, 2024 @03:15PM (#64658424)

    Customers should have the option of a replacement or refund for the defective product.

    They're still stuck with motherboards designed for a defective product. This is like buying a vehicle with only one engine option and that engine is defective by design.

    • From what I've been reading about this problem, however, is that you can manually undervolt these processors, assuming they're undamaged, and they'll be fine, is that true? In which case someone gets a replacement, undervolt it, and it'd be fine.

      Also isn't Intel working on a microcode update and/or BIOS update that'll permanently prevent the problem from happening?

      • by HiThere ( 15173 )

        Has Intel made an official announcement of what their policy is? If not, I don't think you can trust the rumors and random leaks.

      • by evanh ( 627108 )

        The patch is suppose to be a fix, yes. But that only applies for undamaged CPUs. Which, since the bug is doing permanent damage, is only a certainty for a brand new CPU.

        • And that only applies to CPUs that don't have the via oxidation problem. Intel won't say which units are affected, nor will they say how many. Those CPUs are bad out of the box!

          According to some of Wendel's sources at a top three OEM, 10-25% (conservatively) of all 13th gen Raptor Lake CPUs they received were so bad that they couldn't be sold to customers. And that's just the 13th gen ones.

      • If you have to undervolt, you may not get the performance stated on the box.

        You may as well have gotten a presumably cheaper chip / system if you are going to get a lesser spec CPU / system.

  • No way I'm buying a used Intel 13th or 14th Gen CPU system. Every seller will say it's a good CPU...

  • I mean, thanks, Gamers Nexus
  • The best Ryzen chips you can buy right now beat the best currently available Intel chips in just about anything you can throw at them and on performance per dollar, Ryzen seems to be beating Intel at most price points (especially when you factor in power consumption and running costs) so why would you buy Intel instead of AMD?

    When I eventually replace my Core i5-9400F there is every chance I will switch to team red.

Never test for an error condition you don't know how to handle. -- Steinbach

Working...