Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Bug Hardware

Nvidia 55nm Parts Are Bad Too 372

JagsLive sends in a story (in somewhat inflammatory prose) from The Inquirer, which links to many others; they have been following developments in the alleged NVidia quality "fiasco" for some time. "Hot on the heels of its denials that anything is wrong with the G92 and G94s comes another PCN [Product Change Notification] that shows the G92s and G92b are being changed for no reason. Yup, the problems that are plaguing G84 and G86 are the same that affect seemingly all 65nm and now 55nm NVidia parts ... It is hard to overstate how bad this is. Basically every 65nm and 55nm NVidia part appears to be defective ... We are hearing of early failure rates in the teens percent for 8800GTs and far higher for 9600GTs ... To make matters worse, NVidia has a mound of unsold defective parts that they are going to bleed out into the channel along side of the (hopefully) fixed parts. As a buyer, you have no way of knowing which one you are getting ... Until NVidia comes fully clean on this fiasco, lists all the defective parts, and orders boxes clearly marked, you can't say anything other than just avoid them. Then again, since doing the right thing would likely bankrupt them, we wouldn't hold your breath for it to happen."
This discussion has been archived. No new comments can be posted.

Nvidia 55nm Parts Are Bad Too

Comments Filter:
  • Charlie Demerjian (Score:5, Informative)

    by Qhartb ( 1311541 ) on Friday August 29, 2008 @12:40PM (#24796709)
    I stopped reading when I got to "By Charlie Demerjian."

    Seriously, this guy is to NVIDIA as Jack Thompson is to video games. It's just not as common knowledge that you shouldn't take him seriously.
  • by Curien ( 267780 ) on Friday August 29, 2008 @12:50PM (#24796915)

    They say failure rates are "in the teens percent". Figure 20%, just for kicks. That means your chances of either card failing is 1 - (1 - .2)(1 - .2) = 36%.

    For some reason that I don't understand, the vast majority of people have innate misconceptions of the rules of probability.

  • Re:8600GT? (Score:4, Informative)

    by Hatta ( 162192 ) on Friday August 29, 2008 @12:51PM (#24796933) Journal

    I've got one too and was wondering exactly the same thing. From what I can tell the 8600GT has an 80nm [techpowerup.com] process size, so it should be safe. Which is good, I really like this card.

  • by Cheeko ( 165493 ) on Friday August 29, 2008 @12:53PM (#24796961) Homepage Journal

    Well for starters they said its a failure rate in the teens. The odds even with 2 cards that 1 would fail is still less likely than not.

    Also the 8800 cards have been out for a while. The impression I get is that this is a newer issue with the cards, so initial 8800 cards might not be an issue.

  • by XanC ( 644172 ) on Friday August 29, 2008 @12:58PM (#24797035)

    I would say it's because lead-based solder actually works properly, but according to this story that doesn't seem likely to be their motivation.

  • by AcidPenguin9873 ( 911493 ) on Friday August 29, 2008 @12:58PM (#24797047)
    The person who submitted this story to Slashdot left out an important link [theinquirer.net] on that text from the original Inquirer article (linked again here [theinquirer.net] for your convenience). In the original story, that sentence reads:

    Then again, since doing the right thing would likely bankrupt them [theinquirer.net], we wouldn't hold your breath for it to happen.

    At that link, you'll find The Inquirer's (however flimsy and speculative) financial analysis of a full-scale Nvidia recall of the bad parts.

    The Inquirer doesn't and has never claimed to be a fair and balanced news source, so they are free to put these sorts of quips on their stories. People there are pretty knowledgeable, and appear to have connections and sources in the industry, which is why people keep reading The Inquirer and don't really complain about stuff like that.

  • rohs has exceptions for very fine pitch stuff iirc.

  • by mlwmohawk ( 801821 ) on Friday August 29, 2008 @01:27PM (#24797509)

    The Intel 486SX was a defective 486DX who's numeric processor was dead or not working.

    Most very very large scale integrated chips have defects. Depending on the nature of the defect, they simply categorize the part differently.

    A chip is not fast enough for a high speed gaming system? Us it in an embedded device.

    Buy it, if it fails, return it. Just because nVidia has issues you know about, don't think for an instant that ATI doesn't.

  • by bugfreezer ( 1088369 ) on Friday August 29, 2008 @01:40PM (#24797733)
    I'm not at all sure your criticism is based on the correct quotation source; cf: http://en.wikiquote.org/wiki/Friedrich_Nietzsche#The_Gay_Science_.281882.29 [wikiquote.org] Now back to nvidia....
  • by SomeJoel ( 1061138 ) on Friday August 29, 2008 @01:47PM (#24797837)

    Yourself included, since card failure is an independent event. The chance of any card failing is - tadaa, 20%. Just like if I have 3 dice, the chance of rolling a number is 1/6. If I roll it again, the chance is still 1/6. It will always be 1/6.

    I'm not sure what you are talking about with this unrelated dice example, but the GP is correct. The chance of neither failing is .8 * .8 = .64. The chance of at least one failing is therefore 1 - .64 = .36. 36%, as the guy said. Where did you go to school again? For your dice example, here is a more analogous one: If I roll a six-sided die 3 times, what are the chances it will come up "6" at least once? Chance of it not being six at all = 5/6 * 5/6 * 5/6 = 125/216 (~.58). The chance of it coming up 6 at least once is ~(1 - .58) or roughly 42%.

  • Re:Lead free (Score:2, Informative)

    by sexconker ( 1179573 ) on Friday August 29, 2008 @01:49PM (#24797863)

    They have less lead, but they still have lead.

  • by djrogers ( 153854 ) on Friday August 29, 2008 @01:49PM (#24797865)
    Raw eggs in the US run about 1:20,000-1:40,000 chance of salmonella, and a healthy adult is capable of fighting off the amount of salmonella in the average tainted egg.

    WRT to beef though, salmonella poisoning by beef is almost completely unheard of - chicken yes, beef no. Where this whack job got his numbers from is anyone's guess but they are wrong.

  • by kesuki ( 321456 ) on Friday August 29, 2008 @02:14PM (#24798231) Journal

    actually, wall street hasn't yet factored in the possibility of 20% of nvidia's high to mid end chips being totally reject chips yet.

    so a betting man would watch the stock closely for the next few weeks, then when it bottoms buy massive quantities of stock.

    this is the kind of a massive chip recall scenario which makes nvidia a likely buyout target by say Intel (everyone likes buying a company at a fraction of the value of the company, which is why M$ worked so hard to try and take over yahoo)

    for those saying it's only the chip that is the problem, it's very expensive to remove and replace a chip, because normally the chips are all factory produced on a robotic assembly line, and they're only designed to put the chips on, not take them off. you need people to remove chips, making a massive recall a very expensive option. then there are those who will want their chips recalled, even if the chip was working fine for months, and might be a random lucky working chip.

  • A wolf! A wolf! (Score:4, Informative)

    by ozbird ( 127571 ) on Friday August 29, 2008 @02:35PM (#24798507)
    Charlie at The Inquirer has no credibility when it comes to nVidia.

    From TFA, nVidia is changing from high lead to eutectic (tin) solder - for RoHS compliance - and has issues a PCN to that effect. Charlie has latched onto this as "proof" of his claim that all nVidia chips are faulty and overheat.

    What Charlie doesn't explain is how switching from high-lead solder (5/95 Sn/Pb) to eutectic solder (63/37 Sn/Pb) - which has the lowest melting point of all tin-lead solders - is supposed to help if the chips are overheating. Nor does he explain how changing the solder material has any relationship to changing the underfill material on some mobile chips (other than they were both PCNs.) But hey, why let facts get in the way of a conspiracy theory/page hits?
  • Re:Pizza (Score:5, Informative)

    by Dogtanian ( 588974 ) on Friday August 29, 2008 @03:00PM (#24798847) Homepage
    I remember reading something not entirely dissimilar in Robert X Cringely's "Triumph of the Nerds". Might or might not be apocryphal; I don't have the book to hand. Apparently Intel (IIRC) were having problems. The amount of defective parts they were getting was going through the roof, and they were pulling their hair out trying to get to the root of the problem.

    Finally they traced it down to the guy responsible for receiving the deliveries of the silicon wafers. Apparently he was taking out the wafers and putting them down in his desk- quite dusty and very definitely *not* up to clean room standards!- to make sure Intel was getting what they'd paid for.
  • by Giant Electronic Bra ( 1229876 ) on Friday August 29, 2008 @03:07PM (#24798953)

    Contrary to your belief that 'these kinds of problems are subtle and might be missed during a decent period of testing' it can be EXTREMELY difficult to find these kinds of problems. Beyond your wildest imaginings difficult.

    Having worked on high performance hardware/software systems as an engineer I can tell you from first hand experience that the situation is more like there are 999,999,998 ways for things to go wrong and about 2 ways you can get it right, and those 2 ways are not AT ALL obvious. Usually the types of problems you encounter HAVE no obvious cause and no obvious solution and mostly can't be reliably replicated. They can stem from the very most subtle differences between two boards or systems. A cap that happens to be a bit out of spec and a slightly less than perfect solder joint can combine to create an error that happens 1 out of every 100 billion times an operation is performed.

    Now, combine that with the fact that you have a dozen vendors slightly varying implementations of a given board design, PCs of all different types and quality levels running at different speeds with different CPUs in them, running a plethora of different versions and subversions of OS and drivers and applications, and the real miracle is you can make a board that works reliably at all.

    Any attempt to make a really seriously bullet proof product that would virtually never have problems is simply infeasible. There is a law of diminishing returns involved. At a certain point you have to say "Well, we've tested it in 10 dozen different systems under 6 different OS versions with 128 different apps, and we get N number of crashes/malfunctions per hour of runtime." and then you call it a day. You could spend 10x more time and money on QA and reduce the failures to N/2, but you also won't sell much product when multiply your NRE by a factor of 10...

    Plus such perfection will be for naught because MS will release BrokenOS patch "friday the 13th" 2 days later and you'll STILL be encountering the higher error rates. Same goes for new motherboards, games, etc. It is just a loosing proposition.

    All you can realistically do is what they do now, test the heck out of it as best you can afford to, ship it out the door, and try to address any issues that come up later as quickly and painlessly as you can.

    This is the kind of reason why military and aerospace grade hardware costs 2000x more than electronics with similar functionality with civilian retail/commercial specs. They REALLY do have to be certain things work exactly right or people die, and it is WAY expensive.

  • by Tycho ( 11893 ) on Friday August 29, 2008 @04:15PM (#24800157)

    Yes there is an exception in RoHS for lead solder that has a high melting point. However, the official RoHS rule is that while lead solders in general are prohibited, there is an exception allowing for the use of lead solder that contains at least 90% lead. The idea being that solder with at least 90% lead melted at a higher temperature and was at least somewhat safer if disposed of improperly. Otherwise, potentially there may also have been no replacements for high lead content solders that performed as well when the first RoHS directives were drawn up in 2003. Currently (2008), however, there are lead-free solders that would work, but the lead free solders are more expensive than lead based solders (by roughly three times). Using a lead-free solder with a significantly different composition may also require a new packaging design and another extensive round of qualification, too. I am not totally sure how this would be done.

    It get worse, the new solder to be used by nVidia mentioned in this Inq article states that it will only contain 63% lead and 37% tin, making nVidia based cards with this solder not saleable to consumers in the EU according to RoHS directives. The replacement 63Pb/37Sn solder has a somewhat better tensile strength and a lower coefficient of thermal expansion than the older 95Pb/5Sn solder, which may be why nVidia chose this route to fix the problem. Whether nV will be selling very many products in the EU with this fix and whether this will correct the problems, is another issue.

  • by sexconker ( 1179573 ) on Friday August 29, 2008 @04:36PM (#24800617)

    Uh, perhaps you missed the entire debacle about the bad laptop parts, the hush hush from OEMs (deleting comments on forums about the issue, DELL being the exception), and the fact that nvidia is paying half of the cost to replace said parts (cost of parts AND cost of customer care - this is UNHEARD of).

    Nvidia's official line is that a small batch of parts expect slightly higher than normal failure rates, but it's because of the OEM designs.

    The small batch part was proven bullshit when the news first broke. The slightly higher than normal failure rate was also debunked. Blaming the OEM designs is also bullshit, because all the OEM designs fit within nvidia's own guidelines.

    We're also seeing the same problem with desktop parts.

    Nvidia knows there is a problem, and there has been NOTHING in the way of a recall, or notifications to customers. ALL that customers get is a bios update that makes the fans run all the time, and a line (filtered through OEMs) saying "We know there is a problem, this update fixes it, if it breaks, we'll replace it as per warranty." All this does is delay the failure (hopefully past warranty) and pass the cost onto the customers.

    The fact is all the suspect parts (and there is an ever-increasing list) WILL fail when running anywhere near the higher end of nvidia's guidelines for thermal and electrical constraints. It's just a matter of WHEN these parts will fail.

  • Re:A wolf! A wolf! (Score:3, Informative)

    by Jay L ( 74152 ) * <jay+slash&jay,fm> on Friday August 29, 2008 @04:50PM (#24800943) Homepage

    Good.. I thought it was just me. And I'm definitely NOT a hardware guy. But I can't see, from his description of the PCN, how switching from high-lead to tin solder could be seen as a response to, well, anything except "let's use less lead".

    I know that 63/37 has a lower melting point than 60/40, and a "sharper" one (no pasty phase), which is why I use it for audio repairs and cabling; I'm a klutz, and anything that makes my solder joints more stable is good. But I can't imagine that this matters as much on SMT, where your components are fixed in place.

    That said... a quick Google shows that there are all sorts of considerations in what solder to use for PCB solder bumps: not just temperature, but the metals involved in the leads, and the PCB traces, and a bunch of other stuff that involves knowing more about electronics and metallurgy than my "the batteries go this way" brain can handle. So there may well be some stability advantages to eutectic solder for NVidia's solder bumps.

    Anyone here actually know this stuff? I've got an 8800GT in my Mac Pro, which definitely runs hotter than your average PC...

  • by rahvin112 ( 446269 ) on Friday August 29, 2008 @05:49PM (#24802143)

    I own two notebooks with Nvidia Chipsets in them. Both HP notebooks, one contains an 8400M the other an 8800m GTS. The 8400M notebook's cover broke at the hinge conection (a problem that was in no way related to circuit boards) last week and was sent back, just got it back today and checked on the repair slip was a note that they replaced the outside cover but they had also replaced the video circuit board. Surprise!

    Just last week the Laptop with the 8800GTS started blue screening windows with a video subsystem problem before the login prompt. Ubuntu booted without error but would freeze every 30 seconds for 15 seconds or so if you moved the cursor on the screen. HP concluded the graphics system was malfunctioning and off to repair it went. I'll know in a couple weeks what was replaced but I bet the 8800GTS gets replaced.

    This is a BIG deal people. Charlie is being a sensationalist but it's a BIG deal if HP extends the warranty on every laptop with the chips in them for an additional year. HP wouldn't do that unless they feared loss of customers or a class action lawsuit because the warranty extension costs them serious dollars. And I would also bet HP isn't going to eat every dollar. Nvidia will share the cost at a minimum. Even 10% bad parts could cost Nvidia hundreds of millions.

    Charlie might go overboard in his complaints about Nvidia but he's right about this issue, it's really really big and Nvidia will eventually talk about it because of stories like this. Without Charlie's stories Nvidia would probably try to bury the issue and pretend it wasn't happening and if I was invested in NVDA I would want to know this information because it's a harbinger of a profit warning by NVDA.

  • by capnkr ( 1153623 ) on Saturday August 30, 2008 @12:19AM (#24806309)
    TFA and /. summary are possibly grossly unfair. There *are* two sides to every story, and apparently the article author has a chip on his shoulder for Nvidia, no pun intended. (Personally, I don't have a dog in this fight, but in the interest of fairness...) Check out the comments, like this one which would seem to be from someone at Nvidia:

    Answer this... As you know Charlie has a history of severe bias against NVIDIA. Our July announcement of the problem with notebook GPU failures (link [nvidia.com]) has given him lots to rant about. This new story is the latest in a series of articles in which he continues to stretch the truth in order to spread FUD. In it he asserts he paints the notebook chip failures as if it were a widespread epidemic affecting every single NVIDIA GPU in existence including desktop. Here is a list of BS and the truth.

    Myth 1 - NVIDIA has denied responsibility for the failures and is blaming suppliers and partners.
    In our announcements accept responsibility for the failures. We DO call out the material failure but we also acknowledge that our suppliers and notebook designs because this is true and we need to disclose this in our official statements to the SEC. We would not go on record with the SEC making such bold claims if they weren't true. See our Form 8-K statement below.

    Myth 2 - There is an "official story" that the problems were limited a batch of a few bad parts for HP.
    We have never issued a stated this. See our public statements below.

    Where is source for that?

    Myth 3 - NVIDIA is forcing a fix on notebook makers

    The idea that a supplier like NVIDIA can dictate a fix to the world's largest PC makers is preposterous.

    The truth is the notebook makers determining their own course of action and we are supporting them.

    Where is source for that?

    Myth 4 - NVIDIA is trying to cuts our financial liability.
    We put aside $200M to help partners solve this problem for consumers. As far as we know NVIDIA is the first and only chip maker to help fund the cost for repairs.

    Myth 5 - This affects desktop chips, G92, G94, etc.
    We have only seen this problem on notebooks. We just reiterated this during an official financial call. Once again we would not say this if it wasn't true. Note we have not disclosed the specific GPUs but we have stated this impact previous generation GPUs and that current gen GPUs are not in production.

    Fact Charlie has an obvious bias against NVIDIA and he has no sources to back up his claims. Out of all of the hundreds upon hundreds of notebooks models designed with NVIDIA chips in the last few years, only a small number of these have experienced the problem. Within this small number of models, only a small percentage actually experiences the chip failure. It is highly unlike a notebook user will experience the problem. And we have never seen this problem on desktop.

    Other Useful Information

    "Separately, NVIDIA plans to take a one-time charge from $150 million to $200 million against cost of revenue for the second quarter to cover anticipated warranty, repair, return, replacement and other costs and expenses, arising from a weak die/packaging material set in certain versions of its previous generation GPU and MCP products used in notebook systems. Certain notebook configurations with GPUs and MCPs manufactured with a certain die/packaging material set are failing in the field at higher than normal rates. To date, abnormal failure rates with systems other than certain notebook systems have not been seen. NVIDIA has initiated discussions with its supply chain regarding this material set issue and the Company will also seek to access insurance coverage for this matter."
    posted by : Derek, 29 August 2008


    So, whichever way it breaks, I do hope that what *is* the truth WRT this issue gets out...

New York... when civilization falls apart, remember, we were way ahead of you. - David Letterman

Working...