Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Upgrades Hardware

The Economics of Chips With Many Cores 343

meanonymous writes "HPCWire reports that a unique marketing model for 'manycore' processors is being proposed by University of Illinois at Urbana-Champaign researchers. The current economic model has customers purchasing systems containing processors that meet the average or worst-case computation needs of their applications. The researchers contend that the increasing number of cores complicates the matching of performance needs and applications and makes the cost of buying idle computing power increasingly prohibitive. They speculate that the customer will typically require fewer cores than are physically on the chip, but may want to use more of them in certain instances. They suggest that chips be developed in a manner that allows users to pay only for the computing power they need rather than the peak computing power that is physically present. By incorporating small pieces of logic into the processor, the vendor can enable and disable individual cores, and they offer five models that allow dynamic adjustment of the chip's available processing power."
This discussion has been archived. No new comments can be posted.

The Economics of Chips With Many Cores

Comments Filter:
  • How is this new? (Score:4, Informative)

    by lintux ( 125434 ) <slashdot AT wilmer DOT gaast DOT net> on Tuesday January 15, 2008 @05:03AM (#22047860) Homepage
    IIRC this is done in mainframes for *ages* already...
  • Re:Hardware DRM.... (Score:1, Informative)

    by Anonymous Coward on Tuesday January 15, 2008 @05:30AM (#22048028)
    Actually, disabling cylinders has been around and in limited practice for a while now.

    That's one of the driving factors (hahaha) behind electrically controlled valves. (It's much more complicated to do when you have to manipulate the cam shaft to disable the valves).
  • by ozmanjusri ( 601766 ) <aussie_bob@hotmail . c om> on Tuesday January 15, 2008 @05:53AM (#22048144) Journal
    Why not their economic model?

    Because it's dumb.

    In 1999 I paid about AU$600 for a midrange Pentium Pro CPU. In 2008, I bought a midrange Xeon Dual-core for the massively increased price of... AU$600.

    In 2000, I bought a shiny new Intergraph TDZ2000 with two PII 350s for the bargain cost of just $5,000. Now, Apple is prepared to sell me a Mac Pro with two 2.8GHz, quad core Xeons for the stupefying price of $2,799.00.

    Now, explain to me again why it would be in my best economic interest to buy a computer with cores that could be disabled if I don't pay my rent?

  • by RzUpAnmsCwrds ( 262647 ) on Tuesday January 15, 2008 @05:53AM (#22048148)
    Your metaphor on multi-issue CPUs is interesting, but not necessarily valid.

    Instruction scheduling is the biggest fundamental problem facing CPUs today. Even the best pipelined design issues only one instruction per clock, per pipeline (excluding things like macro-op fusion which combine multiple logical instructions into a single internal instruction). So we add more pipelines. But more pipelines can only get us so far - it becomes increasingly more difficult to figure out (schedule) which instructions can be executed on which pipeline at what time.

    There are several potential solutions. One is to use a VLIW architecture where the compiler schedules instructions and packs them into bundles which can be executed in parallel. The problem with VLIW is that many scheduling decisions can only occur at runtime. VLIW is also highly dependent on having excellent compilers. All of these problems (among others) plagued Intel's advanced VLIW (they called it "EPIC") architecture, Itanium.

    Another solution is virtual cores, or HyperThreading. HTT uses instructions from another thread (assuming that one is available) to fill pipeline slots that would otherwise be unused. The problem with HTT is that you still need a substantial amount of decoding logic for the other thread, not to mention a more advanced register system (although modern CPUs already have a very advanced register system, particularly on register-starved architectures like x86) and other associated logic. In addition, if you want to get benefits from pipeline stalls (e.g like on the P4), you need even more logic. This means that HTT isn't particularly beneficial unless you have code that results in a large number of data dependencies or branch mispredicts, or if pipeline stalls are particularly expensive.

    Multicore CPUs have come about for one simple reason: we can't figure out what to do with all of the transistors we have. CPUs have become increasingly complex, yet the fabrication technology keeps marching forward, outpacing the design resources that are available. This has manifested itself in two main ways.

    First, designers started adding larger and larger caches to CPUs (caches are easy to design but take up lots of transistors). But after a point, adding more cache doesn't help. The more cache you have, the slower it operates. So designers added a multi-level cache hierarchy. But this too only goes so far - as you add more cache levels, the performance delta between memory and cache decreases, because there's only a finite level of reference locality in code (data structures like linked lists don't help this). You may be able to get a single function in cache, but it's unlikely that you're going to get the whole data set used by a complex program. The net result is that beyond a certain point, adding more cache doesn't do much.

    What do you do when you can't add more cache? You could add more functional units, but then you're constrained by your front-end logic again, which is a far more difficult problem to solve. You could add more front-end logic, which is what HyperThreading does. But that only helps if your functional units are sitting idle a substantial percentage of the time (as they did on the P4).

    So you look at adding both functional units and more front-end logic. You'll decode many instruction streams and try to schedule them on many pipelines. This is what modern GPUs do, and for them, it works quite well. But most general-purpose code is loaded with data dependencies and branches, which makes it very difficult to schedule more than a very few (say, 4) instructions at a time, regardless of how many pipelines you have. So, now, effectively, you have one thread that is predominantly using 4 pipelines, and one that is predominantly using the other 4.

    Wait, though. If one thread is mostly using one set of pipelines, and one is mostly using the other, we can split the pipelines into two groups. Each will take one thread. This way, our register and cache systems are simpler (because
  • by peas_n_carrots ( 1025360 ) on Tuesday January 15, 2008 @06:02AM (#22048206)
    "..because AMD at one point couldn't get hyperthreading right and had its marketers convince..."

    Quick history lesson. Intel tried pawning off hyperthreading to the market. If you mean that AMD should have done hyperthreading, perhaps you should look at the reviews/benchmarks to see that it reduced performance in many cases. In the future, more software might by able to take advantage of increased thread parallelism, but that future is not now, at least in the x86 world.
  • by SanityInAnarchy ( 655584 ) <ninja@slaphack.com> on Tuesday January 15, 2008 @06:21AM (#22048302) Journal
    I know that on Linux, I cannot immediately tell the difference between an SMP-enabled kernel on a single-core Hyperthreading system, and an SMP-enabled kernel on a dual-core system with no hyperthreading.

    In either case, I'm fairly sure I see at least two items in /proc/cpuinfo, I need an SMP kernel, etc. So if someone (Intel) suddenly decided to make a dual-core hyperthreaded design in which the "teams" actually shared a common pool, would I notice, short of Intel making an announcement?

    As for your assertion, a quick scan of Wikipedia suggests that you're a bit naively wrong here. (But then, I'm the one pretending to know what I'm talking about from a quick scan of wikipedia; I suppose I'm being naive.) Wikipedia makes a distinction between Instruction level parallelism [wikipedia.org] and Thread level parallelism [wikipedia.org], with advantages and disadvantages for each.

    One of the advantages of thread-level parallelism is that it's software deciding what can be parallized and how. This is all the threading, locking, message-passing, and general insanity that you have to deal with when writing code to take advantage of more than one CPU. As I understand it, a pipelining processor essentially has to do this work for you, by watching instructions as they come in, and somehow making sure that if instruction A depends on instruction B, they are not executed together. One way of doing this is to delay the entire chain until instruction A finishes. Another is to reorder the instructions.

    But even if you consider this a solved problem, it requires a bit of hardware to solve. I'm guessing at some point, it's easier to just throw more cores at the problem than to try to make each core a more efficient pipeline, just as it's easier to throw more cores at the problem than it is to try to make each core run faster.

    There's also that user-level interface I talked about above. With multicore and no hyperthreading, the OS knows which core is which, and can distribute tasks appropriately -- idle tasks can take up half of one core, the gzip process (or whatever) can take up ALL of another core. With multicore and hyperthreading, the OS might not know -- it might simply see four cores. And with multicore, hyperthreading, and shared pipelines, it gets worse -- as I understand it, there's no longer any way, at that point, that an OS can specify which CPU a particular thread should be sent to. Threading itself may become irrelevant.

    Well, anyway... What confuses me is that we still haven't adopted languages and practices that naturally scale to multiple cores. I'm not talking about complex threading models that make it easy to deadlock -- I'm talking about message-passing systems like Erlang, or wholly-functional systems like Haskell.

    Hint: Erlang programs can easily be ported from single-core to multi-core to a multi-machine cluster. Haskell programs require extra work at the source code level to be made single-threaded, and can (like Make) use an arbitrary number of threads, specifiable at the commandline. They're not perfect, by far; Haskell's garbage collector is single-threaded, I think. But that's an implementation detail; most programs in C and friends, even Perl/Python/Ruby, will not be written with multiple cores in mind, and, in fact, have single-threaded implementations (or stupid things like the GIL).
  • by TheThiefMaster ( 992038 ) on Tuesday January 15, 2008 @07:36AM (#22048546)
    I've done this kind of thing. nVidia 6800LE with half it's shader processors disabled (had 4 blocks of 4, 2 blocks disabled), which could have half of those (1 block of 4) re-enabled without issue. Athlon XP 2500+ that could have the FSB changed to 200MHz instead of 166 and it would BECOME a Athlon XP 3200+ (name and all).
    And the best one: Two Athlon XP 2400+ cpus that I unlocked with a conductive pen to be Athlon MP 2400+s, and I still use in a dual-cpu board now.

    Generally, unlocked or overclocked pc parts burn out faster than if they'd been left alone (e.g. the 6800LE I mentioned died a horrible death, and now doesn't work at all). However if the chip was DESIGNED to be able to be unlocked, it would be perfectly safe.
  • by Anonymous Coward on Tuesday January 15, 2008 @07:38AM (#22048552)
    IBM's been doing this for years with some of their smaller servers http://www-03.ibm.com/systems/i/hardware/cod/index.html/ [ibm.com]

    The cost in IT labor and lost productivity during the downtime that old methods need to add processing capacity can be a *lot* for servers hosting your important applications but its awfully expensive to pay upfront for enough power to keep up with ordering spikes during the Christmas buying season (for example) if that spikes way beyond your normal needs. Much cheaper to pay for only enough to handle your normal needs and then pay for the extra needed to handle oddball spikes only during the time you need it.

    There's no way Ticketmaster's IT budget would agree to pre-pay for enough computing capacity to not bog down when the Hannah Montana tickets went on sale but if they could pay for just an hour of it ... much easier sale. Or remember the "performance" of Amazon's servers last Christmas when they put up their special sale items? If they could have just paid for a 24-hr keycode to enter the night you can bet the IT guys would have had a much easier time getting that in the budget.

    Or as IBM puts it:
    "Imagine you launch a dynamite new Web application for the holiday season, and it's getting more traffic than you expected. What do you do to avoid disruptions in service? You turn on available inactive processors and memory to handle every hit, then turn the extra capacity off when the application requires less capacity in the new year. You pay only for what you have activated.
    Or say you tell your business analysts they now have access to all the company's business intelligence data. The danger is that, with your current processor configuration, increased demand could slow response times to a crawl. The solution? You activate reserve processing power to meet the new user demands without disrupting current operations."

    The other beauty is that once the computer manufacturer has built in the ability to activate or inactivate processors and memory on the fly those same mechanisms make it natural to shuffle processors and memory between virtualized servers on the machine without restarting them.

    And yes, it runs Linux.

  • Re:erm... (Score:3, Informative)

    by gnasher719 ( 869701 ) on Tuesday January 15, 2008 @07:56AM (#22048634)

    So, Intel is going to charge us less for a processor with 4 cores because we can turn three off most of the time? Or is the power saving supposed to make the cost of the chip less prohibitive?
    First, it seems you are under the impression that this might be Intel's idea. It is not. Second, turning off cores is stupid. If you want to reduce performance of a multi-core chip, you reduce the clock speed as far as possible. Four cores at a quarter of the maximum clock speed use lots less electricity than one core running at full speed.
  • by Anonymous Coward on Tuesday January 15, 2008 @08:56AM (#22048956)
    Having worked at nvidia, there is a reason those extra TPCs were disabled and its not because of a cripple ware model but because of yield. We cannot produce chips that are perfect all the time. So we settle for chips that are perfect a small percentage of the time, mostly perfect an ok percentage of the time, and half working a good percentage of the time. We then make 3 or 4 different series (GS/GT/GTX/GTS/Ultra) with different TPCs in each series, disable the TPCs in each chip that doesn't work or fails to pass QA and then ship them. If you unlock them, you are frying you working card because some of the faults could be things like "Oops, there was a short in the TPC because the transistors cooked too close to each other" or "Oops, the clock passes too close to the +12V in this module -- if it hits 50 Celcius, it could turn into a short". This model helps products from being prohibitively expensive for a fabless company because we are billed on "silicon wafers used" on not on "number of fault free chips produced".
  • by BeanThere ( 28381 ) on Tuesday January 15, 2008 @09:08AM (#22049032)
    ... for CPUs, there are effectively ZERO variable costs to the producer once you've purchased the chip and it's in your hands.

    Dedicated circuitry to create artificial scarcity and control actually adds unnecessary costs.

    This might be useful in very specific scenarios where somebody, say, owns a supercomputer and rents it out, but even there, I'm sure there are far better solutions that don't involve the CPU hardware.

    This is, like you suggest, just a BS wet dream of the manufacturers ... make something once, get money forever.

    Right now we probably have few enough major chip vendors that with a little bit of collusion, if they decided not to compete, they could probably pull something like this on us. This doesn't look likely right now, but it seems possible. Hopefully some other (possibly foreign) company would enter the market if that happened. Competition is healthy for a market.
  • Re:How is this new? (Score:3, Informative)

    by jorenko ( 238937 ) on Tuesday January 15, 2008 @09:37AM (#22049258)
    Except that they do this because manufacturing chips is not an exact science -- some turn out better than others, and these are able to handle higher clock speeds with less chance of failure and less power usage. Thus, the quality of each individual chip determines its clock speed and its price. While the enthusiast will always be able to increase that with no problem 90% of the time, that's quite a different thing from selling a chip that's supposed to be turned up. These would need to be good enough to handle the heavier usage from any user at all, and fully supported all the while.
  • by afidel ( 530433 ) on Tuesday January 15, 2008 @09:51AM (#22049364)
    Bullshit, the biggest cost vendor who licenses per CPU actually licenses per core, Oracle! On Windows it's one license per 2 cores, everywhere else it's .75 per core except Sun T1/T2 where it's .25 per core.
  • Re:Hardware DRM.... (Score:4, Informative)

    by LWATCDR ( 28044 ) on Tuesday January 15, 2008 @10:55AM (#22050038) Homepage Journal
    Okay even turning off cores that you don't isn't dump. A lot of the time a pc really is just doing nops waiting for you to do something. Powering down cores when they are not needed will save power just like turning off cylinders in an engine...
    What is dumb is this pay me to turn on more cores idea.
    It really goes counter to the idea of of OWNING or BUYING a pc. If I BUY the computer I OWN the computer. I shouldn't have to pay you to unlock some part of the that computer.
    Yes it is going back to the days of the Mainframe and frankly I don't think that is a good idea.
  • by MBGMorden ( 803437 ) on Tuesday January 15, 2008 @11:28AM (#22050442)
    Your statement is *USUALLY* right, but not universally right.

    IBM for example when licensing some stuff (namely Lotus Domino): they go by performance units.

    A single core x86 CPU would be 100 units per core. Dual-core CPU's would be 50 units per core (notice that they work out to the same). Quad-cores however are also 50 units per core, so while a Single and Dual Core chip cost the same, Quad Cores end up costing twice as much in license fees.

    They even have some architectures where it changes to different values (I think one of them, maybe SPARC chips, were 100 units per core for Single Core and 70-80 units per core for Dual Cores - making a Dual Core more expensive, but not twice as expensive.).

    Aside from them most other vendors I've dealt with who do license by the CPU do it by actual processor and not cores. As a result when we are never worried about buying a quad-core box for a server even if it's a little more than the particular situation calls for, whereas generally I won't buy multi-CPU boxes unless I think I will actually need them. I'm sure there are exceptions though as there are to everything.

    One are where this gets interesting: virtual machines. Our trend lately has been to buy a huge hulk of a server and then have CentOS + VMWare Server split it up into numerous virtual machines to do smaller jobs. The hiccup there is that for most software licenses, you have to go by the underlying hardware. So if I have a quad-box that is hosting say 6 installations of MS SQL Server, then I'd have to pay the quad license for each one, even though the power of the underlying quad chip system is being split up to probably less than a single CPU system would offer if it weren't virtualized.
  • by Dare nMc ( 468959 ) on Tuesday January 15, 2008 @12:08PM (#22050992)
    All old rumors. This has come up at many Conferences, although I have never heard from nvida, but Intel, AMD have stated that is just false.
      A) the volume on each line is too high to be shifting silicon between lines.
      B) it just takes too much logic during processing, if A but not B + C... for so few chips that would have a flaw that allowed them to work, but not fully.
      C) flaws in silicon almost never affect just one chip, let alone just one section of one chip. (multi core is still a single chip.)
      D) QC finds most flaws at the wafer level, before ever entering the container, and it is assumed more is affected, they are never touched.
    Now this is old, was true 10 years ago, when silicon qualitys weren't as good as now (better silicon yield makes the economics even worse today)

    Their is usually more than just the CPU difference between lines, for example you need more cooling and better power source for more/faster CPU's.
    I don't know about chips, but Cat does on their Diesel engines. The warranty cost will be higher for a higher powered engine, no matter what it was designed for, so that is part of the cost equation. As well as stepped up Power will compete with the next higher line. So to ease the gap in price from one platform to the next...
  • by SharpFang ( 651121 ) on Tuesday January 15, 2008 @01:01PM (#22051782) Homepage Journal

    Assuming using JUST your logic, every slower processor or chip is one that has failed to be higher processor or chip across the same line of products. We all know that is not the case. There are also market demands that must be met. I have no idea about failure rates but I highly doubt only failed chips make lower tier products. What percentage of what does each company or product line use? No one here has any idea.

    The availablity dictates price, price regulates demand.

    Take 'LE' and 'GT' releases of NVidia cards. Their difference? GT have all the shader units fully operational, LE have half of them disabled.

    Formerly, 'LE' versions of NVidia cards were a major part of the market, and the luxury 'GT' versions with 2 times as many shader units were at least twice as expensive. Nowadays 'LE' are just slightly cheaper from 'GT' and you need a sucker or desperate to buy the 'LE' version because the price gain is very low comparing to the performance penalty. Reason? NVidia improved the manufacturing efficiency, making way fewer faulty units. Supply for GT increased, supply for LE decreased. So we push some 'LE' sector customers into the 'GT' sector, by increasing price of 'LE' and decreasing of 'GT'. The manufacturing costs are the same (it costs exactly the same to produce a 100% working chip as a faulty one...) and people are encouraged to buy the higher-end device due to its lower price, and if they are strictly the 'old LE' market, meaning definitely cheaper product and not willing to pay for either the 'new cheaper' GT nor the 'new more expensive' LE, they will just buy a card from another line, a GT of an older model for example.

    Prices of CPU don't increase linearly with speed. The curve of $/MIPS may seem puzzling, but in fact it's the line of yield of the manufacturing in given class.

    Of course the market has a very heavy momentum and the price changes don't happen day-to-day. So the temporary differences between supply and demand get filled by units from higher class that have parts of functionality artificially disabled. That's the overclocker's heaven - you just need to 'unlock' the chip and you have a genuine 'higher version'. But that's a matter of pure luck (or insider info or following the news closely) because the chance the part will be 'crippled to lower the price' are worse than that it was faulty in the first place.
  • by Anonymous Coward on Tuesday January 15, 2008 @02:50PM (#22054278)
    Perhaps you should consider:

    a) the production advantage Intel ($31.5 billion revenue) has over a *fabless* Nvidia ($3.77 Billion).

    b+c) the difference between a 128/256/384/etc. shader unit gpu (massive amount of very simple processors), and a 1/2/4 core cpu (several extremely complex processors)

    d) distribution model for intel (single supplier) vs nvidia/ati (dozens of suppliers)

    Intel and amd ship similar if not the same cooler across most of their line, and most of each processor line will nicely slot into a certain tdp envelope (35W, 65W, etc.)

    Nvidia and ati/amd follow a different power consumption model due to differences a-d I outlined above.

    In short, the guy that works for nvidia has it right after all...

The key elements in human thinking are not numbers but labels of fuzzy sets. -- L. Zadeh

Working...