The Economics of Chips With Many Cores 343
meanonymous writes "HPCWire reports that a unique marketing model for 'manycore' processors is being proposed by University of Illinois at Urbana-Champaign researchers. The current economic model has customers purchasing systems containing processors that meet the average or worst-case computation needs of their applications. The researchers contend that the increasing number of cores complicates the matching of performance needs and applications and makes the cost of buying idle computing power increasingly prohibitive. They speculate that the customer will typically require fewer cores than are physically on the chip, but may want to use more of them in certain instances. They suggest that chips be developed in a manner that allows users to pay only for the computing power they need rather than the peak computing power that is physically present. By incorporating small pieces of logic into the processor, the vendor can enable and disable individual cores, and they offer five models that allow dynamic adjustment of the chip's available processing power."
How is this new? (Score:4, Informative)
Re:Hardware DRM.... (Score:1, Informative)
That's one of the driving factors (hahaha) behind electrically controlled valves. (It's much more complicated to do when you have to manipulate the cam shaft to disable the valves).
Re:How is this [business model] new? (Score:5, Informative)
Because it's dumb.
In 1999 I paid about AU$600 for a midrange Pentium Pro CPU. In 2008, I bought a midrange Xeon Dual-core for the massively increased price of... AU$600.
In 2000, I bought a shiny new Intergraph TDZ2000 with two PII 350s for the bargain cost of just $5,000. Now, Apple is prepared to sell me a Mac Pro with two 2.8GHz, quad core Xeons for the stupefying price of $2,799.00.
Now, explain to me again why it would be in my best economic interest to buy a computer with cores that could be disabled if I don't pay my rent?
Re:You know what I don't get? (Score:5, Informative)
Instruction scheduling is the biggest fundamental problem facing CPUs today. Even the best pipelined design issues only one instruction per clock, per pipeline (excluding things like macro-op fusion which combine multiple logical instructions into a single internal instruction). So we add more pipelines. But more pipelines can only get us so far - it becomes increasingly more difficult to figure out (schedule) which instructions can be executed on which pipeline at what time.
There are several potential solutions. One is to use a VLIW architecture where the compiler schedules instructions and packs them into bundles which can be executed in parallel. The problem with VLIW is that many scheduling decisions can only occur at runtime. VLIW is also highly dependent on having excellent compilers. All of these problems (among others) plagued Intel's advanced VLIW (they called it "EPIC") architecture, Itanium.
Another solution is virtual cores, or HyperThreading. HTT uses instructions from another thread (assuming that one is available) to fill pipeline slots that would otherwise be unused. The problem with HTT is that you still need a substantial amount of decoding logic for the other thread, not to mention a more advanced register system (although modern CPUs already have a very advanced register system, particularly on register-starved architectures like x86) and other associated logic. In addition, if you want to get benefits from pipeline stalls (e.g like on the P4), you need even more logic. This means that HTT isn't particularly beneficial unless you have code that results in a large number of data dependencies or branch mispredicts, or if pipeline stalls are particularly expensive.
Multicore CPUs have come about for one simple reason: we can't figure out what to do with all of the transistors we have. CPUs have become increasingly complex, yet the fabrication technology keeps marching forward, outpacing the design resources that are available. This has manifested itself in two main ways.
First, designers started adding larger and larger caches to CPUs (caches are easy to design but take up lots of transistors). But after a point, adding more cache doesn't help. The more cache you have, the slower it operates. So designers added a multi-level cache hierarchy. But this too only goes so far - as you add more cache levels, the performance delta between memory and cache decreases, because there's only a finite level of reference locality in code (data structures like linked lists don't help this). You may be able to get a single function in cache, but it's unlikely that you're going to get the whole data set used by a complex program. The net result is that beyond a certain point, adding more cache doesn't do much.
What do you do when you can't add more cache? You could add more functional units, but then you're constrained by your front-end logic again, which is a far more difficult problem to solve. You could add more front-end logic, which is what HyperThreading does. But that only helps if your functional units are sitting idle a substantial percentage of the time (as they did on the P4).
So you look at adding both functional units and more front-end logic. You'll decode many instruction streams and try to schedule them on many pipelines. This is what modern GPUs do, and for them, it works quite well. But most general-purpose code is loaded with data dependencies and branches, which makes it very difficult to schedule more than a very few (say, 4) instructions at a time, regardless of how many pipelines you have. So, now, effectively, you have one thread that is predominantly using 4 pipelines, and one that is predominantly using the other 4.
Wait, though. If one thread is mostly using one set of pipelines, and one is mostly using the other, we can split the pipelines into two groups. Each will take one thread. This way, our register and cache systems are simpler (because
Re:You know what I don't get? (Score:2, Informative)
Quick history lesson. Intel tried pawning off hyperthreading to the market. If you mean that AMD should have done hyperthreading, perhaps you should look at the reviews/benchmarks to see that it reduced performance in many cases. In the future, more software might by able to take advantage of increased thread parallelism, but that future is not now, at least in the x86 world.
Would we know the difference? (Score:5, Informative)
In either case, I'm fairly sure I see at least two items in
As for your assertion, a quick scan of Wikipedia suggests that you're a bit naively wrong here. (But then, I'm the one pretending to know what I'm talking about from a quick scan of wikipedia; I suppose I'm being naive.) Wikipedia makes a distinction between Instruction level parallelism [wikipedia.org] and Thread level parallelism [wikipedia.org], with advantages and disadvantages for each.
One of the advantages of thread-level parallelism is that it's software deciding what can be parallized and how. This is all the threading, locking, message-passing, and general insanity that you have to deal with when writing code to take advantage of more than one CPU. As I understand it, a pipelining processor essentially has to do this work for you, by watching instructions as they come in, and somehow making sure that if instruction A depends on instruction B, they are not executed together. One way of doing this is to delay the entire chain until instruction A finishes. Another is to reorder the instructions.
But even if you consider this a solved problem, it requires a bit of hardware to solve. I'm guessing at some point, it's easier to just throw more cores at the problem than to try to make each core a more efficient pipeline, just as it's easier to throw more cores at the problem than it is to try to make each core run faster.
There's also that user-level interface I talked about above. With multicore and no hyperthreading, the OS knows which core is which, and can distribute tasks appropriately -- idle tasks can take up half of one core, the gzip process (or whatever) can take up ALL of another core. With multicore and hyperthreading, the OS might not know -- it might simply see four cores. And with multicore, hyperthreading, and shared pipelines, it gets worse -- as I understand it, there's no longer any way, at that point, that an OS can specify which CPU a particular thread should be sent to. Threading itself may become irrelevant.
Well, anyway... What confuses me is that we still haven't adopted languages and practices that naturally scale to multiple cores. I'm not talking about complex threading models that make it easy to deadlock -- I'm talking about message-passing systems like Erlang, or wholly-functional systems like Haskell.
Hint: Erlang programs can easily be ported from single-core to multi-core to a multi-machine cluster. Haskell programs require extra work at the source code level to be made single-threaded, and can (like Make) use an arbitrary number of threads, specifiable at the commandline. They're not perfect, by far; Haskell's garbage collector is single-threaded, I think. But that's an implementation detail; most programs in C and friends, even Perl/Python/Ruby, will not be written with multiple cores in mind, and, in fact, have single-threaded implementations (or stupid things like the GIL).
Re:How is this [business model] new? (Score:4, Informative)
And the best one: Two Athlon XP 2400+ cpus that I unlocked with a conductive pen to be Athlon MP 2400+s, and I still use in a dual-cpu board now.
Generally, unlocked or overclocked pc parts burn out faster than if they'd been left alone (e.g. the 6800LE I mentioned died a horrible death, and now doesn't work at all). However if the chip was DESIGNED to be able to be unlocked, it would be perfectly safe.
they buy this because it saves money (Score:2, Informative)
The cost in IT labor and lost productivity during the downtime that old methods need to add processing capacity can be a *lot* for servers hosting your important applications but its awfully expensive to pay upfront for enough power to keep up with ordering spikes during the Christmas buying season (for example) if that spikes way beyond your normal needs. Much cheaper to pay for only enough to handle your normal needs and then pay for the extra needed to handle oddball spikes only during the time you need it.
There's no way Ticketmaster's IT budget would agree to pre-pay for enough computing capacity to not bog down when the Hannah Montana tickets went on sale but if they could pay for just an hour of it
Or as IBM puts it:
"Imagine you launch a dynamite new Web application for the holiday season, and it's getting more traffic than you expected. What do you do to avoid disruptions in service? You turn on available inactive processors and memory to handle every hit, then turn the extra capacity off when the application requires less capacity in the new year. You pay only for what you have activated.
Or say you tell your business analysts they now have access to all the company's business intelligence data. The danger is that, with your current processor configuration, increased demand could slow response times to a crawl. The solution? You activate reserve processing power to meet the new user demands without disrupting current operations."
The other beauty is that once the computer manufacturer has built in the ability to activate or inactivate processors and memory on the fly those same mechanisms make it natural to shuffle processors and memory between virtualized servers on the machine without restarting them.
And yes, it runs Linux.
Re:erm... (Score:3, Informative)
Re:How is this [business model] new? (Score:5, Informative)
The rent model is flawed because ... (Score:4, Informative)
Dedicated circuitry to create artificial scarcity and control actually adds unnecessary costs.
This might be useful in very specific scenarios where somebody, say, owns a supercomputer and rents it out, but even there, I'm sure there are far better solutions that don't involve the CPU hardware.
This is, like you suggest, just a BS wet dream of the manufacturers
Right now we probably have few enough major chip vendors that with a little bit of collusion, if they decided not to compete, they could probably pull something like this on us. This doesn't look likely right now, but it seems possible. Hopefully some other (possibly foreign) company would enter the market if that happened. Competition is healthy for a market.
Re:How is this new? (Score:3, Informative)
Re:S/W licensed per processor (Score:3, Informative)
Re:Hardware DRM.... (Score:4, Informative)
What is dumb is this pay me to turn on more cores idea.
It really goes counter to the idea of of OWNING or BUYING a pc. If I BUY the computer I OWN the computer. I shouldn't have to pay you to unlock some part of the that computer.
Yes it is going back to the days of the Mainframe and frankly I don't think that is a good idea.
Re:S/W licensed per processor (Score:3, Informative)
IBM for example when licensing some stuff (namely Lotus Domino): they go by performance units.
A single core x86 CPU would be 100 units per core. Dual-core CPU's would be 50 units per core (notice that they work out to the same). Quad-cores however are also 50 units per core, so while a Single and Dual Core chip cost the same, Quad Cores end up costing twice as much in license fees.
They even have some architectures where it changes to different values (I think one of them, maybe SPARC chips, were 100 units per core for Single Core and 70-80 units per core for Dual Cores - making a Dual Core more expensive, but not twice as expensive.).
Aside from them most other vendors I've dealt with who do license by the CPU do it by actual processor and not cores. As a result when we are never worried about buying a quad-core box for a server even if it's a little more than the particular situation calls for, whereas generally I won't buy multi-CPU boxes unless I think I will actually need them. I'm sure there are exceptions though as there are to everything.
One are where this gets interesting: virtual machines. Our trend lately has been to buy a huge hulk of a server and then have CentOS + VMWare Server split it up into numerous virtual machines to do smaller jobs. The hiccup there is that for most software licenses, you have to go by the underlying hardware. So if I have a quad-box that is hosting say 6 installations of MS SQL Server, then I'd have to pay the quad license for each one, even though the power of the underlying quad chip system is being split up to probably less than a single CPU system would offer if it weren't virtualized.
Re:How is this [business model] new? (Score:2, Informative)
A) the volume on each line is too high to be shifting silicon between lines.
B) it just takes too much logic during processing, if A but not B + C... for so few chips that would have a flaw that allowed them to work, but not fully.
C) flaws in silicon almost never affect just one chip, let alone just one section of one chip. (multi core is still a single chip.)
D) QC finds most flaws at the wafer level, before ever entering the container, and it is assumed more is affected, they are never touched.
Now this is old, was true 10 years ago, when silicon qualitys weren't as good as now (better silicon yield makes the economics even worse today)
Their is usually more than just the CPU difference between lines, for example you need more cooling and better power source for more/faster CPU's.
I don't know about chips, but Cat does on their Diesel engines. The warranty cost will be higher for a higher powered engine, no matter what it was designed for, so that is part of the cost equation. As well as stepped up Power will compete with the next higher line. So to ease the gap in price from one platform to the next...
Re:How is this [business model] new? (Score:3, Informative)
Assuming using JUST your logic, every slower processor or chip is one that has failed to be higher processor or chip across the same line of products. We all know that is not the case. There are also market demands that must be met. I have no idea about failure rates but I highly doubt only failed chips make lower tier products. What percentage of what does each company or product line use? No one here has any idea.
The availablity dictates price, price regulates demand.
Take 'LE' and 'GT' releases of NVidia cards. Their difference? GT have all the shader units fully operational, LE have half of them disabled.
Formerly, 'LE' versions of NVidia cards were a major part of the market, and the luxury 'GT' versions with 2 times as many shader units were at least twice as expensive. Nowadays 'LE' are just slightly cheaper from 'GT' and you need a sucker or desperate to buy the 'LE' version because the price gain is very low comparing to the performance penalty. Reason? NVidia improved the manufacturing efficiency, making way fewer faulty units. Supply for GT increased, supply for LE decreased. So we push some 'LE' sector customers into the 'GT' sector, by increasing price of 'LE' and decreasing of 'GT'. The manufacturing costs are the same (it costs exactly the same to produce a 100% working chip as a faulty one...) and people are encouraged to buy the higher-end device due to its lower price, and if they are strictly the 'old LE' market, meaning definitely cheaper product and not willing to pay for either the 'new cheaper' GT nor the 'new more expensive' LE, they will just buy a card from another line, a GT of an older model for example.
Prices of CPU don't increase linearly with speed. The curve of $/MIPS may seem puzzling, but in fact it's the line of yield of the manufacturing in given class.
Of course the market has a very heavy momentum and the price changes don't happen day-to-day. So the temporary differences between supply and demand get filled by units from higher class that have parts of functionality artificially disabled. That's the overclocker's heaven - you just need to 'unlock' the chip and you have a genuine 'higher version'. But that's a matter of pure luck (or insider info or following the news closely) because the chance the part will be 'crippled to lower the price' are worse than that it was faulty in the first place.
Re:How is this [business model] new? (Score:1, Informative)
a) the production advantage Intel ($31.5 billion revenue) has over a *fabless* Nvidia ($3.77 Billion).
b+c) the difference between a 128/256/384/etc. shader unit gpu (massive amount of very simple processors), and a 1/2/4 core cpu (several extremely complex processors)
d) distribution model for intel (single supplier) vs nvidia/ati (dozens of suppliers)
Intel and amd ship similar if not the same cooler across most of their line, and most of each processor line will nicely slot into a certain tdp envelope (35W, 65W, etc.)
Nvidia and ati/amd follow a different power consumption model due to differences a-d I outlined above.
In short, the guy that works for nvidia has it right after all...