Follow Slashdot stories on Twitter


Forgot your password?
Intel Upgrades

Intel Flagship Core i7-6950X Broadwell-E To Offer 10-Cores, 20-Threads, 25MB L3 ( 167

MojoKid writes: Intel has made a habit of launching enthusiast versions of previous generation processors after it releases a new architecture. As was the case with Intel's Haswell architecture, high-end Broadwell-E variants are expected and it looks like Intel is readying a doozy. Recently revealed details show four new processors under the new HEDT (High-End Desktop) banner for Broadwell, which is one more SKU than Haswell-E brought to the table. The most intriguing of the new chips is the Core i7-6950X, a monster 10-core CPU with Hyper Threading support. That gives the Core i7-6950X 20 threads to play with, along with a whopping 25MB of L3 cache. The caveat is the CPU's clockspeed — it will run at just 3.0GHz (base), so for applications that aren't properly tuned to take full advantage of large core counts and threads, it could potentially trail behind the Core i7-6700K, a quad-core Skylake processor clocked at 3.4GHz (base) to 4GHz (Turbo).
This discussion has been archived. No new comments can be posted.

Intel Flagship Core i7-6950X Broadwell-E To Offer 10-Cores, 20-Threads, 25MB L3

Comments Filter:
  • Mainstream programming languages are still sequential by default and the likes of OpenCL are too hard to learn for simple tasks. UI code is still single threaded in most systems, and that drags most computation into that thread as well through programmer laziness. It's time for languages which are parallel by default and where ability to parallelize a loop is verifiable at compile time. Yes I know FORTRAN is much closer to that then C/Java, but that's due to being primitive to a degree that will not fly in

    • Julia and Rust have some intriguing parallelisation mechanisms.
      What I would like to write is code that has a dependency graph (or the compiler figures out the dependencies and parallelises by itself).

      In the meantime I simple write code for 1 processor, and run that on many data sets in parallel (using make or doit []).

      • Julia and Rust have some intriguing parallelisation mechanisms. [...] I simple write code for 1 processor, and run that on many data sets in parallel

        I haven't found Julia's parallelism very efficient, but maybe it's just my lack of coding skills. Then again my work [] is rather parallel by nature (independent pixels), so I simply run several processes using shell scripts.

        For example, it would be nice if something like map() were always parallelized as it kind of assumes independent data points, but there are still other considerations like memory management. Julia's pmap() seems to have too much overhead to be of any help, especially when the separate p

    • Java is pretty easy to write multi-threaded code.

      • by iamacat ( 583406 )

        Multithreading requires every instance of concurrent execution to be micromanaged by the programmer, leading to a lot of code which is not parallelized in practice. Potential concurrency should be the default case for, say, all for loops and serialization an explicit paradigm that a programmer is aware of. Coupled with strong compile time checking that can detect safe and unsafe code.

        • Here's an example from Oracle:
          double average = roster .parallelStream() .filter(p -> p.getGender() == Person.Sex.MALE) .mapToInt(Person::getAge) .average() .getAsDouble();
          Lamba's and extensions to the Collections framework have made parallel loops simple.

          You can't have the compiler parallelize loops if the methods you call can be overridden. Forcing every method called to be final, just so you can optimize some loop is a little daft. It's also pointless (read: slower) parallelizing a loop that only runs

          • by iamacat ( 583406 )

            Nothing has to be made final, subclasses just need to obey the contract declared by superclass. This can be accomplished, in the worst case, by making everything synchronized.

            Make default for loop potentially parallel and have compiler complain if it can not prove that by either code inspection or, as a last resort, explicit annotation on the loop or methods that it calls. Then introduce an sfor keyword for when you really have to make things sequential.

            • by Megol ( 3135005 )

              IMHO there should be 3 basic types of loops: FOR, PAR, SEQ. FOR loops can be parallelized if possible by the compiler, PAR loops have parallel semantics and SEQ loops have sequential semantics.

              Then most loops can use the FOR variant but when advantageous the programmer can use PAR or SEQ depending on the situation and/or to help the compiler and improve readability.

            • Slapping synchronized on a method doesn't solve all your multi-threaded problems.

              The simplest example is probably a Hashtable, all methods are synchronized.
              If you want to replace a value in the map, calling get, then set is effectively a read-modify-write, that operation needs to be protected by enclosing it all in a synchronized block.

              You'll also risk dead locks.
              If Object A's synchronized method calls Object B's synchronized method and vise versa, having two threads calling both those methods risks a dead

    • It's not just the programming languages. Most *tasks* of any complexity tend to be highly sequential in nature. There are some rare exceptions, but the notion that a language can just automatically parallelize loops and get some massive speedup is not very feasible, I hate to say. It tends to work best in highly contrived or specialized situations. You have to be running some serious computation in a LOT of loops for that to pay off in any way, or the overhead is simply a non-starter. Moreover, those c

      • by iamacat ( 583406 )

        Really? Every browser window or tab should hang if Javascript in one of them is slow? Loading and decoding an image for one of the icons on screen should prevent the UI from processing touch events? Many of those problems have been solved by important applications ad-hoc, but sane behaviour by default would be great.

        • A web browser is somewhat of a unique case, as each tab is more or less equivalent to a separate application - or at least, it should be. Chrome certainly proved it can effectively be done that way, I think. I agree - a single page should not be able to slow down the entire browser - that's terrible design.

          When I was talking about UI, I was talking more like .NET's WPF or Qt, perhaps, neither of which are thread-safe because of performance concerns. It's critical for the programmer to do the work of hand

    • Basically every Java program I have ever seen is multithreaded. (that only excludes hello world programs etc.)
      No idea why you explicitely mentioned Java and C# ... the first one is a particular bad example.

      • by Bengie ( 1121981 )
        My first non-homework program ever was a threaded in C# 2.0 and was for my first job. Threading is pretty easy. With .Net 4 and even more so with 4.5, they took a lot of the boring parts and let me simply connect my parts together is pre-made legos.

        I can understand how threading can be hard for systems that are very latency sensitive like 3D games, but anything that just needs scaling and throughput, threading is brain dead easy.
  • by SeeManRun ( 1040704 ) on Sunday November 15, 2015 @12:49PM (#50934981)
    I have been seeing this a lot lately for some reason. The i7 6700K runs at 4.0 ghz base clock and turbo's up to 4.2. So it will be quite a performance beatdown by Skylake if clockspeed instead of threads is important.
  • by Anonymous Coward

    C'mon Intel, everyone knows that 400 thread count is the minimum needed for a good night's sleep.

    • Nope, 400 still isn't enough. A lot of us are still waiting for a CPU with over 9000 threads.

    • by KGIII ( 973947 )

      Pfft... 1500 count Egyptian, or I'm going home!

      (No, not really. I don't actually know what the thread count is. One, I'm in a hotel STILL and, two, I don't actually buy my own bedding at home.)

  • AMD's response? (Score:4, Interesting)

    by Khyber ( 864651 ) <> on Sunday November 15, 2015 @01:08PM (#50935043) Homepage Journal

    Assuming Intel doesn't go Xeon-scale in pricing for this CPU (who am I kidding, of course they will) I wonder how AMD plans to respond to this.

    For now, they've got the consoles holding them afloat. And while I am an AMD fan, I see they are rapidly losing out on the desktop space when it comes to performance (despite both companies having rather meager performance gains for the past several years.)

    They'd better figure out what the fuck they're doing, and come up with some competing responses, quickly. Hell, I've got ideas for them, all involving that HBM tech.

    1. Use a modified version of that HBM tech to stack their CPU cores and load it up with tons of cache memory (for their non-APU line.) And don't forget to drop a process node, for fuck's sake.
    2. Use modified HBM tech to create stacked CPU/GPU/RAM/CACHE on the same die (for their APU line.)
    3. Use modified HBM to create stacked single-die CrossFire GPUs that don't consume gobs of power (GPU line.)
    4. Use modified HBM tech to create a true monolithic SOC package that integrates EVERYTHING, thus eliminating the need for motherboards - at that point and time, it just becomes a breakout board with a socket. They could probably do away with the interposer as well if They were clever enough in the design.

    • Re:AMD's response? (Score:5, Informative)

      by Nemyst ( 1383049 ) on Sunday November 15, 2015 @01:37PM (#50935173) Homepage
      HBM only works for stacking memory (hence why it's called High Bandwidth Memory). You can't stack CPU cores because they output waaaaaay too much heat. You can dissipate heat from memory passively, so stacking them and slapping an active cooler can work. Good luck stacking CPU cores in the same way.
      • by Khyber ( 864651 )

        "You can't stack CPU cores because they output waaaaaay too much heat."

        Microfluidic cooling to an IHS. Easy-peasy.

    • by Kjella ( 173770 )

      Assuming Intel doesn't go Xeon-scale in pricing for this CPU (who am I kidding, of course they will) I wonder how AMD plans to respond to this.

      Fighting the battles they can win, or are at least less likely to lose. This is a halo product of a server line of chips and getting Opterons back in the data center takes more time for validation and convincing conservative enterprises than AMD has. Zen will launch to compete with Intel's mainstream dual/quad-core chips, even if it pulls off a miracle I'm guessing it'd take at least a year or two until AMD is back to a full top-to-bottom stack.

    • Re:AMD's response? (Score:5, Informative)

      by gman003 ( 1693318 ) on Sunday November 15, 2015 @02:06PM (#50935295)

      AMD has been developing a new microarchitecture, Zen, which will replace the horribly-designed Bulldozer. It's rumored to be made on a 14nm node, and they re-hired the guy who designed the K10 architecture (aka the last good CPUs AMD made), so I expect it to be reasonably competitive with Intel. I really hope it is, at least.

      Your terminology is completely out of whack ("stacked single-die CrossFire GPU" is a phrase with more contradictions than whitespace characters), but I'll analyze what you were trying to say instead of what you actually said:
      #1: Current chip-stacking tech doesn't allow for all that much bandwidth between chips, especially when going above two layers. CPU cores need a pretty hefty amount of bandwidth to their cache, so that's already problematic. Stacking dies also limits thermal performance - if you stack two dies, you have 2x the heat in 1x the heat-conducting surface area. For low-power stuff, that's fine, but CPU cores get pretty hot. Many high-performance dies are already performance-constrained by how much heat they can conduct to their cooler.
      #2. This is a good idea. Or rather, the good idea is "APU on an interposer using HBM for main memory". You'd need bigger CPU caches - HBM is ridiculously high-latency even by VRAM standards, it will really hurt CPU performance otherwise. And it will limit upgradability - no way to just pop another DIMM of DDR3 in there. But the GPU gains should be worth it.
      #3. Again, thermals will absolutely prevent you from stacking GPU dies. HBM and stacking doesn't do ANYTHING for the power efficiency of the chips you're stacking, so that's two 100W+ dies on top of each other. Not gonna happen. You could stack them side-by-side on an interposer, but at that point why not just fabricate them as one die?
      #4. The cost of an interposer is significantly greater than that of a printed circuit board, and a lot of stuff won't benefit from the greater bandwidth to the CPU - stuff like a USB controller or audio chipset. Stacking the dies is also more expensive than just using a PCB - it's done in phones where space is REALLY constrained, but even the smallest desktops aren't that tight for space yet. So all that's left is putting everything onto one die - which runs into yield problems, because with bigger individual dies, a single defect will wipe out a lot more silicon. AMD actually *is* already doing this with their lowest-end laptop/desktop parts - look at Socket AM1, there's not much on the motherboard besides external connectors and power-delivery circuits. But they're also pretty low-end in performance.

      • by Khyber ( 864651 )

        We have microfluidics for stacking dies and removing heat. We do it on p-n junctions on some of the latest LEDs (which are fucking MASSIVE at nearly 7mm x 7mm on just the die alone, not including any mount, circuitry, etc.) to keep them very cool.

        I don't speak of ideas unless I already know we've got the technology to handle it.

        • 49mm^2 is "massive"? A high-end processor is 500-600mm^2. And even if microfluidics works to remove heat (how do you have a layer with both enough fluid channels to cool, and enough TSVs for communication?), that will increase your cost substantially. I would expect $1K+ for a quad-core CPU under this kind of design.

    • by dbIII ( 701233 )
      Low end desktops and high end cluster computing are keeping AMD going. It's not as if you can put four of these 3GHz 20 thread beasts on a board, and the Xeons that can do that are both slower and cost a fortune compared with four-way AMD CPUs.
      Not very long ago I got a quote for an 80 thread Intel machine (4 Xeons) which turned out to be slightly less than ten times the price for a 64 core AMD machine with the same clock speed, memory capacity, disks etc. The Xeon machine would perform better than a singl
  • Imagine a Beowulf Cluster of these!

    • Imagine a Beowulf Cluster of these!

      No need ... give it a while and people will be saying "64 Cores ought to be enough for anyone." Then again, GPUs passed that count long ago.

  • by SuricouRaven ( 1897204 ) on Sunday November 15, 2015 @01:20PM (#50935093)

    "Hi. We're from Intel, and we'd like to take a look at your multithreading, such as it is."

  • This is all well and good but I have to wonder: is this thing still optimized for single-threaded performance??
  • How much does it really matter anymore? For 99% of the population, any top end computer built in the last 5+ years is so damn fast, it will be fine for the next 10 years. Unless we see the 100x (or whatever) increase with quantum computing, these small incremental improvements are fairly pointless.
  • Why is Intel introducing a new Broadwell processor? Why not Skylake?

    Broadwell was a "Tick". Skylake is the improvement called "Tock".
    • by Fwipp ( 1473271 ) on Sunday November 15, 2015 @02:35PM (#50935399)

      "Intel has made a habit of launching enthusiast versions of previous generations processors after it releases it a new architecture."

    • by Nemyst ( 1383049 )
      Yields. When intel releases a new processor line, yields are still pretty low, especially towards the high end. That's why you have binning and so many different processors - so they can recycle a top-end processor as a mid or high-end processor should parts of it end up subpar (though this is more popular in GPUs these days).

      As the line ages, yields improve and they generally iterate over the design in smaller ways to obtain even better efficiency or iron out issues. It's at that point that it becomes ve
      • Thanks for the explanation.

        "(so Broadwell-E is 6000 like Skylake processors)"

        That, to me, seems like Intel being typically Intel. That creates confusion, instead of communicating clearly.

        A long time ago, I wanted to order some Intel motherboards. I needed the part numbers. It required 2 hours to get the numbers.

        Several years ago, I mentioned an error in the Intel web site to an Intel customer service employee. He said, "Oh, we are re-doing our web site." A year later, I happened to get the same
  • by barc0001 ( 173002 ) on Sunday November 15, 2015 @06:21PM (#50936441)

    This will be nice to pop into a whitebox VMWare ESXi machine. Definitely cheaper than a 2 x 6 core build.

    • by swb ( 14022 )

      If only they would pair it with a desktop board that could take 256 GB RAM.

      I find that I eat all my disk i/o and RAM way before my cpu.

      • Yeah that's part of the problem, but for some of our dev workloads we only use 2GB of RAM per VM but hammer the processor so this is a good niche fit. And Gigabyte's got some workstation boards that go to 64GB but also cost more so it's a trade off - and it's not a sure thing they'll support these chips. Obviously it's not for everyone.

        • There are desktop boards that support a theoretical 512GB or 768GB memory, if you go with registered ddr4.
          Look for the "pro" chipset, C612.
          Needs a Xeon E5-1xxx - the leading 1 says it works only in single CPU mode - which is about the same as an i7 anyway.

  • "it could potentially trail behind the Core i7-6700K, a quad-core Skylake processor clocked at 3.4GHz (base) to 4GHz (Turbo)."

    Not by much.

    If you want to see the true speed of any CPU, look at the memory speed. Internal multipliers make some steps run faster but the overall effect isn't high enough to justify the cost deltas on the higher-clockrate CPUs. In general the sweetspot is 2-4 steps below the top step.

    If you have a proper multitasking operating system it will take as much advantage of extra processo

  • I hear it is used to power the new Gillette razor with 6 blades...

VMS must die!