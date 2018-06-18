HPE Announces World's Largest ARM-based Supercomputer (zdnet.com) 30
The race to exascale speed is getting a little more interesting with the introduction of HPE's Astra -- what will be the world's largest ARM-based supercomputer. From a report: HPE is building Astra for Sandia National Laboratories and the US Department of Energy's National Nuclear Security Administration (NNSA). The NNSA will use the supercomputer to run advanced modeling and simulation workloads for things like national security, energy, science and health care.
HPE is involved in building other ARM-based supercomputing installations, but when Astra is delivered later this year, "it will hands down be the world's largest ARM-based supercomputer ever built," Mike Vildibill, VP of Advanced Technologies Group at HPE, told ZDNet. The HPC system is comprised of 5,184 ARM-based processors -- the Thunder X2 processor, built by Cavium. Each processor has 28 cores and runs at 2 GHz. Astra will deliver over 2.3 theoretical peak petaflops of performance, which should put it well within the top 100 supercomputers ever built -- a milestone for an ARM-based machine, Vildibill said.
Quantity game? (Score:2)
Maybe I'm naive, but a typical "supercomputer" these days mostly just connects up bunches of servers (or "servlets") via a central cluster manager or cluster tree. The "size" of the super-computer is then roughly the total number of CPU's (or maybe total instructions per second for the entire shebang).
Thus, if you want to make a "numeric" world's record, you just get ship-loads of servers and hook them up to the cluster manager tree. It's mostly a quantity pissing match roughly comparable to having the tall
Re: (Score:2)
If they were all working concurrently on the same problem, then yes.
Quantity X (Score:2)
The supercomputer has far higher interconnect bandwidth and better latency than typically networked commercial servers.
There needs to be high-performance (meaning assembly level drivers in cases) support for the API's used by the heavily multiprocessed workloads. Think about massive partial differential equation solvers with one gridpoint talking to others and updating at every timestep.
Conventional networked servers and their bad latency: http://www
Re: (Score:2)
You are naive. That's how you make a really crappy supercomputer.
This machine will have more than 100,000 cores. At that scale there are many things that must be carefully thought out. Even just _launching_ a job at 100k procs presents challenges (enough so that people who do it well put out press releases about it: http://mvapich.cse.ohio-state.... [ohio-state.edu] ). Beyond choosing the processor (obvious) here are some of the things that must be thought about / balanced:
1. Power - for machines this large you often h
Re: (Score:2)
The biggest difference between a simple compute cluster and a supercomputer is the speed of the interconnect. A compute cluster might have individually fast nodes, potentially decked out with RAM, but it's not going to be able to access the contents of any other node's memory effectively. So a big problem needs to be partitioned into slices that fit on a node.
Supercomputers have fast enough interconnects that multiple nodes can act as a single machine image. Nodes can read and write to the shared memory so
But maybe not for long... (Score:1)
I wonder how long until we see a million ARM SpiNNaker? http://apt.cs.manchester.ac.uk... [manchester.ac.uk]
Cost and power? (Score:2)
This is an interesting development in the use of ARM processors in large computing systems. However, how much progress this represents depends on the dollars and watts needs to produce results. News articles frequently mention the 2.3 petaflops number, but the procurement cost of the system and the power needed to achieve the peak petaflops number are hard to find. If this ARM system doesn't present a compelling dollars or watts story, what is the advantage of this system over competing technologies?
Re: (Score:2)
Well - ease of programming for one thing.
With the death of Intel Phi... the HPC community really only has GPUs to offer good flops/watt. The problem with that? Not all workloads map well to GPUs and you often can't rewrite millions of dollars of software that doesn't use GPUs.
ARM offers another alternative: it can run anything an x86 processor can at better flops/watt.
The rise of ARM in HPC is _definitely_ an interesting development!
Beowulf cluster (Score:1)
Only an RPeak of 2.3 Petaflops? (Score:1)
For a 5,184 socket system a "Peak" performance of 2.3 Petaflops isn't that revolutionary.
I'm assuming that when they say "peak" they mean a LinPack "Rpeak" value which is usually (with a few exceptions) *higher* than the "Rmax" value that's actually used to order the systems by performance. There is no contra-indication in the story that these values are Rmax and in fact the story literally says "theoretical peak petaflops" definitely makes me think Rpeak?
You can see the soon to be outdated list from last
Re: (Score:2)
The flops/socket is still better than BlueGene procs do - and I suspect the flops/watt will be a LOT better than the Xeon system you pointed out.
An exascale computer can't simply use 10M Xeons... you would need to build a small nuclear reactor next to it to power it. And while GPUs are useful for generating flops... not all workloads map well to them. These cores are general purpose: they can run anything a Xeon can run... but should use a lot less power.
Holy shit, CPUs! (Score:2)
Besides the interesting point of this system using ARM over x86-64, it looks like it's all CPU powered. The past few TOP500 rankings have been giant GPU clusters with fast interconnects. The CPUs have provided little of their actual number crunching.
It's not like GPU heavy super computers are slacking or anything, I just think it's cool seeing a machine get high performance without them. I'm no expert but it seems an all CPU design would be easier to write code for since the problem set doesn't need to be