Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
IT

HPE Announces World's Largest ARM-based Supercomputer (zdnet.com) 57

The race to exascale speed is getting a little more interesting with the introduction of HPE's Astra -- what will be the world's largest ARM-based supercomputer. From a report: HPE is building Astra for Sandia National Laboratories and the US Department of Energy's National Nuclear Security Administration (NNSA). The NNSA will use the supercomputer to run advanced modeling and simulation workloads for things like national security, energy, science and health care.

HPE is involved in building other ARM-based supercomputing installations, but when Astra is delivered later this year, "it will hands down be the world's largest ARM-based supercomputer ever built," Mike Vildibill, VP of Advanced Technologies Group at HPE, told ZDNet. The HPC system is comprised of 5,184 ARM-based processors -- the Thunder X2 processor, built by Cavium. Each processor has 28 cores and runs at 2 GHz. Astra will deliver over 2.3 theoretical peak petaflops of performance, which should put it well within the top 100 supercomputers ever built -- a milestone for an ARM-based machine, Vildibill said.

This discussion has been archived. No new comments can be posted.

HPE Announces World's Largest ARM-based Supercomputer

Comments Filter:
  • Quantity game? (Score:2, Flamebait)

    by Tablizer ( 95088 )

    Maybe I'm naive, but a typical "supercomputer" these days mostly just connects up bunches of servers (or "servlets") via a central cluster manager or cluster tree. The "size" of the super-computer is then roughly the total number of CPU's (or maybe total instructions per second for the entire shebang).

    Thus, if you want to make a "numeric" world's record, you just get ship-loads of servers and hook them up to the cluster manager tree. It's mostly a quantity pissing match roughly comparable to having the tall

    • There's lots of naivete in the "connect up bunches" part.

      The supercomputer has far higher interconnect bandwidth and better latency than typically networked commercial servers.

      There needs to be high-performance (meaning assembly level drivers in cases) support for the API's used by the heavily multiprocessed workloads. Think about massive partial differential equation solvers with one gridpoint talking to others and updating at every timestep.

      Conventional networked servers and their bad latency: http://www
    • Re:Quantity game? (Score:5, Informative)

      by friedmud ( 512466 ) on Monday June 18, 2018 @08:27PM (#56806510)

      You are naive. That's how you make a really crappy supercomputer.

      This machine will have more than 100,000 cores. At that scale there are many things that must be carefully thought out. Even just _launching_ a job at 100k procs presents challenges (enough so that people who do it well put out press releases about it: http://mvapich.cse.ohio-state.... [ohio-state.edu] ). Beyond choosing the processor (obvious) here are some of the things that must be thought about / balanced:

      1. Power - for machines this large you often have to make special deals with local utility companies to power it efficiently.
      2. Cooling - The heat load will be immense, deciding how to cool it is incredibly important
      3. Interconnect - There are many options here (although fewer than in the past). Choosing e.g. Infiniband vs Ethernet, etc. comes with different tradeoffs and can depend on what your average application will be doing (many short messages vs large messages, etc.)
      4. Switching - How many switches are needed? What topology will you use (fat-tree, hypercube, etc.). It depends somewhat on how much you want to spend on switches and somewhat on what your typical application workload looks like.
      5. RAM - RAM is currently incredibly expensive (thanks to cell-phones using so much of it!). How much RAM, what type, how fast can greatly tip the scales in price / performance
      6. OS - Most of these machines these days run Linux - but there are many different flavors. Things get optimized all the way down to exactly which Kernel version to use - and everything is hand-tuned
      7. Job Scheduler - Several options here from PBS to Slurm and proprietary vendor specific options. How good your job scheduler is can have a HUGE impact in the usability of the machine.
      8. Filesystem - Most of these machines have at least two types of filesystems: "home" and "scratch"... where "home" can be something reliable - maybe even using NFS and "scratch" is typically some highspeed filesystem (Lustre, Panasas, etc.). Choosing the balance between the two is critical. Note that 100,000 processes reading/writing simultaneously can take even carefully crafted filesystems to their knees.
      9. Local disk - for a long time it was in voguge to run a "diskless" system - but now "disks" are making a come back (in the way of NV-RAM). Depending on what your applications look like this can provide huge speedups.

      (I'm sure I missed something - but these are the big ones)

      Anyway: It's not simple. Purchasing for these machines typically takes at least a year just in the phase where you're defining the requirements and then another 6 months or so to put out bids and go through the selection process.

      In case you're wondering - I do work in the national lab system, I use these machines daily and am part of procurement decisions for them...

      • by Tablizer ( 95088 )

        That's how you make a really crappy supercomputer.

        That's my point: setting a record, and being useful/good are not necessarily the same thing.

        Maybe the cheap-and-easy version can perform a very narrow set of calculation types faster than anything and set a record doing those narrow things.

        It's roughly comparable to a USA "muscle car" compared to European sports cars. On a straight road, the relatively cheap muscle car may out-perform the expensive European sports cars in certain categories. But throw in so

        • by Tablizer ( 95088 )

          On review, I worded my original poorly, implying that such shenanigans were the rule instead of the exception. My apologies.

      • by pnutjam ( 523990 )
        Managed some clusters that did engine modelling for Rolls-Royce. Your spot on. When you run dozens (or hundreds) of identical machines, there is alot that goes into it. It's not simple by any means. You could have warmed soup in the back of our clusters, especially when under load.
        File system can make a huge difference, as can job scheduler.
    • The biggest difference between a simple compute cluster and a supercomputer is the speed of the interconnect. A compute cluster might have individually fast nodes, potentially decked out with RAM, but it's not going to be able to access the contents of any other node's memory effectively. So a big problem needs to be partitioned into slices that fit on a node.

      Supercomputers have fast enough interconnects that multiple nodes can act as a single machine image. Nodes can read and write to the shared memory so

      • These machines are still "distributed memory" supercomputers. It's rare to see a true "shared memory" cluster in HPC these days.

        Infiniband works off of a RDMA process (Remote Direct Memory Access) - but you wouldn't consider it to be "shared" memory (and programmers don't typically interact with the RDMA calls directly - most often still using MPI... but MPI then uses RDMA to achieve the transfer).

        That said: you are correct that interconnect is one of the things that makes a supercomputer "super". The pri

    • by mikael ( 484 )

      For a modern supercomputer, the connections between the nodes are just as or even more important than the GPU cores and CPU cores that are on each node, and get a name; the interconnect fabric or fabric computing. These systems are rack mounted with each node on a single motherboard. Racks can be added and removed according to funding. Each node needs to transfer data to any other node within a few nanoseconds as well as load startup data and save checkpoints at fixed intervals. Some computing problems depe

  • I wonder how long until we see a million ARM SpiNNaker? http://apt.cs.manchester.ac.uk... [manchester.ac.uk]
     

  • This is an interesting development in the use of ARM processors in large computing systems. However, how much progress this represents depends on the dollars and watts needs to produce results. News articles frequently mention the 2.3 petaflops number, but the procurement cost of the system and the power needed to achieve the peak petaflops number are hard to find. If this ARM system doesn't present a compelling dollars or watts story, what is the advantage of this system over competing technologies?

    • Well - ease of programming for one thing.

      With the death of Intel Phi... the HPC community really only has GPUs to offer good flops/watt. The problem with that? Not all workloads map well to GPUs and you often can't rewrite millions of dollars of software that doesn't use GPUs.

      ARM offers another alternative: it can run anything an x86 processor can at better flops/watt.

      The rise of ARM in HPC is _definitely_ an interesting development!

  • I would run this super computer in a beowulf cluster
  • For a 5,184 socket system a "Peak" performance of 2.3 Petaflops isn't that revolutionary.

    I'm assuming that when they say "peak" they mean a LinPack "Rpeak" value which is usually (with a few exceptions) *higher* than the "Rmax" value that's actually used to order the systems by performance. There is no contra-indication in the story that these values are Rmax and in fact the story literally says "theoretical peak petaflops" definitely makes me think Rpeak?

    You can see the soon to be outdated list from last

    • The flops/socket is still better than BlueGene procs do - and I suspect the flops/watt will be a LOT better than the Xeon system you pointed out.

      An exascale computer can't simply use 10M Xeons... you would need to build a small nuclear reactor next to it to power it. And while GPUs are useful for generating flops... not all workloads map well to them. These cores are general purpose: they can run anything a Xeon can run... but should use a lot less power.

      • " I suspect the flops/watt will be a LOT better than the Xeon system you pointed out."

        Not according to real-world tests of these chips: https://www.servethehome.com/c... [servethehome.com]

        There's a long-held assumption that anything with the word "ARM" on it must be energy efficient because reasons. Well this isn't a smartphone SoC.

  • Besides the interesting point of this system using ARM over x86-64, it looks like it's all CPU powered. The past few TOP500 rankings have been giant GPU clusters with fast interconnects. The CPUs have provided little of their actual number crunching.

    It's not like GPU heavy super computers are slacking or anything, I just think it's cool seeing a machine get high performance without them. I'm no expert but it seems an all CPU design would be easier to write code for since the problem set doesn't need to be

    • Yep! CPUs are definitely easier to program and sometimes GPUs are exactly wrong for certain workloads.

      BTW: The current #1 (which will surely be supplanted in the soon to be refreshed Top500) is an all "CPU" machine (somewhat close to what Intel Phi was): https://www.top500.org/system/... [top500.org]

      10M actual "CPU" cores. But - they are clocked lower and of quite a bit different architecture from your normal Xeon...

      ARM's rise is definitely interesting because it should give us another option for good flops/watt while

    • The cool part for me is learning there's 28-cores ARM processors out there, which gives me hope that Apple's delays with the MacBook Air and Mac mini means they're close to releasing their first-ever ARM-powered Macs running macOS.

      • Sounds good - except AMD Epyc is 32 cores / socket today... and 64 cores / socket next year.

        Personally: I would love a 128 core, dual-socket AMD Epyc based Mac Pro next year...

        That said: I do still think that an ARM based Mac Pro would be fun... but hopefully we'll see chips with more than 28 cores in them...

        • Sounds good, but have you seen the prices on those AMD Epyc CPUs? And the power requirements?

          I'm think about an Apple ARM processor, based on the A12 or whatever found in the iPad Pro or iPhone X, but with eight or sixteen times as many cores, running macOS and emulating x86 for legacy applications.

  • by dicobalt ( 1536225 ) on Monday June 18, 2018 @09:03PM (#56806670)
    Game over for ARM designs from China. Super computers don't need need kind of risk. Adjust your investment portfolio as needed and lol at the crash that's due.
  • HPe has announced they're laying off their entire US-based ARM engineering team
  • There's no way High Pitch Eric is working on any type of CPU.

An authority is a person who can tell you more about something than you really care to know.

Working...