Forgot your password?
typodupeerror
Upgrades AMD Graphics Hardware

AMD Catalyst Driver To Enable Mantle, Fix Frame Pacing, Support HSA For Kaveri 71

Posted by timothy
from the next-step-is-the-optic-nerve dept.
MojoKid writes "AMD has a new set of drivers coming in a couple of days that are poised to resolve a number of longstanding issues and enable a handful of new features as well, most notably support for Mantle. AMD's new Catalyst 14.1 beta driver is going to be the first publicly available driver from AMD that will support Mantle, AMD's "close to the metal" API that will let developers wring additional performance from GCN-based GPUs. However, the new drivers will also add support for the HSA-related features introduced with the recently released Kaveri APU, and will reportedly fix the frame pacing issues associated with Radeon HD 7000 series CrossFire configurations. A patch for Battlefield 4 is due to arrive soon as well and AMD is claiming performance gains in excess of 40 percent in CPU limited scenarios but smaller gains in GPU-limited conditions, with average gains of 11 — 13 percent over all." First time accepted submitter Spottywot adds some details about the Battlefield 4 improvements, writing that Johan Andersson, one of the Technical Directors in the Frostbite team, says that the best performance gains are observed when a game is bottlenecked by the CPU, "which can be quite common even on high-end machines." "With an AMD A10-7850K 'Kaveri' APU Mantle provides a 14 per cent improvement, on a system with an AMD FX-8350 and Radeon 7970 Mantle provides a 25 per cent boost, while on an Intel Core i7-3970x Extreme system with 2x AMD Radeon R9 290x cards a huge 58 per cent performance increase was observed."
This discussion has been archived. No new comments can be posted.

AMD Catalyst Driver To Enable Mantle, Fix Frame Pacing, Support HSA For Kaveri

Comments Filter:
  • by Billly Gates (198444) on Thursday January 30, 2014 @01:29PM (#46111797) Journal

    MaximumPC paints this a little bit different. [maximumpc.com] Where only lower end cpu's get a big boost in conjecture with higher end AMD cards.

    I guess we will wait and see with benchmarks later today when 14.1 is released.

    This is great news for those like me on older Phenom II 2.6 ghz systems who can afford to upgrade the ram, video card, and to an ssd but not the cpu without a whole damn new system. I use VMWare and this obsolete system has a 6 core cpu and hardware virtualization support. Otherwise I would upgrade but only an icore7 or higher end AMD FX-8350s have the same features for non gaming tasks. I can play Battlefiend 4 on this soon with high settings at 1080p would be great!

    • by 0123456 (636235)

      MaximumPC paints this a little bit different. [maximumpc.com] Where only lower end cpu's get a big boost in conjecture with higher end AMD cards.

      I was wondering how that made any sense, because I've never seen my i7 more than 20% used in any game where I've monitored CPU usage. However, I haven't played the Battlefield games in years.

      • Some games which were really slow on my system like Star Wars the old republic have improved in later patches as they now spread the tasks across all 6 cpu cores.

        However there is lag sometimes when the cpu usage is at only 40%. This is because of synchronization between all the cores waiting on the other to finish something etc. That is one of the drawbacks of parallelization and why Intel's Itanium failed. Great for servers but anything where data needs to be exchanged between the different parts of the pr

        • by Anonymous Coward on Thursday January 30, 2014 @02:03PM (#46112143)

          Some games which were really slow on my system like Star Wars the old republic have improved in later patches as they now spread the tasks across all 6 cpu cores.

          However there is lag sometimes when the cpu usage is at only 40%. This is because of synchronization between all the cores waiting on the other to finish something etc. That is one of the drawbacks of parallelization and why Intel's Itanium failed. Great for servers but anything where data needs to be exchanged between the different parts of the program via threads hits bottlenecks.

          So the icore7 uses crazy mathematical algorithms to execute data before it even arrives to save bandwidth to get insane IPC which is why AMD can't compete. But if you have a heavily threaded app that is latency intensive like a game it can be choppy even with low cpu utilization.

          There is so much wrong with this post.

          First, Itanium didn't fail due to difficulties in paralleling things. Software never ramped up due to low market penetration and the fact that they shoved instruction execution back onto the compiler writers, it had poor perfromance for X86 code, and it was never targetted at anything but big server iron. It was never intended to a consumer level chip. Another big reason Itanium failed was the introduction of AMD 64.

          Secondly the anecdotal latency that you experience in SWToR even though CPU utilization only being 40% is unlikely due to "core waiting on the other to finish something" and I challenge you to present a heap dump illustrating such a block correlated with your metric for latency. If you have not done an in depth analysis you'd have no way to know, but if you did I'd be curious as to your findings.

          Finally, I have no idea why you would think that the i7 (I assume that's what you meant by icore7) "execute[s] data before it arrives." That doesn't even make sense. What you are most like referring to is out-of-order execution, or possibly branch prediction - both features that are also present in AMD chips and earlier Intel chips going back to the Pentium Pro. The better IPC of the i7 certainly has nothing to do with magical future seeing math and more to do with better execution units, OoO executions resources and superior FPU hardware.

          It is true that in general games have no been able to scale to use 100% of your cpu 100% of the time, but it's not for the reason that you have stated and I'm quite doubtful that threading has introduced the type of latency a human would notice in the equation as you describe. There is a latency/throughput trade off, but ti is quite possible to achieve superior frame latencies with multiple cores than with single cores.

          • by 0123456 (636235) on Thursday January 30, 2014 @02:05PM (#46112171)

            It was never intended to a consumer level chip.

            I take it you weren't around at the time? I remember many magazine articles about how Itanium was going to replace x86 everywhere, before it turned out to suck so bad at running x86 code.

            • Re: (Score:2, Informative)

              by Anonymous Coward

              I was around at the time and before. Itanium was touted as eventually replacing X86 years off in the hype phase, but when it was only ever released as as a server chip (with an order of magnitude more transistors and cost than any consumer chip) and any notion that it would ever be for end users was dropped. I'm not sure if the early hype had anything to do with Intel's intentions or simply everyone else's day dreams, but Itanium was designed specifically to compete with RISC on big iron. Intel was not even

      • by dave562 (969951)

        I have an i7-960 (3.2GHz) and the ESO beta was pushing it pretty hard, averaging 30-40%.

      • We are talking about a real time application, so even without 100% load over a relatively large sampling interval, performance can be degraded.

        Let's assume that you have 2 sequential things that cannot be overlapped. CPU setup and GPU processing. You cannot begin CPU setup of next frame until GPU is done with current frame (gross oversimplification, but there are sequencies that bear some resemblence to this).

        So a hypothetical CPU takes 1 ms to setup a frame, and then the hypothetical GPU then takes 4 ms

      • You also have to consider how the game was programmed and compiled. Many games are not able to support multiple cores, or may not support as many cores as you have. If you've got a 6 core CPU and your game is only designed and optimized for 2 cores, your CPU can bottleneck at 33% utilization, and some threads may bottleneck at 17% utilization. Even if the game does support all 6 cores, you still can get threads that hit capacity on a single core and won't be split to multiple cores.

      • MaximumPC paints this a little bit different. [maximumpc.com] Where only lower end cpu's get a big boost in conjecture with higher end AMD cards.

        I was wondering how that made any sense, because I've never seen my i7 more than 20% used in any game where I've monitored CPU usage. However, I haven't played the Battlefield games in years.

        CPUs can bottleneck even at 20% utilisation. The task manager will show 20% average utilisation, but that could mean that it sat at 100% utilisation for 20% of the time, rather than 20% utilisation for 100% of the time (or some mix in between).

      • by Luckyo (1726890)

        Load up WoW and try raiding with reasonable amount of add-ons enabled. Watch your CPU choke like a baby who swallowed a lego brick.

        BF also tended to use a lot of CPU, but WoW is just unrivalled in eating your CPU alive.

        • by Bengie (1121981)
          WoW chokes, but it only uses about 30% of my CPU. My CPU is mostly idle.
          • by Luckyo (1726890)

            To be specific - it chokes on its main thread. Badly. Highly overclocked i5 wipes the floor with i7s because of it.

    • by Baloroth (2370816)

      MaximumPC paints this a little bit different. [maximumpc.com] Where only lower end cpu's get a big boost in conjecture with higher end AMD cards.

      I'm not sure exactly what you're referring to in the link. The only comparisons are low(er) end CPUs with high end cards or high end CPUs with low(er) range cards. You don't get a boost if you've got high-end CPUs and a lower end card, but that should be completely obvious: the point of Mantle is to shift the load to the GPU. If the GPU is already maxed out, you won't see much (or any) gains at all.

    • by edxwelch (600979)

      So, MaximumPC must not consider the i7-3970x Extreme mentioned a high end CPU. Because that gets a 58% boost.

  • by ADRA (37398)

    "which can be quite common even on high-end machine"

    Sure, when the games are coded to use 100% of 1 thread while ignoring (most likely) 3-7 threads just screaming to be utilized, then CPU's are surely a contentious bottleneck.

    • by kllrnohj (2626947)

      Games are still largely gated by a single thread because graphics APIs still don't allow multiple threads to share a context[1]. When you're only allowed to talk to the GPU from a single thread, it's no surprise that thread is going to be CPU bound.

      1. Yes I know you can do limited resource sharing between contexts and such for background uploading of resources like textures, but the actual act of issuing all the drawing commands to produce the frame has to happen from a single thread.

      • It's not just that, it's the locking behaviour needed for correct concurrency. There's no point in multi-threading a lot of cores that are going to spend most of their time waiting on another thread to release a lock. Even Mantle is single threaded. i.e. There's a queue for the GPU, one for Compute and one for DMA, on different threads. There aren't going to be 8 threads all using the GPU at once. It'll still be serialised by the driver, just more efficiently than you can do it with existing D3D or GL
        • by Bengie (1121981)

          Even Mantle is single threaded

          Single threaded per queue and you can create as many as you want per application. During setup, you create all of the queues you want to use, then register them. Once registered, the GPU can directly read from the queue instead of making a system call. This pretty much removes those pesky 30,000 clock cycle system calls. Kaveri goes a step further and registers queues directly with the hardware, because each queue item is exactly one cache line in size and the APU shares the cache-coherency and protected me

          • Yes, that was my point - the GPU has to schedule. It multi-threads at the warp level on the actual metal, but it'll execute one "command" at a time. The benefit here is as you say - lock free queuing of commands and highly optimised state management.
            • by Bengie (1121981)
              AMD GPUs can execute several kernels at a time and each Mantle task is a kernel. GPUs are meant to be high throughput and will keep itself busy as long as you can feed it work fast enough.
    • Two words, "false sharing". Just throwing cores at a problem doesn't necessarily improve the performance. You have to caress your cache lines pretty gently to get real improvements.
  • by Anonymous Coward

    The grand predictions they made for the performance increase now have a _huge_ asterisk (* when CPU limited).

    What serious gamer is CPU bound these days? Most people have giant CPUs and GPUs that struggle with 1080p and especially higher resolutions.

    Now, at first blush this doesn't matter - a 10% improvement is a 10% improvement, great. The problem is that the _cost_ is lock-in to this weirdo AMD Mantle stuff instead of DirectX or OpenGL.

    Posted anonymously because with one or two bad moderations, due to th

    • AMD strategic... (Score:4, Interesting)

      by Junta (36770) on Thursday January 30, 2014 @01:56PM (#46112069)

      So gamers get a small boost to their gaming rigs, but that's not *really* the goal for AMD.

      The real goal is that AMD demonstrably lags Intel in *CPU* performance, but not GPU. OpenGL/Direct3D implementations cause that to matter, meaning AMD's cpu business gets dinged as a valid component in a configuration that will do some gaming. Mantel diminishes the importance of the CPU to most gaming, therefore their weak CPU offering is made workable to sell their APU based systems. It can do so cheaper than Intel+Discrete GPU while still reaping a tidy profit.

      • The real goal is that AMD demonstrably lags Intel in *CPU* performance, but not GPU.

        Well it's a stupid idea to use all that spiffy CPU performance for syscalls anyway, regardless of whether the CPU is an Intel chip or an AMD chip...or not?

        • by Junta (36770)

          It's not to say it's a bad thing to reduce needless CPU activity, I was just explaining how this particularly was a strategic move for AMD. The GP was saying 'it's pointless because gamers buy power CPUs', I was simply pointing out that's not what AMD would currently want, since beefy CPU means Intel at this particular point (and really most points since Nehalem came out).

          I will say it *could* be a bad thing if it leads to Mantle-exclusive games that are unable to support other GPUs at all (Intel, nVidia,

    • by Anonymous Coward on Thursday January 30, 2014 @02:40PM (#46112551)

      I think that misses the point. CPUs aren't the limiting factor in part because game devs limit the number of draw calls they issue to avoid it being a limiting factor (because not everybody has a high end CPU). Mantle may not offer vastly more performance in the short term, but it will enable more in game engines in the long term if the claims DICE and AMD make are accurate. That doesn't get away from the cost of lock in, but like any new release of this sort Mantle may never catch on but it may push DX and GL to change in a mantle-like direction which does then benefit all developers.

    • by mikael (484)

      Weirdo AMD stuff? The descriptors that DirectX and Mantle use are basically the registers that control your GPU. Traditional desktop GL has the large GL state (textures + framebuffer objects + blending + vertex buffers + transform feedback) and which is what the driver maintains and attempts to sanity check with all those GL calls (500+ of them) to make sure nothing inconsistent or fatal is allowed to run on any of the GPU cores. Setting a single parameter can involve cross-referencing dozens of other varia

    • by Bengie (1121981)

      What serious gamer is CPU bound these days?

      Most of my games are thread bound. 80% idle CPU with a 90% idle GPU and getting 20fps.

  • On a related note, Microsoft are working on an update to Direct3D to provide a "light weight" runtime similiar to the XBone. Presumably, this will solve the same draw call issues that Mantle deals with.
    Unfortunately, it doesn't sound like the update will happen anytime soon - maybe for Windows 9?
    Also, it's unclear whether they will back port the update to Windows 7.

    https://blogs.windows.com/wind... [windows.com]

    • by FreonTrip (694097)
      Willing to bet the answer to that is "no." Meaningful features are a cudgel used to bludgeon customers into buying new Windows licenses, after all!
  • by Cruciform (42896)

    I was really disappointed by the comparative performance of the AMD 290x 4GB vs my nVidia 650 Ti Boost 2GB.

    The nVidia let's me run games like Borderlands 2 and Skyrim at max settings on my old Core 2 Duo smoothly, yet the 290X hitches and drags, almost as if it were streaming the gameplay from the hard drive. I expected a card with 2000+ shaders to be faster than that.

    If my processor isn't bottlenecking the 650s performance too badly, at least the 290X should be able to cap out at something reasonable.

    • by rwise2112 (648849)

      I was really disappointed by the comparative performance of the AMD 290x 4GB vs my nVidia 650 Ti Boost 2GB.

      The nVidia let's me run games like Borderlands 2 and Skyrim at max settings on my old Core 2 Duo smoothly, yet the 290X hitches and drags, almost as if it were streaming the gameplay from the hard drive. I expected a card with 2000+ shaders to be faster than that.

      If my processor isn't bottlenecking the 650s performance too badly, at least the 290X should be able to cap out at something reasonable.

      That doesn't make any sense, as I've got a 7850 that runs Skyrim at max settings with no problem, but I do have an i7 processor.

  • Does it increase my mining hash speed? If you are buying AMD cards for gaming at today's prices, you are an idiot.

  • by Sycraft-fu (314770) on Thursday January 30, 2014 @04:38PM (#46113749)

    Well, assuming it takes off which I don't think it will. If this stuff is truly "close to the core" as the Mantle name and marketing hype claim, then it'll only work so long as they stick with the CGN architecture. It won't work with any large architecture changes. So that means that they either have to stick with GCN forever, which would probably cripple their ability to make competitive cards in the future as things change, or they'd have to abandon support for Mantle in newer cards, which wouldn't be that popular with the developers and users that had bought in. I suppose they also could provide some kind of abstraction/emulation layer but that rather defeats the purpose of a "bare metal" kind of API.

    I just can't see this as being a good thing for AMD in the long run, presuming Mantle truly is what they claim. The whole reason for things like DirectX and OpenGL are to abstract the hardware so that you don't have to write a render for each and every kind of card architecture, which does get changed a lot. If Mantle is tightly tied to GCN then that screws all that over.

    So either this is a rather bad desperation move from AMD to try and make up for the fact that their CPUs have been sucking lately, or this is a bunch of marketing BS and really Mantle is a high level API, but just a proprietary one to try and screw over nVidia.

    • by Anonymous Coward

      It is a high level API, and not particularly tied to GCN.

      The major difference to DirectX and OpenGL: application, library, mapped GPU memory and GPU message buffer all live in the same user address space. Meaning zero context switching overhead for a lot of GPU calls.
      On top, there's not all that much error checking.
      They can do that because the GPU has pre-emptive task switching and a MMU with protected memory.
      So at worst a misbehaving application causes the GPU equivalent of a GPF, terminating that applicat

      • If it is just support for preemption and an MMU MS already created an API for that, it is called "DirectX 11.2" or more properly the WDDM 1.3. 11.1 (WDDM 1.2) supported full preemption, 11.2 supports page based virtual memory.

        I dunno, I guess we'll see how performance in real games actually shakes out but if this is nothing more than an API with a couple newish features, features that DX already supports, I'm not really sure that the "giveashit" factor for devs will be very high.

        I also wonder as to how they

    • by higuita (129722)

      Don't forget that all new consoles today are using AMD cards, so the game developers are using mantle for those, they would also can do that to PC with little change

  • AMD, with their decreasing market share, should use open, or at least widespread, standards such as OpenGL or DirectX.
    Remember AMD's alternative to NVidia's CUDA? I think it was called Fire- something or something-stream... that really went well.
    If AMD opens up a new front here, and Mantle has any success at all, NVidia will retaliate by creating their own API, and guess how many people will use Mantle then.

    • by higuita (129722)

      Again, nvidia don't have the console market today, AMD have it, so if game developers will use MANTLE in the consoles, the PC market should follow.
      If the performance is as claimed, game developers will for sure try to use it, specially if its supported in all systems the same (all consoles, windows, mac, linux)

    • decreasing market share? AMD has fully sold out it's entire retail inventory to the point they are talking increasing production to meet demand. AMD numbers are going to be huge.

<< WAIT >>

Working...