The Truth About AMD Ryzens Performance Issues

After AMD released Ryzen, Reviewers and Users alike were really quick to throw around Theories. And this has been going on with no clear answer. Many people blamed the Scheduler, others blamed SMT. Thanks to two unnamed theory crafters and the help of nwgat, we can now get a closer look at the actual cause. Let’s take a look shall we?

Update 2017-03-16: A user on Reddit apparently got a response from AMD confirming that there is indeed only one memory controller on Ryzen (Infinity Fabric). This confirms that there is indeed a bottleneck on the CPU itself.

The Ryzen Problem

All Reviews have shown that AMD Ryzen under-performs when all Cores and SMT are active, but nobody is sure why. There have been a lot of theories, some of which have been addressed by AMD as wrong. Others have been proven true, like that Ryzen performs better with only the first CCX active and SMT disabled.

But recently two theories have really sounded legit: slower MOV instructions and the 2nd CCX has to go through the first CCX for memory access.

Testing The Theories

To test these theories I used a self written tool that shows memory bandwidth. It tests single thread performance and multi threaded performance and (on Windows) sets the proper thread affinity mask. The tool was run with identical settings passed to it and maximum optimisations in the compiler enabled.

Task Intel i5-4690
DDR3 1333Mhz
Ryzen R7 1700X
DDR4 2100Mhz
Ryzen R7 1800X
DDR4 2400Mhz
MOV Copy 6148.85 mb/s 5373.25 mb/s 5595.71 mb/s
Normal Copy 6154.97 mb/s 8158.41 mb/s 7962.22 mb/s
2 Threads 6654.01 mb/s 12399.49 mb/s 12207.71 mb/s
3 Threads 6716.55 mb/s 13092.77 mb/s 14448.28 mb/s
4 Threads 7004.97 mb/s 13433.51 mb/s 14430.77 mb/s
5 Threads 6828.04 mb/s 13271.88 mb/s 13769.70 mb/s
6 Threads 6962.61 mb/s 13160.45 mb/s 14092.03 mb/s
7 Threads 7018.98 mb/s 13044.91 mb/s 14123.68 mb/s
8 Threads 7026.20 mb/s 12993.32 mb/s 14200.40 mb/s
9 Threads 6990.10 mb/s 12969.62 mb/s 14096.46 mb/s
10 Threads 7049.04 mb/s 12956.59 mb/s 14005.07 mb/s
11 Threads 6973.88 mb/s 12765.16 mb/s 13917.83 mb/s
12 Threads 7012.68 mb/s 12745.24 mb/s 13770.88 mb/s
13 Threads 6983.30 mb/s 12495.60 mb/s 13609.24 mb/s
14 Threads 7038.75 mb/s 12386.70 mb/s 13612.83 mb/s
15 Threads 7086.29 mb/s 12276.65 mb/s 13307.97 mb/s
16 Threads 7084.65 mb/s 12563.24 mb/s 13577.06 mb/s

Leaving the differences in Memory used aside, we can see a few issues with AMD Ryzen memory bandwidth:

1. MOV Copy is significantly slower than Normal Copy on AMD Ryzen (by 35%)

MOV (and all instructions in that set) are often used to move or copy memory. In this case, it is a REP MOVSB that is being used, which is usually the fastest way to copy memory – that is, if the CPU was actually optimised for it. Intel CPUs at one point performed similar so seeing this is not a surprise. It is however a huge performance hit for any games that aren’t aware of what CPU they are running on.

2. Ryzen performance peaks at 4 Threads

Even though the CPU has 8 physical Cores, the maximum bandwidth was at 4 Threads, which indicates that the memory controller on the CPU itself can only handle 4 Cores at the same time – after that it has to balance the necessary work over all Cores. This is a step back from the behaviour observed in Piledriver and Bulldozer, which (after reaching the physical core count) kept about the same memory bandwidth instead of degrading.

3. Windows Scheduler Issues

You didn’t think I would include this here, did you? It turns out that the people are right, the Windows 10 Scheduler does something wrong. I have not included the data for that in the table above, but basically when no thread affinity is set the performance drops back to single thread levels.

The Possible Future

The question now is, can any of these be fixed? For fist and last one, the answer is that it depends on the Operating System vendor. For the second one, we will likely have to wait for Zen2 to improve this performance problem.

All we can do now is wait and find more things.

Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.