After AMD released Ryzen, Reviewers and Users alike were really quick to throw around Theories. And this has been going on with no clear answer. Many people blamed the Scheduler, others blamed SMT. Thanks to two unnamed theory crafters and the help of nwgat, we can now get a closer look at the actual cause. Let’s take a look shall we?
Update 2017-03-16: A user on Reddit apparently got a response from AMD confirming that there is indeed only one memory controller on Ryzen (Infinity Fabric). This confirms that there is indeed a bottleneck on the CPU itself.
The Ryzen Problem
All Reviews have shown that AMD Ryzen under-performs when all Cores and SMT are active, but nobody is sure why. There have been a lot of theories, some of which have been addressed by AMD as wrong. Others have been proven true, like that Ryzen performs better with only the first CCX active and SMT disabled.
But recently two theories have really sounded legit: slower MOV instructions and the 2nd CCX has to go through the first CCX for memory access.
Testing The Theories
To test these theories I used a self written tool that shows memory bandwidth. It tests single thread performance and multi threaded performance and (on Windows) sets the proper thread affinity mask. The tool was run with identical settings passed to it and maximum optimisations in the compiler enabled.
|Ryzen R7 1700X
|Ryzen R7 1800X
|MOV Copy||6148.85 mb/s||5373.25 mb/s||5595.71 mb/s|
|Normal Copy||6154.97 mb/s||8158.41 mb/s||7962.22 mb/s|
|2 Threads||6654.01 mb/s||12399.49 mb/s||12207.71 mb/s|
|3 Threads||6716.55 mb/s||13092.77 mb/s||14448.28 mb/s|
|4 Threads||7004.97 mb/s||13433.51 mb/s||14430.77 mb/s|
|5 Threads||6828.04 mb/s||13271.88 mb/s||13769.70 mb/s|
|6 Threads||6962.61 mb/s||13160.45 mb/s||14092.03 mb/s|
|7 Threads||7018.98 mb/s||13044.91 mb/s||14123.68 mb/s|
|8 Threads||7026.20 mb/s||12993.32 mb/s||14200.40 mb/s|
|9 Threads||6990.10 mb/s||12969.62 mb/s||14096.46 mb/s|
|10 Threads||7049.04 mb/s||12956.59 mb/s||14005.07 mb/s|
|11 Threads||6973.88 mb/s||12765.16 mb/s||13917.83 mb/s|
|12 Threads||7012.68 mb/s||12745.24 mb/s||13770.88 mb/s|
|13 Threads||6983.30 mb/s||12495.60 mb/s||13609.24 mb/s|
|14 Threads||7038.75 mb/s||12386.70 mb/s||13612.83 mb/s|
|15 Threads||7086.29 mb/s||12276.65 mb/s||13307.97 mb/s|
|16 Threads||7084.65 mb/s||12563.24 mb/s||13577.06 mb/s|
Leaving the differences in Memory used aside, we can see a few issues with AMD Ryzen memory bandwidth:
1. MOV Copy is significantly slower than Normal Copy on AMD Ryzen (by 35%)
MOV (and all instructions in that set) are often used to move or copy memory. In this case, it is a REP MOVSB that is being used, which is usually the fastest way to copy memory – that is, if the CPU was actually optimised for it. Intel CPUs at one point performed similar so seeing this is not a surprise. It is however a huge performance hit for any games that aren’t aware of what CPU they are running on.
2. Ryzen performance peaks at 4 Threads
Even though the CPU has 8 physical Cores, the maximum bandwidth was at 4 Threads, which indicates that the memory controller on the CPU itself can only handle 4 Cores at the same time – after that it has to balance the necessary work over all Cores. This is a step back from the behaviour observed in Piledriver and Bulldozer, which (after reaching the physical core count) kept about the same memory bandwidth instead of degrading.
3. Windows Scheduler Issues
You didn’t think I would include this here, did you? It turns out that the people are right, the Windows 10 Scheduler does something wrong. I have not included the data for that in the table above, but basically when no thread affinity is set the performance drops back to single thread levels.
The Possible Future
The question now is, can any of these be fixed? For fist and last one, the answer is that it depends on the Operating System vendor. For the second one, we will likely have to wait for Zen2 to improve this performance problem.
All we can do now is wait and find more things.