cageymaru
Fully [H]
- Joined
- Apr 10, 2003
- Messages
- 21,712
Wendell at Level1Techs has done extensive testing of AMD Ryzen Threadripper 2990WX systems in an attempt to find out what is causing performance regressions under Windows. When AMD Ryzen Threadripper 2990WX processors first arrived on the scene, the performance regressions observed were blamed on the choice to use NUMA and only using only 4 memory channels. But then more testing showed that when running apps that are native to Windows and Linux, performance regressions would show up during Windows testing, but the same system would be extremely fast under Linux. If memory bandwidth was the issue, then the performance regressions should have appeared in Linux also.
Even testing an AMD EPYC 7551 32 core/64 thread monster revealed the same performance regressions issue and it has 8 memory channels. After conferring with other hardware testers, Wendell finally believes that he has found the issue that is causing the performance regressions; the Windows kernel! Wendell and another brilliant tech enthusiast named Jeremy at Bitsum collaborated to create a utility called CorePrio to fix issues with the Windows kernel that caused it to possibly only use one NUMA node. They say it gave them double the performance in their testing with Indigio. The article is a deep dive into the technical aspects of the problem and the solution that is a highly recommended read!
This is most likely related to a bugfix from Microsoft for 1 or 2 socket Extreme Core Count (XCC) Xeons wherein a physical Xeon CPU has two numa nodes. In the past (with Xeon V4 and maybe V3), one of these NUMA nodes has no access to I/O devices (but does have access to memory through the ring bus). If that's true, then that work-around to make sure this type of process stays on the "ideal CPU" in the same socket has no idea what to do when there is more than one other NUMA node in the same package to "fail over" to.
Even testing an AMD EPYC 7551 32 core/64 thread monster revealed the same performance regressions issue and it has 8 memory channels. After conferring with other hardware testers, Wendell finally believes that he has found the issue that is causing the performance regressions; the Windows kernel! Wendell and another brilliant tech enthusiast named Jeremy at Bitsum collaborated to create a utility called CorePrio to fix issues with the Windows kernel that caused it to possibly only use one NUMA node. They say it gave them double the performance in their testing with Indigio. The article is a deep dive into the technical aspects of the problem and the solution that is a highly recommended read!
This is most likely related to a bugfix from Microsoft for 1 or 2 socket Extreme Core Count (XCC) Xeons wherein a physical Xeon CPU has two numa nodes. In the past (with Xeon V4 and maybe V3), one of these NUMA nodes has no access to I/O devices (but does have access to memory through the ring bus). If that's true, then that work-around to make sure this type of process stays on the "ideal CPU" in the same socket has no idea what to do when there is more than one other NUMA node in the same package to "fail over" to.
Last edited: