defaultluser
[H]F Junkie
- Joined
- Jan 14, 2006
- Messages
- 14,398
I do like seeing Intel begin to get their ass kicked. After EPYC was just marginal, this is the first sign of real competition from anyone 
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature currently requires accessing the site using the built-in Safari browser.
we have an Arm chip that can go toe-to-toe with Intel and AMD and come out ahead in some cases. Best of all, the list price of the 32 core top-bin CN9980 part is $1795 about half of the competitive Intel and AMD chips.
I'm now a little more interested in how well they fix things in ThunderX 2. Too bad we'll have to wait two years for benchmarks.
Well, two years ago, some of us were at RWT predicting that Intel and AMD would have lots of trouble with 16/14nm ARM servers. Then only 28nm AMD Opteron Seattle and the 40nm XGene-1 were available and lots of geniuses at RWT said us that it wasn't happening: "ARM cannot scale up", "maybe for microservers, but no one will beat Xeon",...
They turned to be wrong, very wrong.
So Intel can tune up STREAM Triad a bit better than Cavium can on the Xeons
Cavium is quoting internal benchmarks it has run but not yet submitted to SPEC against Intel results that have been submitted, which is not exactly kosher but we have to get the data we can get
As for floating point math, the custom Armv8 cores in the Vulcan chips have a pair of 128-bit NEON math units, and the Xeon SP Gold chips have a 512-bit AVX-512 unit
On the SPEC floating point test, the ThunderX2 can beat the Intel chips using GCC compilers, but Intel pulls ahead on its own iron using its own compilers by about 26.5 percent over the ThunderX2 using GCC compilers. The important thing is that Cavium is working with Arm Holdings, which now owns software tools maker Allinea, to create optimized compilers that goose the performance of integer and floating point jobs by around 15 percent, which will put ThunderX2 ahead on integer performance (for these parts anyway) and close the gap considerably on floating point (with about a 10 percent gap still to the advantage of Intel).
More benches, this time at Anandtech. More praise for the platform, but they're withholding the power consumption figures until they have a shipping system to test on (power management broke).
https://www.anandtech.com/show/12694/assessing-cavium-thunderx2-arm-server-reality
I like the threaded SPEC performance analysis. Those extra threads can really help, depending on the load.
Just wish the fuckers had put the results on the same page, instead of splitting them up for ad views.
Qualcomm NOT leaving ARM servers
https://www.reuters.com/article/us-...center-business-chip-chief-says-idUSKBN1J902Z
Well yes, I thought they might be jumping to conclusions there at El Reg. But it doesn't counter the fact that the chief engineer left. That will probably slow down progress on Centriq v2.0, while they find a new dream team.
Mind you, the Thunder X2 team is in the same boat, since they bought this new design from Broadcom. Nobody seems to be willing to stick it out in the ARM server chip design market.
4x4 A72 cores with no custom L3? 2014 called, and wants it's phone back (there is only a 10% IPC difference between A57 and A72).
You can tell it's no more than a science project when they didn't even target modern cores. The A75 has been in shipping products for a year now, and the lack of it shows under real benchmarks:
https://www.servethehome.com/putting-aws-graviton-its-arm-cpu-performance-in-context/
The funny thing, Amazon is a big enough cloud provider that they could actually benefit form a real effort building their own cutting-edge ASIC. But at this rate of disinterest by Amazon, that's five to ten years away.
Considering Skylake is, IIRC, twice as wide as TX2 (4 issue versus 2), that's very impressive. Even more impressive if I'm wrong and it's more.
Skylake is 8-wide issue. TX2 is 6-wide issue.
Arm's new Neoverse platforms
https://www.anandtech.com/show/13959/arm-announces-neoverse-n1-platform
Seems interesting and a good path forward.
32bit ARM server -- opening move
40nm 64bit ARM server -- move
28nm 64bit ARM server -- move
16nm 64bit ARM server -- check
10nm 64bit ARM server -- check
7nm 64bit ARM server -- checkmate
This 64C Neoverse offers scalar performance similar to 64C Rome, but on about half the power: ~100W vs ~200W. And Intel Cascade Lake and Copper Lake will require ~400W to get that level of performance.
And a 128C Neoverse is in the pipeline
https://www.hpcwire.com/2019/02/20/arm-unveils-neoverse-n1-platform-with-up-to-128-cores/
What about Gromacs AVX512?
But a simulated SPECrate2006 (cache fit) number means checkmate.
By end of year CUDA on ARM will acquire same status than CUDA on x86 or POWER
https://nvidianews.nvidia.com/news/...-exascale-supercomputing?ncid=pa-twi-h2-92902
That is excellent news, I've been wanting CUDA-based applications for ARM for a long time now.By end of year CUDA on ARM will acquire same status than CUDA on x86 or POWER
https://nvidianews.nvidia.com/news/...-exascale-supercomputing?ncid=pa-twi-h2-92902