
Intel CPU Overclocking Blue team overclocking 

Thread Tools  Display Modes 
#1




A Favor to Ask: Skylake X and AVX512
Right now, there are conflicting reports that this first line of Skylake X processors (based on the 10core Skylake Purley LCC die) will not have fullthroughput AVX512.
I want to definitively answer this question  both for myself and for anyone else looking to purchase a Skylake X processor for the purpose of AVX512. Using the same FLOPs benchmark that discovered the Ryzen FMA bug, we should be able to find out if Skylake X has fullthroughput, or halfthroughput AVX512. So my request for someone who has a Skylake X sample* to:
**The source code is also in that GitHub repo if you want to build it yourself. But be aware that if you need the Intel Compiler if you want to build the AVX512 binaries for Windows.  When you run the benchmark, I expect one of 3 things to happen:
Here is what the benchmark looks like for a 32core Skylake Purley system on Google Cloud running at 2.0 GHz with 2.5 GHz turbo: Code:
Running Skylake Purley tuned binary with 1 thread... SinglePrecision  128bit AVX  Add/Sub GFlops = 15.904 Result = 2.02376e+06 DoublePrecision  128bit AVX  Add/Sub GFlops = 7.952 Result = 1.00995e+06 SinglePrecision  128bit AVX  Multiply GFlops = 15.936 Result = 2.03498e+06 DoublePrecision  128bit AVX  Multiply GFlops = 7.968 Result = 1.00712e+06 SinglePrecision  128bit AVX  Multiply + Add GFlops = 15.936 Result = 1.69085e+06 DoublePrecision  128bit AVX  Multiply + Add GFlops = 7.968 Result = 841756 SinglePrecision  128bit FMA3  Fused Multiply Add GFlops = 31.872 Result = 2.02868e+06 DoublePrecision  128bit FMA3  Fused Multiply Add GFlops = 15.936 Result = 1.01782e+06 SinglePrecision  256bit AVX  Add/Sub GFlops = 31.808 Result = 4.06688e+06 DoublePrecision  256bit AVX  Add/Sub GFlops = 15.936 Result = 2.02901e+06 SinglePrecision  256bit AVX  Multiply GFlops = 31.872 Result = 4.06158e+06 DoublePrecision  256bit AVX  Multiply GFlops = 15.936 Result = 2.02013e+06 SinglePrecision  256bit AVX  Multiply + Add GFlops = 31.872 Result = 3.34696e+06 DoublePrecision  256bit AVX  Multiply + Add GFlops = 15.936 Result = 1.70441e+06 SinglePrecision  256bit FMA3  Fused Multiply Add GFlops = 63.744 Result = 4.0399e+06 DoublePrecision  256bit FMA3  Fused Multiply Add GFlops = 31.872 Result = 2.00801e+06 SinglePrecision  512bit AVX512  Add/Sub GFlops = 63.744 Result = 8.11456e+06 DoublePrecision  512bit AVX512  Add/Sub GFlops = 31.872 Result = 4.03949e+06 SinglePrecision  512bit AVX512  Multiply GFlops = 63.36 Result = 8.0743e+06 DoublePrecision  512bit AVX512  Multiply GFlops = 31.872 Result = 4.05014e+06 SinglePrecision  512bit AVX512  Multiply + Add GFlops = 63.744 Result = 6.68723e+06 DoublePrecision  512bit AVX512  Multiply + Add GFlops = 31.872 Result = 3.3739e+06 SinglePrecision  512bit AVX512  Fused Multiply Add GFlops = 127.488 Result = 8.22848e+06 DoublePrecision  512bit AVX512  Fused Multiply Add GFlops = 63.744 Result = 4.03805e+06 Running Skylake Purley tuned binary with 64 thread(s)... SinglePrecision  128bit AVX  Add/Sub GFlops = 683.36 Result = 8.68179e+07 DoublePrecision  128bit AVX  Add/Sub GFlops = 263.568 Result = 3.35065e+07 SinglePrecision  128bit AVX  Multiply GFlops = 527.616 Result = 6.69453e+07 DoublePrecision  128bit AVX  Multiply GFlops = 263.88 Result = 3.34619e+07 SinglePrecision  128bit AVX  Multiply + Add GFlops = 527.136 Result = 5.58561e+07 DoublePrecision  128bit AVX  Multiply + Add GFlops = 263.64 Result = 2.79832e+07 SinglePrecision  128bit FMA3  Fused Multiply Add GFlops = 1056.77 Result = 6.71142e+07 DoublePrecision  128bit FMA3  Fused Multiply Add GFlops = 528.336 Result = 3.36188e+07 SinglePrecision  256bit AVX  Add/Sub GFlops = 1054.14 Result = 1.34076e+08 DoublePrecision  256bit AVX  Add/Sub GFlops = 527.52 Result = 6.68866e+07 SinglePrecision  256bit AVX  Multiply GFlops = 1056.77 Result = 1.34416e+08 DoublePrecision  256bit AVX  Multiply GFlops = 527.664 Result = 6.70251e+07 SinglePrecision  256bit AVX  Multiply + Add GFlops = 1055.33 Result = 1.12018e+08 DoublePrecision  256bit AVX  Multiply + Add GFlops = 527.52 Result = 5.59086e+07 SinglePrecision  256bit FMA3  Fused Multiply Add GFlops = 2110.08 Result = 1.34046e+08 DoublePrecision  256bit FMA3  Fused Multiply Add GFlops = 1055.33 Result = 6.69451e+07 SinglePrecision  512bit AVX512  Add/Sub GFlops = 2112.26 Result = 2.68216e+08 DoublePrecision  512bit AVX512  Add/Sub GFlops = 1056 Result = 1.34131e+08 SinglePrecision  512bit AVX512  Multiply GFlops = 2117.38 Result = 2.69031e+08 DoublePrecision  512bit AVX512  Multiply GFlops = 1059.26 Result = 1.34601e+08 SinglePrecision  512bit AVX512  Multiply + Add GFlops = 2118.14 Result = 2.24393e+08 DoublePrecision  512bit AVX512  Multiply + Add GFlops = 1058.5 Result = 1.12102e+08 SinglePrecision  512bit AVX512  Fused Multiply Add GFlops = 4242.43 Result = 2.69409e+08 DoublePrecision  512bit AVX512  Fused Multiply Add GFlops = 2115.07 Result = 1.34365e+08 
The Following User Says Thank You to Mysticial For This Useful Post:  
Massman (06092017) 
#2




Re: A Favor to Ask: Skylake X and AVX512
Fired off some emails
__________________
Where courage, motivation and ignorance meet, a persistent idiot awakens. For all HWBOT community related questions, contact Christian Ney or Websmile. For any other questions, contact me at pieter@hwbot.org. 
The Following User Says Thank You to Massman For This Useful Post:  
Mysticial (06092017) 
#3




Re: A Favor to Ask: Skylake X and AVX512
Bump. NDAs lifting today.
I'm most curious about the 7820X and the 7900X. EDIT: The reviews seems to indicate that the 6 and 8core models will have halfthroughput, and the 10core model will have fullthroughput. Microarchitecture Analysis: Adding in AVX512 and Tweaks to SkylakeS  The Intel SkylakeX Review: Core i9 7900X, i7 7820X and i7 7800X Tested Last edited by Mysticial; 06192017 at 16:28. 
#4




Re: A Favor to Ask: Skylake X and AVX512
Windows 10 1703 with Intel C++ redists installed.
__________________
Elmor's lab 
#5




Re: A Favor to Ask: Skylake X and AVX512
Thank you!
This is interesting though. The compiler seems to be trying to enforce that the computer has RDSEED instructions. But RDSEED was already available starting from Broadwell. I don't see why it would be missing from Skylake X unless it was explicitly disabled in the BIOS or something. This might be a problem moving forward since the compiler forces these checks even though most programs won't use them anyway. EDIT: Is virtualization disabled in the BIOS? I'm reading around and it seems that some machines have all the crypto instructions disabled (AESNI, RDRAND, and RDSEED) and it may be related to virtualization. Last edited by Mysticial; 06212017 at 17:45. 
#6




Re: A Favor to Ask: Skylake X and AVX512
I found a way to disable that check by the compiler and I've updated the binaries.
So if anyone is willing to try now, it should (hopefully) work regardless of whether RDSEED is enabled or not. Thanks. 
#7




Re: A Favor to Ask: Skylake X and AVX512
Quote:
__________________
Elmor's lab 
#8




Re: A Favor to Ask: Skylake X and AVX512
Would you be able to try with the latest binaries? I updated them last night.
As far as I can tell, I've removed the check. So it should get past that message and either run successfully or crash. Thanks for you time. 
#9




Re: A Favor to Ask: Skylake X and AVX512
Works fine here with prior binaries will test later with latest.

#10




Re: A Favor to Ask: Skylake X and AVX512
I don't believe you, send me your X299 gear so I can see first hand.

Thread Tools  
Display Modes  

