Overclocking 2P/4P Opterons

rvborgh

Weaksauce
Joined
Jul 1, 2014
Messages
108
hi Folks,

i've been running a quad socket SuperMicro H8QGi-F motherboard with Opteron 61xx ES chips since 2015 as my home PC. 48 K10 cores overclocked to 3.0 GHz. The thing has been a tank, and solid as a rock (and it should be since the internals plus the SC748 case weigh about 80 lbs). The K10 based Opterons worked well, but lately i decided to upgrade to the Piledriver based Opteron 63xx chips to tide me over a bit until i upgraded yet again to EPYC Rome chips. Yes i know Piledriver is a bit passee these days, but still....

Specifically the ones i used were ZS258045TGG54_34/25/20_2/16. These are unlocked 63xx engineering samples - essentially and unlocked version of Opteron 6380 (16 cores). i pulled all 4 61xx ES (and the Noctua NH-U9DO A3 coolers), and reinstalled 2 of the 63xx chips.

i ran into a bunch of peculiarities in getting these things stable.

To keep heat down and performance up, i downcored them in the BIOS using the "Compute Unit" setting. So instead of a total of 32 cores, i get 16 (8 per 63xx). The "Compute Unit" setting essentially disables CMT, and clock gates the unused Piledriver core in the compute unit. The remaining core gets the L2 cache as well as the L1 instruction cache and shared decoder to itself for a bit of an IPC increase.

My HQGi-F also had Tear's OCNG BIOS flashed, and of course the Turionpowercontrol utility to let me set pstates (voltages plus frequencies).

The strange thing was that the box was not stable. i could run Prime95 small FFTs (which runs in the L1/L2) just fine, but the moment i ran Blended (where it hit memory) the box would black screen and reboot. Running the RightMark Multi-Thread Memory Test would also cause a reset. At first i thought it was related to the Hypertransport, and dropping that down to HT 1.0 helped things run longer. That did not fix things unfortunately.

It turns out that the BIOS node interleaving disabled setting seems to break the processor. The Node interleaving disabled setting exposes threads to the true memory latencies caused by the distance from the core running that thread to the data in the memory it was originally allocated on. So threads running on socket 2, accessing data in memory that was originally allocated on socket 1 have memory latencies a lot higher, than if they were running on a core in socket 1. Setting it to Node interleaving back to Auto (enabling node interleaving) and striping the memory across all nodes to even out the memory latency seen by each socket brought completely stability back.

i have no idea why disabling node interleaving breaks things under heavy load. Its almost like when all the cores are hitting memory hard, the shared Hypertransport probe filter that maintains memory coherency across cores is simply not able to keep up.

As far as overclocking goes. The Piledriver cores on these things run 4.5 GHz just fine in pb0 single core turbo. pb1 (all core turbo) they run at 3.9 GHz just fine. i haven't really pushed things much farther as i want to keep voltages not too much higher than 1.2v (this is where i ran my 61xx for years, so i know the VRMs can take it).

Frequency and voltages for stability (with ref clock at 200 MHz):
2800 vcore 1.1250v (pb1 all core turbo)
3500 vcore 1.1875v (pb1 all core turbo)
3600 vcore 1.1875v (pb1 all core turbo)
3700 vcore 1.2000v (pb1 all core turbo)
3800 vcore 1.2250v (pb1 all core turbo)
3900 vcore 1.2375v (pb1 all core turbo)
4000 vcore 1.2375v (pb1 all core turbo)

4500 vcore 1.3125v (pb0 single core turbo)

The above settings were with APM on. i will be disabling APM to see what frequencies/voltages these processors run when turbo core isn't managing things.

Compared to the K10s, the single thread performance is marginally better (but requires a lot more frequency to achieve this), but the main benefit is the much improved memory performance.

Next up now that i have found stability - i will be upping the ref clock to see how fast i can get the memory controllers/L3 going.

I'd be curious how other people are doing overclocking these things. Heat is not so bad with just 16 cores. i hit about 33C or so when running prime95. One advantage i think is that the Opteron's have a larger heat spreader compared to say the FX-8350, so you can run things cooler when running only 8 cores in each 63xx.
 
Last edited:

rvborgh

Weaksauce
Joined
Jul 1, 2014
Messages
108
What is the wattage draw at the wall of that? :eek:

It is not so bad with only 16 cores going full chat. About 480 watts for the entire system using the Kill-A-Watt. Idle is around 260 watts. This is with all pb1 (all core turbo) set to 3.9 GHz, and pb2 (single core turbo) set to 4.4 GHz (which is 4.0 GHz and 4.5 GHz after multiplying by the 205 ref clock).

The strange thing i encountered through all this. Not being able to disable node interleaving in the BIOS. This means that about 3/4 of the memory accesses have to go through the hypertransport links because memory is seen as striped across all the processor nodes vs processor die 0 seeing its own ram.

At the highest frequencies, i could not go beyond about 205 ref clock. If i backed off the frequency to 3.8 GHz, then i could hit 207 ref clock with stability, and then only with increased voltage when compared to the same frequencies with the stock 200 ref clock. i ended up just backing things off to 205 because the small increase in ref clock wasn't worth the larger decrease in cpu frequency.

i think i am concluding my overclocking effort with this now. i think a lot of the cores can actually hit 4.6, but some do not, and i am not going to put in the effort to figure out which ones do, and which ones do not. Some even work fine at 4.7. But the voltage required is high, and i want to say in the sane voltage range and for the thing to be reliable. This should tide me over until i swap out this H8QGi-F motherboard for an updated Dual EPYC SuperMicro motherboard in the future.

Here are the settings for anyone running these processors.

There was one other setting in the BIOS that is special to this - CPU Downcore mode was set to "Compute Unit"

PS: i want to give kudos to "Tear" over on the DC forum here, as he was the person who put together the OCNG BIOS which let me run the memory at the highest speed this motherboard would support. Also to the folks that put together TurionPowerControl.

https://valid.x86.fr/fqdcrm

Final results 2.jpg
 
Last edited:
Joined
Jan 11, 2023
Messages
3
It is not so bad with only 16 cores going full chat. About 480 watts for the entire system using the Kill-A-Watt. Idle is around 260 watts. This is with all pb1 (all core turbo) set to 3.9 GHz, and pb2 (single core turbo) set to 4.4 GHz (which is 4.0 GHz and 4.5 GHz after multiplying by the 205 ref clock).

The strange thing i encountered through all this. Not being able to disable node interleaving in the BIOS. This means that about 3/4 of the memory accesses have to go through the hypertransport links because memory is seen as striped across all the processor nodes vs processor die 0 seeing its own ram.

At the highest frequencies, i could not go beyond about 205 ref clock. If i backed off the frequency to 3.8 GHz, then i could hit 207 ref clock with stability, and then only with increased voltage when compared to the same frequencies with the stock 200 ref clock. i ended up just backing things off to 205 because the small increase in ref clock wasn't worth the larger decrease in cpu frequency.

i think i am concluding my overclocking effort with this now. i think a lot of the cores can actually hit 4.6, but some do not, and i am not going to put in the effort to figure out which ones do, and which ones do not. Some even work fine at 4.7. But the voltage required is high, and i want to say in the sane voltage range and for the thing to be reliable. This should tide me over until i swap out this H8QGi-F motherboard for an updated Dual EPYC SuperMicro motherboard in the future.

Here are the settings for anyone running these processors.

There was one other setting in the BIOS that is special to this - CPU Downcore mode was set to "Compute Unit"

PS: i want to give kudos to "Tear" over on the DC forum here, as he was the person who put together the OCNG BIOS which let me run the memory at the highest speed this motherboard would support. Also to the folks that put together TurionPowerControl.

https://valid.x86.fr/fqdcrm

View attachment 524650

For lowering the multi on NB, you have to choose "AUTO". It will automaticly go down to 9x when the ref clock is getting higher!

Here some of my system. At the moment i turned only 3 Nodes on. Tryin to find the maximum.
 

Attachments

  • 112233.jpg
    112233.jpg
    470.8 KB · Views: 0
Last edited:

rvborgh

Weaksauce
Joined
Jul 1, 2014
Messages
108
For lowering the multi on NB, you have to choose "AUTO". It will automaticly go down to 9x when the ref clock is getting higher!

Here some of my system. At the moment i turned only 3 Nodes on. Tryin to find the maximum.

How is the optimization going?
i haven't been able to run the NB at more than about 205 with stability. i still have the issue where under stress testing the machine will just shut down, but at this point i am thinking that it is a matter of the motherboard simply not being able to supply voltage reliably to each socket once things go about 220 watts or so.
 

cdabc123

Supreme [H]ardness
Joined
Jun 21, 2016
Messages
4,097
Nice troubleshooting with a interesting configuration. Although I dont think I could bare running a opteron setup. Last time I did was a mining rig and every time after that I would always just grab a 1366 system I had instead.
 
Joined
Jan 11, 2023
Messages
3
How is the optimization going?
i haven't been able to run the NB at more than about 205 with stability. i still have the issue where under stress testing the machine will just shut down, but at this point i am thinking that it is a matter of the motherboard simply not being able to supply voltage reliably to each socket once things go about 220 watts or so.
i achived 247 mhz ref clock. 3952 MHZ for all Cores with 3 Nodes. With 4 Nodes the maximum is 242 MHZ ref clock. Its working with stock voltages. I have no options for setting the voltages higher. No ES Samples here :( The maximum of the NB is 240 x 10, but here also, the voltages limits everything.
On Turbo it boosts to 17.5 x multi and the reason of that is 4322 MHZ on one Module. I tried on a H8DGI to manipulate the voltages over the IC, but after modding the board its not working anymore. Also i asked Elmor Labs for help, but they also did not help.
 

rvborgh

Weaksauce
Joined
Jul 1, 2014
Messages
108
i achived 247 mhz ref clock. 3952 MHZ for all Cores with 3 Nodes. With 4 Nodes the maximum is 242 MHZ ref clock. Its working with stock voltages. I have no options for setting the voltages higher. No ES Samples here :( The maximum of the NB is 240 x 10, but here also, the voltages limits everything.
On Turbo it boosts to 17.5 x multi and the reason of that is 4322 MHZ on one Module. I tried on a H8DGI to manipulate the voltages over the IC, but after modding the board its not working anymore. Also i asked Elmor Labs for help, but they also did not help.
i am not sure that voltage is limiting you too much. i found that the voltage requirements to increase the GHz really starts climbing after about 3.8 GHz.

i run with this TPC script:

cd C:\tpc
TurionPowerControl -fo 2
TurionPowerControl -set node all core all pstate 0 freq 4400 vcore 1.3250
TurionPowerControl -set node all core all pstate 1 freq 3900 vcore 1.2625
timeout /t 2
TurionPowerControl -fo 0

With a ref clock of 205. This gives 4.0 GHz (all core turbo) and 4.5 GHz (single core turbo). So being able to manipulate the voltage doesn't give me too much extra.
Since you have an abundance of cores with the 6386SE i wonder if you have tried running these in "COMPUTE UNIT" setting (its in the downcore mode BIOS setting)? It saves power, but makes each core about 15-20% more efficient as it isn't sharing its L2 and various other computing resources with its neighboring core in each module.
You might also be able to up your ref clock by running this way.
 
Joined
Jan 11, 2023
Messages
3
i am not sure that voltage is limiting you too much. i found that the voltage requirements to increase the GHz really starts climbing after about 3.8 GHz.

i run with this TPC script:

cd C:\tpc
TurionPowerControl -fo 2
TurionPowerControl -set node all core all pstate 0 freq 4400 vcore 1.3250
TurionPowerControl -set node all core all pstate 1 freq 3900 vcore 1.2625
timeout /t 2
TurionPowerControl -fo 0

With a ref clock of 205. This gives 4.0 GHz (all core turbo) and 4.5 GHz (single core turbo). So being able to manipulate the voltage doesn't give me too much extra.
Since you have an abundance of cores with the 6386SE i wonder if you have tried running these in "COMPUTE UNIT" setting (its in the downcore mode BIOS setting)? It saves power, but makes each core about 15-20% more efficient as it isn't sharing its L2 and various other computing resources with its neighboring core in each module.
You might also be able to up your ref clock by running this way.
Thats what i mean with "Nodes" i lower the Compute Unit Setting in Bios and the CPU's are running much more stable at higher clocks.
8 Compute Units per CPU (16 Core) = max. 242 Mhz ref clock
6 Compute Units per CPU (12 Core) = max. 247 Mhz ref clock
4 Compute Units per CPU (8 Core) = max. 247 Mhz ref clock
2 Compute Units per CPU (4 Core) = max. 247 Mhz ref clock
that tells me, that a voltage limit reached.
And yes, i agree. The CPUs are working much more efficient when you disable some units. In comparsion to the FX-8350 for example you get something between 3 - 6% more speed on single core.

My Turion Power Control won't work. It shows me errors with "Unable to initialize WinRing0 library"
 
Last edited:
Top