5800X constant WHEA errors/blusescreen only in games

LigTasm

Supreme [H]ardness
Joined
Jul 29, 2011
Messages
5,397
So I've been having this problem for a few weeks now, not sure what exactly is happening here. This ONLY happens in games, this machine will run any stress test or benchmark for 20 hours straight with zero hiccups. If I try to play a game, in this instance the game is Fallout 76, it will blue screen and crash with these two errors in 15-20 minutes.

The setup right now is as follows but everything has been changed out at least once besides the CPU:

- 5800X
- Asus X570 Tuf + wifi
- 32GB Gskill C16 3200mhz kit (2x16)
- Reference 6900XT
- Corsair 1000W
- EK 360mm AIO
- Inland 2TB m.2 SSD


I have tried messing with voltages any which way you can think of, shutting off CPB and PBO, fixed clock speeds, etc. Also tried a few different memory kits I have including a nice Samsung B-die kit and nothing seems to help it. Before I sold my 11700k rig with the 3070 in it, I tried swapping the GPU as well and that didn't help. Anyone have any ideas? I did try to get a B550 board for testing but the one I got from Amazon had dead USB ports so I barely got past reinstalling windows and then removed it. I have done 3 or 4 fresh installs since troubleshooting this.

The best results I've gotten was shutting off PBO and undervolting the CPU with a -0.200 offset, it runs around 4750mhz and 1.225V like that and lasts for 2-3 hours before the BSOD.


event 8.pngevent 18.png
 
Last edited:

Mchart

Supreme [H]ardness
Joined
Aug 7, 2004
Messages
5,586
Sounds like a board issue, TBH, assuming you're running everything at default settings and it's still happening.
 

MrC4

[H]ard|Gawd
Joined
Feb 10, 2010
Messages
1,633
If this only happens with Fallout 76, it may be a game issue and not a PC issue. Sounds like you have eliminated every other variable.

As a last ditch effort, make sure your bios is up to date. This has fixed a few issues that cropped up over the years.
 

Nobu

[H]F Junkie
Joined
Jun 7, 2007
Messages
8,548
If it's always the same two cores, I'd be very suspicious of the processor. If it's a different core each time, it could point to a power delivery problem (in the processor, motherboard, or psu). The processor could be fine, but power spikes from other components could cause the processor to crash, too, if it can't handle it. Unfortunately, that'd be the hardest thing to troubleshoot, as you'd need an equivalent or better psu, and/or a similar GPU that you could swap in.
 

Mchart

Supreme [H]ardness
Joined
Aug 7, 2004
Messages
5,586
Pretty rare for a CPU to be bad unless you've damaged some pins or the pins came damaged.
 

Bankie

[H]ard|Gawd
Joined
Jul 27, 2004
Messages
2,015
One thing to try is to find the voltage of your memory modules and manually set it to that in the BIOS. When my X570 board is set to DIMM Voltage - Auto it undervolts the memory and causes all kinds of issues.
 

LigTasm

Supreme [H]ardness
Joined
Jul 29, 2011
Messages
5,397
If this only happens with Fallout 76, it may be a game issue and not a PC issue. Sounds like you have eliminated every other variable.

As a last ditch effort, make sure your bios is up to date. This has fixed a few issues that cropped up over the years.


Its not just FO76, it happens in every game I play for more than a few minutes. A little more background, this build has been together since January with zero problems. I do not overclock or do anything to it. Its been set on DOCP and thats the only BIOS tweak besides fan speeds that I did.

I was on the latest BIOS, but when I posted this thread I was thinking about anything else that happened and noticed that a few weeks ago I did upgrade to the latest 4021 BIOS with the 1.2.0.3 Patch C AGESA (previous to that I was on 2607 which was from 2020 sometime, this board was used with 2000 and 3000 series Ryzen chips). What I did was finagle a BIOS a few version old from like march and so far everything has been rock solid. I turned on FO76 and went into a custom server and left it there all day and it went 8-9 hours without crashing.

Maybe my board or chip doesn't like the newest BIOS version? Hoping this solves it.
 

Skull_Angel

[H]ard|Gawd
Joined
May 31, 2010
Messages
1,664
One thing to try is to find the voltage of your memory modules and manually set it to that in the BIOS. When my X570 board is set to DIMM Voltage - Auto it undervolts the memory and causes all kinds of issues.
I'm not so sure about setting them to defaults; while the vDIMM may be undervolted on auto in most cases the default rated setting may be borderline too; otherwise, [CLDO] VDDP/IOD/CCD are often auto set too high for stock settings. Could try out 0.9v VDDP, 0.94/0.98v IOD, 0.9v CCD, may be try 1.1v vSOC as well.

x570 boards have been pretty bleh with RAM training (selecting stable auto-settings), but 1.2.0.3c has been pretty decent from what I've read. It couldn't hurt to check out your settings with ZenTimings and compare them with people using similar kits (same dies +/- raw card revision) to see if anything looks off mark.

edit: LigTasm If that's the case it could very well be a difference in RAM training across the different BIOS revisions and the "enhanced compatibility" of the recent patch is just off-mark compared to older versions. If you want to update the BIOS again, make sure to save a screenshot of current settings in ZenTimings and compare them with the new BIOS, then test with TM5 1usmus v3 profile and/or Karhu for RAM stability.
 
Last edited:

LigTasm

Supreme [H]ardness
Joined
Jul 29, 2011
Messages
5,397
I'm not so sure about setting them to defaults; while the vDIMM may be undervolted on auto in most cases the default rated setting may be borderline too; otherwise, [CLDO] VDDP/IOD/CCD are often auto set too high for stock settings. Could try out 0.9v VDDP, 0.94/0.98v IOD, 0.9v CCD, may be try 1.1v vSOC as well.

x570 boards have been pretty bleh with RAM training (selecting stable auto-settings), but 1.2.0.3c has been pretty decent from what I've read. It couldn't hurt to check out your settings with ZenTimings and compare them with people using similar kits (same dies +/- raw card revision) to see if anything looks off mark.

edit: LigTasm If that's the case it could very well be a difference in RAM training across the different BIOS revisions and the "enhanced compatibility" of the recent patch is just off-mark compared to older versions. If you want to update the BIOS again, make sure to save a screenshot of current settings in ZenTimings and compare them with the new BIOS, then test with TM5 1usmus v3 profile and/or Karhu for RAM stability.

Good ideas. This was one of the first things I thought and went through the timings one by one, I did manually set VSOC, VDDP, etc etc. My oldest Gskill kit has been around since the days of the 6700K being the top dog and I have notes on its settings on multiple platforms. I have several other kits and I tried them all, they pass every memory test for hours and hours no problem. I think the problem is actually somewhere in the vcore response curve, the reason I think so is the chip is running 10-12C cooler with this old BIOS than the newest one. Maybe they changed something with PBO or CPB? I'm not sure and I don't really have the tools to figure it out. One thing I never did was fully disable c-states.

Regardless, Warframe was another big offender and I let it run over night and no problems yet. I'll keep an eye on it for a few days because it has gone a week without a crash recently and then suddenly started up again later.
 

Skull_Angel

[H]ard|Gawd
Joined
May 31, 2010
Messages
1,664
Good ideas. This was one of the first things I thought and went through the timings one by one, I did manually set VSOC, VDDP, etc etc. My oldest Gskill kit has been around since the days of the 6700K being the top dog and I have notes on its settings on multiple platforms. I have several other kits and I tried them all, they pass every memory test for hours and hours no problem. I think the problem is actually somewhere in the vcore response curve, the reason I think so is the chip is running 10-12C cooler with this old BIOS than the newest one. Maybe they changed something with PBO or CPB? I'm not sure and I don't really have the tools to figure it out. One thing I never did was fully disable c-states.

Regardless, Warframe was another big offender and I let it run over night and no problems yet. I'll keep an eye on it for a few days because it has gone a week without a crash recently and then suddenly started up again later.
Something is definitely off if there's a temperature difference to that degree with no other changes. Did crashes with the BIOS update happen shortly after boot, after being idle for some time, or after having run programs for at least an hour?

When you upgraded previously;
Did you have to manually set all values again, or did it allow you to reuse old profiles? (Some old settings may not work, as has been documented with some AGESA updates more recently, but alwasy best to manually input settings after updates)
Have you run y-cruncher to see if the infinity fabric was stable? (1600 FCLK should be rock-solid and almost all 5000 series should have no issue at 1800 with manual RAM tweaking)
Which RAM tests did you run? (Memtest variants don't seem as reliable as TestMem5 with newer profiles like 1usmus v3 and anta777 extreme, OCCT tests, Karhu, etc.)
Have you been running with Curve Optimizer [and testing on vs off]? (CO takes weeks of low-power/idle task testing to refine since stress testing it will only show loaded instabilities which aren't usually the problem)
Have you run through Windows Power Plan settings? (Don't use AMD optimized power plans for 5000 series as they don't work as well as standard balanced power plan and may even cause issues)

As far as RAM goes; you should be able to reach similar settings across different configurations (for timings at least; voltages, termination, and cad_bus should be unique), but that may not be the case when relying on auto-settings since training will have varying results. I can't see this really being the issue if there is a temperature difference like you describe though; ~5C+ CPU temp difference could be possible with a moderately-high RAM OC vs stock, but that shouldn't be that case here.
 

LigTasm

Supreme [H]ardness
Joined
Jul 29, 2011
Messages
5,397
Something is definitely off if there's a temperature difference to that degree with no other changes. Did crashes with the BIOS update happen shortly after boot, after being idle for some time, or after having run programs for at least an hour?

When you upgraded previously;
Did you have to manually set all values again, or did it allow you to reuse old profiles? (Some old settings may not work, as has been documented with some AGESA updates more recently, but alwasy best to manually input settings after updates)
Have you run y-cruncher to see if the infinity fabric was stable? (1600 FCLK should be rock-solid and almost all 5000 series should have no issue at 1800 with manual RAM tweaking)
Which RAM tests did you run? (Memtest variants don't seem as reliable as TestMem5 with newer profiles like 1usmus v3 and anta777 extreme, OCCT tests, Karhu, etc.)
Have you been running with Curve Optimizer [and testing on vs off]? (CO takes weeks of low-power/idle task testing to refine since stress testing it will only show loaded instabilities which aren't usually the problem)
Have you run through Windows Power Plan settings? (Don't use AMD optimized power plans for 5000 series as they don't work as well as standard balanced power plan and may even cause issues)

As far as RAM goes; you should be able to reach similar settings across different configurations (for timings at least; voltages, termination, and cad_bus should be unique), but that may not be the case when relying on auto-settings since training will have varying results. I can't see this really being the issue if there is a temperature difference like you describe though; ~5C+ CPU temp difference could be possible with a moderately-high RAM OC vs stock, but that shouldn't be that case here.


Well all I can tell you is I went from 80C during games to around 66-68, it was immediately obvious and I checked because my fans didn't ramp up any more (I have them set at 50% until 80C). Previously even common desktop tasks like installing a program would ramp the fans way up. Right now I've run all sorts of games all day and haven't had a single hiccup. If it stays as-is I will be very happy, I really don't feel like tweaking and testing for more hours than I play my games since I rarely have time anyways. I used to care a lot about overclocking and all that but now I just want to to be quiet and work.
 

Skull_Angel

[H]ard|Gawd
Joined
May 31, 2010
Messages
1,664
Well all I can tell you is I went from 80C during games to around 66-68, it was immediately obvious and I checked because my fans didn't ramp up any more (I have them set at 50% until 80C). Previously even common desktop tasks like installing a program would ramp the fans way up. Right now I've run all sorts of games all day and haven't had a single hiccup. If it stays as-is I will be very happy, I really don't feel like tweaking and testing for more hours than I play my games since I rarely have time anyways. I used to care a lot about overclocking and all that but now I just want to to be quiet and work.
More or less the same here; only overclocked RAM to have a bit of fun, turned on PBO and found good limits for S&G, and now it's happy where it's at. More than anything, tweaking settings is more about attaining assured stability now since overclocking doesn't have much headroom; we're out of the early adoption phase with current Zen3, but it's still maturing so these issues are still popping up with defaults, though less and less frequently.
 
Top