System Instability and lots of Uncorrectable ECC errors

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
I just built a new server out of second hand components (minus the RAM which is new).

- Motherboard: Supermicro X11SSL-F (updated to BIOS 2.6)
- CPU: Intel Xeon E3 1240 V5
- RAM: 4 x Crucial MTA9ASF2G72AZ-3G2B1 (DDR4-3200 ECC UDIMM 1Rx8 CL22)
- PSU: Seasonic G-360
- Cooler: Noctua NH-L12S

All 64GB of RAM are properly detected in BIOS. The RAM runs only at 2133 MHz though (probably CPU limitation).

It never occurred to me before when I was building any other system, but with this one, it boots, but as soon as I enter Debian Linux and try to do some stress test (e.g. compile a Linux Kernel to do a thermal load on the CPU) it just automatically reboots.

Other things I tried:
- Remove some of the DIMMs -> Server refuses to boot (5 short beeps + 1 long beep = no system memory detected)
- Ran Passmark Memtest86 free -> No errors detected in Memtest86 (I only ran it for ~1 hour), but LOTS of Uncorrectable ECC Errors (in all channels/DIMMs) A1/A2/B1/B2

I discovered that while the memory is listed a compatible with Supermicro X11SSL-F on Crucial website, the Motherboard manual states only that:


Max Memory Possible4GB DRAM Technology
8GB DRAM Technology
Single Rank
UDIMM
16GB
(4x 4GB DIMMs)
32GB
(4x 8GB DIMMs)
Dual Rank
UDIMMs
32GB
(4x 8GB DIMMs)
64GB
(4x 16GB DIMMs)

I assume these were the only memory configurations available when the motherboard came out. Maybe other configurations are also supported ?

Side question: why is my 16GB DIMM listed as 1R8? Shouldn't it be 1R16 since it's a 16GB DIMM?

I will be able to build another similar system in the early future, which (hopefully) will yield a different result, as I cannot determine who is faulty here:
- Motherboard ? No bent pins though ...
- CPU ? Might try to remove cooler, wiggle CPU a bit, and reinstall cooler, maybe that will help a bit
- RAM ? Is the single rank the issue here ?
- PSU ?

Thank you for your help :).
 

GiGaBiTe

2[H]4U
Joined
Apr 26, 2013
Messages
2,266
All 64GB of RAM are properly detected in BIOS. The RAM runs only at 2133 MHz though (probably CPU limitation).

Yes, it's a CPU limitation. Intel is pretty good about telling the memory speeds available on their CPU data pages.
https://www.intel.com/content/www/u...1240-v5-8m-cache-3-50-ghz/specifications.html

It never occurred to me before when I was building any other system, but with this one, it boots, but as soon as I enter Debian Linux and try to do some stress test (e.g. compile a Linux Kernel to do a thermal load on the CPU) it just automatically reboots.

journalctl, dmesg and /var/log/messages might show you what happened immediately before a triple fault happened that caused the spontaneous reboot.

Side question: why is my 16GB DIMM listed as 1R8? Shouldn't it be 1R16 since it's a 16GB DIMM?

That's not what those numbers mean. "1" means the module is single rank. "Rx" means how wide each memory chip on the module is in bits. 8 would be 8 bit and require 8 chips to make up 64 bits. 16 would mean 16 bit and only require 4 chips to make up 64 bits. Your module has 8 chips, plus one additional chip for ECC, making it a 1R8 module.

Ran Passmark Memtest86 free -> No errors detected in Memtest86 (I only ran it for ~1 hour), but LOTS of Uncorrectable ECC Errors (in all channels/DIMMs) A1/A2/B1/B2

Uncorrectable ECC errors ARE memory errors. It means that a single address had too many bit errors to be corrected by ECC.

As for what might be wrong, there is a laundry list of them, starting with the fact you're using modules that aren't officially supported by the motherboard. Super Micro motherboards in my experience are extremely picky about RAM.

All four memory modules could be bad also, which isn't out of the realm of possibilities. I've received multiple kits of new memory where all of the modules were bad, RMA'd them, only to receive all different modules that were also bad.

Another possibility is a bad CPU IMC, or one that's flaky when loaded with a large amount of memory. I would consult the Super Micro manual for your board to find a configuration with less memory installed that boots and test that. You may also want to test some completely different known good modules that are listed as compatible.

You may consider also trying a different higher wattage PSU. While those parts aren't really that power hungry, under peak load, the system could be pulling enough for a brief period of time to cause PSU issues. 360W doesn't give you a lot of headroom, I'd go with a 450-600W unit to be safe.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
First of all, thank you for your exhaustive and quick reply, GiGaBiTe ;).

journalctl, dmesg and /var/log/messages might show you what happened immediately before a triple fault happened that caused the spontaneous reboot.

I'll have to dig this up ... I assume previous boot I can find out with "journalctl -k -b -1" (-1 argument for previous boot)


That's not what those numbers mean. "1" means the module is single rank. "Rx" means how wide each memory chip on the module is in bits. 8 would be 8 bit and require 8 chips to make up 64 bits. 16 would mean 16 bit and only require 4 chips to make up 64 bits. Your module has 8 chips, plus one additional chip for ECC, making it a 1R8 module.
Ah, alright, thanks for the explanation :).

Uncorrectable ECC errors ARE memory errors. It means that a single address had too many bit errors to be corrected by ECC.
Sorry, my bad, probably I didn't express myself clearly enough.
I meant, when I said I ran Passmark Memtest86 free:
- No errors detected and/or reported within Memtest86 screen
- LOTS of Uncorrectable ECC Errors (in all channels/DIMMs) A1/A2/B1/B2 reported in Supermicro IPMI/BMC Event Log

Hope it's clearer now.

As for what might be wrong, there is a laundry list of them, starting with the fact you're using modules that aren't officially supported by the motherboard. Super Micro motherboards in my experience are extremely picky about RAM.
I'm surprised by your statement. I had Supermicro X9 dual servers with 32 sticks of DDR3 RAM with mix-match of brands, frequencies, etc. Always ran stable and no Memtest86 or other memory errors reported. This is not to say it's a good practice, but in my view, Supermicro are the least picky Server boards around.

Furthermore Crucial memory compatibility tool listed this memory as compatible.

All four memory modules could be bad also, which isn't out of the realm of possibilities. I've received multiple kits of new memory where all of the modules were bad, RMA'd them, only to receive all different modules that were also bad.
The first hypothesis I had was actually a mix/match of different RAM modules/chips. It's the same Crucial P/N, but 3 DIMMs were DPASAYN001 / 2146 while 1 DIMM was HM00004A33 / 2141. That's why I tried to disconnect temporarily the odd one, but whenever I had 1/2/3 sticks only of RAM, the system wouldn't boot.

Do you think it's an issue related to the single/dual rank type of memory I am using ?

Another possibility is a bad CPU IMC, or one that's flaky when loaded with a large amount of memory. I would consult the Super Micro manual for your board to find a configuration with less memory installed that boots and test that. You may also want to test some completely different known good modules that are listed as compatible.
The list of compatible modules is very short and nothing I can really shop around anyway in Europe :(.

You may consider also trying a different higher wattage PSU. While those parts aren't really that power hungry, under peak load, the system could be pulling enough for a brief period of time to cause PSU issues. 360W doesn't give you a lot of headroom, I'd go with a 450-600W unit to be safe.
I always ran this Seasonic G-360 360W PSU with all the X10SLL-F/X10SLM-F systems I had (previous generation), never had a problem. While it's not the same MB/CPU, do you suspect this new X11SSL-F and E3 1240 V5 to be so much more power hungry? I would have actually expected slighly lower power consumption.

But as I said, in a few days I'll have some other RAM DIMMs (same type MTA9ASF2G72AZ-3G2B1, and another set with Kingston KSM26ED8/16HD - SK Hynix based the latter) that I can test another full system with (same model of MB/CPU/PSU/RAM, but different "pieces" - for another system).
 

GiGaBiTe

2[H]4U
Joined
Apr 26, 2013
Messages
2,266
Sorry, my bad, probably I didn't express myself clearly enough.
I meant, when I said I ran Passmark Memtest86 free:
- No errors detected and/or reported within Memtest86 screen
- LOTS of Uncorrectable ECC Errors (in all channels/DIMMs) A1/A2/B1/B2 reported in Supermicro IPMI/BMC Event Log

Hope it's clearer now.

Doesn't matter, memory errors are memory errors. Memtest86 is known to make false positives and false negatives. I've never known it to be great at detecting ECC errors. If the IPMI/BMC says there are memory errors, I'd trust it more since it has lower level access to the hardware than Memtest86 does.


I'm surprised by your statement. I had Supermicro X9 dual servers with 32 sticks of DDR3 RAM with mix-match of brands, frequencies, etc. Always ran stable and no Memtest86 or other memory errors reported. This is not to say it's a good practice, but in my view, Supermicro are the least picky Server boards around.

My area of knowledge of Super Micro gear is from the mid 90s to the late 2000s. They were extremely picky about memory back in those days on boards that used Intel chipsets. I guess they've changed since then since the IMC is now on the CPU.

The first hypothesis I had was actually a mix/match of different RAM modules/chips. It's the same Crucial P/N, but 3 DIMMs were DPASAYN001 / 2146 while 1 DIMM was HM00004A33 / 2141. That's why I tried to disconnect temporarily the odd one, but whenever I had 1/2/3 sticks only of RAM, the system wouldn't boot.

Unless the motherboard requires multiple sticks of memory to boot, it not booting with one stick installed tells me there is either an incompatibility issue, or that such a large portion of the memory stick is bad that the system can't run properly. You can have bad memory in a system and it not affect booting if it's off somewhere in the upper address map and the lower addresses are fine. This would explain the system running with all four sticks and not less because the bad areas are being shuffled into places in the memory map that are being used for booting and getting corrupted.

I always ran this Seasonic G-360 360W PSU with all the X10SLL-F/X10SLM-F systems I had (previous generation), never had a problem. While it's not the same MB/CPU, do you suspect this new X11SSL-F and E3 1240 V5 to be so much more power hungry? I would have actually expected slighly lower power consumption.

Intel's TDP numbers mean nothing, they have a long history of pulling random numbers out of their ass and calling it "good enough", when it really isn't. Exacerbating that, motherboard vendors often ignore the guidance Intel provides on turbo frequencies and durations, and which cores boost. This pushes up that power number further. While that CPU is rated at 80W, that is an extremely conservative number. I would expect double or even triple that number in heavy load situations that utilize all cores, and possibly more that use specialized instructions like AVX.

I have not had an Intel processor in the last 22 years that ran anywhere near its "nominal" TDP number, and I've had a lot of them. My current i9-10850k has a TDP of 125W, but it regularly blows far past that, I've seen it close to 300W when all 10 cores are cranking along in boost mode. And since the MSI motherboard ignores the boosting guidelines, it'll keep that power level up until the liquid cooler can't keep up and hits tjmax.

Anyway, a 360W PSU doesn't give you a lot of headroom. Expect at least half of it to be used by the CPU, and another not insignificant chunk by the motherboard and RAM. I don't know how many drives you have, or what type they are. But a CPU rapidly pulling and unloading 50% of the PSUs rating can cause problems.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
Unless the motherboard requires multiple sticks of memory to boot, it not booting with one stick installed tells me there is either an incompatibility issue, or that such a large portion of the memory stick is bad that the system can't run properly. You can have bad memory in a system and it not affect booting if it's off somewhere in the upper address map and the lower addresses are fine. This would explain the system running with all four sticks and not less because the bad areas are being shuffled into places in the memory map that are being used for booting and getting corrupted.
Could be a case where 4x half bad sticks allow the MB to boot, but as soon as something "real" gets done, it just crashes :confused:.


Intel's TDP numbers mean nothing, they have a long history of pulling random numbers out of their ass and calling it "good enough", when it really isn't. Exacerbating that, motherboard vendors often ignore the guidance Intel provides on turbo frequencies and durations, and which cores boost. This pushes up that power number further. While that CPU is rated at 80W, that is an extremely conservative number. I would expect double or even triple that number in heavy load situations that utilize all cores, and possibly more that use specialized instructions like AVX.

I have not had an Intel processor in the last 22 years that ran anywhere near its "nominal" TDP number, and I've had a lot of them. My current i9-10850k has a TDP of 125W, but it regularly blows far past that, I've seen it close to 300W when all 10 cores are cranking along in boost mode. And since the MSI motherboard ignores the boosting guidelines, it'll keep that power level up until the liquid cooler can't keep up and hits tjmax.

Anyway, a 360W PSU doesn't give you a lot of headroom. Expect at least half of it to be used by the CPU, and another not insignificant chunk by the motherboard and RAM. I don't know how many drives you have, or what type they are. But a CPU rapidly pulling and unloading 50% of the PSUs rating can cause problems.

I agree with Intel TDP ratings being very off. AMD usually is/was more realistic on those. About the PSU rating, I'd tend to agree that it's not hugely oversized. However this is for a home server running Virtualization (Proxmox VE), 6 x Crucial MX500 SSDs, 1 x Mellanox 10gbps NIC, 1 x IBM M1015 SAS HBA and possibly a few USB-Ethernet adapters (for Pfsense WAN connection). Storage wise it's not very stressed at all. 15W for the HBA + 10W for the NIC, maybe 10W for all USB ? Even if you underrate the PSU due to age etc (I seem to recall a rule of thumb is use PSU at 60%-70% max, especially since this one is quite old) then 70% x 360W = 252W -> CPU + RAM + MB < 252W - 35W = 217W for CPU + RAM + MB.

So it might be that it's a bit tight indeed (although I still think that this is not the primary cause of instability, since memory errors are already reported at relatively low CPU load).

Any PSU you would reccomend ? I have always been buying Seasonic lately, but in the latest years I'd say the "good value" market has disappeared. There are entry level PSUs from Corsair etc, but you read sometimes that they destroyed the whole PC they were in. Plus in the 400W-500W mark, there aren't many high end PSUs from what I recall. Most of the high-end stuff is say 800W-1000W, for people using dual/triple GPUs, overclock etc.
 

GiGaBiTe

2[H]4U
Joined
Apr 26, 2013
Messages
2,266
I tend to stick with Antec, Cooler Master and ThermalTake. I know the last one doesn't have a great reputation, but I've not had that many issues with them over 15 years or so I've been using them. My current TT 800W TR2 has been in service for over a decade across several builds with no issues, and it definitely gets a workout with my i9-10850k, GTX 1070 Ti and eight drives.

I would recommend staying away from EVGA though, especially their lower end stuff. I've had several of their units blow up or silently die in normal office PC use. Also had one that was DOA.
 

Nobu

[H]F Junkie
Joined
Jun 7, 2007
Messages
8,538
Have you tried the single odd stick of ram by itself? The three matched sticks may be bad (or mostly bad).
 

toast0

[H]ard|Gawd
Joined
Jan 26, 2010
Messages
1,889
I don't know if the supermicro bios will let you tweak memory timings and voltage? You might just need a bit more voltage or looser timings.

Also, sometimes the IPMI error logs show up late, so maybe the ECC errors lead to the reboot under Linux, but only showed up in IPMI while you were running memtest. I'd try clearing the logs and then unplugging the server so the BMC has to restart. Maybe reseat the processor, too.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
I tend to stick with Antec, Cooler Master and ThermalTake. I know the last one doesn't have a great reputation, but I've not had that many issues with them over 15 years or so I've been using them. My current TT 800W TR2 has been in service for over a decade across several builds with no issues, and it definitely gets a workout with my i9-10850k, GTX 1070 Ti and eight drives.

I would recommend staying away from EVGA though, especially their lower end stuff. I've had several of their units blow up or silently die in normal office PC use. Also had one that was DOA.
I am tempted to also upgrade my Desktop, that way I can test these RAM DIMMs to see if they are the faulty ones. Some new NVIDIA GPUs (RTX3000) apparently had issues with Seasonic, although I am stuck on GTX 1000 for now ;). Would you say the Antec Signature Platinum 1000 compare well with Seasonic Prime PX 1000 ? Or should I rather go with Corsair HX1000? I heard some bad stuff with Corsair though, particularly about the *i* types (with this digital DSP control I believe), very picky about voltage fluctuations.
Both are cheaper than the Seasonic :).

For the small server I'd say I could get the following for approx. the same price (all of them 80+ gold):
- Seasonic CORE GC 500W
- Seasonic G12 GC Series 550W
- Cooler Master MWE Gold V2 550W
- Antec Neo Eco NE500G Zen 500W
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
Have you tried the single odd stick of ram by itself? The three matched sticks may be bad (or mostly bad).
I don't think I did. Might as well try. I'm just a bit unmotivated to remove the cooler every single time :(.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
I don't know if the supermicro bios will let you tweak memory timings and voltage? You might just need a bit more voltage or looser timings.

Also, sometimes the IPMI error logs show up late, so maybe the ECC errors lead to the reboot under Linux, but only showed up in IPMI while you were running memtest. I'd try clearing the logs and then unplugging the server so the BMC has to restart. Maybe reseat the processor, too.
I don't believe I can tweak the voltage. Most I think I can do is tweak the power limit for Intel Turbo Boost. It's not a board intended for overclockers ;).

I cleared IPMI logs before any new "attempt", and yet it got full 5 min after starting Memtest86 again (although nothing was reported in Memtest86 itself, just in IPMI/BMC).

Processor re-seating is another one I could try, that's true :).
 

GiGaBiTe

2[H]4U
Joined
Apr 26, 2013
Messages
2,266
I am tempted to also upgrade my Desktop, that way I can test these RAM DIMMs to see if they are the faulty ones. Some new NVIDIA GPUs (RTX3000) apparently had issues with Seasonic, although I am stuck on GTX 1000 for now ;). Would you say the Antec Signature Platinum 1000 compare well with Seasonic Prime PX 1000 ? Or should I rather go with Corsair HX1000? I heard some bad stuff with Corsair though, particularly about the *i* types (with this digital DSP control I believe), very picky about voltage fluctuations.
Both are cheaper than the Seasonic :).

For the small server I'd say I could get the following for approx. the same price (all of them 80+ gold):
- Seasonic CORE GC 500W
- Seasonic G12 GC Series 550W
- Cooler Master MWE Gold V2 550W
- Antec Neo Eco NE500G Zen 500W

Any of those are probably fine, bar the Corsair. I've had issues with their CX series mysteriously failing not even being loaded much.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
Any of those are probably fine, bar the Corsair. I've had issues with their CX series mysteriously failing not even being loaded much.
Just got a Antec Signature Platinum 1300, since it was about the same price as the Antec Signature Platinum 1000 (for the Desktop) :).
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
Short update. Just tried a different set of MC+CPU+PSU and 4x new&different sticks of RAM (2 sticks DPASAYN001 - 2146 / 2 sticks HM00004A33 / 2141). Same instability and lots of Uncorrectable ECC errors reported by BMC/IPMI. Debian Linux crashes/panic & reboot approx 30 seconds after login window. BIOS showed once again all memory detected.

Are we sure once again that it's not a RANK DIMM issue ? Although I cannot understand why single rank would be an issue, I thought that was always a quad rank vs dual rank thing, where quad rank DIMM would run into limitations.
 
Last edited:

GiGaBiTe

2[H]4U
Joined
Apr 26, 2013
Messages
2,266
Are you installing all four sticks at once, or one at a time? You should try one at a time, and make sure the stick is in the correct slot.

According to pages 33/34 of the manual, you need to start with slot B2 first, which is the farthest slot from the CPU. The next would be A2, then B1 and A1 last.

Installing less than four sticks in the wrong slots can cause a no-POST scenario.

Your 16 GB sticks could be unsupported because it's a single rank module. From the manual, the largest supported single rank stick is 8 GB. I don't know what your new different memory modules are because those P/Ns don't show up on a search.

But if you're still getting errors with a different CPU, the motherboard could have issues, like being warped and having broken socket-to-board BGA joints.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
Your 16 GB sticks could be unsupported because it's a single rank module. From the manual, the largest supported single rank stick is 8 GB. I don't know what your new different memory modules are because those P/Ns don't show up on a search.
It was the same P/N as before, that was the Batch number (just to say it's again different batches, even though I ordered at the same time from the same retailer).

I installed A1 and A2 (channel A) with one batch number, installed B1 and B2 (channel B) with the other batch number. The channels should be indipendent, therefore that should work.

I am really thinking it's a RANK issue for the DIMM, but then why does it get detected at all in the BIOS ??? It's detectable but not usable.

Hopefully I will receive Kingston KSM26ED8/16HD before this weekend, then I could try a dual RANK DIMM instead ...

Then I have to figure out how to send them back the incompatible ones, have still only 7 days for that.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
But if you're still getting errors with a different CPU, the motherboard could have issues, like being warped and having broken socket-to-board BGA joints.
It's the same CPU model, but a different one. Same with the MB: same MB model, but a different one.
 

GiGaBiTe

2[H]4U
Joined
Apr 26, 2013
Messages
2,266
I installed A1 and A2 (channel A) with one batch number, installed B1 and B2 (channel B) with the other batch number. The channels should be indipendent, therefore that should work.

Not correct. It should be B2 and A2 for one pair and B1 and A1 for the other pair. Try only having one stick installed in B2, or the farthest slot from the CPU socket.

I am really thinking it's a RANK issue for the DIMM, but then why does it get detected at all in the BIOS ??? It's detectable but not usable.

Memory controllers still aren't smart enough to detect the fine nuances of memory stick specs. I don't think the JEDEC tables for timings includes the memory module layout, so weird things happen when it expects one thing and it's actually something else.

Then I have to figure out how to send them back the incompatible ones, have still only 7 days for that.

Just be honest and say they aren't working on your setup, and you don't have another machine to test them on.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
Not correct. It should be B2 and A2 for one pair and B1 and A1 for the other pair. Try only having one stick installed in B2, or the farthest slot from the CPU socket.
Weird, I had the feeling it was channel A/B/C/D (at least on dual socket motherboards), and "1" and "2" were the number of the DIMM within that channel. Are you sure the "logic" is reversed on single socket boards ?

Memory controllers still aren't smart enough to detect the fine nuances of memory stick specs. I don't think the JEDEC tables for timings includes the memory module layout, so weird things happen when it expects one thing and it's actually something else.
No standard regulating interoperability? Damn it ...

Just be honest and say they aren't working on your setup, and you don't have another machine to test them on.
Here in Europe you get 14 days no-question-asked right of return. I'll have to return in 3 separate packages since that's how they came with. Whatever, 20 EUR total for return, not the end of the world. But I am in the middle of some building renovations too right now :(.
 

GiGaBiTe

2[H]4U
Joined
Apr 26, 2013
Messages
2,266
Weird, I had the feeling it was channel A/B/C/D (at least on dual socket motherboards), and "1" and "2" were the number of the DIMM within that channel. Are you sure the "logic" is reversed on single socket boards ?

There is no reversed logic, it's how it has always been on motherboards with multiple memory channels.

For dual channel it has almost always been either:

A | B | A | B
B | A | B | A

It is rarely ever:

A | A | B | B
B | B | A | A

Even back in the dark ages when we had ganged and unganged memory, the ganged slots were every other slot, not grouped together.

Even though the slots are labeled as A1, A2, B1, B2, all A or all B doesn't make dual channel.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
There is no reversed logic, it's how it has always been on motherboards with multiple memory channels.

For dual channel it has almost always been either:

A | B | A | B
B | A | B | A

It is rarely ever:

A | A | B | B
B | B | A | A

Even back in the dark ages when we had ganged and unganged memory, the ganged slots were every other slot, not grouped together.

Even though the slots are labeled as A1, A2, B1, B2, all A or all B doesn't make dual channel.
I think we are not talking about the same thing.

For me "A" is one channel and "B" is another channel. They are controlled by independent memory controllers. In order to have a "dual channel" RAM you need to populate both B2 and A2, we agree on that.

What I meant is that when mixing memory, all DIMMs belonging to one specific channel (say B2 and B1) must be of the same type/frequency/voltage/timings/etc. You can (not that you should, but the memory controller should be able to manage) install a pair of DIMMs in Channel A (A2 and A1) that is different from what is installed in Channel B (B2 and B1). One should not however have different DIMMs installed within the same channel, so if DIMM B2 is different than DIMM B1, problems are expected to arise.
 

Nobu

[H]F Junkie
Joined
Jun 7, 2007
Messages
8,538
From the quick ref guide:
Screenshot_20220405-074003.jpg


B2, A2, then B1, A1
 

GiGaBiTe

2[H]4U
Joined
Apr 26, 2013
Messages
2,266
What I meant is that when mixing memory, all DIMMs belonging to one specific channel (say B2 and B1) must be of the same type/frequency/voltage/timings/etc. You can (not that you should, but the memory controller should be able to manage) install a pair of DIMMs in Channel A (A2 and A1) that is different from what is installed in Channel B (B2 and B1). One should not however have different DIMMs installed within the same channel, so if DIMM B2 is different than DIMM B1, problems are expected to arise.

That's not correct, you're confusing yourself.

The DIMMs MUST match across channels. Matched sticks must be installed in B2 and A2, as well as B1 and A1. You don't put two of the same module in B1 and B2 and the other two of the same in A1 and A2.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
Just tried with Kingston KSM26ED8/16HD. Very short test so far, but all memory detected and not getting random reboots yet. IPMI/BMC event log is also empty (not getting Uncorrectable ECC Errors in System Event Log). Seems like the DIMM RANK matters after all :(. Too bad because the Crucial was quite a bit cheaper. Oh well, cannot really argue about price, if it doesn't work at all ;).
 

GiGaBiTe

2[H]4U
Joined
Apr 26, 2013
Messages
2,266
Good ole weird memory compatibility issues. Least you found some sticks that seem to work now.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
Good ole weird memory compatibility issues. Least you found some sticks that seem to work now.
Yep. Unfortunately I think I missed the 14-day return deadline on 1 DIMM stick out of 8. Not sure if they could still honor it with a reduced refund or it I have to try to sell it on eBay :(.
 

GiGaBiTe

2[H]4U
Joined
Apr 26, 2013
Messages
2,266
Unless it was a hideously expensive stick of RAM, probably good to keep in the spare parts pile. You'll just have to note what it is to remember about the rank issue.
 

luckylinux

Limp Gawd
Joined
Mar 19, 2012
Messages
225
Well those are the only DDR4 based systems I have and there it doesn't work. Approx 100 EUR.
 
Top