Nvidia 4090 meltdown maybe an answer

Zarathustra[H]

Extremely [H]
Joined
Oct 29, 2000
Messages
35,455
https://videocardz.com/newz/nvidia-...manufacturers-of-12vhpwr-power-adapter-cables

TLDR;
There are 2 manufacturers of the 4090 power cables, one of them didn't do so great a job on quality control and design.

I guessed poor crimps on the connector from the very start. That's the most common way for this stuff to happen.

When you have poor crimps, you have lesser contact area, which results in resistive heating at high amps. Same thing that happened with my 8 pin motherboard 12v extension cable a while back.

1668551392513.png


1668551419445.png



Bad extensions and adapters are nothing new, which is why I generally don't use power cables of any kind that don't come with my PSU for high amp applications (motherboard 12v and PCIe power) anymore. I learned the hard way.


Through some miracle of fortune, after swapping the power supply and cable, this system continued to work just fine as my main desktop until I upgraded a few years ago.

The motherboard and CPU (an x79 with a Core i7-3930k in it) is still in use as my backup / testbench machine.
 

MavericK

Zero Cool
Joined
Sep 2, 2004
Messages
31,592

Ebernanut

[H]ard|Gawd
Joined
Dec 15, 2010
Messages
1,695
I guessed poor crimps on the connector from the very start. That's the most common way for this stuff to happen.

When you have poor crimps, you have lesser contact area, which results in resistive heating at high amps. Same thing that happened with my 8 pin motherboard 12v extension cable a while back.

View attachment 527015

View attachment 527016


Bad extensions and adapters are nothing new, which is why I generally don't use power cables of any kind that don't come with my PSU for high amp applications (motherboard 12v and PCIe power) anymore. I learned the hard way.


Through some miracle of fortune, after swapping the power supply and cable, this system continued to work just fine as my main desktop until I upgraded a few years ago.

The motherboard and CPU (an x79 with a Core i7-3930k in it) is still in use as my backup / testbench machine.
It looks like the one side isn't plugged in all the way.
 

Slade

2[H]4U
Joined
Jun 9, 2004
Messages
2,924
It seems like a plausible cause. It would explain a lot of the issues as all the evidence points to a resistance overheat condition. I'd be curious to get all those burnt up samples and tear them apart to confirm that theory.
 

Slade

2[H]4U
Joined
Jun 9, 2004
Messages
2,924
Oh look someone got a bunch of samples and examined them closely... what a coincidence...
 

ChadD

Supreme [H]ardness
Joined
Feb 8, 2016
Messages
5,903
"Now regardless of what actually formed that connection, we know that something did. Because the x-ray shows it did and so does the ah... part where there was fire. " - Tech Jesus.
 

DPI

[H]F Junkie
Joined
Apr 20, 2013
Messages
12,592
TLDW "Plug the connector in all the way".

Great video. IgorsLab also slapped-then-backhanded, and then slapped again.
 
Last edited:

Decko87

2[H]4U
Joined
Sep 23, 2007
Messages
2,169
TLDW "Plug the connector in all the way".

Great video. IgorsLab also slapped-then-backhanded, and then slapped again.
Yeah that's not really true though, Steve makes it very clear the fact that there is such a high potential for user error means it's a flawed design as well. He thinks its self to use but shouldn't be so easy for user error to occur.
 

Damar

Supreme [H]ardness
Joined
Jun 20, 2004
Messages
4,610
Just makes me want to stick to my plan of keeping my EVGA 3080 FTW for the time being, and maybe grabbing a spare one while there's still NIB stock around at decent prices.
 

MavericK

Zero Cool
Joined
Sep 2, 2004
Messages
31,592
Yeah that's not really true though, Steve makes it very clear the fact that there is such a high potential for user error means it's a flawed design as well. He thinks its self to use but shouldn't be so easy for user error to occur.
He also points out repeatedly that manufacturing errors causing debris or otherwise imperfect internal connectivity could also cause issues, they just weren't able to validate that themselves.

It's hard for me to believe that 100% of these issues are caused by people having the connector halfway unplugged *in addition to* bending at a weird angle.
 

DPI

[H]F Junkie
Joined
Apr 20, 2013
Messages
12,592
Yeah that's not really true though, Steve makes it very clear the fact that there is such a high potential for user error means it's a flawed design as well. He thinks its self to use but shouldn't be so easy for user error to occur.

What's untrue? I'd agree the connector seems like it could be revised to be more foolproof, but "Plug the connector in all the way" wasn't attempting to dispute or minimize that.

And that was just my takeaway for the video, not the entire issue. I'm also not assuming the entire issue is put bed by this one video, or that the lack of a tight connection is the sole reason for every legit case of a melted connector.
 
Last edited:

ChadD

Supreme [H]ardness
Joined
Feb 8, 2016
Messages
5,903
Yeah that's not really true though, Steve makes it very clear the fact that there is such a high potential for user error means it's a flawed design as well. He thinks its self to use but shouldn't be so easy for user error to occur.
Yes its a combo of user error... bad design... and some shit manufacturing. The ones that melted have not all melted for the exact same reasons. If the user error is very likely to happen with people with years of experience it is also a design error. That these don't have a good secure click is a big fail. That some of them have debris in the connectors is a big fail. That some of connectors on the GPU side are having Zinc plating come off is a pretty big manufacturing fail. Having worked in industrial supplies for many years I have a lot of experience getting things platted in small runs... I know how easy it is to screw a batch up. With tight tolerance fasteners, it is important to use a good batch numbering system so you can test reasonable amounts of product. With platting ideally you should be running some material testing on batches every X number of parts. To be fair I would assume the zinc plating on everyone's GPUs and power connecters are about the same... I mean oxidation should be an issue that doesn't pop up for years in electronic parts which isn't an issue with stuff that is obsoleted faster then that can be an issue. In this case though we are dealing with a ton of power on a very small connector... exposed copper with that much power going through it. I can see oxidation actually being an issue... beyond the issues with having loose bits of Zinc floating in the connector.
 

DPI

[H]F Junkie
Joined
Apr 20, 2013
Messages
12,592
No, I guessed at it, did I get close?

Heh, it wasn't quite as cut-and-dry as "Nvidia fucked up". But even if it was, the idea of a public statement of anything being wrong, of anything negative, or calling more attention to it by acknowledging it at all, isn't really the era we live in. Everything in corpo communication is run through cost-benefit analysis, legal dept., etc. The only time a "public statement" happens is when they've carefully calculated that it would be more expensive not to issue one. "Silence is the hardest argument to refute" .

Likeliest is Nvidia will work with PCI-SIG for a possible revised, more foolproof connector/cable design, and may even put up a page where you can request a revised cable for FE cards. Their AIB partners will each probably do their own least-effort thing - either put up a page where customers can request a revised cable, or - if I know Taiwanese AIB's, transfer the burden onto customers by making them request a revised cable through RMA'ing the original.

So Youtubers, tech bloggers and Reddit kids will continue to flail that the house is on fire "WHY iznt Nvidia recalling EVRY 4090", while Nvidia & partners will - at best - revise quietly.
 
Last edited:

Zarathustra[H]

Extremely [H]
Joined
Oct 29, 2000
Messages
35,455

Max you can draw on the good old 4 pin peripheral connector (erroneously referred to by many, myself included as "Molex"") is 5 amps, so that means 60w at 12v or 25w at 5v, but that's if nothing else is connected to the rest of the strand, if it just goes straight from the GPU to the PSU.

So, for a modern 450W gpu, assuming 75W of power comes from the PCIe slot, you'd need 7 of these to get enough power, with each of them going straight back to the PSU with nothing else connected to the strand :p

I think the existing 6pin and 8pin PCIe power connectors worked just fine. No need to replace them.

Actually, if I had my druthers, we'd just use a universal power connector for all things high powered. (Mainly video cards and motherboard additional power)

Just make everything the same, like the 8pin EPS 12v motherbopard connectors, able to carry 336w each. Two of those suckers ought to be enough for pretty much anything, and they are sturdy as all hell (as long as you don't let some shitty Chinese company crimp the connectors) and there would be no need to mess around with different cable types.

Discontinue the PCIe power standard, and this new 12pin standard and just use 8pin EPS for everything!
 

Zarathustra[H]

Extremely [H]
Joined
Oct 29, 2000
Messages
35,455
Heh, it wasn't quite as cut-and-dry as "Nvidia fucked up".

Well, if you are a manufacturer, you are responsible for your entire supply chain.

If your supplier fucks up, and you don't catch it, that means you fucked up.

In the end, you deliver a product. If it's not good, that's on you.

That said, I imagine Nvidia hasnt done a lot of buying and / or selling of these cables, but that it has been up to the AIB's.

So I'd guess that the AIB's fucked up by:

1.) Not better vetting their suppliers; and
2.) Having inadequate incoming inspection sampling

This should never have happened in the first place, had they properly vetted their supplier, and even if it did, their incoming inspection should have caught it, and everything should have gone on hold until resolved.

There are no excuses for shit like this going out to the field.

Granted, I work with medical products, and we have a teensy bit stricter quality standards than consumer electronics, but still. The buck stops with the brand/integrator/final manufacturer.
 

Accursed

Gawd
Joined
Mar 28, 2008
Messages
548
Heh, it wasn't quite as cut-and-dry as "Nvidia fucked up". But even if it was, the idea of a public statement of anything being wrong, of anything negative, or calling more attention to it by acknowledging it at all, isn't really the era we live in. Everything in corpo communication is run through cost-benefit analysis, legal dept., etc. The only time a "public statement" happens is when they've carefully calculated that it would be more expensive not to issue one. "Silence is the hardest argument to refute" .
I would still expect them to have SOME statement eventually. "Hey guys, be extra careful when connecting everything. Many of the problem we're seeing are caused by a few avoidable mistakes". But I get what you're saying, saying nothing is probably safer position for them.
 
Top