Folks! I've got another card on the bench today, and that's this MSI 980 Ti Golden Edition. This card came to me as part of a lot of dead graphics cards, and it's the least promising among the bunch, so we're looking at it first.
I made a video for those who prefer such things. As an aside, if the mods don't approve of spamming my own videos here, I hope they'll reach out and warn me before I get banned. I couldn't find a list who the actual video card forum mods are. There use to be one, back in the before time...
Anyway, on to the card. This card is mainly interesting because it comes with this really neat all copper heatsink, which is sort of unusual. From the outside, the card looks pretty clean, and there's no obvious damage visible to explain why it doesn't work. I always liked this generation of MSI's cards - I think they're nice looking. In case it doesn't come across in the photo, the shroud protecting the fans is also copper colored, unlike the red shroud you get on the regular versions.
What the seller told me is that the system doesn't even power up with this card plugged in. That is a sure sign that we have a short through our 12V power plane. Before we attempt to test the card, we should check on this, as the potential exists to damage the card or some other part of the test system if it's the case.
We can do this without disassembling the card by testing the pins at the connector. I really prefer to use the solder joints, but this card has this handsome backplate that's in the way.

Resistance on the far connector... The 290X from our last thread is on the bench here to remind me which pins are ground and which are not.

It's interesting that it's pretty dead on 750 ohms. That's unusually low for a 12V rail, but I wonder if there's some sort of current monitoring shenanigans going on that requires it. Nvidia loves to prescribe that kind of thing, whereas AMD seems to like to just over-build the card and let it run really hot if it needs to.
Resistance on the near connector. As I suspected, basically a dead short.

We can't reach any of the solder joints to diagnose what failed with the heatsink on, so we'll need to remove it.

With the card disassembled, it's pretty obvious that we've got some serious damage going on.

Here's a couple of close-ups of it. The PCB got so hot in this area that it burned the coating off, and also melted the solder holding that dual MOSFET package on. See how the pins are bridged?


What we're seeing here is a textbook example of the high-side MOSFET failing and shunting 12V into somewhere it doesn't belong. I bet this smelled amazing.
The component that looks the most suspicious is a SinoPower SM7320 dual N-type MOSFET package. This isn't really a power stage, but rather just a high side and low side MOSFET together in one little package. We're clearly going to need to remove it, but let's take some resistance measurements first.
The Vcore rail. This isn't out of the question as a sane resistance for GPU this size, but it is a little bit low.

The memory rail. Looks... Sane, at least.
Edit: Shoot, I lost the image from this one. It's about 65 ohms.
Probably 3.3V from the slot connector. Important thing is we don't have a short.
Edit: You'll have to take my word for it that this was 365 ohms on the nose. Voltage monitoring of some sort?
This card doesn't have the great big memory controller rail like AMD cards of similar vintage do, so that's pretty much all the measuring we need to do for now. Our memory rail looks like it's probably undamaged, and given that the failed MOSFET appears to be part of the Vcore VRM (that is, the biggest one), it's probably a safe assumption that our damage is limited to that. As bad as the damage to the PCB looks, it is possible that the only actual short is inside that MOSFET package. Let's get the burned components off the board now, and see what happens with our short. Hot air station, go!!


After letting the board cool down a bit, we'll check our 12V resistance again. Presumably, we want to see something like 750 ohms, if they're doing the same current monitoring on this rail as they did on the other.

Unfortunately, our short is even worse now. With the MOSFET removed from the board, the next most likely suspect is the damaged PCB itself. I did some picking to see if it's just the outer layers touching, and ended up with this...

I also checked resistance on our removed SM7320. What we're looking at here is the resistance between the pad that gets 12V and the one that supplies the output to the GPU. With no voltage to the gate, we should several million ohms here, but we see instead is zero. This means we had 12V sent directly to the GPU circuit for some length of time, and if we assume the user tried to start the system back up, probably more than once. It also implies that the damage to the PCB probably happened later, as a result of this, and not the other way around.

At length, I was able to get the resistance through the board up to about 30 ohms. I suspect that if I got more aggressive, and Dremeled the top couple of layers in the damaged area, I could clear the short, but I'm doubtful that would be productive, given the likelihood of the core having taken 12V on the chin several times.

At this point, I think it's safe to conclude this card is beyond saving. That said, I've got a pretty awesome heatsink for it now, so I may revisit this card later if I can find a less totally destroyed card to use that heatsink, or any of its other parts on. I'm open to offers if anyone happens to have a dead MSI 980 Ti lying around.
I talked in the video about how I suspect this card died as a result of hamfisted overclocking coupled with, potentially, a weak power supply. This is basically the same mode of failure (at a grander scale) that we saw on the 690s last year, and I think a quirk of this card's design, where the gates appear to be driven with 12V, instead of 7V from another regulator, exacerbates the potential for the same thing to happen here. So, IMHO, if we've learned anything, it's that one should keep an eye on the health of one's power supply, as in some cases, relying on it to be healthy is the only thing keeping this from happening.
I made a video for those who prefer such things. As an aside, if the mods don't approve of spamming my own videos here, I hope they'll reach out and warn me before I get banned. I couldn't find a list who the actual video card forum mods are. There use to be one, back in the before time...
Anyway, on to the card. This card is mainly interesting because it comes with this really neat all copper heatsink, which is sort of unusual. From the outside, the card looks pretty clean, and there's no obvious damage visible to explain why it doesn't work. I always liked this generation of MSI's cards - I think they're nice looking. In case it doesn't come across in the photo, the shroud protecting the fans is also copper colored, unlike the red shroud you get on the regular versions.
What the seller told me is that the system doesn't even power up with this card plugged in. That is a sure sign that we have a short through our 12V power plane. Before we attempt to test the card, we should check on this, as the potential exists to damage the card or some other part of the test system if it's the case.
We can do this without disassembling the card by testing the pins at the connector. I really prefer to use the solder joints, but this card has this handsome backplate that's in the way.

Resistance on the far connector... The 290X from our last thread is on the bench here to remind me which pins are ground and which are not.

It's interesting that it's pretty dead on 750 ohms. That's unusually low for a 12V rail, but I wonder if there's some sort of current monitoring shenanigans going on that requires it. Nvidia loves to prescribe that kind of thing, whereas AMD seems to like to just over-build the card and let it run really hot if it needs to.
Resistance on the near connector. As I suspected, basically a dead short.

We can't reach any of the solder joints to diagnose what failed with the heatsink on, so we'll need to remove it.

With the card disassembled, it's pretty obvious that we've got some serious damage going on.

Here's a couple of close-ups of it. The PCB got so hot in this area that it burned the coating off, and also melted the solder holding that dual MOSFET package on. See how the pins are bridged?


What we're seeing here is a textbook example of the high-side MOSFET failing and shunting 12V into somewhere it doesn't belong. I bet this smelled amazing.
The component that looks the most suspicious is a SinoPower SM7320 dual N-type MOSFET package. This isn't really a power stage, but rather just a high side and low side MOSFET together in one little package. We're clearly going to need to remove it, but let's take some resistance measurements first.
The Vcore rail. This isn't out of the question as a sane resistance for GPU this size, but it is a little bit low.

The memory rail. Looks... Sane, at least.
Edit: Shoot, I lost the image from this one. It's about 65 ohms.
Probably 3.3V from the slot connector. Important thing is we don't have a short.
Edit: You'll have to take my word for it that this was 365 ohms on the nose. Voltage monitoring of some sort?
This card doesn't have the great big memory controller rail like AMD cards of similar vintage do, so that's pretty much all the measuring we need to do for now. Our memory rail looks like it's probably undamaged, and given that the failed MOSFET appears to be part of the Vcore VRM (that is, the biggest one), it's probably a safe assumption that our damage is limited to that. As bad as the damage to the PCB looks, it is possible that the only actual short is inside that MOSFET package. Let's get the burned components off the board now, and see what happens with our short. Hot air station, go!!


After letting the board cool down a bit, we'll check our 12V resistance again. Presumably, we want to see something like 750 ohms, if they're doing the same current monitoring on this rail as they did on the other.

Unfortunately, our short is even worse now. With the MOSFET removed from the board, the next most likely suspect is the damaged PCB itself. I did some picking to see if it's just the outer layers touching, and ended up with this...

I also checked resistance on our removed SM7320. What we're looking at here is the resistance between the pad that gets 12V and the one that supplies the output to the GPU. With no voltage to the gate, we should several million ohms here, but we see instead is zero. This means we had 12V sent directly to the GPU circuit for some length of time, and if we assume the user tried to start the system back up, probably more than once. It also implies that the damage to the PCB probably happened later, as a result of this, and not the other way around.

At length, I was able to get the resistance through the board up to about 30 ohms. I suspect that if I got more aggressive, and Dremeled the top couple of layers in the damaged area, I could clear the short, but I'm doubtful that would be productive, given the likelihood of the core having taken 12V on the chin several times.

At this point, I think it's safe to conclude this card is beyond saving. That said, I've got a pretty awesome heatsink for it now, so I may revisit this card later if I can find a less totally destroyed card to use that heatsink, or any of its other parts on. I'm open to offers if anyone happens to have a dead MSI 980 Ti lying around.
I talked in the video about how I suspect this card died as a result of hamfisted overclocking coupled with, potentially, a weak power supply. This is basically the same mode of failure (at a grander scale) that we saw on the 690s last year, and I think a quirk of this card's design, where the gates appear to be driven with 12V, instead of 7V from another regulator, exacerbates the potential for the same thing to happen here. So, IMHO, if we've learned anything, it's that one should keep an eye on the health of one's power supply, as in some cases, relying on it to be healthy is the only thing keeping this from happening.