EVGA GTX 980 Ti dropped dead after changing thermal paste

Dioxid3

n00b
Joined
May 17, 2022
Messages
5
Hey all!

I have had quite a joyride past five days. Last Saturday I ran into an issue, that my PC crashed when I was booting up more intensive games. No biggie, maybe a temp issue right? Well, I fired up HWinfo64 monitoring and logged data to check whether it'd be a temperature issue, whilst running furmark. "Performance Limit - Thermal [Yes/No]" didn't hit Yes even once. 'Huh, that's odd.' Then I started wondering, maybe it could be the PSU giving up? % of TDP showed no more than 40.8%, so that wasn't a dead give away even then.

I went and got another PSU, and ran furmark to see whether we'd crash again. We did. I then limited power to the card, using EVGA's software. 60% worked fine, 70% worked fine, but 80% was a show stopper. Crashed again. This time I decided that 'OK, lets change the paste as it hasn't been changed ever, should be simple and this does look a lot like crashing from high temp, maybe the sensors just didn't read well or something.'

Well it was, and it wasn't. Taking apart the card was hilariously easy - 4 screws held the heatsink to the GPU, along the fan controller and power for the LED strip on the side of the card. I took off the faceplate that held the heatpads on the VRAM and some other components as well, and cleaned any dust stringing along. The paste that was applied at the factory was very messy, which surprised me quite a bit. This was a card that I bought off a friend, who got the card in exchange for his old 980 ti going full bonkers, and got this one under warranty. This card was in plastic wraps when I got it, so I know nobody should've changed the paste on it before, unless it was a refurbished one.

Anyways, I cleaned off the paste and reapplied it, and slapped it all together. Except now the PC wouldn't start at all with the card attached. When the 980 ti is connected, you can hear the PSU's protections firing off, cutting the power. This marks the beginning of a saga barely longer than the LOTR: Director's Edition.

I have so far:
  1. Meticulously cleaned every nook and cranny, avoiding too much mechanical disturbance and using only chemicals designed for cleaning electronics.
  2. Checked that I did not accidentally bump off or dislodge any components.
  3. Measured the living hell out of the card. Now, I'm not too experienced in electronics and I am sure that my measuring is at best questionable, but after watching countless of videos about multimetering a GPU, I couldn't find a single issue. Not a blown fuse, nothing. Measured resistance through the GPU core, and it also seems to be alive and well.
  4. Plugged the card back in and out only to have the same result.
  5. PC is currently running on another GPU just fine, so we can be 100% sure it's something between the PCI-e connectors and ATX/PEG power cables.
The simplest explanation would be, that I've mismeasured something on the board and it is indeed dead. I've pestered about 3 different groups of PC builders and one group of electronic hobbyists, but I have now arrived at a dead-end. I've tried to look up any kind of diagram or reference that would point me to something I could have missed in measuring, but couldn't find anything. The only outlier I could find, was ground-PCIeX16's short comb, that showed like 05.0 or so ohms of resistance, but getting the reading from it was very hard as it jumped either over the limit, shower very high resistance or 05.0 ohms.

I don't know what to do anymore. I've spent so much time on this, it's not really even "I don't want to buy a new one" as much as "I really want to figure out what the hell has happened", and to learn from it. Any advice is more than welcome!
 

RazorWind

Supreme [H]ardness
Joined
Feb 11, 2001
Messages
4,346
First, what exact model of 980 Ti is this? The "SC" reference board version?

Edit: Can you post a photo of it, and mark where you measured the 5.0 ohms that seems to be abnormal?
 

chameleoneel

Supreme [H]ardness
Joined
Aug 15, 2005
Messages
5,999
Hey all!

I have had quite a joyride past five days. Last Saturday I ran into an issue, that my PC crashed when I was booting up more intensive games. No biggie, maybe a temp issue right? Well, I fired up HWinfo64 monitoring and logged data to check whether it'd be a temperature issue, whilst running furmark. "Performance Limit - Thermal [Yes/No]" didn't hit Yes even once. 'Huh, that's odd.' Then I started wondering, maybe it could be the PSU giving up? % of TDP showed no more than 40.8%, so that wasn't a dead give away even then.

I went and got another PSU, and ran furmark to see whether we'd crash again. We did. I then limited power to the card, using EVGA's software. 60% worked fine, 70% worked fine, but 80% was a show stopper. Crashed again. This time I decided that 'OK, lets change the paste as it hasn't been changed ever, should be simple and this does look a lot like crashing from high temp, maybe the sensors just didn't read well or something.'

Well it was, and it wasn't. Taking apart the card was hilariously easy - 4 screws held the heatsink to the GPU, along the fan controller and power for the LED strip on the side of the card. I took off the faceplate that held the heatpads on the VRAM and some other components as well, and cleaned any dust stringing along. The paste that was applied at the factory was very messy, which surprised me quite a bit. This was a card that I bought off a friend, who got the card in exchange for his old 980 ti going full bonkers, and got this one under warranty. This card was in plastic wraps when I got it, so I know nobody should've changed the paste on it before, unless it was a refurbished one.

Anyways, I cleaned off the paste and reapplied it, and slapped it all together. Except now the PC wouldn't start at all with the card attached. When the 980 ti is connected, you can hear the PSU's protections firing off, cutting the power. This marks the beginning of a saga barely longer than the LOTR: Director's Edition.

I have so far:
  1. Meticulously cleaned every nook and cranny, avoiding too much mechanical disturbance and using only chemicals designed for cleaning electronics.
  2. Checked that I did not accidentally bump off or dislodge any components.
  3. Measured the living hell out of the card. Now, I'm not too experienced in electronics and I am sure that my measuring is at best questionable, but after watching countless of videos about multimetering a GPU, I couldn't find a single issue. Not a blown fuse, nothing. Measured resistance through the GPU core, and it also seems to be alive and well.
  4. Plugged the card back in and out only to have the same result.
  5. PC is currently running on another GPU just fine, so we can be 100% sure it's something between the PCI-e connectors and ATX/PEG power cables.
The simplest explanation would be, that I've mismeasured something on the board and it is indeed dead. I've pestered about 3 different groups of PC builders and one group of electronic hobbyists, but I have now arrived at a dead-end. I've tried to look up any kind of diagram or reference that would point me to something I could have missed in measuring, but couldn't find anything. The only outlier I could find, was ground-PCIeX16's short comb, that showed like 05.0 or so ohms of resistance, but getting the reading from it was very hard as it jumped either over the limit, shower very high resistance or 05.0 ohms.

I don't know what to do anymore. I've spent so much time on this, it's not really even "I don't want to buy a new one" as much as "I really want to figure out what the hell has happened", and to learn from it. Any advice is more than welcome!
what thermal paste did you use?
 

mnewxcv

[H]F Junkie
Joined
Mar 4, 2007
Messages
8,815
Multimeter to see if 12v pins are shorted to ground. If so, remove heatsink and retest.
 

Dioxid3

n00b
Joined
May 17, 2022
Messages
5
Hi all, sorry for taking so long to reply, was a busy weekend and not home, so couldn't answer to anything really.

RazorWind :
First, what exact model of 980 Ti is this? The "SC" reference board version? Can you post a photo of it, and mark where you measured the 5.0 ohms that seems to be abnormal?

It is an EVGA GeForce GTX 980 Ti SC GAMING ACX 2.0+, Part Number: 06G-P4-4993-KR. I've included a video of all the power-supplying pins on the GPU as well as the part I mentioned.

chameleoneel :
what thermal paste did you use?
"Phobya HeGrease Extreme". No idea where it is from, it's some older paste I had laying around. Definitely not top-brand. If you are chasing after the fact that it'd conduct electricity, I highly doubt this was the case. Could be, but if it indeed did conducted electricity, we would probably have some kind of proof for it on the board. Not ruling it out, but this option has been entertained times and times again.

mnewxcv :
Multimeter to see if 12v pins are shorted to ground. If so, remove heatsink and retest.

This is something I actually didn't try out. I just plugged it in and filmed a video of it.

Sorry a longish video, feel free to speed it up and skip. Also holy crap I need a new webcam, the resolution is atrocious ahaha

EDIT: I just re-confirmed that screwing down the heatsink on the card brings the resistance down on the PCI-e bus/whatever. Unscrewed, FAN plugged, the resistance goes over the limit with 200 setting. Fan unplugged makes no difference. Earlier without heatsink, I used 2000 ohms setting when I got 05.0 so it wasn't in fact "5 ohms".
 
Last edited:

Dioxid3

n00b
Joined
May 17, 2022
Messages
5
When you take resistance measurements on the slot connector, which pin are you actually measuring?
https://pinoutguide.com/Slots/pci_express_pinout.shtml

The side shown in the video is side A, so the pins you should be measuring are A2 and A3.
I didn't actually even think about this. That also explains why most likely my readings differed so wildly. Both A2 & A3 measure about 12.0 on 20k ohm setting, with the heatsink on.
 

RazorWind

Supreme [H]ardness
Joined
Feb 11, 2001
Messages
4,346
Yeah, the pin matters. A lot. As for the reading, do you have a "K" on the display? So that's 12K ohms? If so, that's a normal reading.

Try this take a reading from each of the pads that I circled here. When the reading stabilizes, you should have 15K-20K on each one. Let us know what you actually have. If you find that you have 0K on one of them, set the meter to the lowest setting, and take another measurement on that pin, and report that as well. It'll probably be something like 2.5 ohms. Note no "K" - that's single digit ohms.
DSCF5796.jpg
 

Dioxid3

n00b
Joined
May 17, 2022
Messages
5
Yeah, the pin matters. A lot. As for the reading, do you have a "K" on the display? So that's 12K ohms? If so, that's a normal reading.

Try this take a reading from each of the pads that I circled here. When the reading stabilizes, you should have 15K-20K on each one. Let us know what you actually have. If you find that you have 0K on one of them, set the meter to the lowest setting, and take another measurement on that pin, and report that as well. It'll probably be something like 2.5 ohms. Note no "K" - that's single digit ohms.
I tried to consult the manual of the multimeter, but it doesnt mention anything about marking the thousands with K. It is a manual one, and nothing denotes of having such markkngs either. With lowest setting of 200ohms it goes OL so I assume we are good with the readings on A2 and A3

Regarding the pads you mention
  1. Resistance between the red and ground = OL, can’t seem to get a stable reading
  2. Blue & GND = 3.5-10 ohms. Mixed readings, tried to reposition second prong. Significantly less than the red one either way.
Im wondering whether its me or if the meter is just shit, reading differs depending on the moon and stars lmao

I’m also up for chat if this back and forth seems too hopeless 😂 Discord @ Dioxide#2133
 

Attachments

  • image.jpg
    image.jpg
    390.6 KB · Views: 0
  • image.jpg
    image.jpg
    535.6 KB · Views: 1
Last edited:

RazorWind

Supreme [H]ardness
Joined
Feb 11, 2001
Messages
4,346
That meter clearly isn't ideal for this task - I use a Voltcraft VC850 which features auto ranging among other things, but a bench type meter is really what you'd want for this. The bench type ones are an order of magnitude more expensive, though.

Anyway, your readings confirm what I suspected, which is that you have a short to ground, most likely through the GPU, on one of the core or memory power phases. What this means is that the 12V rail of the power supply is presently wired up directly to the GPU core or memory. The good news is that the GPU usually survives this, and will work if you can fix the short. The bad news is that fixing the short properly is theoretically trivial, but pretty difficult in practical terms.

The next step is to figure out which components are shorted. You could try current injection with 1.0V, but I'd guess that the short is complete enough that you wouldn't find anything conclusive. Another option is to guess, and start removing components until the short is cleared. I would start with the power stage closest to the PCI-E slot connector, and work my way up to the center. This board is notorious for having the closest one to the connector fail, but I've heard of other ones in that bank occasionally failing as well. Because the short is on the 8 pin connector, we know that it's one of the bottom four phases, and not the top four, which are connected to the six pin connector.

Note that in order to remove a power stage, you need a PCB preheater, a hot air rework station, and some flux. If you try to use a heat gun, you will almost certainly damage the board.
 

Dioxid3

n00b
Joined
May 17, 2022
Messages
5
Agh I accidentally closed the window without saving a draft and kinda rage quit.

mnewxcv didn't make any difference, sadly :(

I will find a local electronics hobby club and ask around whether someone has experience on this level of stuff, and if we could inject the 1.0V and as little current as possible, as RazorWind suggested. Should I inject on the same pin I measured? I found the drMOS in question, I think from one of your videos. If I got it right, there are 8 power stages (or phases, dunno if they are the same thing?), and that they are comprised of multiple components, but I suspect it's the drMOS as I've tested the rest and get anything between 3.4 to 12 ohms depending on the component, and there are no outliers. But, as you've said, this meter isn't really for this so I'm not ruling out anything, but will start troubleshooting the power stages from bottom to top. Will probably have to wait until my summer holiday, so will update then :)
 
Top