PDA

View Full Version : The infamous "Display Driver stopped responding and has recovered"


0xygenthief
01-26-2012, 03:09 PM
Considering the number of techs here I figured this would be a good place to both rant and seek assistance.

I am currently working with EVGA support to isolate the cause of my most recent issue with my new rig. I'll list my specs to set a baseline:

i7 940 c0/c1 stepping
ASUS Rampage III Gene mobo
Crucial M4 SSD for OS
Seagate 4tb for storage
G.skill 24gb ram 6 Dimms
2 EVGA 480 GTX SC GPUs
Thermaltake Toughpower 1200w PSU

So on my initial build after installing my OS I was unable to boot into windows after installing my graphics driver. Reinstalled the OS and installed all the drivers, saving the display drivers for last and was able to boot into windows successfully. That is, until I enabled SLI. After enabling SLI I couldn't boot into windows. So I went into safe mode, uninstalled the drivers and was again able to boot into windows.

At this point I started troubleshooting my hardware and have been working with EVGA on the GPU front with little success. Long story short, all hardware components work fine in other systems by themselves.

I have performed an exhaustive search online to find that this is not an isolated issue. In fact, the only consistent factor in this error is that systems running XP pre-sp3 are not affected and that this appears to happen more on systems running 64bit OSes. Another correlation is that some GPUs seem to be mentioned more than others, though I am not sure if this is simply because the cards or series are more popular or if they are just more prone to have this issue. In any case the Nvidia 400 series seem to be mentioned a whole lot more than others, this includes the 400M series. I can personally attest to this since my wife has a laptop getting this error and hers is running a 460M and one of my colleagues is sporting an ASUS laptop also with a 460M getting the same errors.

I say all that up front to help you all get out of the "it's your hardware" mindset. The fact that this is happening seemingly at random leads most to believe that this is a hardware issue. But the numbers don't lie. So many people having the same issue with similar hardware and OS configurations points to a driver, software or compatibility issue.

Digging deeper I find references to windows Timeout Detection and Recovery (TDR). Essentially windows vista and 7 have this new "feature" that detects when an app either crashes or is about to crash by keying off of a timeout value. Essentially when a program doesn't respond within a short amount of time windows cyles the driver forcing a driver recovery. The intent was to improve experience for gamers who might have gpu issues so that they may troubleshoot their hardware instead of simply getting a BSOD.

The problem is that my issues, along with most everyone else, are occuring outside of gaming. In fact, my system runs great in Skyrim and when going through benchmarks like Kombustor or OC Scanner. I get the driver recovered error when doing things like surfing the net via Internet Explorer or openning iTunes.

A few have linked this occurrence to the different votage/clock speeds that our graphics cards run while in either idle mode or 2D mode. Again doing my own research I found that the 400 series of cards have 4 power and clock states while the newer 500 series only has 3. The common modes are idle/2D/3D modes. Some have suggested the extra mode in the 400 series is a 3D light mode. In any case, the problem that I am experienceing is occurring in either the Idle, 2D mode or possibly in the transition between the two.

Again, the EVGA folks think it is a harware issue, but if that were the case I wouldn't be stable while running games right? Then again, I read where some folks RMA their stuff in and the replacement gear fixes the problem while others say it had no effect. Some say their issues come and go with the rise and fall of temps and others say they can avoid the errors by underclocking their gear (not undervolting). Again pointing to hardware as the likely culprit.

So to troubleshoot this I am considering flashing my vbios with a modified rom that will slightly bump up the juice for idle and 2d performance. Since underclocking gear seams to increase stability, wouldn't that suggest that upping the voltage might obtain the same results?

Has anyone attempted to flash their vbios before? I know it will void my warranty but I figure its worth a try...

Tiny
01-26-2012, 03:25 PM
If you flash the cards bios you will void the warranty if it turns out to be hardware. I would get an evga tech to tell you to flash it first. Then after they have it noted you can flash it.

0xygenthief
01-26-2012, 03:28 PM
I forgot to add that my errors have been reduced significantly while in non-sli mode after I modified my display driver settings to be in performance mode as opposed to adaptive for power savings. In fact, if I am not in performance mode I cannot enable sli-mode. Without the performance mode enabled my computer BSODs when I attempt to enable SLI. Definately a good bit of info as it points to power management as a potential cause or at least part of the problem.

0xygenthief
01-26-2012, 03:29 PM
If you flash the cards bios you will void the warranty if it turns out to be hardware. I would get an evga tech to tell you to flash it first. Then after they have it noted you can flash it.

Yeah, I have been trying to convince the techs that a vbios flash is the answer but its been all level 1 support thus far and I haven't gottent responses from the same tech twice. I honestly think they will end up having me RMA the cards since they will simply not want to go down the rabbit hole with me...

xmanrigger
01-26-2012, 04:23 PM
Oxygen, I think it maybe one of your cards. I know you didnt want to hear that. But that is what usualy happens if a high overclock fails.
I have used a dual GTX480-SLI with Win7 and on my P6X58D-E and never had one issue whatsoever.
This is something to try. Download MSI Afterburner. Remove one of the video cards. Apply a slight overclock to the card, say 750mhz on the core and bump the memory slightly. Then bump the voltage up about 3-4 steps. Run a 3D app or game and see if same thing happens. Do this to both cards. They should easily be able to run at those speeds. If one fails, it is likely the culprit. One of the cards maybe weak or needs to be taken apart and reapply thermal past and maybe padding.
I know you dont want to hear hardware, but that sounds to me like a weak card.

When I am benching and an overclock on the VGA(s) fails, that is what usualy happens. And when I go to reboot, the system freezes until I do a hard boot with the reset button or shut off machine. Once I fire it up again, all is well until I hit an unstable clock.

Tiny
01-26-2012, 04:35 PM
Oh and if they do rma the card or cards make sure to revert any flash you did to the bios. I would go with what xman said for testing. It tends to work a lot of the time.

Grnfinger
01-26-2012, 05:07 PM
Considering the number of techs here I figured this would be a good place to both rant and seek assistance.

I am currently working with EVGA support to isolate the cause of my most recent issue with my new rig. I'll list my specs to set a baseline:

i7 940 c0/c1 stepping
ASUS Rampage III Gene mobo
Crucial M4 SSD for OS
Seagate 4tb for storage
G.skill 24gb ram 6 Dimms
2 EVGA 480 GTX SC GPUs
Thermaltake Toughpower 1200w PSU

So on my initial build after installing my OS I was unable to boot into windows after installing my graphics driver. Reinstalled the OS and installed all the drivers, saving the display drivers for last and was able to boot into windows successfully. That is, until I enabled SLI. After enabling SLI I couldn't boot into windows. So I went into safe mode, uninstalled the drivers and was again able to boot into windows.

At this point I started troubleshooting my hardware and have been working with EVGA on the GPU front with little success. Long story short, all hardware components work fine in other systems by themselves.

I have performed an exhaustive search online to find that this is not an isolated issue. In fact, the only consistent factor in this error is that systems running XP pre-sp3 are not affected and that this appears to happen more on systems running 64bit OSes. Another correlation is that some GPUs seem to be mentioned more than others, though I am not sure if this is simply because the cards or series are more popular or if they are just more prone to have this issue. In any case the Nvidia 400 series seem to be mentioned a whole lot more than others, this includes the 400M series. I can personally attest to this since my wife has a laptop getting this error and hers is running a 460M and one of my colleagues is sporting an ASUS laptop also with a 460M getting the same errors.

I say all that up front to help you all get out of the "it's your hardware" mindset. The fact that this is happening seemingly at random leads most to believe that this is a hardware issue. But the numbers don't lie. So many people having the same issue with similar hardware and OS configurations points to a driver, software or compatibility issue.

Digging deeper I find references to windows Timeout Detection and Recovery (TDR). Essentially windows vista and 7 have this new "feature" that detects when an app either crashes or is about to crash by keying off of a timeout value. Essentially when a program doesn't respond within a short amount of time windows cyles the driver forcing a driver recovery. The intent was to improve experience for gamers who might have gpu issues so that they may troubleshoot their hardware instead of simply getting a BSOD.

The problem is that my issues, along with most everyone else, are occuring outside of gaming. In fact, my system runs great in Skyrim and when going through benchmarks like Kombustor or OC Scanner. I get the driver recovered error when doing things like surfing the net via Internet Explorer or openning iTunes.

A few have linked this occurrence to the different votage/clock speeds that our graphics cards run while in either idle mode or 2D mode. Again doing my own research I found that the 400 series of cards have 4 power and clock states while the newer 500 series only has 3. The common modes are idle/2D/3D modes. Some have suggested the extra mode in the 400 series is a 3D light mode. In any case, the problem that I am experienceing is occurring in either the Idle, 2D mode or possibly in the transition between the two.

Again, the EVGA folks think it is a harware issue, but if that were the case I wouldn't be stable while running games right? Then again, I read where some folks RMA their stuff in and the replacement gear fixes the problem while others say it had no effect. Some say their issues come and go with the rise and fall of temps and others say they can avoid the errors by underclocking their gear (not undervolting). Again pointing to hardware as the likely culprit.

So to troubleshoot this I am considering flashing my vbios with a modified rom that will slightly bump up the juice for idle and 2d performance. Since underclocking gear seams to increase stability, wouldn't that suggest that upping the voltage might obtain the same results?
Has anyone attempted to flash their vbios before? I know it will void my warranty but I figure its worth a try...



Sapphire released a bios for for the 5970, it suffered from 2d clock volts set to low and would crash.
I would back up the OEM bios and flash the cards. If you ever need warranty flash the fuckers back to stock

0xygenthief
01-26-2012, 05:38 PM
Oxygen, I think it maybe one of your cards. I know you didnt want to hear that. But that is what usualy happens if a high overclock fails.
I have used a dual GTX480-SLI with Win7 and on my P6X58D-E and never had one issue whatsoever.
This is something to try. Download MSI Afterburner. Remove one of the video cards. Apply a slight overclock to the card, say 750mhz on the core and bump the memory slightly. Then bump the voltage up about 3-4 steps. Run a 3D app or game and see if same thing happens. Do this to both cards. They should easily be able to run at those speeds. If one fails, it is likely the culprit. One of the cards maybe weak or needs to be taken apart and reapply thermal past and maybe padding.
I know you dont want to hear hardware, but that sounds to me like a weak card.

When I am benching and an overclock on the VGA(s) fails, that is what usualy happens. And when I go to reboot, the system freezes until I do a hard boot with the reset button or shut off machine. Once I fire it up again, all is well until I hit an unstable clock.

Interestingly enough EVGA had suggested something similar. Before making the post I had already done this. I used Afterburner to setup a 800/1950 OC and first ran it on each card then on both in SLI. Ran through the Kombustor benchmark with no issues. Fired up Skyrim with each card separately and with SLI and again no issues. All this with ZERO voltage tweaks. Both cards are star performers.

So, again, this isn't a hardware issue in the traditional sense. I can game with it just fine. I just can't surf the motherfucking web!

0xygenthief
01-26-2012, 05:40 PM
Sapphire released a bios for for the 5970, it suffered from 2d clock volts set to low and would crash.
I would back up the OEM bios and flash the cards. If you ever need warranty flash the fuckers back to stock

Makes sense, I remember reading that the 5970 suffered from this issue as well. I guess I am not too far off in my assumption then.

Gonzo
01-26-2012, 07:23 PM
ive personally always felt it had something to do with aero in win 7 cause if i turn aero off i never get that error msg. happens all the time for me with my gtx 460's and even my new gtx 570's hope u find a fix and let us know if u do im sure theres a few of us with this problem

xmanrigger
01-26-2012, 10:48 PM
"I CANT SURF THE WEB"

A safe bet is the system memory. I missed that point above when reading initially. As soon as I seen that, a bell went off in my mind.

Go get some other memory, stuff it in your board and see if still issues. And I mean all the memory. Change all sticks out. Before putting the new stuff in, you should also do a hard-reset of your BIOS by removing the battery and shorting the jumpers on your board manually. This will ensure there are no remnants of the settings for the other ram. Either that or run memtest and torque the livin shit out of the system memory you have and see if it shits the bed.

I worked on a system last year for a buddy with similar problems. He could either be set to SLI and game like a muthafuka and have his web browser all fucked up. Or disable SLI, and have a happy medium. After fucking with it for a day or so, we pulled the system memory and put some new shit in. All was well and still is.

Gonzo
01-26-2012, 11:49 PM
i think u nailed it xman. cause im using 4 dimms of memory thats only supported for 2. normally runs allright but occasionally it gives me problems.

xmanrigger
01-27-2012, 06:50 AM
i think u nailed it xman. cause im using 4 dimms of memory thats only supported for 2. normally runs allright but occasionally it gives me problems.

Check with the QVL in your mobo manual. Only use what they suggest or what you know will work.

0xygenthief
01-29-2012, 02:38 PM
Running Memtest now. I have a 24 GB kit installed. Upon startup the test got to a whopping 14% before it hung. Decided to test each stick individually. First stick passed without issues. Second wouldn't even allow me to boot, period. Third failed about 8% of the way through, ran it again and it made it to 41%. Testing the 4th now. I think I will try the second stick again just to be sure I didn't seat it correctly...

Just out of curiosity, if the memory fails does it absolutely mean the memory is bad or could it point to a motherboard issue? God I hope not...

0xygenthief
01-29-2012, 04:42 PM
Ok, so wtf? Earlier today I was unable to boot into windows. Essentially my rig would lock up within a second after signing in. So, thinking it MUST be a memory problem I performed MEMTEST on all my ram. The first stick ran through fine. Second not at all, third caused a lock up and the rest ran fine. So I went back through and ran the second and third again... Amazingly no issues! Again, WTF?

Thinking, hey my problems are fixed I attempted to log into windows (this time with only 1 Dimm. Again, my system locks up! WTF?

So I think to myself what has changed? I did remove one of my graphics cards for the RMA that was setup by EVGA. So I said fuck it, let me swap them out... I'll be damned if my system didn't let me log in without any issues.

So, at this point it "appears" that the problem is with my OTHER card, the one that was running fine yesterday.

I am wondering if there isn't something else going on here... I will hold off on sending anything in for an RMA, at least until tomorrow since I want to test the new configuration out after a cold boot.

This has got to be the most frustrating build I have ever fucking had!