0xygenthief
01-26-2012, 03:09 PM
Considering the number of techs here I figured this would be a good place to both rant and seek assistance.
I am currently working with EVGA support to isolate the cause of my most recent issue with my new rig. I'll list my specs to set a baseline:
i7 940 c0/c1 stepping
ASUS Rampage III Gene mobo
Crucial M4 SSD for OS
Seagate 4tb for storage
G.skill 24gb ram 6 Dimms
2 EVGA 480 GTX SC GPUs
Thermaltake Toughpower 1200w PSU
So on my initial build after installing my OS I was unable to boot into windows after installing my graphics driver. Reinstalled the OS and installed all the drivers, saving the display drivers for last and was able to boot into windows successfully. That is, until I enabled SLI. After enabling SLI I couldn't boot into windows. So I went into safe mode, uninstalled the drivers and was again able to boot into windows.
At this point I started troubleshooting my hardware and have been working with EVGA on the GPU front with little success. Long story short, all hardware components work fine in other systems by themselves.
I have performed an exhaustive search online to find that this is not an isolated issue. In fact, the only consistent factor in this error is that systems running XP pre-sp3 are not affected and that this appears to happen more on systems running 64bit OSes. Another correlation is that some GPUs seem to be mentioned more than others, though I am not sure if this is simply because the cards or series are more popular or if they are just more prone to have this issue. In any case the Nvidia 400 series seem to be mentioned a whole lot more than others, this includes the 400M series. I can personally attest to this since my wife has a laptop getting this error and hers is running a 460M and one of my colleagues is sporting an ASUS laptop also with a 460M getting the same errors.
I say all that up front to help you all get out of the "it's your hardware" mindset. The fact that this is happening seemingly at random leads most to believe that this is a hardware issue. But the numbers don't lie. So many people having the same issue with similar hardware and OS configurations points to a driver, software or compatibility issue.
Digging deeper I find references to windows Timeout Detection and Recovery (TDR). Essentially windows vista and 7 have this new "feature" that detects when an app either crashes or is about to crash by keying off of a timeout value. Essentially when a program doesn't respond within a short amount of time windows cyles the driver forcing a driver recovery. The intent was to improve experience for gamers who might have gpu issues so that they may troubleshoot their hardware instead of simply getting a BSOD.
The problem is that my issues, along with most everyone else, are occuring outside of gaming. In fact, my system runs great in Skyrim and when going through benchmarks like Kombustor or OC Scanner. I get the driver recovered error when doing things like surfing the net via Internet Explorer or openning iTunes.
A few have linked this occurrence to the different votage/clock speeds that our graphics cards run while in either idle mode or 2D mode. Again doing my own research I found that the 400 series of cards have 4 power and clock states while the newer 500 series only has 3. The common modes are idle/2D/3D modes. Some have suggested the extra mode in the 400 series is a 3D light mode. In any case, the problem that I am experienceing is occurring in either the Idle, 2D mode or possibly in the transition between the two.
Again, the EVGA folks think it is a harware issue, but if that were the case I wouldn't be stable while running games right? Then again, I read where some folks RMA their stuff in and the replacement gear fixes the problem while others say it had no effect. Some say their issues come and go with the rise and fall of temps and others say they can avoid the errors by underclocking their gear (not undervolting). Again pointing to hardware as the likely culprit.
So to troubleshoot this I am considering flashing my vbios with a modified rom that will slightly bump up the juice for idle and 2d performance. Since underclocking gear seams to increase stability, wouldn't that suggest that upping the voltage might obtain the same results?
Has anyone attempted to flash their vbios before? I know it will void my warranty but I figure its worth a try...
I am currently working with EVGA support to isolate the cause of my most recent issue with my new rig. I'll list my specs to set a baseline:
i7 940 c0/c1 stepping
ASUS Rampage III Gene mobo
Crucial M4 SSD for OS
Seagate 4tb for storage
G.skill 24gb ram 6 Dimms
2 EVGA 480 GTX SC GPUs
Thermaltake Toughpower 1200w PSU
So on my initial build after installing my OS I was unable to boot into windows after installing my graphics driver. Reinstalled the OS and installed all the drivers, saving the display drivers for last and was able to boot into windows successfully. That is, until I enabled SLI. After enabling SLI I couldn't boot into windows. So I went into safe mode, uninstalled the drivers and was again able to boot into windows.
At this point I started troubleshooting my hardware and have been working with EVGA on the GPU front with little success. Long story short, all hardware components work fine in other systems by themselves.
I have performed an exhaustive search online to find that this is not an isolated issue. In fact, the only consistent factor in this error is that systems running XP pre-sp3 are not affected and that this appears to happen more on systems running 64bit OSes. Another correlation is that some GPUs seem to be mentioned more than others, though I am not sure if this is simply because the cards or series are more popular or if they are just more prone to have this issue. In any case the Nvidia 400 series seem to be mentioned a whole lot more than others, this includes the 400M series. I can personally attest to this since my wife has a laptop getting this error and hers is running a 460M and one of my colleagues is sporting an ASUS laptop also with a 460M getting the same errors.
I say all that up front to help you all get out of the "it's your hardware" mindset. The fact that this is happening seemingly at random leads most to believe that this is a hardware issue. But the numbers don't lie. So many people having the same issue with similar hardware and OS configurations points to a driver, software or compatibility issue.
Digging deeper I find references to windows Timeout Detection and Recovery (TDR). Essentially windows vista and 7 have this new "feature" that detects when an app either crashes or is about to crash by keying off of a timeout value. Essentially when a program doesn't respond within a short amount of time windows cyles the driver forcing a driver recovery. The intent was to improve experience for gamers who might have gpu issues so that they may troubleshoot their hardware instead of simply getting a BSOD.
The problem is that my issues, along with most everyone else, are occuring outside of gaming. In fact, my system runs great in Skyrim and when going through benchmarks like Kombustor or OC Scanner. I get the driver recovered error when doing things like surfing the net via Internet Explorer or openning iTunes.
A few have linked this occurrence to the different votage/clock speeds that our graphics cards run while in either idle mode or 2D mode. Again doing my own research I found that the 400 series of cards have 4 power and clock states while the newer 500 series only has 3. The common modes are idle/2D/3D modes. Some have suggested the extra mode in the 400 series is a 3D light mode. In any case, the problem that I am experienceing is occurring in either the Idle, 2D mode or possibly in the transition between the two.
Again, the EVGA folks think it is a harware issue, but if that were the case I wouldn't be stable while running games right? Then again, I read where some folks RMA their stuff in and the replacement gear fixes the problem while others say it had no effect. Some say their issues come and go with the rise and fall of temps and others say they can avoid the errors by underclocking their gear (not undervolting). Again pointing to hardware as the likely culprit.
So to troubleshoot this I am considering flashing my vbios with a modified rom that will slightly bump up the juice for idle and 2d performance. Since underclocking gear seams to increase stability, wouldn't that suggest that upping the voltage might obtain the same results?
Has anyone attempted to flash their vbios before? I know it will void my warranty but I figure its worth a try...