Breeze

How Smart Is Smart?

11 posts in this topic

My WHS console says all my disks are fine. WHS Disk Management says all my disks are fine. But while Home Server SMART console says all my drives are ok and "Falure Predicted" says "False" in all cases, when I view the SMART Status of one of my drives it has some dire warnings about it:

Large Pending Bad Sector Count

There are 289 sectors that are waiting to be reallocated.

Disk Health: Critical

The health of this disk is Critical. It is recommended that you replace it as soon as possible.

Uncorrectable Sectors Detected

There are 13 uncorrectable sector errors on the disk.

All the other disks say : Disk Status: Healthy. So what's going on? Is Home Server SMART blowing things our of proportion? How can it say that Failure is not predicted and still give these warnings? And if it's right, how can WHS and Disk Management not detect these problems? I'm confused...

Related question: how do you run a manufacturer's diagnostic on a WHS drive? Boot from CD?

Thanks for any help unraveling this.

Share this post


Link to post
Share on other sites

Upgrade to a WGS Supporter Account to remove this ad.

My WHS console says all my disks are fine. WHS Disk Management says all my disks are fine. But while Home Server SMART console says all my drives are ok and "Falure Predicted" says "False" in all cases, when I view the SMART Status of one of my drives it has some dire warnings about it:

All the other disks say : Disk Status: Healthy. So what's going on? Is Home Server SMART blowing things our of proportion? How can it say that Failure is not predicted and still give these warnings? And if it's right, how can WHS and Disk Management not detect these problems? I'm confused...

Related question: how do you run a manufacturer's diagnostic on a WHS drive? Boot from CD?

Thanks for any help unraveling this.

Hey Breeze,

"Failure Predicted" relies on the disk manufacturer. They set the values of the other SMART counters that determine if "Failure Predicted" is true. In your case, the manufacturer of the disk doesn't think that 13 unrecoverable sectors is a fatal issue.

Home Server SMART examines a lot more SMART counters than the Server Storage tab or Disk Management, and the author has provided his own thresholds for what he considers is a fatal issue.

So the short answer is that SMART is unreliable at best, and is implemented differently depending on the manufacturer and model of the disk.

1 person likes this

Share this post


Link to post
Share on other sites

Related question: how do you run a manufacturer's diagnostic on a WHS drive? Boot from CD?

Or floppy. Manufacturer's diagnostics are meant to be run from a bootable disc.

Share this post


Link to post
Share on other sites

Or floppy. Manufacturer's diagnostics are meant to be run from a bootable disc.

And doing so wouldn't be a bad idea either. If for nothing else, peace of mind.

Share this post


Link to post
Share on other sites

Thanks for the tips, guys. I'm going to run offline diagnostics to see what's what.

Share this post


Link to post
Share on other sites

One of the many Linux live ISO's with smartmontools would probably be worth a look. It can be used to initiate the various internal test procedures too.

However I would be inclinded to ditch the drive anyway. All IDE/SATA drives have reserved capacity to relocate bad sectors (which are normal), so there should never be any bad sectors visible from the OS. As an aside, because of this it's not possible to truely erase all data on a disk with the OS, since remapped sectors are no longer addressible from an OS, which can be handy for the forensic services :)

Share this post


Link to post
Share on other sites

I'm posting the following as an FYI. I had a hard time running the WD diagnostics on my WHS system:

1. I originally installed WHS with RAID enabled in the BIOS. WD Diag only works under legacy IDE. I'm not actively using RAID so I just set the BIOS to IDE for the testing. Apparently this is what you do even if you do use RAID as long as you don't boot into WHS, but since I haven't tested this check around to be sure.

2. It still wouldn't run after that because of a stupid defect in the latest WD diagnostic (see here), so I downloaded Ultimate Boot CD via http://www.ultimatebootcd.com/ and ran the WD diagnostic from that.

3. It STILL wouldn't run: turns out when it runs from UBCD using default settings, it may stop on "IDLE: Going resident FDAPM ADV:MAX", so you have to choose the second "optimal" option in the menu that launches after you select the WD diagnostic and intercept the message that asks if you want to install "fdapm" (power management) and uncheck it.

After all that, finally WD diagnostics ran. Quicktest says drive is ok (!?). Now running the extended test (4.5 hours to go).

TBC...

Share this post


Link to post
Share on other sites

9 hours and 42 minutes later... WD diagnostics reports that the "DRIVE HAS BEEN REPAIRED" along with error code 0223 which according to the relevant WD page:

"Errors found, but have been repaired successfully. There were media errors that were within the repair capabilities of diagnostic utility. The drive should now be defect free."

OK. So why is SMART still reporting that 289 bad sectors need to be re-allocated? Because the max has been achieved? Because the errors have been corrected but SMART data wasn't updated?

And why doesn't WHS even recognize this and deal with it directly? It would seem kind of important that a backup monitors the status of it's own backup hardware and take pre-emptive steps towards preventing disasters. I get the feeling that the prevalent attitude is "Oh it's ok, it's just a consumer server anyway..."

This has really shaken my confidence in WHS. For about $500 you can get a Dobro RAID box that does it's own disk analysis and monitoring and keeps a spare drive on hand for immediate substitution and subsequent replacement, on top of self monitoring of good and bad sectors. My opinion of WHS is changing from tool to toy; another half-baked OS thrown down the customer's throat. No shame.

So it would seem SMART is as smart as the system using it, which in this case is not very smart at all. <_<

Share this post


Link to post
Share on other sites

So it would seem SMART is as smart as the system using it, which in this case is not very smart at all. <_<

No. As Sam stated above "So the short answer is that SMART is unreliable at best, and is implemented differently depending on the manufacturer and model of the disk."

So long as you used the manufacturers diagnostics that covers your particular drive model, I would believe it over a 3rd party tool. As to why the counters aren't updated I wouldn't have a clue. A call into WD might provide an answer.

Bottom line is even the Drobo would be using assumptive predictions.

Share this post


Link to post
Share on other sites

This is really just as it was in the 1980's.., the OS still doesn't (and can't) mark sectors that require extended retries to read as bad. Perhaps there is no way for the drive to communicate that it's needed to perform some kind of retry mechanism to retrieve the data to the OS, which I suppose was the whole point of IDE.

It used to be normal for drives to have a bunch of 'marginal' sectors on them - back in the day I wrote a little utility to time reads to every cluster on the disk and flag in the FAT as bad any out-of-tollerance by a certain amount (hence detecting internal retries) as DOS and chkdsk, just as today, always actively tried to assume a sector was good and avoiding flagging bad where possible. As said proper hardware RAID will use many techniques to keep control on this kind of thing, which is why time-limited error recovery is really needed to avoid the controller marking the whole drive as bad/offline when a bad sector is found.

On the SMART query, it may be that the manufacturer's data in the field is a different interpretation. For example it might be relocated sectors, free recloation buffer or something similar. It probably isn't an absolute number anyway - take the narrative with a bit of a pinch of salt.

But back to the question, I would strongly suggest ditching this disk or at least using it for something else (external backup drive perhaps) - for your server you want 100% good disks IMO. Also bear in mind that this IS a consumer software product running on consumer grade desktop hardware in most cases. Look at the price of a Dell server with however much usable storage you have (even on SATA drives) with hardware RAID..., and why will be clear :o

Share this post


Link to post
Share on other sites

Thanks for the heads up guys. The smaller Dell servers start at $500 so that may be something to consider, though the hard core enterprise stuff is definitely out of my league. Apart from the 1 to 2 drive redundancy, this is what Drobo claims:

"Self-Healing Technology With the self-healing technology now incorporated into Drobo S, your data is safer than ever. Even when sitting idle, the Drobo will continually examine the blocks and sectors on every disk, flagging questionable areas. This preemptive “scrubbing” helps ensure your data is being written only to the healthy areas of your drives, and that your data is always safe. And if a drive fails, Drobo S will strive to keep your data in the safest state possible, utilizing the available space on the remaining healthy drives."

Which sounds exactly like what J1mbo is describing. How functional it is is another matter, but along with the automatic RAID swapping, it sure is more comforting than the way WHS does things.

As for ditching the drive: it's less than 2 months old. I'll contact WD and see what they say.

I used to see Winchester hard disks as a great way of storing large amounts of data. Now I'm starting to see them more like ticking time bombs, especially since they shortened warranties and reduced build quality. I still have a 13-year old 4 gig SCSI drive that still works... it's all about planned obsolescence isn't it?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now



Upgrade to a WGS Supporter Account to remove this ad.



  • Posts

    • So I currently have Windows Server 2012 Essentials (running in a Hyper-V VM) and am thinking about migrating to a Windows 10 server. I have 12 disks or so on the server all configured via Storage spaces. Can I simply connect these disks up to a Windows 10 install and have all the files appear? From what I have read the storage space info is all on the disks and it should work, but has anyone here actually tried it and succeeded. I have 10+ TB of data and the easiest way to get it to the new server would be to just connect the disks up.   Thanks, -Bill
    • Did you try CC cleaner or a disk clean up program? It seems to me that old files merged or not are still existing, do you have external storage that you could export the known good files to while you reformat and clean the 6TB?
    • Hello Is it possible to clone an external HDD containing the server backup to another external HDD? The external HDD doesn't have a drive letter, so how can I clone the drive?  I have two external drives in a Dual Dockingstation (2 drives), which has a button to clone from A to B, but I don't trust that solution too much..
    • I have a whs 2011 server, which has been running fine for years, however since I have updated the client pc's to windows 10 anniversary edition I am constantly getting a services not running error on clients, having checked the services, all are in the correct state, I have tried restarting some of the obvious ones, but with no success. if I select the fix option in the alert list for the error, it clears, but then returns after 20 mins or so. Not sure what to do next to solve this.
    • well it took a great deal of time but the repair made it possible for the cleanup to go pass the 14% strange however is that even though I told it to delete all the bad backups and merge the good ones it did not clear up any of the disk space. And adding all the good merge backup sizes together shows there should be the normal 6Tb free available. So the failed backups are gone the good backups are all merged but the disk space remains at 212 Mb. not sure how to tackle this I am going to look at it again this next weekend to see if I missed something.
  • Popular Contributors

    Nobody has received reputation this week.