Jump to content

Welcome to We Got Served Forums
Register now to gain access to all of our features. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, post status updates, manage your profile and so much more. If you already have an account, login here - otherwise create an account for free today!
Photo

How Smart Is Smart?

- - - - -

  • Please log in to reply
10 replies to this topic

#1
Breeze

Breeze

    Member

  • Members
  • PipPip
  • 73 posts
My WHS console says all my disks are fine. WHS Disk Management says all my disks are fine. But while Home Server SMART console says all my drives are ok and "Falure Predicted" says "False" in all cases, when I view the SMART Status of one of my drives it has some dire warnings about it:

Large Pending Bad Sector Count
There are 289 sectors that are waiting to be reallocated.

Disk Health: Critical
The health of this disk is Critical. It is recommended that you replace it as soon as possible.

Uncorrectable Sectors Detected
There are 13 uncorrectable sector errors on the disk.


All the other disks say : Disk Status: Healthy. So what's going on? Is Home Server SMART blowing things our of proportion? How can it say that Failure is not predicted and still give these warnings? And if it's right, how can WHS and Disk Management not detect these problems? I'm confused...

Related question: how do you run a manufacturer's diagnostic on a WHS drive? Boot from CD?

Thanks for any help unraveling this.


Upgrade to a WGS Supporter Account to remove this ad.

#2
Sam Wood

Sam Wood

    Advanced Member

  • Add-In Developer
  • PipPipPip
  • 1,793 posts
  • Gender:Male
  • Location:New Zealand

My WHS console says all my disks are fine. WHS Disk Management says all my disks are fine. But while Home Server SMART console says all my drives are ok and "Falure Predicted" says "False" in all cases, when I view the SMART Status of one of my drives it has some dire warnings about it:



All the other disks say : Disk Status: Healthy. So what's going on? Is Home Server SMART blowing things our of proportion? How can it say that Failure is not predicted and still give these warnings? And if it's right, how can WHS and Disk Management not detect these problems? I'm confused...

Related question: how do you run a manufacturer's diagnostic on a WHS drive? Boot from CD?

Thanks for any help unraveling this.


Hey Breeze,

"Failure Predicted" relies on the disk manufacturer. They set the values of the other SMART counters that determine if "Failure Predicted" is true. In your case, the manufacturer of the disk doesn't think that 13 unrecoverable sectors is a fatal issue.

Home Server SMART examines a lot more SMART counters than the Server Storage tab or Disk Management, and the author has provided his own thresholds for what he considers is a fatal issue.

So the short answer is that SMART is unreliable at best, and is implemented differently depending on the manufacturer and model of the disk.
  • Breeze likes this

#3
wardog

wardog

    Advanced Member

  • Members
  • PipPipPip
  • 1,766 posts
  • Gender:Male
  • Location:Michigan, USA

Related question: how do you run a manufacturer's diagnostic on a WHS drive? Boot from CD?


Or floppy. Manufacturer's diagnostics are meant to be run from a bootable disc.

#4
wardog

wardog

    Advanced Member

  • Members
  • PipPipPip
  • 1,766 posts
  • Gender:Male
  • Location:Michigan, USA

Or floppy. Manufacturer's diagnostics are meant to be run from a bootable disc.


And doing so wouldn't be a bad idea either. If for nothing else, peace of mind.

#5
Breeze

Breeze

    Member

  • Members
  • PipPip
  • 73 posts
Thanks for the tips, guys. I'm going to run offline diagnostics to see what's what.

#6
J1mbo

J1mbo

    Advanced Member

  • Members
  • PipPipPip
  • 1,263 posts
  • Gender:Male
  • Location:UK
Contributor
One of the many Linux live ISO's with smartmontools would probably be worth a look. It can be used to initiate the various internal test procedures too.

However I would be inclinded to ditch the drive anyway. All IDE/SATA drives have reserved capacity to relocate bad sectors (which are normal), so there should never be any bad sectors visible from the OS. As an aside, because of this it's not possible to truely erase all data on a disk with the OS, since remapped sectors are no longer addressible from an OS, which can be handy for the forensic services :)

#7
Breeze

Breeze

    Member

  • Members
  • PipPip
  • 73 posts
I'm posting the following as an FYI. I had a hard time running the WD diagnostics on my WHS system:

1. I originally installed WHS with RAID enabled in the BIOS. WD Diag only works under legacy IDE. I'm not actively using RAID so I just set the BIOS to IDE for the testing. Apparently this is what you do even if you do use RAID as long as you don't boot into WHS, but since I haven't tested this check around to be sure.

2. It still wouldn't run after that because of a stupid defect in the latest WD diagnostic (see here), so I downloaded Ultimate Boot CD via http://www.ultimatebootcd.com/ and ran the WD diagnostic from that.

3. It STILL wouldn't run: turns out when it runs from UBCD using default settings, it may stop on "IDLE: Going resident FDAPM ADV:MAX", so you have to choose the second "optimal" option in the menu that launches after you select the WD diagnostic and intercept the message that asks if you want to install "fdapm" (power management) and uncheck it.

After all that, finally WD diagnostics ran. Quicktest says drive is ok (!?). Now running the extended test (4.5 hours to go).

TBC...

#8
Breeze

Breeze

    Member

  • Members
  • PipPip
  • 73 posts
9 hours and 42 minutes later... WD diagnostics reports that the "DRIVE HAS BEEN REPAIRED" along with error code 0223 which according to the relevant WD page:

"Errors found, but have been repaired successfully. There were media errors that were within the repair capabilities of diagnostic utility. The drive should now be defect free."

OK. So why is SMART still reporting that 289 bad sectors need to be re-allocated? Because the max has been achieved? Because the errors have been corrected but SMART data wasn't updated?

And why doesn't WHS even recognize this and deal with it directly? It would seem kind of important that a backup monitors the status of it's own backup hardware and take pre-emptive steps towards preventing disasters. I get the feeling that the prevalent attitude is "Oh it's ok, it's just a consumer server anyway..."

This has really shaken my confidence in WHS. For about $500 you can get a Dobro RAID box that does it's own disk analysis and monitoring and keeps a spare drive on hand for immediate substitution and subsequent replacement, on top of self monitoring of good and bad sectors. My opinion of WHS is changing from tool to toy; another half-baked OS thrown down the customer's throat. No shame.

So it would seem SMART is as smart as the system using it, which in this case is not very smart at all. <_<

#9
wardog

wardog

    Advanced Member

  • Members
  • PipPipPip
  • 1,766 posts
  • Gender:Male
  • Location:Michigan, USA

So it would seem SMART is as smart as the system using it, which in this case is not very smart at all. <_<

No. As Sam stated above "So the short answer is that SMART is unreliable at best, and is implemented differently depending on the manufacturer and model of the disk."

So long as you used the manufacturers diagnostics that covers your particular drive model, I would believe it over a 3rd party tool. As to why the counters aren't updated I wouldn't have a clue. A call into WD might provide an answer.

Bottom line is even the Drobo would be using assumptive predictions.

#10
J1mbo

J1mbo

    Advanced Member

  • Members
  • PipPipPip
  • 1,263 posts
  • Gender:Male
  • Location:UK
Contributor
This is really just as it was in the 1980's.., the OS still doesn't (and can't) mark sectors that require extended retries to read as bad. Perhaps there is no way for the drive to communicate that it's needed to perform some kind of retry mechanism to retrieve the data to the OS, which I suppose was the whole point of IDE.

It used to be normal for drives to have a bunch of 'marginal' sectors on them - back in the day I wrote a little utility to time reads to every cluster on the disk and flag in the FAT as bad any out-of-tollerance by a certain amount (hence detecting internal retries) as DOS and chkdsk, just as today, always actively tried to assume a sector was good and avoiding flagging bad where possible. As said proper hardware RAID will use many techniques to keep control on this kind of thing, which is why time-limited error recovery is really needed to avoid the controller marking the whole drive as bad/offline when a bad sector is found.

On the SMART query, it may be that the manufacturer's data in the field is a different interpretation. For example it might be relocated sectors, free recloation buffer or something similar. It probably isn't an absolute number anyway - take the narrative with a bit of a pinch of salt.

But back to the question, I would strongly suggest ditching this disk or at least using it for something else (external backup drive perhaps) - for your server you want 100% good disks IMO. Also bear in mind that this IS a consumer software product running on consumer grade desktop hardware in most cases. Look at the price of a Dell server with however much usable storage you have (even on SATA drives) with hardware RAID..., and why will be clear :o

#11
Breeze

Breeze

    Member

  • Members
  • PipPip
  • 73 posts
Thanks for the heads up guys. The smaller Dell servers start at $500 so that may be something to consider, though the hard core enterprise stuff is definitely out of my league. Apart from the 1 to 2 drive redundancy, this is what Drobo claims:

"Self-Healing Technology With the self-healing technology now incorporated into Drobo S, your data is safer than ever. Even when sitting idle, the Drobo will continually examine the blocks and sectors on every disk, flagging questionable areas. This preemptive “scrubbing” helps ensure your data is being written only to the healthy areas of your drives, and that your data is always safe. And if a drive fails, Drobo S will strive to keep your data in the safest state possible, utilizing the available space on the remaining healthy drives."

Which sounds exactly like what J1mbo is describing. How functional it is is another matter, but along with the automatic RAID swapping, it sure is more comforting than the way WHS does things.

As for ditching the drive: it's less than 2 months old. I'll contact WD and see what they say.

I used to see Winchester hard disks as a great way of storing large amounts of data. Now I'm starting to see them more like ticking time bombs, especially since they shortened warranties and reduced build quality. I still have a 13-year old 4 gig SCSI drive that still works... it's all about planned obsolescence isn't it?




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users


Upgrade to a WGS Supporter Account to remove this ad.