Part of a conversation between me and my neighbor this week:
Me: “Sounds like the drive is failing, have you checked the S.M.A.R.T. Status?”
Him: “What’s that?”
Being a geek and neck deep in various tech projects every day for the last two decades it’s sometimes easy to forget there are people out there that may have never heard of something I take for granted. Sure, readers of this blog are probably pretty in touch with their inner geek but you never know whom may stumble upon this article in the future. So here’s a very short explanation of what S.M.A.R.T. is.
Borrowed from wikipedia:
“S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) is a monitoring system included in computer hard disk drives (HDDs), solid-state drives (SSDs), and eMMC drives. Its primary function is to detect and report various indicators of drive reliability with the intent of anticipating imminent hardware failures.”
(for the remainder of this article I will just type ‘SMART’ as I can’t be bothered with all those periods)
In a perfect world, SMART can warn you if issues arise and give you enough time to make sure your backups are in order before the drive dies completely or becomes unreliable to the point of data corruption and/or system instability. Alas we are Mac users and one thing the macOS is just absolutely terrible at, is interpreting SMART parameters and notifying users of issues! Interpreting? Yes. SMART spits out raw data and it’s up to the operating system or application to interpret that data. This is why if you have the same failing drive and connect it to different computers, running different operating systems, and using different applications, you’ll likely get different results every single time. Confusing isn’t it?
While this can be annoying and confusing, it’s also a good thing for most power users. Some like to have incredibly sensitive software that alerts at the first hint of trouble (I’m one of them) and there are some that don’t want to be bothered until SMART errors start spewing out with every rotation of the platters.
In the case of an operating system such as macOS, SMART data interpretation is so conservative, the drive has to be on fire for it to think “hmm, there may be an issue here!”. But it gets worse. macOS has no mechanism to alert you of a SMART issue! You won’t know something is wrong until you open Disk Utility. Disk Utility is of course an application very few regular users will ever open, let alone open frequently to check their drive’s SMART status. As far as I’m concerned, macOS has no SMART monitoring or warning, at all. We might as well be back in the PPC days where SMART was not even a thing and the only way we’d know a drive had problems is when it simply stopped working or made horrible shrieking noises.
One example, here is a 500GB hard drive that suddenly started reallocating sectors (more on this later). I hooked it up to an external USB enclosure and checked it out in Disk Utility.
That’s right, Disk Utility can’t check the SMART status of external drives, another serious bummer. However when I put this drive in my Mac Pro, I see this:
So the drive must be OK, right? Here’s what DriveDx shows:
The drive is obviously experiencing something and Disk Utility feels it is not important enough to let us know. Oh, and 3rd party software has no problem reading the SMART status of external drives.
I know I have a folder full of screenshots like these *somewhere* but after an hour of searching I can’t find it. In that folder are examples of hard drives that have hundreds and thousands of errors, clicking, rattling and experiencing issues to the point where they are barely or not at all usable, and Disk Utility shows SMART status is “verified”! I’m very annoyed I can’t find this folder but I hope I’ve built enough credit so you’ll take my word for it.
Software to use
So, Disk Utility and macOS won’t warn us, what are the alternatives?
My go-to is DriveDx. Their interpretation of SMART data is very much to my liking and warns immediately when an issue is found. Another very good app is SMART Utility which also does a good job interpreting the data. There are many applications that offer SMART monitoring but the two I just mentioned are my favorites.
SMART Utility has an option that allows you to install a SAT SMART Driver. This driver allows the software to monitor the SMART status of external drives over USB or FireWire, a very very nice feature. Not only does macOS not leverage this driver once it’s installed, the fact that such a driver is not part of the OS by default is preposterous if you ask me. DriveDx does not offer this feature but the SAT SMART driver is free and open source so you can grab it here and use it with any software you’d like. Version 0.10 is the most current at the time of writing and even though it’s 6 years old, it still works great in macOS Mojave (I have not tested this in Catalina).
Interpreting the interpreters
Which ever software you go with to monitor SMART parameters, you still need to understand what is being shown to you. For example, the earlier mentioned “reallocated sectors”, what’s a sector and why is it being reallocated? DriveDx is very good at explaining what these errors mean but other software may not be. An online search can help out with understanding these parameters. It’s also important to know that simply because an error is shown, it doesn’t mean the drive is failing.
Example; I’m using a 4TB hard drive that is currently three years old. It has 3 reallocated sectors and because of this, it is rated at 72% health in DriveDX (no problem in Disk Utility of course). Any hard drive or SSD has spare sectors so that data can be moved there if a problem is found with the original sector. The amount of spare sectors on a HDD or SSD is anyones guess, this can range between 1% and 12% of the drive’s capacity last I looked into this. With 3 sectors being reallocated, I do not freak out but I do start closely monitoring the drive. If today a drive has 3 reallocated sectors and tomorrow it has 4, or 5, or more… something is wrong and this drive is probably on it’s way out. In my case, this drive reallocated 3 sectors years ago and never reallocated one since. This drive is rock solid and I trust it with important data.
I don’t want to go too far off topic but if you want to know what a drive sector is, have a look here at this very informative Wikipedia article. In the below illustration of a hard drive platter, ‘C’ is a sector.
Another example; I have hard drives that are unreliable junk according to DriveDX, SMART Utility and any other software that has checked them. Hundreds and thousands of CRC Errors, sounds serious right? It can be but in these cases it is not. CRC Errors happen when there is a transmission problem to or from the HDD or SSD and all of these drives come from 2009 – 2012 MacBook Pros that were notorious for their failing hard drive cables. I know for a fact these hard drives are OK, the cable was the problem. So even though all SMART monitoring software out there will write these drives off as scrap metal, I’ll happily use them. Of course these drives have not had any more errors since they were transferred to other Macs or were paired with a new hard drive cable.
An error does not necessarily mean a problem. But once you are alerted to a SMART error it is very important to keep an eye on it. If the error count goes UP, the drive is likely having a problem. If the error was a one time thing, depending on the parameter (I/O errors are almost always a hardware failure) you can possibly carry on and use the drive for years to come. Investigate the error, find out what it means and if there may be a bad cable or OS issue causing it. Once you’re comfortable with your research, make a decision. Keep using the drive or scrap it.
SMART data is open to interpretation and even though some software does a great job at interpreting it, it’s up to you to educate yourself and make the final call on a SMART warning.
When to check the SMART status
This is not something you want to do every few months, or set a calendar alert for. SMART monitoring should be 24/7/365.
A drive may be 100% at 5PM today, it may be a unrecoverable paper weight tomorrow 11AM. Depending on the issues a drive can limp along long enough for data recovery or it can completely run itself into the ground. You want to be alerted to any potential issues as soon as possible and the only way to do this is to have software keep a close eye on all of your drives whenever the computer is on.
I have DriveDx set to probe the drives every 30 minutes and en email is sent out if a problem is found. Drive self-tests are run every 10000 hours or so (initiated by me) and I typically discontinue the full time use of a drive once it has 40000-50000 hours on it (personal comfort level), that’s when it’s moved to older systems or used as spares for unimportant tasks.
Staying on top of these vitally important but rather fragile devices’ health status (HDD or SSD) is the best way to prepare for a potential problem. Of course having a solid backup strategy will further minimize the risk of data loss, as long as the backups are tested and verified!
I hope this was useful to you, let me know in the comments if you have any cool SMART related stories, I’d love to hear them 🙂