A better way to check your backups

A better way to check your backups

Reading Time: 6 minutes

Last year I wrote the piece “Check your backups” which focused on optical media backups. Backing up and checking those backups is an ever evolving process, we always look for faster, easier and more reliable ways to backup our data but checking backups has always been pretty much the same tedious, time consuming process.

Today I want to tell you about a little utility that I’ve had for years but never used; IntegrityChecker by diglloydTools. Part of a suite that I’ve been using for years but for some reason I’ve never explored this particular utility in that suite.

The concept is extremely simple. IntegrityChecker gives each file you point it to a SHA1 hash and stores it in a small invisible file alongside that file. That invisible file will live there indefinitely and holds those hashes for comparison next time you run IntegrityChecker to see if any files have changed. Let’s see this utility in action so you can see how invaluable this is when it comes time to check your backups.

First, We need something to backup, so I created a folder with super important data.

At this point it’s worth mentioning that it’s not important where this data will get backed up. It can go to an optical disc, another hard drive or a remote server somewhere. For this article, I will keep the focus on optical media as those are the most time consuming to verify later on.

With our super important data ready to be backed up, I fire up IntegrityChecker and point it to this folder of data. The Command menu is set to “Update All”, this updates all files, whether or not they already have validation info.

Once the Start button is clicked, the app will get to work and spit out a result similar to this:

========================================================================================================================
ic update-all /Volumes/Work and Misc/08. Backup Discs/SUPER IMPORTANT DATA (summary)
Friday, May 28, 2021 at 4:35:13 PM Eastern Daylight Time
========================================================================================================================

# Files with stored hash: 0
# Files missing: 0
# Files hashed: 65
# Files without hashes: 65
# Files whose size has changed: 0
# Files whose date changed: 0
# Files whose content changed (same size): 0
# Suspicious files: 0

Checking the data with invisible files made visible, the .ic files are now present.

Opening this file in a text editor reveals the file names that were hashed and their hashes, along with some other number sequences.

If even one bit of those files changes, the cryptographic hash would change so this is a very effective way of comparing data. A little detour to prove this point is in order so I will take one of these PDF’s, hash it, change something, save it, and hash it again. The file “L0406227.pdf” was hashed and was stored as “38A593454430ABBF6A3CCFDED2F096E1F764F474”. I opened the file and added one word of text, then saved it and hashed it again. The hash is now stored as “E3661CF01F744C67AE98111EB8D6BE7AA20DB872”.

Knowing how sensitive this process is, you can see how even the smallest data corruption of a file would result in a much different cryptographic hash which makes it incredibly easy to spot that data. Even when checking terabytes of backed up data, you can now fully automate it.

So the super important data was hashed, and can now be backed up. I used Toast to burn this data to a CD and ran IntegrityChecker to verify the data.

========================================================================================================================
ic verify /Volumes/SUPER IMPORTANT DATA
Friday, May 28, 2021 at 4:52:44 PM Eastern Daylight Time
========================================================================================================================

# Files with stored hash: 65
# Files missing: 0
# Files hashed: 65
# Files without hashes: 2
=========================
/Volumes/SUPER IMPORTANT DATA/:
Desktop DB
Desktop DF

# Files whose size has changed: 0
# Files whose date changed: 0
# Files whose content changed (same size): 0
# Suspicious files: 0

All the data was re-hashed and found to be identical to the original source data. This data and the optical media are 100%! Now I can wait a few years to re-test this disc and update this article then but I have a feeling nobody wants to wait that long. I was going to send this disc to Dana for the royal pliers treatment but that too would take longer than I’d like. Some scratching with a screwdriver was done by yours truly instead and the disc was re-verified. IntegrityChecker was unable to complete the job.

Disc successfully damaged, now we do what we would normally do when encountering a damaged disc; try to save the data (and not forget to recover the .ic files!). Of course one would have multiple backups of such important data but let’s try recovery anyway as we’re trying to test an app here. As much data as possible was extracted from the disc and now we need to see how much of the salvaged data is still intact. IntegrityChecker did it’s thing:

========================================================================================================================
ic verify /Desktop/Super Important Data Recovery
Friday, May 28, 2021 at 8:42:04 PM Eastern Daylight Time
========================================================================================================================

# Files with stored hash: 65
# Files missing: 43
===================
/Desktop/Super Important Data Recovery/Photos/:
GDFD3555.HEIC
IMG_7115.HEIC
–I removed a long file list here to keep the log short–
IMG_7224.MOV

# Files hashed: 22
# Files without hashes: 2
=========================
/Desktop/Super Important Data Recovery/:
Desktop DB
Desktop DF

# Files whose size has changed: 0
# Files whose date changed: 0
# Files whose content changed (same size): 0
# Suspicious files: 1
=====================
The following file contents have changed, but file dates and size have not changed. This could indicate data corruption.
However, some programs do alter files while keeping the same date, so it usually is innocuous.
/Desktop/Super Important Data Recovery/Photos/:
IMG_7212.HEIC

========================================================================================================================
ic verify /Desktop/Super Important Data Recovery (summary)
Friday, May 28, 2021 at 8:42:04 PM Eastern Daylight Time
========================================================================================================================

# Files with stored hash: 65
# Files missing: 43
# Files hashed: 22
# Files without hashes: 2
# Files whose size has changed: 0
# Files whose date changed: 0
# Files whose content changed (same size): 0
# Suspicious files: 1

So, 43 files were unable to be copied and we have a full list of exactly which files they are. And 1 file reports the contents have changed, this could mean data corruption. As it’s an image file I verified this and sure enough, the file is damaged.

I included this to show that just because a file was copied, does not mean it isn’t damaged. In this case the file thumbnail was present, the file size was the same and file date/time were the same. Unless I would have opened that file to check, I never would have found this corruption unless by chance. IntegrityChecker compared hashes and immediately found out something was different, this takes all the guesswork out of it. Luckily of course this super important data was backed up elsewhere. A quick scan of the disk image on the server showed the server backup data was OK so a restore of the lost and damaged files was possible.
The lost data was recovered, a new disc was burned and all is right in the world once again.

When you have large amounts of data to check, having IntegrityChecker do it for you will save you tons of time. No need to copy data to see if there are transmission errors, no need to pick random files to open and check to see if data was corrupted. Just let IntegrityChecker loose on the data and wait for it to finish. Then act accordingly if something is found that is not as it should be.

I now run IntegrityChecker on all the data that needs to be backed up and have hashed and re-backed up data that goes to the server or other local drives. For my large archive of optical media this is not going to help me but going forward all optical media backups will include the .ic files as well.

IntegrityChecker is paid software and you can get it here.
It works on Mac OS X 10.7 Lion through macOS 11 Big Sur which allows an enormous range or hardware and OS to use it. Even many years from now it should be no problem to get your hands on a Mac that runs any of these OS versions to verify your backups. And of course there’s no reason to think they won’t support future versions of macOS. As an added bonus, and I quote their website,

“There is no “time out” bomb in diglloydTools; once installed it will work with in perpetuity no further restrictions. Does not “phone home”, does not use the internet.”

Something I can certainly appreciate in this time of subscription models and dodgy phone-home connections.

So, get IntegrityChecker. Hash all your data prior to backing it up. And save yourself a lot of time and effort in the future!

One thought on “A better way to check your backups

  1. Hello Jay,
    This is quite an innovative design concept (Hash) utilized in Backup software.
    It is an integral part of the design structure in Blockchain Technology because of its contribution to accuracy.
    However, you stated that IntegrityChecker is designed to work in Mac OS 7 onwards.
    What I want to know is what do you know to be the best Backup software available for use in OS 5.8 Leopard. That is my ultimate Retro Power Mac and I would really love to have a practical effective software that will help me with my data.
    I’ll appreciate your guidance on this.
    Please e-mail me a reply as short as it may be (smile).

Leave a Reply

Your email address will not be published. Required fields are marked *