Skip to content

Hands Off & Google Chrome

Every few hours now, "Hands Off!" warns me about the following:

ksfetch wants to resolve dl.google.com

This is Google Software Update, asking permission for ksfetch to check for/retrieve updates for installed Google products.

But every time ksfetch starts, it starts from a random directory somewhere in /private/tmp - there's no way "Hands Off!" (or any other "application firewall") to create a path-based rule without being too broad. I.e. if we allowed /private/tmp/*/ksfetch, we could just allow everything - not a good idea.

Our best bet would be to change the frequency for updates to be checked, e.g. every two days:
  defaults write com.google.Keystone.Agent checkInterval 172800
Now the screen pops up only every other day, much better than every few hours :-)

Filesystem data checksumming

While Linux has many filesystems to offer, almost none of them features data checksumming.
Of course, everybody is looking to ZFS: Solaris has it since 2005, FreeBSD introduced it in 2007. What about Linux? The ZFSonLinux project looks quite promising, while ZFS-Fuse seems to more a proof-of-concept.

On MacOS X we have a similar picture: there's MacZFS, which I haven't looked into in a long time. But apparently it's supported for 64-bit kernels now, so maybe that's something to try again. And then there's Zevo, a commercial port of ZFS for Mac.

All in all, I wouldn't use these experiments for actual data storage just yet. Thus I decided to have data checksums in userspace - that is, generating a checksum for a file, storing it and checking it once in a while. Of course, this has various implications and drawbacks:
  • The filesystem is already in place and lots of files are already stored. By generating a checksum now we cannot say for sure if the file is still "correct" and we may generate a "correct" checksum for an already corrupt file.
  • Generating a checksum for new files doesn't happen on-the-fly. Instead it has to be done regularly (and possibly automacially too) if new files are added to the filesystem. While not generating checksums on-the-fly could translate to a better performance compared to data-checksum enabled filesystems, there's serious I/O to be expected once our "automatic" checksum generating scripts kick in.
  • After generating checksums we also have to implement a regular (and possibly automatic) verification and a some kind of remediation process on what to do if a checksum doesn't match. Is there a backup available? Does the checksum of the backup file validate? What if there are two backup locations and both of them have different (but validating) checksums? Mayhem!
  • Where do we store the checksums? In a separate file on the same filesystem? On a different filesystem, on some offsite storage? Where do we store the checksum for the checksum file?
Except for the last question there are probably no good answers and may be major issues depending on the setup. However, for me this was the only viable way to go for now: there's no ZFS port for this 12" PowerBook G4 and I didn't trust btrfs enough to hold my data.

In light of all these obstacles I wrote a small shell script that will generate a checksum for a file and store them as an extended attribute. Most filesystems support them and the script tries to accommodate MacOS X, Linux and Solaris (just in case UFS is in use).

The scripts needs to be run once for the full filesystem:
 find . -type f | while read a; do checksum_file.sh set "$a"; done
...and regularly when new files are added:
 find . -type f | while read a; do checksum_file.sh check-set "$a"; done
...and again to validate already stored checksums:
 find . -type f | while read a; do checksum_file.sh check "$a"; done
Enjoy! :-)

Notes:
  • On XFS, one needs to pay attention when EAs are used. Usually the attributes are stored in the inode - but when the attribute is too large for the inode, it has to be stored somewhere else and performance suffers :-\ Better use a bigger inode size when creating the XFS filesystem. This might or might not be true for other filesystems.

  • For JFS, the inode is fixed at 512 bytes and space for inline xattr data is limited to 128 bytes. Anything bigger will require more data blocks for the extended attributes to be stored.

  • While checksums for files may be important, this won't address corruption in other places of your (and my) machine. #tinfoilhat

Update: FWIW, on this particular machine of mine (throttled to 750MHz), with an external SATA disk attached via FW-400, a full initialization or verification of 800 GB data takes about one day. Yes, a full day. The major bottleneck seems to be CPU though, as the disk delivers around 30MB/s - but the dm-crypt layer slows this down to ~8 MB/s. With a newer machine this should be much faster.

GNU/screen & UTF-8

In a GNU/screen session on this box, this happens sometimes:
$ rm foo
rm: remove regular empty file ?foo??
Why does it show a question mark instead of an apostrophe?

Running the same command with a different or no LANG setting seems to help:
$ LANG=C rm foo
rm: remove regular empty file 'foo'?

$ LANG="" rm foo
rm: remove regular empty file 'foo'?

$ echo $LANG
en_US.UTF-8
I usually start my screen sessions with -U:
  > Run screen in UTF-8 mode. This option tells screen that your terminal sends and 
  > understands UTF-8 encoded characters. It also sets the default encoding for new
  > windows to ‘utf8’. 
So, we have LANG set to en_US.UTF-8 and screen(1) started in UTF-8 mode and still have character issues? Weird. But then I remembered: I use -U only when starting the screen session, not when resuming the session:
% screen -US MySession

$ echo $LANG; rm foo
en_US.UTF-8
rm: remove regular empty file 'foo'?
Upon resuming the session with "screen -dr" the question marks were back again. Resuming with -drU fixes that.