Skip to main content

Hands Off & Google Chrome

Every few hours now, "Hands Off!" warns me about the following:

ksfetch wants to resolve dl.google.com

This is Google Software Update, asking permission for ksfetch to check for/retrieve updates for installed Google products.

But every time ksfetch starts, it starts from a random directory somewhere in /private/tmp - there's no way "Hands Off!" (or any other "application firewall") to create a path-based rule without being too broad. I.e. if we allowed /private/tmp/*/ksfetch, we could just allow everything - not a good idea.

Our best bet would be to change the frequency for updates to be checked, e.g. every two days:

  defaults write com.google.Keystone.Agent checkInterval 172800
Now the screen pops up only every other day, much better than every few hours :-)

Filesystem data checksumming

While Linux has many filesystems to offer, almost none of them features data checksumming.

Of course, everybody is looking to ZFS: Solaris has it since 2005, FreeBSD introduced it in 2007. What about Linux? The ZFSonLinux project looks quite promising, while ZFS-Fuse seems to more a proof-of-concept.

On MacOS X we have a similar picture: there's MacZFS, which I haven't looked into in a long time. But apparently it's supported for 64-bit kernels now, so maybe that's something to try again. And then there's Zevo, a commercial port of ZFS for Mac.

All in all, I wouldn't use these experiments for actual data storage just yet. Thus I decided to have data checksums in userspace - that is, generating a checksum for a file, storing it and checking it once in a while. Of course, this has various implications and drawbacks:
  • The filesystem is already in place and lots of files are already stored. By generating a checksum now we cannot say for sure if the file is still "correct" and we may generate a "correct" checksum for an already corrupt file.
  • Generating a checksum for new files doesn't happen on-the-fly. Instead it has to be done regularly (and possibly automacially too) if new files are added to the filesystem. While not generating checksums on-the-fly could translate to a better performance compared to data-checksum enabled filesystems, there's serious I/O to be expected once our "automatic" checksum generating scripts kick in.
  • After generating checksums we also have to implement a regular (and possibly automatic) verification and a some kind of remediation process on what to do if a checksum doesn't match. Is there a backup available? Does the checksum of the backup file validate? What if there are two backup locations and both of them have different (but validating) checksums? Mayhem!
  • Where do we store the checksums? In a separate file on the same filesystem? On a different filesystem, on some offsite storage? Where do we store the checksum for the checksum file?
Except for the last question there are probably no good answers and may be major issues depending on the setup. However, for me this was the only viable way to go for now: there's no ZFS port for this 12" PowerBook G4 and I didn't trust btrfs enough to hold my data.

In light of all these obstacles I wrote a small shell script that will generate a checksum for a file and store them as an extended attribute. Most filesystems support them and the script tries to accommodate MacOS X, Linux and Solaris (just in case UFS is in use).

The scripts needs to be run once for the full filesystem:
 find . -type f | while read a; do checksum_file.sh set "$a"; done
...and regularly when new files are added:
 find . -type f | while read a; do checksum_file.sh check-set "$a"; done
...and again to validate already stored checksums:
 find . -type f | while read a; do checksum_file.sh check "$a"; done
Enjoy! :-)

Notes:
  • On XFS, one needs to pay attention when EAs are used. Usually the attributes are stored in the inode - but when the attribute is too large for the inode, it has to be stored somewhere else and performance suffers :-\ Better use a bigger inode size when creating the XFS filesystem. This might or might not be true for other filesystems.

  • For JFS, the inode is fixed at 512 bytes and space for inline xattr data is limited to 128 bytes. Anything bigger will require more data blocks for the extended attributes to be stored.

  • While checksums for files may be important, this won't address corruption in other places of your (and my) machine. #tinfoilhat

Update: FWIW, on this particular machine of mine (throttled to 750MHz), with an external SATA disk attached via FW-400, a full initialization or verification of 800 GB data takes about one day. Yes, a full day. The major bottleneck seems to be CPU though, as the disk delivers around 30MB/s - but the dm-crypt layer slows this down to ~8 MB/s. With a newer machine this should be much faster.

GNU/screen & UTF-8

In a GNU/screen session on this box, this happens sometimes:

$ rm foo
rm: remove regular empty file ?foo??
Why does it show a question mark instead of an apostrophe?

Running the same command with a different or no LANG setting seems to help:
$ LANG=C rm foo
rm: remove regular empty file 'foo'?

$ LANG="" rm foo
rm: remove regular empty file 'foo'?

$ echo $LANG
en_US.UTF-8
I usually start my screen sessions with -U:
  > Run screen in UTF-8 mode. This option tells screen that your terminal sends and 
  > understands UTF-8 encoded characters. It also sets the default encoding for new
  > windows to ‘utf8’. 
So, we have LANG set to en_US.UTF-8 and screen(1) started in UTF-8 mode and still have character issues? Weird. But then I remembered: I use -U only when starting the screen session, not when resuming the session:
% screen -US MySession

$ echo $LANG; rm foo
en_US.UTF-8
rm: remove regular empty file 'foo'?
Upon resuming the session with "screen -dr" the question marks were back again. Resuming with -Udr fixes that.

A more simple test may be just to print some characters:
$ echo -e '\xe2\x82\xac' 
€

$ screen -S foo
# echo -e '\xe2\x82\xac' 
�
^ad

$ screen -Udr foo
# echo -e '\xe2\x82\xac' 
â¬
And conversely, this time we start our session in UTF-8 mode:
$ screen -US foo
# echo -e '\xe2\x82\xac' 
€
^ad

$ screen -dr foo
# echo -e '\xe2\x82\xac' 
?

whois: Invalid charset for response

As if MacOS didn't have enough charset problems, here's another one:

$ /usr/bin/whois denic.de
% Error: 55000000013 Invalid charset for response
Although the problem has been reported to DENIC years ago, they still send out UTF-8 data if the handles contain e.g. umlauts.

But why can't the MacOS version of whois(1) handle UTF-8 data? A quick look on the binary reveals:
$ strings /usr/bin/whois
[...]
de.whois-servers.net
-T dn,ace -C US-ASCII %s
So, the -T dn,ace -C US-ASCII seems to be hardcoded, as we can see in the source:
#define GERMNICHOST	"de.whois-servers.net"
[...]
if (strcmp(hostname, GERMNICHOST) == 0) {
		fprintf(sfo, "-T dn,ace -C US-ASCII %s\r\n", query);
There's no -C switch to pass to whois(1) to change this behaviour. Experimenting with LC_ALL environment variables did not help either.

What did help was to pass options directly to their whois server
$ /usr/bin/whois -h whois.denic.de -- "-T dn,ace denic.de"
This way, -C US-ASCII is skipped and the (UTF-8) output can be displayed just fine.

Of course, we could also install whois from Macports, it seems to handle UTF-8 data just fine (although it had a similar problem years ago):
$ sudo port install whois

$ /opt/local/bin/whois denic.de | file -
/dev/stdin: UTF-8 Unicode text

$ /opt/local/bin/whois denic.de
[...]
[Tech-C]
Type: PERSON
Name: Business Services
Organisation: DENIC eG
Address: Kaiserstraße 75-77

Mozilla defaults

Every now and then I come across a new machine I've never logged in to before and starting Firefox for the first time. And then I always have to make my way through oh so many preference knobs and about:config entries just to get it into a usuable state.

So, while I knew the configuration could be tweaked via user.js, I never got around actually creating this file and adding some sensible defaults to it. Well, that's been done now. And with site-wide defaults, it's even more fun!

In short:

  • Create local-settings.js in defaults/pref/ underneath the Firefox installation directory.
  • Create firefox.cfg in the Firefox installation directory.
  • Create user.js inside your profile directory and fill it with some sensible defaults.
Of course, we can skip the first two steps and just fill user.js with the contents of firefox.cfg but we have to replace defaultPref and lockPref entries with user_pref.

You don't exist, go away!

After opening my laptop today, the first thing was of course to login to various systems, as I usually do. But this time I couldn't and instead was greeted with:

  $ ssh foobar
  You don't exist, go away!
At first I thought the remote system was at fault, but ssh would print the same message for every other system I was trying to login to. This had been reported by others already and after just clicking those links I tried again and this time ssh was able to login w/o a problem. So, while this was only a temporary issue, let's recap and dig into that once again.

Apparently, the error message is generated by the client:
$ strings `which ssh` | grep away
You don't exist, go away!
It's right there in ssh.c:
pw = getpwuid(original_real_uid);
	if (!pw) {
		logit("You don't exist, go away!");
		exit(255);
	}
So, the call to getpwuid() failed. Now, why would it do that? In the manpage it says:
   These functions obtain information from DirectoryService(8),
   including records in /etc/passwd
And /etc/passwd was there all the time (hah!), so maybe DirectoryService(8) screwed up? Let's see if we find something in /var/log/system.log:
14:59:57 coreservicesd[54]: _scserver_ServerCheckin: client uid validation failure; getpwuid(502) == NULL
14:59:58 loginwindow[376]: resume called when there was already a timer
14:59:58 coreservicesd[54]: _scserver_ServerCheckin: client uid validation failure; getpwuid(502) == NULL
There it is. Now, restarting coreservicesd (or securityd) would have helped, but by now the system was fully waken up from sleep and getpwuid() was able to do what it does - and ssh was working again, too. If it happens again and won't recover by itself - we know what to do :-)

Zero padding shell snippets

I was looking for a way to zero-pad a number sequence in bash. While the internet was helpful as usual, one particular post had lots of examples in its comments, very neat stuff.

Of course, with so many differeant approaches, this called for a benchmark! :-)

$ time bash padding_examples.sh bash41 1000000 > /dev/null
real    7m38.238s
user    3m7.056s
sys     0m7.884s

$ time   sh padding_examples.sh printf 1000000 > /dev/null
real    1m39.314s
user    0m41.244s
sys     0m2.064s

$ time   sh padding_examples.sh    seq 1000000 > /dev/null
real    0m10.883s
user    0m5.016s
sys     0m0.040s
So, seq(1) is of course the fastest - if it's not installed, use printf.

Update: with bash-4.0, the following is also possible:
$ time echo {01..1000000} > /dev/null
real    0m38.852s
user    0m14.948s
sys     0m0.260s
However, this will consume a lot of memory:
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
23468 dummy     25   5  189m 186m 1380 R  43.1 14.9  0:28.48  bash

rTorrent: Hash check on download completion found bad chunks, consider using "safe_sync"

rTorrent would not complete a download and print the following:
* file.foo
* [OPEN]  506.3 /  785.3 MB Rate: 0.0 / 0.0 KB Uploaded: 248.8 MB  [T R: 0.49]
* Inactive: Hash check on download completion found bad chunks, consider using "safe_sync".
Initiating a check of the torrent's hash (^R) succeeded and then rTorrent tried to download the remaining part of the file - only to fail again, printing the same message :-\

Setting safe_sync (which got renamed to pieces.sync.always_safe) did not help. There's a longish and old ticket that got closed as "invalid". While this might have been the Right ThingTM to do (see the LKML discussion related to that issue) there was another hint: decreasing max_open_files (which got renamed to network.max_open_files) to a lower value, say 64. Needless to say that this didn't help either, so maybe there's something else going on here.

strace might be able to shed some light on this, so let's give it a try. After several hours (and a good night's sleep) a 2GB strace(1) logfile was waiting to be analyzed. I only needed the part of the logfile up to where the error message occured first - and from there on upwards I'd search for negative exitcodes, as they will denote some kind of error. And lo and behold, there it was:
    mmap2(NULL, 292864, PROT_READ, MAP_SHARED, 13, 0x31100) = -1 ENOMEM (Cannot allocate memory)
Before we continue to find out why we failed, let's see how much memory we tried to allocate here. mmap2() is supposed to "map files or devices into memory":
    void *mmap2(void *addr, size_t length, int prot, int flags, int fd, off_t pgoffset);
In our case, size_t is 292864 (bytes) with an offset of 0x31100. However, this offset is in "pagesize units". So, what is our page size?
$ uname -rm && getconf PAGE_SIZE
3.9.0-rc4 ppc
4096
Let's calculate the size rTorrent was trying to mmap2() here:
$ bc
obase=16
4096
1000

ibase=16
obase=A
1000 * 31100                <== PAGE_SIZE * 0x31100
823132160

ibase=A
823132160 + 292864          <== add size_t
823425024
So, 823425024 bytes are 786 MB - we have 1.2 GB RAM on this machine and some swapspace too. Not too much, but this box mmap()'ed larger files than this before - why would mmap2() fail with ENOMEM here?

Maybe this "reduce max_open_files" hint tipped me off but now I remembered playing around with ulimit(3) a while ago. So maybe these ulimits were too tight?

And they were! Setting ulimit -v ("per process address space") to a larger value made the ENOMEM go away and rTorrent was able to complete the download:
$ ls -lgo file.foo
-rw-r----- 1 823425024 Apr  1 11:38 file.foo
...with the exact same size mmap2() was trying to allocate. Btw, we could've checked the file before rTorrent completed the download, because it's an sparse file anyway.

Update: while raising the ulimit(3) certainly resolved the ENOMEM issue, the torrent would still not complete successfully. Turns out it was a kernel bug after all, but it was resolved rather quickly.

Linux symlink restrictions

This happened:

$ uname -sr && id
Linux 3.8-trunk-amd64
uid=1000(dummy) gid=1000(dummy) groups=1000(dummy)

$ ln -s /usr/local/src /tmp/foo
$ ^D

# id
uid=0(root) gid=0(root) groups=0(root)

# ls -l /tmp/foo
lrwxrwxrwx 1 dummy dummy 14 Mar  7 21:30 /tmp/foo -> /usr/local/src

# ls -l /tmp/foo/
ls: cannot access /tmp/foo/: Permission denied
Huh? root is not allowed to follow a user's symlink? Turns out this is a feature now:
  > The solution is to permit symlinks to only be followed when outside
  > a sticky world-writable directory, or when the uid of the symlink and
  > follower match, or when the directory owner matches the symlink's owner.
And since /tmp usually has the sticky bit set, the access to /tmp/foo is denied. This access restriction is accompanied by audit messages:
 type=1702 audit(1362720689.110:28): op=follow_link action=denied \
           pid=22758 comm="ls" path="/tmp/foo" dev="sda1" ino=252
This can be tweaked via sysctl:
# sysctl -w fs.protected_symlinks=0
fs.protected_symlinks = 0

# ls -l /tmp/foo/
total 12092
-rw-r--r-- 1 root  staff    6794 Jan  9 22:59 bar
[...]

# sysctl -a | grep protected
fs.protected_hardlinks = 1
fs.protected_symlinks = 0
...but the default (fs.protected_symlinks=1) is kinda neat, now that I know about it :-)