Skip to content

/bin/ls --wtf

So, I noticed this:
$ env -i /bin/bash                 # Clear the environment
$ touch foo bar\ baz               # Creates two files, "foo" 
                                   # and "bar baz"
$ ls -1
'bar baz'
Why is ls(1) suddenly quoting filenames that contain spaces? After a bit of digging, this commit introduced this change into GNU/coreutils, but at least Debian is on the case and fixed it in their version:
$ ls
bar baz

$ ls --quoting-style=shell
'bar baz'

Mediawiki Upgrade

Upgrading Mediawiki through Git seemed like a cool idea and worked quite well for a long time. But since Mediawiki 1.25 the update process changed considerably and just wasn't fun any more. As updates are a rare occurence anyway, I decided to switch back to tarballs instead. Let's try this, for Mediawiki 1.27:

 curl | gpg --import
 gpg --verify mediawiki-1.27.1.tar.gz.sig
 export DOCROOT=/var/www/
 cd $DOCROOT/mediawiki
 tar --strip-components=1 -xzf ~/mediawiki-1.27.1.tar.gz
Perform the necessary (database) updates:
 cd $DOCROOT/mediawiki
 script -a -c "date; php maintenance/update.php --conf `pwd`/LocalSettings.php" ~/mwupdate.log 
While we're at it, re-generate the sitemap:
 cd $DOCROOT/mediawiki
 mkdir -p sitemap && chmod 0770 sitemap && sudo chgrp www-data sitemap
 sudo -u www-data MW_INSTALL_PATH=`pwd` php maintenance/generateSitemap.php \
     --conf `pwd`/LocalSettings.php --fspath `pwd`/sitemap --server \
     --urlpath --skip-redirects
Remove/disable clutter:
 cd $DOCROOT/mediawiki
 chmod 0 docs maintenance tests
 sudo touch {cache,images}/index.html
Don't forget to upgrade the extensions as well:
 cd ../piwik-mediawiki-extension-git
 git checkout master && git pull && git clean -dfx
 git archive --prefix=piwik-mediawiki-extension/ --format=tar HEAD | tar -C $DOCROOT/mediawiki/extensions/ -xvf -
 cd ../MobileFrontend-git
 git checkout master && git pull && git clean -dfx
 git archive --prefix=MobileFrontend/ --format=tar origin/REL1_27  | tar -C $DOCROOT/mediawiki/extensions/ -xvf -
And with that, the new version should be online :-)

Installing NRPE in OpenWRT

With at least OpenWRT 15.05, the NRPE package appears to be unmaintained. We could should build the package manually, but before we do this, let's install an older version from our backups. For example:
$ ( cd ../backup/router/ && find . -name "*nrpe*" -o -name "check_*" | xargs tar -cf - ) | \
    ssh router "tar -C / -xvf -"
This should restore the NRPE binary, its configuration files and init scripts and all the check_* monitoring plugins. Did I mention that backups are important? :-)
With that, we're almost there:
 $ ldd /usr/sbin/nrpe => not found => not found => not found => /lib/ (0x77a64000) => /lib/ (0x779f7000) => /lib/ (0x77a88000)
Let's install the dependencies:
opkg install libopenssl libwrap
Add the nagios user:
echo 'nagios:x:50:' >> /etc/group
echo 'nagios:x:50:50:nagios:/var/run/nagios:/bin/false' >> /etc/passwd
echo 'nagios::16874:0:99999:7:::' >> /etc/shadow
Configure nrpe:
 $ grep ^[a-z] /etc/nrpe.cfg
 command[check_dummy]=/usr/libexec/nagios/check_dummy 0
 command[check_dns]=/usr/libexec/nagios/check_dns -H -s localhost -w 0.1 -c 0.5
 command[check_entropy]=/root/bin/ -w 1024 -c 512
 command[check_http]=/usr/libexec/nagios/check_http -H localhost -w 0.1 -c 0.5
 command[check_load]=/usr/libexec/nagios/check_load -w 4,3,2 -c 5,4,3
 command[check_ntp_time]=/usr/libexec/nagios/check_ntp_time -H -w 0.5 -c 1.0
 command[check_ssh]=/usr/libexec/nagios/check_ssh -4 router
 command[check_softwareupdate_opkg]=/root/bin/ opkg
 command[check_users]=/usr/libexec/nagios/check_users -w 3 -c 5
Let's try to start it, and enable it if it works:
 $ /etc/init.d/nrpe start
 $ ps | grep nrp[e]
 5320 nagios    2908 S    /usr/sbin/nrpe -c /etc/nrpe.cfg -d
 $ /etc/init.d/nrpe enable
And that's about it. Of course: since we're using an outdated NRPE version, we won't receive any (security) updates - so this setup should only be used in a trusted environment, i.e. not over the internet.

gpgkeys: HTTP fetch error 60: SSL certificate problem: Invalid certificate chain

After installing GnuPG from Homebrew, gpg was unable to connect to one of its key servers:
$ gpg --refresh-keys
gpg: refreshing 47 keys from hkps://
gpgkeys: HTTP fetch error 60: SSL certificate problem: Invalid certificate chain
The trick was to install their root certificate and mark it "trusted":
$ wget
$ open sks-keyservers.netCA.pem
	=> Trust always
Now the operation was able to complete:
$ gpg --refresh-keys
gpg: Total number processed: 47
gpg:              unchanged: 19
gpg:           new user IDs: 5
gpg:            new subkeys: 4
gpg:         new signatures: 1698
gpg:     signatures cleaned: 2
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:  19  signed:  12  trust: 0-, 0q, 0n, 0m, 0f, 19u
gpg: depth: 1  valid:  12  signed:   4  trust: 12-, 0q, 0n, 0m, 0f, 0u
gpg: next trustdb check due at 2018-08-19

MacOS Gatekeeper: Verifying...

There's VLC installed on this Mac via Homebrew Cask and every time VLC starts up, the dreaded Verifying... progress bar comes up:
VLC verifying...
Now, this message of course is generated by MacOS Gatekeeper, trying to do its job. Eventually the verification completes and VLC is started - but the process repeats every time VLC starts! And it's only happening for VLC, it doesn't appear for other applications installed with Homebrew Cask.

Fortunately, there's an easy workaround to stop that behaviour - we need to remove the extended attribute:
$ xattr -l /Applications/BrewBundle/ 0002;5123a312;Safari;4CC444EB-4444-44A4-4C44-4B444FBC4444

$ sudo xattr -d /Applications/BrewBundle/
Now VLC can be started w/o the verification delay :-)

XFS: Corruption warning: Metadata has LSN ahead of current LSN

This just happened again on a different machine, right after running xfs_repair:
$ sudo xfs_repair /dev/mmcblk0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 0
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

$ echo $?

$ sudo mount -t xfs /dev/mmcblk0 /mnt/disk
mount: wrong fs type, bad option, bad superblock on /dev/mmcblk0,

$ sudo dmesg -t | tail 
XFS (mmcblk0): Mounting V5 Filesystem
XFS (mmcblk0): Corruption warning: Metadata has LSN (20:50596) ahead of current LSN (1:2). Please unmount and run xfs_repair (>= v4.3) to resolve.
XFS (mmcblk0): log mount/recovery failed: error -22
XFS (mmcblk0): log mount failed
What happened here? Apparently, with the XFS v5 superblock the userspace tools (xfsprogs) also changed.

And so it happened that xfs_repair version 3.2.1 tried to check an XFS file system that had already enabled its v5 superblock format. But the version is too old to handle v5 superblocks and left the file system in an corrupt state.

Luckily it's easy to fix:
 > Kernel v4.4 and later detects an XFS log problem which is only fixed by
 > xfsprogs v4.3 or later. If you have encountered the inability to mount an
 > xfs filesystem, please update to this version of xfsprogs and run
 > xfs_repair against the filesystem.
And indeed:
$ /opt/xfsprogs/sbin/xfs_repair -V
xfs_repair version 4.5.0

$ sudo /opt/xfsprogs/sbin/xfs_repair /dev/mmcblk0
Phase 7 - verify and correct link counts...
Maximum metadata LSN (20:50596) is ahead of log (1:2).
Format log to cycle 23.

$ sudo mount -t xfs /dev/mmcblk0 /mnt/disk
$ mount | tail -1
/dev/mmcblk0 on /mnt/disk type xfs (rw,relatime,attr2,discard,inode64,noquota)
Phew! :-)

Character collation

So, recently I came across this funny behaviour on a SLES11sp4 machine:
sles11$ netstat -ni | awk '/^[a-z]/' 
Kernel Interface table
eth0   1500   0     3562      0      0      0     1955      0      0      0 BMRU
lo    16436   0       20      0      0      0       20      0      0      0 LRU
Wait, what? Why is the (uppercase) string "Kernel" matched against the lowercase "[a-z]" search expression? The same command on a SLES12sp1 machine does the Right Thing:
sles12$ netstat -ni | awk '/^[a-z]/' 
eth0   1500   0      685      0      0      0      438      0      0      0 BMRU
lo    65536   0       12      0      0      0       12      0      0      0 LRU
Apparently, this is not an unknown problem and can indeed be fixed by providing another LC_COLLATE variable:
$ netstat -ni | LC_COLLATE=C awk '/^[a-z]/' 
eth0   1500   0     3711      0      0      0     2032      0      0      0 BMRU
lo    16436   0       20      0      0      0       20      0      0      0 LRU
While providing a different LC_COLLATE variable did help, this still smells like a bug in SLES11, as the configured locales were exactly the same:
sles11$ locale 

sles11$ locale -k LC_COLLATE

sles11$ locale | md5sum 
677d9b3dbdf9759c8b604f294accd102  -

sles12$ locale | md5sum 
677d9b3dbdf9759c8b604f294accd102  -
Interestingly enough, both installations differ greatly in the way they look up locale information:
sles11$ echo | strace -e open awk '/^[a-z]/' 
open("/etc/", O_RDONLY)      = 3
open("/lib64/", O_RDONLY)     = 3
open("/lib64/", O_RDONLY)      = 3
open("/lib64/", O_RDONLY)      = 3
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3

sles12$ echo | strace -e open awk '/^[a-z]/' 2>&1 | grep -v ENOENT
open("/etc/", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/en_US.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3
open("/usr/lib/locale/en_US.utf8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/en_US.utf8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/en_US.utf8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/en_US.utf8/LC_TIME", O_RDONLY|O_CLOEXEC) = 3
open("/dev/null", O_RDWR)               = 3
+++ exited with 0 +++
Alas, no bug has been reported yet :-\

While this appears to be documented behaviour, it's still very confusing and may even violate the Principle of Least Surprise. FWIW, GNU/grep behaves as expected on both systems, no matter the collation:
$ echo Abc | egrep --color '[[:lower:]]'

PS: I forgot to mention how cool SUSE Studio is - this SLE12 test VM was up & running in minutes and accessible via SSH too and I didn't even have to fire up my local VirtualBox instance! :-)

umask & symbolic links on MacOS X

This just annoyed me again:
$ umask 0022
$ touch foo
$ umask 0066
$ ln -s foo bar

$ ls -lgo foo bar
-rw-r--r--  1   0 Mar  9 14:17 foo
lrwx--x--x  1   3 Mar  9 14:17 bar -> foo

$ sudo -u nobody cat foo bar
OK, this seems to work (the permissions are checked on the target, not the symlink), but not so with directories:
$ umask 0022
$ mkdir -p foo/file
$ umask 0066
$ ln -s foo bar

$ ls -ldgo foo bar
drwxr-xr-x  3   102 Mar  9 15:02 foo
lrwx--x--x  1     3 Mar  9 15:03 bar -> foo

$ sudo -u nobody ls -l bar
ls: bar: Permission denied
lrwx--x--x  1 admin  wheel  3 Mar  9 14:23 bar
Interestingly enough, it works if we append a slash to the symlink:
$ sudo -u nobody ls -lgo bar/
total 0
drwxr-xr-x  2  68 Mar  9 14:24 dir
This is annoying when a user has a more stringent umask for normal use, but temporarily elevates its privileges to install software, without adjusting the umask first. To clean up this mess afterwards, we can re-create the affected symbolic links:
$ umask 0022
$ find . -type l ! -perm -g+r | while read l; do
   target=$(readlink "$l") && rm -f "$l" && ln -svf "$target" "$l"
./bar -> foo

$ ls -ld foo bar
drwxr-xr-x  4 admin  wheel  136 Mar  9 14:37 foo
lrwxr-xr-x  1 admin  wheel    3 Mar  9 14:38 bar -> foo
Note: this has been seen in MacOS 10.10.5 on a Journaled HFS+ file system.


Every now and then I start up my OpenBSD VM to see how things are in BSD-land. And of course, after the VM has been asleep for a few month, I'd like to update the system too. As OpenBSD still uses CVS to manage their source repositories (for various reasons), we may have no other choice but to use it:
$ cd /usr/src/
$ time cvs -q up -rOPENBSD_5_8 -Pd
U usr.sbin/zic/zic.8
U usr.sbin/zic/zic.c
P usr.sbin/ztsscale/ztsscale.c
  158m51.96s real     0m16.85s user     7m34.07s system
The tree is about 780 MB in size and took 2.6 hours to complete. And we haven't even started the build yet. Wat?

There's an unofficial Git tree for openbsd-src, but before we revert to that, let's try the recommended alternative, CVSync.

Let's look at the available repositories first:
$ cvsync cvsync://
Name: openbsd, Release: rcs
 Comment: OpenBSD CVS Repository
Name: openbsd-cvsroot, Release: rcs
Name: openbsd-ports, Release: rcs
Name: openbsd-src, Release: rcs
Name: openbsd-www, Release: rcs
Name: openbsd-x11, Release: rcs
Name: openbsd-xf4, Release: rcs
Name: openbsd-xenocara, Release: rcs
We're just interested in openbsd-src for now:
$ sudo mkdir -m0775 /cvs && sudo chgrp wsrc /cvs       # We're not using doas yet.
$ cat /etc/cvsync_openbsd.conf
config {
       base-prefix /cvs

       collection {
               name openbsd-src release rcs
               umask 002

$ cvsync -c /etc/cvsync_openbsd.conf 
The initial sync took well over 3 hours to complete, but successive runs tend to complete in a few minutes, much less than updating with plain cvs.

However, the result is unusable yet:
$ ls -1 /cvs/src/sys/arch/`uname -m`/conf          
No, we have to checkout a local copy now, before we can start using it:
$ cd /usr/src
$ cvs -d /cvs checkout -P src
$ cvs -d /cvs up -Pd
Only now we'll be able to actually update the system. At last, the Git checkout was quick and so much less painful:
$ time git clone openbsd-src-git
real    12m57.329s
user    4m5.468s
sys     0m54.316s

$ cd $_
$ ls -1 sys/arch/`uname -m`/conf

Vacation pictures

The holidays are over and I had to dig through heaps of vacation pictures and wanted to create a little photo gallery for my fellow relatives to click through. After past experiments with Zenphoto and Piwigo, I wanted to switch to a much more simpler solution. One that wouldn't require a database backend and maybe didn't break after a few update cycles.

Looking at static image gallery generators I decided to try llgal again. The command line switches are more difficult to remember than tar, but here we go:
llgal --www --sort revtime --ct %Y-%m-%d -a -d . -k --title "Pictures of Foo"
This will process pictures in the current directory, with the following options:
--www           make all llgal files world-readable
--sort revtime  sort pictures in reverse-mtime (oldest pictures on top)
--ct %Y-%m-%d   use image timestamps as captions, YYYY-mm-dd
-a              write image sizes under thumbnails on index page
-d              operate in directory <dir>
-k              use the image captions for the HTML slide titles
--title         title of the index of the gallery
So far, so good. But some obstacles had to be tackled first:
  • Each picture on the camera was ~3-5 MB each and I didn't want to upload these large files to the gallery. So I resized the pictures with some photo program (not with GraphicsMagick) but now the file's mtime got mangled. GNU/touch was able to fix this.
  • The pictures were taken with two cameras. Unfortunately, one of the cameras had its system time off by two hours - this had to be fixed as well.
As all the pictures (from both cameras) are now in one directory, this is how it looked like:
$ exiftool -s DSCN_001.jpg IMG_002.jpg | grep ^DateTimeOriginal
DateTimeOriginal                : 2015:12:23 18:01:00
DateTimeOriginal                : 2015:12:23 16:03:00
In reality, DSCN_001.jpg was taken at 16:01 and should be listed before IMG_002.jpg. Luckily exiftool is able to correct the EXIF data:
export delta="00:00:00 02:00:00"            # format is YY:mm:dd HH:MM:SS
ls DSCN* | while read f; do
  echo "FILE: $f"
  exiftool -P -ModifyDate-="$delta" -DateTimeOriginal-="$delta" -CreateDate-="$delta" "$f"
  touch -r "$f" -d '-2 hours' "$f"
Although we corrected the file's mtime already, it was still mangled by the previous export step. Let's just extract the exact date from the EXIF data and correct the mtime again:
ls *JPG | while read f; do
  echo "FILE: $f"
  TZ=PST8PDT touch -d "$(exiftool -d %Y-%m-%d\ %H:%M:%S -s "$f" | awk '/^DateTimeOriginal/ {print $3,$4}')" "$f"
After another llgal run, the pictures were now listed in their correct order and ready to be consumed :-)

RTNETLINK answers: No such process

A colleague of mine presented me with a weird routing problem today and it took me a while to understand what was going on. The task was simple: add a network route via a certain gateway that can only be reached via a certain network interface. Let's re-create the setup:
# ip addr change dev eth2
# ip link set eth2 up
# ip addr show dev eth2 scope global
3: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
    link/ether 08:00:27:d0:34:51 brd ff:ff:ff:ff:ff:ff
    inet scope global eth2
Let's add a new route then:
# ip route add via dev eth2
RTNETLINK answers: No such process
Huh? Our eth2 is UP and should be able to reach, right? Let's look at the routing table1):
# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface   U         0 0          0 eth0         UG        0 0          0 eth0
Aha! For some reason the machine has lost its network route on the eth2 interface. Well, the machine has been online for a while and we don't know which admin did what and why. But although eth2 is configured and UP, it cannot reach its own network w/o a network route. Of course, the "ip addr change" does that automatically2) and we staged the whole thing for illustration purposes.

Let's add the missing route and try again:
# ip route add dev eth2 
# netstat -rn 
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface   U         0 0 0 eth2   U         0 0 0 eth0         UG        0 0 0 eth0

# ip route add via dev eth2
# netstat -rn 
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface   UG        0 0 0 eth2   U         0 0 0 eth2   U         0 0 0 eth0         UG        0 0 0 eth0
Yay! :-)

1) Sometimes the output from the iproute2 tools are not as easy to parse and I'll use good ol' net-tools again.
2) Unless we were to assign a /32 address to the interface, e.g. "ip addr change dev eth2"

You say Tomato, I say Shibby

So, this Netgear router went on to become a brick and could not be resurrected from the dead. But I had an old WRT54GL still lying around that could be used until I decide which router to buy as a replacement.

Unfortunately, as the WRT54GL is now 10 years old, neither DD-WRT nor OpenWRT ships any recent version of their firmware for this model. So what else is out there?

Enter Tomato, yet another firmware-for-your-router-project. And while the original project appears to be dormant, many mods were created and some of them are still active. I went with the most recent one, called Tomato by Shibby which offers images with both the Linux 2.4 and 2.6 kernel. I went with the latest K26 release (for MIPS1) that would fit into the 4 MB of flash memory on this router:
$ w3m | grep MIPSR1 | sort -nk4
file         2015-08-06 3610
file             2015-08-06 3786
file          2015-08-06 4157
Verify the checksum:
$ wget
$ md5sum -c MD5SUM 2>&1 | grep OK
image/tomato-K26-1.28.RT-MIPSR1-131-MiniIPv6.trx: OK
The Tomato image can be installed through the DD-WRT or OpenWRT GUI, as the firmware should be recognized by both systems. On OpenWRT, this can also be done from the command line:
sysupgrade -v -n tomato.trx
With that, Tomato by Shibby was up & running on this old WRT54GL :-)
$ ssh router
Tomato v1.28.0000 MIPSR1-131 K26 MiniIPv6
 Welcome to the Linksys WRT54G/GS/GL [router]
 Uptime:  14:15:23 up 1 day, 14:40
 Load average: 0.22, 0.05, 0.01
 Mem : used 92.2% (11.78 of 12.78 MB)
 WAN : @ 00:12:23:34:45:56
 LAN : @ DHCP: -
 WL0 : SSID @ channel: Worldwide9 @ 00:12:34:56:78:90

VirtualBox network performance

Some time ago I had some network performance issues with a VirtualBox guest and I was able to solve it by switching to a different NIC type. But I wanted to find out how the different types are performing and also if there's a difference between the different network modes too. And yes, there is! :-)

Results I

After some test runs, here are the results:
 HOST: Debian/GNU Linux 8.2 / Kernel 4.2.0 x86_64 (vanilla) / VirtualBox 5.0.4
GUEST: Debian/GNU Linux unstable / Kernel 4.2.0-trunk-amd64

Am79C970A / hostonly  580 Mbits/sec
Am79C970A / bridged  473 Mbits/sec
Am79C970A / natnetwork  640 Kbits/sec 1)
Am79C970A / nat  396 Mbits/sec

Am79C973 / hostonly  569 Mbits/sec
Am79C973 / bridged  285 Mbits/sec
Am79C973 / natnetwork  640 Kbits/sec
Am79C973 / nat  438 Mbits/sec

82540EM / hostonly  1.89 Gbits/sec
82540EM / bridged  1.86 Gbits/sec
82540EM / natnetwork  640 Kbits/sec
82540EM / nat  449 Mbits/sec

82543GC / hostonly  1.85 Gbits/sec
82543GC / bridged  1.91 Gbits/sec
82543GC / natnetwork  640 Kbits/sec
82543GC / nat  357 Mbits/sec

82545EM / hostonly  1.85 Gbits/sec
82545EM / bridged  1.90 Gbits/sec
82545EM / natnetwork  640 Kbits/sec
82545EM / nat  389 Mbits/sec

virtio / hostonly  705 Mbits/sec
virtio / bridged  682 Mbits/sec
virtio / natnetwork  640 Kbits/sec
virtio / nat  129 Mbits/sec
The clear winner appears to be 82543GC (Intel PRO/1000 T Server) for bridged mode or 82540EM (Intel PRO/1000 MT Desktop) for hostonly mode.

Results II

And again on a (slower) MacOS X host:
 HOST: MacOS 10.10.5 / X86_64 / VirtualBox 5.0.4
GUEST: Debian/GNU Linux 8.0 / Kernel 4.1

NIC: Am79C970A / MODE: hostonly  29.6 MBytes/sec
NIC: Am79C970A / MODE: bridged  29.9 MBytes/sec
NIC: Am79C970A / MODE: natnetwork  25.2 MBytes/sec
NIC: Am79C970A / MODE: nat  25.8 MBytes/sec

NIC: Am79C973 / MODE: hostonly  28.7 MBytes/sec
NIC: Am79C973 / MODE: bridged  30.0 MBytes/sec
NIC: Am79C973 / MODE: natnetwork  1.38 MBytes/sec
NIC: Am79C973 / MODE: nat  23.4 MBytes/sec

NIC: 82540EM / MODE: hostonly  45.4 MBytes/sec
NIC: 82540EM / MODE: bridged  38.2 MBytes/sec
NIC: 82540EM / MODE: natnetwork  61.3 MBytes/sec
NIC: 82540EM / MODE: nat  47.0 MBytes/sec

NIC: 82543GC / MODE: hostonly  43.0 MBytes/sec
NIC: 82543GC / MODE: bridged  44.7 MBytes/sec
NIC: 82543GC / MODE: natnetwork  64.7 MBytes/sec
NIC: 82543GC / MODE: nat  49.3 MBytes/sec

NIC: 82545EM / MODE: hostonly - (VM would not start)
NIC: 82545EM / MODE: bridged - (VM would not start)
NIC: 82545EM / MODE: natnetwork - (VM would not start)
NIC: 82545EM / MODE: nat - (VM would not start)

NIC: virtio / MODE: hostonly  43.3 MBytes/sec
NIC: virtio / MODE: bridged  46.6 MBytes/sec
NIC: virtio / MODE: natnetwork  10.9 MBytes/sec
NIC: virtio / MODE: nat  13.8 MBytes/sec
Here, the winner appears to be virtio for bridged mode and again 82540EM (Intel PRO/1000 MT Desktop) for hostonly mode. This time, both nat and natnetwork were working, with very different performance patterns.

Results III

On a different system, the iperf results varied greatly and I decided to run the test script longer and multiple times:
for a in {1..10}; do
   echo "### $a -- `date`"
  ~/bin/ vm0 300 2>&1 | tee vbox_nic_"$a".log
Looking at the report files we can already see that the "hostonly" network mode was the fastest, so let's run the report function over all the output files and sort by the fastest NIC:
$ for a in vbox_nic_*.log; do
   ~/bin/ report $a | grep hostonly | sort -u
done | sort -nk6 | tail -5
NIC: 82540EM / MODE: hostonly  228 MBytes/sec
NIC: 82540EM / MODE: hostonly  228 MBytes/sec
NIC: 82545EM / MODE: hostonly  228 MBytes/sec
NIC: 82543GC / MODE: hostonly  229 MBytes/sec
NIC: 82540EM / MODE: hostonly  231 MBytes/sec
So, either NIC (82540EM or 82543GC) should be the fastest in our setup.
1) For some reason, I couldn't get the new natnetwork mode to work on Linux. iperf measured "640 Kbits/sec" while in fact no data was transferred:
HOST$ iperf -t 3 -c -p 15001
Client connecting to, TCP port 15001
TCP window size: 2.50 MByte (default)
[  3] local port 51056 connected with port 15001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-18.5 sec  3.06 MBytes  1.39 Mbits/sec

GUEST$ sudo tcpdump -ni eth2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 262144 bytes
17:05:36.569862 IP > Flags [S], seq 6583, win 32768, options [mss 1460], length 0
17:05:39.574354 IP > Flags [S], seq 6583, win 32768, options [mss 1460], length 0
17:05:42.579472 IP > Flags [S], seq 6583, win 32768, options [mss 1460], length 0
17:05:45.584319 IP > Flags [S], seq 6583, win 32768, options [mss 1460], length 0
17:05:48.589318 IP > Flags [S], seq 6583, win 32768, options [mss 1460], length 0
17:05:51.593294 IP > Flags [S], seq 6583, win 32768, options [mss 1460], length 0
17:05:54.594851 IP > Flags [S], seq 6583, win 32768, options [mss 1460], length 0

More memory with ZRAM

Some of my machines have only little memory and I looked for a better way to utilize what little memory there is in the system. Without being able to increase the physical memory available, there are basically 3 options here:

  • Disable unused applications
  • Reduce the memory footprint of running applications
  • Cheat :-)
After exhausting the first two options, I remembered some memory management mechanisms for the Linux kernel that were introduced a while ago but I've never used them so far:


KSM (Kernel Samepage Merging) is a memory-saving de-duplication feature but it's only really useful for applications using the madvise(2) system call and is often used when hosting virtual machines, e.g. KVM. I'm not running KVM on this machine but activated it anyway - but no application seems to use madvise(2) and it didn't help anything regarding memory usage.


There's zswap, a lightweight compressed cache for swap pages. But instead of a real swap device, the compression takes place in a dynamically allocated memory pool. To enable it, the system must be booted with zswap.enabled=1 but I didn't want to reboot my system just yet, so I skipped that option for now.

Update: I've enabled zswap in the same VM from the zram test below and ran the same test - but the results are rather irritating:
$ awk '{print $NF}' /proc/cmdline 

$ i=0; while true; do
   [ $i -gt 2000 -a `expr $i % 50` = 0 ] && printf "$i  "
   bash -c "sleep 10000 &"; i=$((i+1))
2050  2100  2150  2200  2250  2300  ^C

$ free -m
              total    used  free  shared  buff/cache  available
Mem:            241    192       3       0     46        7
Swap:           127    127       0

$ pgrep -c sleep

$ grep -r . /sys/kernel/debug/zswap/
We max out at ~2300 instances of bash & sleep which is even less than when running without any compression...?


zram has been around for a while now and looked like the most promising contender. On a machine with 1GB RAM, I'd allocate 75% for our compressed swap device:
$ modprobe zram
$ echo 768M > /sys/block/zram0/disksize
$ mkswap /dev/zram0
$ swapon -p2 /dev/zram0
The machine is quite busy and it doesn't take long until it starts swapping to our new swap device1):
$ grep . /sys/block/zram0/{num_{reads,writes},{compr,orig}_data_size}
The compression ration is quite good, we're using only 42% of our precious real memory. I wanted to do some tests though to see if this can be measured in some kind of micro benchmark. In a 256MB Fedora Linux VM, we started GNU/bash along with /bin/sleep over and over again, let see how far we got:
$ i=0; while true; do
   [ $i -gt 2400 -a `expr $i % 50` = 0 ] && printf "$i  "
   bash -c "sleep 10000 &"; i=$((i+1))
2450  2500  2550  2600  2650  2700 ^C

$ pgrep -c sleep

$ free -m
              total    used  free  shared  buff/cache  available
Mem:            241    192       3       0     45        5
Swap:           127    127       0
All memory is used up and starting any more programs is almost impossible now. This was repeatable, it always stopped around ~2700 instances and then came to a grinding halt. Let's try again with ZRAM:
$ pkill sleep
$ modprobe zram && echo 128M > /sys/block/zram0/disksize && mkswap /dev/zram0 && swapon -p2 /dev/zram0
$ i=0; while true; do
   [ $i -gt 2500 -a `expr $i % 100` = 0 ] && printf "$i  "
   bash -c "sleep 10000 &"; i=$((i+1))
2600  2700  2800  2900  3000  3100  3200 ^C

$ pgrep -c sleep

$ free -m
              total    used  free  shared  buff/cache  available
Mem:            241    186       2       0     52        6
Swap:           255    209      46
With ZRAM enabled, it maxes out at ~3100, and makes it up to 3200 if we wait a bit longer (although we still seem to have 46MB free swap available). Again, this is also repeatable. And since we're only starting the same program over and over again, our compression ratio is even better1):
$ grep . /sys/block/zram0/{num_{reads,writes},{compr,orig}_data_size}
Btw, did someone say DriveSpace? :-)

1) Note: these sysfs entries will be deprecated in future kernel versions,

Filesystem data checksumming, pt. II

After my last post on filesystem data checksumming it took me a while until I could convince myself to actually set up regular checks of all the (important) files on my filesystems. The "fileserver" is a somewhat older machine and checksumming ~1.5TB of data takes almost 4 (!) days. Admittedly, the fact that I chose SHA-256 as a hashing algorithm seems to contribute to this long runtime. This being a private file server, MD5 would've have probably been more than enough.

But I wanted to know if this would really make a difference and wrote a small benchmark script, testing different programs and different digests on a particular machine. As always, the results will differ greatly from machine to machine - the following results are for this PowerBook of mine:
$ time ./ test.img 30 2>&1 | tee out.log
=> This took 3.5 hours to complete!

$ grep ^TEST out.log | egrep -v 'rhash_benchmark|SKIPPED' | sort -nk7
TEST: coreutils / DIGEST: md5 / 58 seconds over 30 runs
TEST: openssl / DIGEST: sha1 / 64 seconds over 30 runs
TEST: rhash / DIGEST: sha1 / 64 seconds over 30 runs
TEST: openssl / DIGEST: md5 / 75 seconds over 30 runs
TEST: rhash / DIGEST: md5 / 84 seconds over 30 runs
TEST: perl / DIGEST: sha1 / 121 seconds over 30 runs
TEST: rhash / DIGEST: sha224 / 140 seconds over 30 runs
TEST: openssl / DIGEST: sha224 / 141 seconds over 30 runs
TEST: rhash / DIGEST: sha256 / 141 seconds over 30 runs
TEST: openssl / DIGEST: sha256 / 169 seconds over 30 runs
TEST: coreutils / DIGEST: sha1 / 177 seconds over 30 runs
TEST: rhash / DIGEST: ripemd160 / 305 seconds over 30 runs
TEST: openssl / DIGEST: ripemd160 / 447 seconds over 30 runs
TEST: perl / DIGEST: sha256 / 637 seconds over 30 runs
TEST: perl / DIGEST: sha224 / 641 seconds over 30 runs
TEST: coreutils / DIGEST: sha256 / 653 seconds over 30 runs
TEST: coreutils / DIGEST: sha224 / 657 seconds over 30 runs
TEST: perl / DIGEST: sha384 / 660 seconds over 30 runs
TEST: perl / DIGEST: sha512 / 661 seconds over 30 runs
TEST: rhash / DIGEST: sha512 / 693 seconds over 30 runs
TEST: openssl / DIGEST: sha384 / 694 seconds over 30 runs
TEST: rhash / DIGEST: sha384 / 695 seconds over 30 runs
TEST: openssl / DIGEST: sha512 / 696 seconds over 30 runs
TEST: coreutils / DIGEST: sha512 / 1513 seconds over 30 runs
TEST: coreutils / DIGEST: sha384 / 1515 seconds over 30 runs
I've marked two entries here:
  • Originally I used coreutils to calculate a SHA-256 checksum of each file. In the test run above this takes 11 times longer to complete than MD5 would have taken.
  • Even if I decide against MD5 and choose SHA-1 instead, I'd have to switch to openssl because for some reason coreutils takes almost 3 times longer to complete.
The outcome of these tests means that I'll probably switch to MD5 for my data checksums - this also means that I have to 1) re-generate an MD5 checksum for all files and 2) remove the now-obsolete SHA-256 from all files :-\

Update 1: I omitted cksum and sum from the tests above, as they're not necessarily faster than other checksum tools:
$ n=30
$ for t in sum cksum openssl\ {md4,md5}; do
    START=$(date +%s)
    for a in `seq 1 $n`; do
        $t test.img > /dev/null
    END=$(date +%s)
    echo "TEST: $t / $(echo $END - $START | bc -l) seconds over $n runs"
done | sed 's/ md/_md/' | sort -nk4
TEST: openssl_md4 / 56 seconds over 30 runs
TEST: md5sum / 58 seconds over 30 runs
TEST: sum / 75 seconds over 30 runs
TEST: openssl_md5 / 76 seconds over 30 runs
TEST: cksum / 78 seconds over 30 runs
But again: these tests will have to be repeated on different systems, it could very well be that cksum might really be faster than everything else on another machine - maybe not :-)

Update 2: And it helped indeed: removing the SHA-256 checksum and calculating & attaching the MD5 checksum on 1.5TB of data (88k files) took "only" 31 hours. Which is still a lot, but a lot shorter than the "almost 4 days" we had with SHA-256 :-) Also, the next run won't have to remove the old checksum - it only has to do the verification step. What skewed this number even more was the fact that backups were running on the machine while it was doing the re-calculating stuff, so hopefully the next run will be even shorter.

Update 3: Just to document the cronjob running these regular checks from now on:
0 4 1 */2 *  root  /usr/local/sbin/ all
This will be run on the first day every second month at 4am.

Update 4: I just had to verify the checksum of two ISO images and did another comparison on the same PowerBook G4 machine:
$ ls -goh *.iso
-rw-r--r-- 1 3.2G Jul 24 16:20 file1.iso
-rw-r--r-- 1  440 Jul 24 19:58 file1.iso.sum
-rw-r--r-- 1 4.2G Jul 24 16:51 file2.iso
-rw-r--r-- 1  440 Jul 24 19:58 file2.iso.sum

$ for a in md5 sha1 sha256 sha512; do echo "$a"; time "$a"sum -c *.iso.sum; echo; done
real    8m17.404s
user    0m56.588s
sys     0m28.444s

real    11m12.638s
user    3m20.220s
sys     0m28.044s

real    21m12.057s
user    12m47.092s
sys     0m37.156s

real    40m56.836s
user    29m55.444s
sys     0m39.684s
So, each of the chosen "stronger" algorithm bascially doubles the execution time of a "weaker" one. Again, md5 is more than enough for our use case here.