Splitting and joining tar archives

 Linux  Comments Off on Splitting and joining tar archives
Apr 272015

This is just a note to myself about splitting a tar archive into multiple smaller files on the fly and joining them again. It’s based on this answer to this question on stackoverflow.

# create archives
$ tar cz . | split -b 100GiB - /mnt/8TB/backup.tgz_

This uses the tar command to create a gzip-ed archive of all files in the directory tree starting with the current directory. The output is written to stdout.
This output is then piped into the split command that splits it into multiple files of 100 Gibibyte and writes them using the given prefix, appending aa, ab, ac etc. The "-" tells split to process its stdin rather than a file.
The input amount is massive and contains very may hard links (it’s a daily dirvish backup dating back 2 years).

To restore from this backup, the following command will be used:

# uncompress
$ cat /mnt/8TB/backup.tgz_* | tar xz

This uses the cat command to pipe all input files to stdout, concatenating them implicitly in the process. This is then piped to the tar command again which extracts it as a gzip-ed archive to the current directory. (I have to try that yet, the backup is still running.)

 Posted by on 2015-04-27 at 10:49

SyncThing as an alternative to BitTorrentSync (btsync)

 Android, Linux, Windows  Comments Off on SyncThing as an alternative to BitTorrentSync (btsync)
Apr 082015

A while ago I blogged about using BitTorrentSync as a privacy conscious alternative to the more popular cloud services like Google Drive or DropBox.

BitTorrent recently released btsync version 2 which, apart from trying to sell you a so called PRO version, changed the user interface yet again and also changed the way you set up your installations significantly. Actually, there seems to be no upgrade path, you have to configure your peers all over again. And, just in case that’s not enough incentive for looking for an alternative, the new Windows version does no longer run on Windows server OSes.

One possible alternative is SyncThing. It’s also peer to peer and the configuration is quite similar. In contrast to btsync it is open source. It is available for most desktop OSes and also for Android.

I tested the (official) 64 bit versions for Windows and Linux and it worked so far. For our Ubuntu server I used the initd script from the SyncThing forum. On Windows I just started the program in the console.

Next step is the Android version.

 Posted by on 2015-04-08 at 16:52

On expiring Dirvish images

 Linux  Comments Off on On expiring Dirvish images
Feb 162015

Dirvish is a backup solution for Linux (and probably other unixoid OSes). I use it to make a daily backup of one server to a different server located in a different building (it’s not the only backup solution we use but the most convenient one because we can access these files easily). Once set up, it runs automatically and I have configured it to send me an email with the result of the backup and the remaining free hard disk space. I’m not the only one who does that.

The backup server has multiple 2 tb disks mounted as a single btrfs volume, so the resulting disk space is huge. The backup has run flawlessly for over a year until now it started running out of space. So now is the first time I have actually to think about expiring old backups. (Bad Thomas, you should have given that a little bit more thought from the beginning.)

The way Dirvish handles expiry is like this:

You specify a default expiry rule and optionally more detailed expiry rules in either /etc/dirvish/master.conf or in the vault’s default.conf file. If you don’t change anything, most Dirvish installations will set the default to +1 year:

expire-default: +1 year

All this will do is add an Expire entry to the summary file of each image. To actually expire anything you must call dirvish-expire, but that call is usually added to cron automatically by the dirvish package via the /etc/dirvish/dirvish-cron shell script:

# other stuff ...
/usr/sbin/dirvish-expire --quiet && /usr/sbin/dirvish-runall --quiet

Now, expiring every image after one year is probably not the best backup strategy. You usually want to keep one copy every week, every month and every year for a longer period, maybe even forever. So more complex rules are required and must be added to either /etc/dirvish/master.conf or default.conf of an individual vault. I opted for keeping Friday backups forever and deleting everything else after 9 months, so the configuration looks like this:

expire-default: +9 months

# keep Friday backups forever
# (for everything else we use the default from above)
#       MIN    HR      DOM     MON     DOW     STRFTIME_FMT
        *       *       *       *       fri     never

The rules follow crontab format, so the line above means:

  • ignore the minute
  • ignore the hour
  • ignore the day of the month
  • ignore the month
  • only for Friday

Here is a good explanation about how these rules work.

Now, since all this does is adding an entry to the summary file of each image, I have a problem: This takes care of all future backups, but all existing images were created with an expire-default of +1 year, so they contain corresponding entries like this:

Image: 2014-02-21_12-00
Reference: 2014-02-20_12-00
Image-now: 2014-02-21 12:00:02
Expire: +1 year == 2015-02-21 12:00:02

So the image was taken on 21 FEB 2014 and will be expired on 21 FEB 2015. That is a Friday backup, so I want it to be kept forever. Other images also have an expire entry of +1 year but I have to free up some disk space and therefore want to expire them after 9 months already.

What this means is that I need to change the expire entry in the summary files. All two hundred and something of them. That’s not something you want to do by hand because it’s boring and error prone.

I could probably write a shell script (but since I rarely do that it would also be quite error prone) or a Perl script (same problem even though I have got more practice with that). So I’ll write a Delphi program and access the files via a Samba share.

 Posted by on 2015-02-16 at 12:09

mounting a Samba share

 Linux  Comments Off on mounting a Samba share
Oct 092014

The Linux mount command can also access Samba (Windows) shares, but in contrast to the smbclient command it does not do a Netbios based lookup for machine names. So while

smbclient //server/share

will work, the corresponding

mount -t cifs //server/share /mnt/point

will tell you that it can’t resolve the host name (unless you add the host to your hosts file or it can be looked up via dns).

This StackExchange answer pointed me in the right direction:

There is a command for actually doing that lookup. It’s called nmblookup.
It returns the IP address of the server like this:

nmblookup server server<00>

While this is fine for manually looking it up, if you want to mount a share multiple times or in shell script, this won’t do because you need the IP address only, not the suffix after it.

It gets even worse if the machine in question has more than one IP address:

nmblookup server2 server2<00> server2<00>

Bash to the rescue (I found that solution via this StackOverflow question and this article.)

shift  # Remove machine name from argument list

shift  # Remove share name from argument list

# nmblookup the machine
RES=$(nmblookup $MACHINE)
#echo "RES=\""${RES}"\""

# remove everything but the ip address
# note that this will not return anything meaningful
# if nmblookup returns an error (e.g. cannot find the machine)
#echo "IP=\""${IP}"\""

# Mount smbclient share (passing any arguments on to smbmount
mount -t cifs -r //${IP}/${SHARE} "$@"

Put this code into a file, e.g.


and call it like

mount-win-share server share /mnt/point

If you add additional parameters or options, they will be passed on to mount.

 Posted by on 2014-10-09 at 18:18

Creating a new RAID 5

 Linux  Comments Off on Creating a new RAID 5
Oct 082014

Another reminder to myself, so I don’t forget it again.

Warning: Use this on your own risk! You might lose all the data stored on any of the hard disk drives if you make a mistake!

To create a new raid, all the disks must be partitioned first.

To actually create the RAID we need the tool mdadm which is not installed (on Ubuntu Server) by default.

apt-get install mdadm

This will also install a few dependencies, in particular it will install a mail transfer agent (MTA, postfix in my case). This MTA needs to be configured so it can send e-mails to the administrator (root).

Creating the raid is as easy as typing:

mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

Mdadm might detect that the disks have already been used in a different raid and will warn you. It then gives you the option to continue creating a new array or not.

Create an ext3 file system on the newly created RAID device with the label “daten1”:

mkfs --type=ext3 -L daten1 /dev/md0

This takes quite a while.

To automatically start the RAID, it must be added to mdadm.conf:

mdadm -Es | grep md[0-9]  >>/etc/mdadm/mdadm.conf

Note that this will append to mdadm.conf, so if you execute it multiple times you will get duplicate entries. So make sure to check the file afterwards.

To mount the partition, it must be added to /etc/fstab like this:

/dev/md0    /mnt/daten1   ext3    defaults,noauto

noauto means that it should not be mounted automatically on boot. This is a safeguard against boot failures on headless servers. If any of the automatically mounted devices fails. We don’t reboot our servers very often so we will just ssh into it after reboot and mount the partition manually with

mount /mnt/daten1

To check the RAID status, use

cat /proc/mdstat
 Posted by on 2014-10-08 at 12:18

Using parted to partition a drive

 Linux  Comments Off on Using parted to partition a drive
Oct 082014

This is just a reminder to myself so I don’t forget again.

Warning: Use this on your own risk! You might lose all the data stored on any of the hard disk drives if you make a mistake!

On Linux hard drives > 2 GB must be partitioned with parted and a partition table in gpt format.

Create a new gpt partition table (deleting the entire drive!):

parted /dev/sdX mklabel gpt

Create a new partition spanning the entire drive and using optimal alignment:

parted -a opt /dev/sdX mkpart primary 0% 100%

Set a partition’s raid flag:

parted /dev/sdX set 1 raid on
 Posted by on 2014-10-08 at 11:19

Using a bar code scanner? Watch out for your keyboard layout!

 Linux, Windows  Comments Off on Using a bar code scanner? Watch out for your keyboard layout!
Aug 142014

We are using a bar code scanner to scan the serial numbers of all hard disk drives we buy. This is supposed to make it easier and less error prone to put them into an Excel list for tracking their whereabouts (we use quite a lot of drives for storing video data).

Of course, when I bought 4 TB SATA drives for setting up yet another Linux server, I too put these drives in the list. And since I am a lazy bastard™ I borrowed the scanner to scan them. It worked like a charm so I returned the scanner and started building the raid.

I put labels on the drive bays with our internal number, so in the case of a drive failure I could use the list I mentioned to find out which drive to swap out.

One of the drives apparently was defective to start with, so what did I do? I asked mdadm which drive it was (/dev/sdg) and used

ls -l /dev/disk/by-id

to find its serial number.

lrwxrwxrwx 1 root root  9 Aug 14 08:59 ata-ST4000DM000-1F2168_Z301W61Y -> ../../sdg

Then I opened up the list to find the drive’s internal number. To my surprise, none of the serial numbers in the list seemed to match.

It turned out that on my computer, because I use a UK keyboard layout, the scanner swapped Y and Z. So, in the list the drive had the serial number Y301W61Z while in reality it was Z301W61Y.

Fun with computers, not.

 Posted by on 2014-08-14 at 10:15

headless server fun

 Linux  Comments Off on headless server fun
Oct 172013

A new Ubuntu based server I have set up recently had a power failure which unexpectedly resulted in the box not booting again. There were actually two problems:

  • fsck failed on the data mount because one of the data drives apparently had failed. It took forever but eventually prompted for user input “S” to skip or “M” to fix manually.
  • The first time this happened I just tried powercycling the computer again hoping it would just come up. Unfortunately Grub detected a failure and disabled the timeout for the boot menu. So the box was sitting there in the Grub boot menu.

Unfortunately this server is supposed to be headless (and is mounted to the wall 4m above ground), so there was not even a keyboard where somebody could blindly press one of these keys or press return to select an option in the Grub menu. But sshd wasn’t started yet, so I could ping the server (the IP stack was working) but not ssh into it to fix the problem. So I got myself a really long VGA cable and an USB extension cable to connect a monitor and a keyboard to look at the actual console.

The second issue can be solved easily:

In /etc/default/grub add an the following entry:


This lets Grub show the boot menu for 5 seconds and then tries to boot normally. I used 5 seconds rather than 0 so I could actually use that menu if need arises.

The first issue is a bit more involved. I want the box to at least boot to the state where I can access it through ssh even if the data drives fail. That means I have to remove the mount point from /etc/fstab but have to put the mount command somewhere later into the boot process. One option is to mount it in /etc/rc.local like this (suggested here):

fsck -n UUID=...
if [[ $? != 0 ]]; then
logger -p user.warning "/etc/rc.local: fsck fail $?"
else mount ....

I’ll not be going that way because the system is not that critical. If it doesn’t come up, we will notice and just ssh into it and fsck and mount the data volume manually.

 Posted by on 2013-10-17 at 12:23

Bittorrent Sync, a secure DropBox alternative

 Linux  Comments Off on Bittorrent Sync, a secure DropBox alternative
Oct 092013

The company I work for recently had the requirement to securely exchange files between several computers, some on site, several others off site. This data consisted in part of sensitive data which is covered by the German Bundesdatenschutzgesetz. Somebody suggested using DropBox because it is so simple to use. I had to deny this request because DropBox stores the data "in the cloud" and we have no control where. Also, the data would not have been backup-ed.

One option would have been to use DropBox with e.g. BoxCryptor encryption but that was too complex to set up (for some mostly computer illiterate people). This also assumes I had to trust the company who produces this, but if that’s a problem, why should I trust Microsoft? (In fact, I don’t trust them, but I don’t have a choice.)

After looking into several alternatives (Strato’s HiDrive is not an alternative in my book), I found BitTorrent Sync. This is a simple to install program which uses the BitTorrent protocol through an encrypted connection to synchronize files between several computers. Setup is as easy as DropBox or other file sync software. It features also the option to not only sync one folder but several. It comes in variants for Windows 32/64 bit, Linux, FreeBSD, Max OSX, Android and iOS.

When installed on Windows it can be configured to automatically start and then sit in the icon tray doing its job without getting into your hair. It just works.


The Linux version comes with a simple web server for configuring the folders to sync and some settings. To sync with other computers all you have to do is transmit a “secret” which is a long string of characters. If you want to sync with mobile devices, it can also generate a QR code which you can then scan with the mobile app on that device (No, I am not going to show you a screenshot 😉 ).

There is only one drawback in my opinion: By default there is no central place to store the files. If a new computer wants to join into the folder sharing, one of the existing ones must be on line for the synchronization to work. (With DropBox the cloud storage is always available.). I worked around this issue by installing the Linux application on our server which is always available. This has the positive side effect, that the shared folders are automatically included in the daily backups. (If you don’t have a server there is also the option to install it on a Raspberry Pi which is connected to the Internet.)

Of course there is the trust issue again. Do I trust BitTorrent Labs? They could of course add a back door to their software which, despite claiming otherwise copied all shared files to a central server somewhere. The source code is not available, so there is no way to actually be sure. Do I trust them? Do I have a choice? Is there any alternative? I found none.

 Posted by on 2013-10-09 at 17:53