Unlinked files and seemingly empty full partitions

$ df -P /dev/sda3
Filesystem   1024-blocks   Used      Available   Capacity Mounted on
/dev/sda3        8256952   7007368   830160      90%      /var

$ df -i /dev/sda3
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda3             524288   21548  502740    5% /var

So far, so bad. The disk is full and we can’t blame it on inodes. Since it’s /var, it’s probably just some log files…

$ du -sh /var
1.9G	/var

…or not?

One of the features of Unix filesystems (compared to Windows) is that “deleting” a file does not necessary delete it. The file will be unlinked (so it doesn’t appear in a file system anymore), but still remains usable. Open file handles to this file are tracked, and the file will only be deleted when the last handle is closed (this is why Windows has the annoying “Cannot delete file, it’s open somewhere [and I won't tell you where]” and Linux doesn’t). df obviously has to include unlinked files, but du doesn’t. For that, we have lsof:

When +L is followed by a number, only files having a link count less than that number will be listed. (No number may follow -L.) A specification of the form “+L1” will select open files that have been unlinked. A specification of the form “+aL1 ” will select unlinked open files on the specified file system.

$ lsof +L1
COMMAND   PID     USER   FD   TYPE DEVICE  SIZE/OFF NLINK   NODE NAME
apache2  5102 www-data    2w   REG    8,3    627670     0 450637 /var/log/apache2/error.log.1 (deleted)
apache2  5102 www-data    7w   REG    8,3   2030629     0 450922 /var/log/apache2/other_vhosts_access.log.1 (deleted)
apache2  5102 www-data    8w   REG    8,3   2030629     0 450922 /var/log/apache2/other_vhosts_access.log.1 (deleted)
…and fifty other big log files…

It actually were log files to blame for this – Apache holds a handle to its closed log files after logrotate rotated (and deleted them). An Apache restart after logrotate runs fixes this particular problem.

(Thanks to ServerFault for pointing to lsof +L1!)


RSA encryption in JavaScript and C++

JS is not exactly blindingly fast, that much is commonly agreed. But how slow is slow? For our password management webapp, I wrote a Qt client, which uses the QtWebKit native bridge to access a native RSA implementation (specifically, CryptoPP) and compared it to JSRSA.

Benchmarking JS engines against C++
For the test, I used Firefox 15, Opera 12.02, Chromium 21 and WebkitGTK 1.8.3 on an Intel® Xeon® X3440 (4 cores @ 2.53GHz), with Linux as OS (kernel 3.0.42). WebkitGTK (used by Midori, uzbl and others) uses the JSCore engine, which is also used by Safari, QtWebkit (in turn used by Arora and Konqueror) and other Webkit ports, so its results should be representative for most non-IE browsers.

The text to encrypt was

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu. In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi. Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim. Aliquam lorem ante, dapibus in, viverra quis, feugiat a, tellus. Phasellus viverra nulla ut metus varius laoreet. Quisque rutrum. Aenean imperdiet. Etiam ultricies nisi vel augue. Curabitur ullamcorper ultricies nisi. Nam eget dui.

The text was en- and decrypted using a 512-, a 2048- and an 8192-bit key, wherever feasible the test was repeated multiple times for averaging.

Results:
(times in ms)

Engine Encrypt (512 Bit) Decrypt (512 Bit) Encrypt (2048 Bit) Decrypt (2048 Bit) Encrypt (8192 Bit) Decrypt (8192 Bit)
CryptoPP (QtWebKit bridge) 2 8 0.7 13 1.1 112
V8 (Chromium 21) 5 43 12 359 70 9000
Firefox 15 52 522 156 5166 578 74000
JSCore (WebkitGTK) 55 625 194 5356 1200 138000
Opera 18 94 29 936 102 13000

Observations:

  • Unlike other browsers, Opera executes JS in a separate thread and was still responsive, all other browsers blocked the tab (or the entire browser) completely while the benchmark was running.
  • 2048- and 8192-bit encryption in CryptoPP is constantly faster than 512-bit encryption – I assume this is due to the fact that with longer keys, the text can be encrypted in one chunk, while with 512 bit it has to be chunked, and the code I wrote for that is anything but optimized. I’ll have a look into that.

Conclusion
While Chrome and Opera are pretty fast, RSA in JavaScript is not quite ready for productive usage – at least not with key lengths >2048 bits. I am going to push the native client for our password management system, and only keep the JS version as fallback. While 2048 bit are still considered acceptable for most use cases, I’d rather have an option to switch to 4k or 8k long keys if necessary.

The code will be pushed public as soon as it is in a read- and usable state.


Windows: System Error 1219

Probably the single most dreaded error when it comes to handing network shares.

Multiple connections to a server or shared resource by the same user, using more than one user name, are not allowed. Disconnect all previous connections to the server or shared resource and try again.

Apart from the fact that this »security measure« is a complete joke, you can get the error even when using the same credentials – once implied (i.e. no /USER argument, which means it uses automatically your current account), once explicitely (i.e. /USER:%Username%, which resolves to exactly the same credentials).
Also, you get the error when using the same explicit account on two different domain controllers belonging to the same domain (i.e.: log into server A, mount a drive from A, fine, mount a drive from B, ERROR).

 

TL;DR: Never use “/USER:%Username%”, it will blow up in your face. Or, actually, never use “/USER” at all, since it’s unreliable in any case.

(Note that this »feature« is client-side. OSX and Linux clients will have, as usually, no problems.)


SOGo: A free groupware solution

It’s kinda depressing what a long way a seemingly trivial matter like “shared address books and calendars” can go.

History

Originally, our company used Microsoft Exchange and Outlook, but this was apparently (I wasn’t in the company back then) a major disaster and was after a while replaced with Kolab + Horde + Outlook + some glue.

This worked somewhat better, but was still a royal pain in the backside – Outlook 2003 isn’t exactly the best client in the world, and the proprietary third-party Kolab connectors did its best to make things worse (no multi-user support, extremely slow and similar fun). And Horde… was just being Horde. We ended up resetting Outlook profiles pretty regularly, resulting in long downtimes for the victims (since the supplied Horde version was so slow, confusing and awkward to use that nobody ever bothered with it – I don’t blame them). Thunderbird support was mediocre at best, with the SyncKolab addon being limited to just one calendar and no address book sync.

Alternatives

When I started working for the company, I started looking for alternative groupware solutions – both on the client and the server side. There are lots, actually, but most either bring a whole software stack from mail server to desktop client (like Zimbra) or are too Outlook-centric (like Zarafa).
There are also some pure calendaring/address book solutions, however most are (or were) either not stable enough and/or are not exactly what I’d call user friendly (like DAViCal).

Enter SOGo

SOGo is a refreshing alternative, inserting itself into existing infrastructure and reusing it as much as possible – its server only provides a generic web client (able to plug into any IMAP server), address books (using the open CardDAV standard) and calendars (using the related open CalDAV standard). Mails are handled by your existing SMTP/IMAP servers, it uses CAS/LDAP/AD/SQL for authentication and is mostly client-agnostic (apart from ACLs, which aren’t covered by Cal/CardDAV, and have to be configured with a SOGo client). Unlike most “pure” address book/calendar solutions, its UI is also user friendly enough to allow non-technical users to easily configure their shares and it is quite stable.

For the client, we use Mozilla Thunderbird on Windows. Mind you, this decision was made before Mozilla decided to panic and run around flailing like a headless chicken (fun stuff like “we change to a new release scheme that serves no purpose other than breaking addon compatibility and showing everyone we have the longest… version number. We’ll figure out later how to do implement an update scheme that doesn’t piss off every single user”) – and depressingly, it still works better than Outlook ever did with Kolab.
SOGo provides a complete integration suite with their Thunderbird addons, which gives you a smooth calendar and address book sharing solution in the most cases.
It is, however not without problems – synchronization problems are intransparent, albeit seldom and in the most cased solved by updating and emptying the cache. The performance of the initial sync is, however, extremely bad – the limitations of using a single JavaScript thread for the entire code are painfully obvious here (I hope XUL dies soon). But as said, it’s still better than syncing Outlook with Kolab (which didn’t lock up the entire UI, but took just about as long until it was usable), and complete profile resets are rarely needed (especially after we switched to Thunderbird 10 ESR).

Starting with version 2, SOGo also supports Outlook natively, though I’m not sure whether I’d want to try that. Would be good for incremental migrations, however.

Apple supports Cal- and CardDAV natively in iOS and OSX, thus giving you full desktop integration out of the box (apart from ACLs, which need the web UI).

Speaking of the web UI – it is pretty much the best web mailer I’ve seen so far, with a clean UI (modeled after Thunderbird 2) and a pretty decent performance with large mail boxes. It also has Sieve support for vacation mode – sadly, no support for user-defined filters. If it had that, I’d have completely switched over to it (without filters, 200 automated mails per day and half a dozen mailing lists are a huge pain to deal with). It is fine for light duty mail management, however, and the calendar/address book functionality is so far without any issues.

For Android, there are clients made by DMFS, which integrate Cal- and CardDAV into the native calendar and address book. This is actually the only part of the software stack you have to pay for, and at around 5 bucks per user not a particularly big investment.
(There’s also an experimental ActiveSync and a Funambol interface, which would allow for native Android/Nokia/… support, but both are as of now not terribly well-suited for productive use.)
 
 
 
With that, you have a very flexible Groupware stack that covers about any relevant client architecture, without requiring major changes to your existing server infrastructure. Installation and configuration is pretty easy, too. More on that in a later blog post.


Account multiplexing (ab-)using SSH key authentication

We’ve often run into the problem of how to deal with multiple users sharing one account, since I don’t really want to deploy LDAP auth for external servers – pam_ldap is notoriously unstable and a PITA to debug, and I don’t particularly like the idea of making those servers dependant on auth servers which may or may not crash and/or run into other problems.

Thankfully, SSH’s key authentication allows you to launch a custom command on login. Thus, I wrote some small wrapper script:

#! /bin/bash

if [ $# -lt 3 ]; then
        echo "Usage: shmux 'Full User Name' shell vimmode commandstring"
        echo "  With vimmode being 'full' or 'minimal' and 'commandstring' being a string to be fed into SHELL -c."
        exit
fi

export TRUEUSER="$1"

user=`echo $TRUEUSER | tr '[:upper:]' '[:lower:]'`
export TRUEMAIL="${user// /.}@tao.at"

export GIT_COMMITTER_NAME=$TRUEUSER
export GIT_AUTHOR_NAME=$TRUEUSER
export GIT_COMMITTER_EMAIL=$TRUEMAIL
export GIT_AUTHOR_EMAIL=$TRUEMAIL

SH="$2"

export VIMMODE="$3"

#Ensure compatibility with SCP/SFTP/SSH custom commands
if [ $# -eq 4 ]; then
                $SH -c "$4"
        else
                echo "[shmux] Authenticated as $TRUEUSER"
                $SH -l
fi

…which multiplexes the accounts into multiple ones. The $TRUEUSER variable can be used for further customization (e.g.: source /etc/profile.d/$TRUEUSER.sh for user-specific commands). The VIMMODE variable seen in the code is used with another multiplexer aliased to vim:

#! /bin/sh

case $VIMMODE in
        minimal)
                vim -u /etc/vimrc.minimal "$@"
                ;;
        *)
                vim "$@"
                ;;
esac

This allows having different vimrcs depending on the user preferences (or abilities). This could again be expanded to load user-specific settings (or launch emacs, if you really want to ruin someone’s day).

The actual user settings are then configured in the authorized_keys of the to-be-multiplexed account (which is distributed over our internal package repository):

[…]
command="/usr/bin/shmux 'Sven Schwedas' zsh full ${SSH_ORIGINAL_COMMAND:-}" ssh-rsa …
command="/usr/bin/shmux 'Foobar Foo' tcsh minimal ${SSH_ORIGINAL_COMMAND:-}" ssh-rsa …
[…]

The only downside compared to LDAP is that it takes some minutes to distribute the updated authorized_keys file to all hosts, but apart from that it’s been working fine for some months on our servers.


Improving Apache2 log format

While the default format for Apache logs has some benefits (the CLF is understood by many log analyzers), it’s very awkward zu read (only one space separating entries) and difficult to parse on the command line (spaces everywhere, and no hostname in the actual lines, which makes parsing multiple files a slight hassle).
As I had to change the log format anyway to include the response time (for identifying slow requests), I changed it to the following:

#ServerName	date time	port	r_ip	status	rtime	request	referer		user_agent	rsize
LogFormat "[%v]	%{%F %T}t	%p	%a	%>s	%D	%r	%{Referer}i	%{User-agent}i	%B" combined

The big difference? The actual fields are tab-delimited. Thus they can be easily parsed by cut -f , which avoids all the awkward awk/grep/sed hassle of the CLF; or can be imported into spreadsheet softwares (as tab-delimited CSV) for visualization. This saves a lot of time if someone needs some ad-hoc statistics again and I don’t want to run webalizer or something similar over the aggregated logfiles. Also, the bigger space between each entry makes it much easier to read manually.

(Since LogFormat directives can override each other, deployment can be reduced to “throw two lines into a file in apache2/conf.d, throw file into one of our deb packages, update package on servers”).


RSA encryption in JavaScript and PHP

For a new project (password management application), I’m investigating the usage of RSA in JavaScript and PHP. Had I known the amount of bullshit ahead of me beforehand, I’d have resorted to sticky notes for password management…

Anyway. I quickly settled for Tom Wu’s Bigint and RSA library. It is, for the most part, pretty decent. However, it’s still somewhat bugged (a leading zero in BigIntegers is sometimes skipped, which makes messages unparseable), some functions simply make no sense (UTF-8 in the decryption function might be well meant, but is kinda useless if you have non-UTF-8 strings… like binary data) and it’s lacking functionality for messages longer than the key size. This blog post thankfully outlines how to add this functionality for encryption (I added similar functions for decryption, see the source).
For signing/verifying, I chose to ignore the RFCs (implementing another padding scheme in both PHP and JS would have taken too long) and just used swapped encrypt/decrypt functions.

Another limitation of Wu’s lib was the lack of a “proper” import function – I use a modified version (making use of Wu’s jsbn library) of Lapo Luchini’s ASN1 library plus some glue code to extract the actual key data.


For the PHP side, I use PHPSecLib, which works actually pretty well (I only replaced the user_error reporting with proper Exceptions). The only tricky part is that you need to liberally use hex2bin/bin2hex (I build a thin wrapper around it to do exactly that).


After only… oh, about five days of debugging, the libs work as far as I supposed them to work from the beginning. Project deadlines are overrated anyway.

Code, including examples, on GitHub


Mac OS X and Univention Samba

It took us a while to get this hipster crap OS X to play nice with our Samba server (UCS 2.4). In the default settings, the access rights would be completely insane (732 and similar impossible combinations). By deactivating the UNIX extensions (unix extensions = no) we achieved slightly less insane rights (755, which is better, but doesn’t allow others to modify the folders/files). By modifying the umask (create mode = 0775) we got that, too (it would be better if OS X users could set the permissions via the finder UI, but that doesn’t seem to be possible atm, maybe with UCS3?).


TAOFirewall2 Revision 3 erschienen

Heute wurden erstmals über Qemu virtualisierte Windows-Gäste über die TAOFirewall2 geschützt. Dabei sind zwei Bugs in configure_iface.sh aufgetaucht, die durch eine Besonderheit von test und ein Versäumnis entstanden sind. Die TAOFirewall2 erkennt automatisch, wenn eine virtuelle Maschine eine von Qemu emulierte Netzwerkkarte nutzt und setzt dann das Flag IOEMU, wodurch Antispoofing deaktiviert werden sollte. Die Abfrage, ob das Flag gesetzt ist, hat aber nicht richtig funktioniert; hier als Beispiel:

testvar="false"
echo $([ $testvar != "true" ]) $?
0
testvar="true"
echo $([ $testvar != "true" ]) $?
0

Es wird offenbar das true erkannt und der Rest gar nicht mehr evaluiert.

testvar="false"
echo $([ ! $testvar ]) $?
1
testvar="true"
echo $([ ! $testvar ]) $?
1

Als C-Nutzer verblüfft einen wiederum auch dieses Verhalten. Schlussendlich nutzen wir diese Variante:

testvar="false"
echo $([ ! $testvar = "true" ]) $?
0
testvar="true"
echo $([ ! $testvar = "true" ]) $?
1

Diese Abfrage wurde außerdem auf Schnittstellen ausgeweitet, für die die Firewall nicht deaktiviert wurde.

Die jeweils neueste Version der Firewall finden Sie unter http://www.tao.at/special/TAOFirewall2.tar.bz2.


Outlook-Daten exportieren

Für unsere Migration weg von Kolab und Outlook hin zu SOGo und Thunderbird haben wir (etwas spät) feststellen müssen, dass es keine vernünftigen Konverter gibt – Thunderbird kann nur theoretisch Outlook-CSVs importieren, und andere Programme sind ziemlich beschränkt (Wir haben ursprünglich freeMiCal für Kalender getestet, das kann aber immer nur den ersten Kalender exportieren – und auch den nicht immer).

Aus dem Grund habe ich ein Python-Script geschrieben, das Outlook-CSVs ins LDIF- (für Adressbücher) bzw. CSV-Format (für Kalender und Aufgaben-Listen) exportiert. Das Script ist etwas mit der heißen Nadel gestrickt und konvertiert längst nicht alle Felder (und ist für mod_python geschrieben, wofür ich vermutlich von den WSGI-Genossen gesteinigt werde), sollte aber erweiterbar genug sein, dass diese bei Bedarf nachgerüstet werden können – welche Felder ausgelesen werden, kann mit den *.in-Templates bestimmt werden.
Die einzige externe Bibliothek, die das Script braucht, ist das iCalendar package für Python (und mod_python, wobei man das Script leicht für Kommandozeilenbenutzung abändern oder eine GUI drauflegen kann kann).

Download v0.1