fixing a scrambled IPython command history on stock OS X 10.6

So I started over with a fresh install of OS X 10.6 recently, and wanted to restore my Python development environment. In doing so, IPython is absolutely essential if you want a sane interpreter environment to test out code. I had a bit of trouble with it though.

The Problem

The stock Python 2.6 shipped with OS X 10.6 Snow Leopard has a readline module linked to libedit, the BSD alternative to the GPL’ed readline. The readline module, if you are not aware, is (among other things) responsible for keeping command history in the IPython interpreter. This causes command history in the IPython 0.10 interpreter to behave in very odd ways. When backtracking through the command history buffer using the up-arrow key, for example, the previous command is only partially recalled, and appears completely scrambled. Indents, too, seem off — in a whitespace-sensitive language like Python, this is annoying. (See first figure)

IPython command interpreter is broken when using libedit with command history

IPython command interpreter is broken when using libedit with command history

Fixing IPython’s bugs are beyond my ability. While I certainly don’t want to delve into the quagmire that is GPL vs BSD licensing, I do understand why Apple would want to avoid the viral nature of the GPL and ship libedit instead. However, using a genuine Readline library is going to be the best recourse for this problem. I already have a copy of readline compiled and ready to go, and just need a new version of readline.so, the library that links Python to readline.

The easy solution

Sifting through my records, I came across a SelfSolved problem record from my good friend Hannes who had issues with his IPython command history.

The solution: sudo easy_install readline, which uses setuptools to install a precompiled package of readline.so statically linked to genuine GNU readline. Restart your IPython console and everything should work. (See second figure)

IPython with readline

IPython with readline

The hard solution

Being the inquisitive sort, I also wondered how I would be able to reproduce this work from scratch. readline.so ships with the Python source package, but surely I would not be required to compile a whole new copy of Python for one measly module library?

I documented this process in SelfSolved again: building readline.so for Python. At some point I should write an interface between SelfSolved and Wordpress so that I don’t have to reproduce a lot of my work here manually.

Compiling readline.so

This is actually fairly easy.

  1. Get a copy of the Python source code. In OS X 10.6, it ships with Python 2.6.1.
  2. Unpack it and go into its directory. You should find a Modules subdirectory. In it is readline.c, the source file for readline.so.
  3. Compile the source file. The appropriate incantation is:
    gcc -O2 -arch x86_64 -arch i386 -shared -o readline.so readline.c -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -L/usr/local/lib -lreadline -ltermcap -framework Python

    where the -arch flags should be whatever processors you wish to support, the -I arguments should point to the directories that contain header files for the readline library and the Python framework, and the -L argument should point to the path for the readline library. Use whatever optimization flags you feel comfortable with, instead of -O2, if you wish.

Replacing readline.so

So now we have a readline.so that’s properly linked to readline.dylib. The thornier question is how to override the system-provided readline.so. The system version is located at /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/readline.so, and the naive would simply overwrite it with their new readline.so. This is a bad idea.

As I have mentioned in the past, overwriting system libraries in OS X is an unhealthy thing to do. The problem is that Apple furnishes no official package management system — anything you personally change is considered fair game for the next official system update. On the next system update, if the Python component is affected by the update, the Apple updater will happily clobber your compiled files with its own, leaving you suddenly back at square one. You don’t know how many times I’ve had to recompile emacs (for X11 support) on OS X 10.4 because of this little annoyance. Leave things in the /System/Library directory hierarchy alone, for your own sanity.

However, in this case /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload comes ahead of the user-modifiable /Library/Python/2.6/site-packages directory on Python’s sys.path. So if you just drop readline.so into site-packages, the system version still takes priority.

There are a few ways to do this. For one, you can create a sitecustomize.py in /Library/Python2.6/site-packages. In this file, arbitrary Python statements can be written, and the interpreter will automatically execute them at runtime. So, you can add a sys.path = ["/dir/here"] + sys.path statement and point it to a directory containing your readline.so file. Alternatively, you can abuse the technique used in the easy_install.pth file. It turns out that if you ever used easy_install, directories pointed to by the easy_install.pth file takes priority over the system paths. They use an interesting way to accomplish this, which you can copy. Or, you can just insert your directory containing readline.so into easy_install.pth. In any case, this will force the readline-based readline.so to take precedence over the libedit-based readline.so, without overwriting anything.

Discussion

So for any sane person, the easy solution should be enough. For the rest, the hard solution is an interesting exploration of how some of Python’s built-in modules can be compiled and inserted individually.

Upgrading the Seagate Barracuda 7200.11 to firmware SD1A

So perhaps you have heard of Seagate’s little manufacturing issue with its internal 3.5-inch Barracuda 7200.11 1TB drives a while back — namely, that some drives shipping with SD15 firmware are dying horribly. I had the unfortunate experience of buying such a hard drive — the ST31000340AS — as a scratch disk for my main machine, a MacBook Pro with a mere 240 GB internal drive (a pre-unibody revision, where the HD is insanely difficult to replace).

Seagate did in fact issue a firmware update — SD1A — that supposedly addressed this issue, but of course, there’s one catch: you can’t install the firmware through an external drive enclosure. In communication with Seagate support, a representative confirmed that for those of us without a desktop tower that has a SATA bay, we’re hosed:

Unfortunately, due to the nature of firmware updates and the way external drives work, the firmware update program cannot directly communicate with the drive in the manner it needs to in order to be able to upload the new firmware to the drive. It must be plugged into an internal SATA controller in order to update the drive.

Fair enough. That makes technical sense — but of course, it doesn’t work for me. I asked whether they would handle a mail-in repair, given that I have no easy access to such a desktop. The answer, of course, is No.

I have to find a desktop, open it up, jam this baby in (possibly in place of the existing drive if there’s only one bay), update the firmware, and put everything back together. Sadly, most of my friends who still own desktops would not trust me that far.

Half a year passes, and I finally find a sucker good friend who’s gullible awesome enough for me to try this procedure on his machine. The fellow owns a nice if aging Dell Precision T5400, which comes with two SATA bays (so I don’t have to inflict undue harm onto the existing system). Since this thing can run two drives at once, I can use the first method (a Windows-based firmware updater), though I burned a boot CD for the second method just in case. I popped in the drive, fired up Windows XP, downloaded the Windows-based Firmware Update Utility, double-clicked, and thought it was the (triumphant) end. In fact, it took 3 hours of my life to find out just how deep this rabbit hole goes.

Problem 1: The lying updater

The firmware updater will give a bunch of scary warnings and then reboot the machine. It will automatically reboot to a Seagate Loader screen, which attempts to apply the patch to all eligible SATA drives. To its credit, it’ll skip the non-qualifying (i.e. non-Seagate, non-Barracuda, etc.) drives, but it’ll still try them out first. At the end of the process, it will report “firmware downloaded” and “SUCCESSFUL” or some variant thereof, and automatically reboot back into Windows.

At this point, I advise you to use the SeaTools utility to verify that the firmware update actually applied. Despite its claims, if you were on a stock setup Dell T5400 (or perhaps other models as well), this will prove that the updater is a lying scumbag. And in fact, this particular drive still reported firmware SD15, the broken one.

Problem 2: The broken Boot CD

To save both me and my gracious host (who’s starting to suspect my computer-fixin’ skills now) some time, I decided to try the boot CD method, rather than pounding my head trying to see why the updater was lying. I downloaded the boot CD from the same Seagate Support site above, burned it to disk, and tried it out.

The result is a new SelfSolved posting: SelfSolved #59: getFatBlock error when upgrading Seagate Barracuda 7200.11 firmware. In essence:

The FreeDOS boot CD reports a number of ” error reading partition table drive 01 sector 0 ” errors. This is followed by ” get Fatblock failed:0×000000e8 ” or some variant of ” getFatBlock failed : ” The FreeDOS boot process appears to stall at this stage, and does not continue to the firmware flasher program.

That was lovely.

The Solution

I chased some red herrings. I came across postings about failures in various FreeDOS-based Seagate tools. One such post mentioned that it took a long time for the boot disc to get over the “error reading partition table” errors, but I waited forever (well, 15 minutes) and the boot process did appear to be frozen / stalled. I reformatted the drive via diskpart clean, thinking that the getFatBlock and error reading partition errors were related to a non-MBR partition table (I had it set to GPT). I should have realized, of course, that the errors were completely unrelated to filesystems, despite the “fat block” to which it refers.

The actual solution is deceptively simple — the boot disc & flasher appears to handle AHCI-based SATA mode badly. The Dell I was using was set to AHCI mode, out of the three possible Legacy, AHCI, and RAID options for SATA. Apparently the boot disc simply doesn’t handle this mode correctly on the Dell machine (and may also be related to why the Windows-based updater lies). When the machine switches on, use F12 to enter the boot menu, and select Setup to enter the BIOS. Then, on the list of Drive Options, skip past the SATA drives and down to SATA options. Pick the Legacy option to use ATA mode, instead of AHCI. Once this is done, the boot disc will function correctly, and the updated firmware will be applied without incident. Remember to switch the mode back to AHCI — it’s default for reason, no doubt.

The “error reading partition” messages were completely red herrings. They appear whether you are in the right SATA mode or not, and does not appear to affect the operation of the firmware updater or the boot process. It should not take very long to get to the flasher on this particular setup, so don’t wait on that message too long — it’s a good sign something’s not quite right.

In the end, I did recover my $100 hard drive, and the confidence of my peer in my mad hardware skillz (actually, quite non-existent).

Discussion

In the end, I’m quite appalled at Seagate. This sort of failure shouldn’t have happened, of course. Once it did, Seagate should have offered to take back and replace broken drives — the data I had on there was non-critical. I would have been perfectly willing to pay shipping costs to get a fixed replacement through mail-in service. I should not have been forced to search my social network for a person willing to let me tear his desktop computer apart, for a dubious and unsure firmware update procedure that fails mysteriously. I spent an additional 3 hours tracing mystery failures, for which the error messages were rather useless. Without my trusty iPhone and access to the Internet, I would not been able to solve this problem. How should I have known what “getFatBlock failed” means?

This little episode has convinced me to never buy a Seagate drive again — I simply cannot afford the time and energy for these sort of firmware upgrade adventures. While I was looking for a desktop to tear apart, I bought a Western Digital Caviar Black 1TB drive instead. Another $100, but at least I had a scratch drive for my work.

The moral of the story: Seagate, you are the worst storage vendor I’ve had to work with so far. I hope this record is not broken in the future.

Subversion 1.6.2 runtime error on network access on OS X 10.5

A new SelfSolved solution is up for perusal. The problem I tried to solve:

After compiling Subversion 1.6.2 from source on OS X 10.5 Leopard, the compilation is apparently successful, but svn dies when it tries to connect to the network for the first time. Crash log reports that symbols are missing from libneon.dylib.

Crash report from shell:

dyld: lazy symbol binding failed: Symbol not found: _ne_set_connect_timeout
Referenced from: /usr/local/lib/libsvn_ra_neon-1.0.dylib
Expected in: dynamic lookup

dyld: Symbol not found: _ne_set_connect_timeout
Referenced from: /usr/local/lib/libsvn_ra_neon-1.0.dylib
Expected in: dynamic lookup

Check out the places that I googled and my final solution writeup … at SelfSolved #49: Subversion 1.6.2 explodes on first network access.

The problem is very similar to a previous compilation issue I solved for PHP. In essence, the -L library search path passed to GCC at compilation time has /usr/lib in front of everything else. This means whatever library path you might have given to it at configure time, it’ll always look for the library in /usr/lib first, picking up the old system libneon in the process. Since the bad libneon dynamically linked, the problem doesn’t manifest itself until runtime — and only at runtime with network access involved.

As with the PHP issue, change the very first -L/usr/lib to -L/usr/local/lib (or wherever your newer libneon is located), and it’ll link correctly.

Out of curiosity, I checked MacPorts first. The MacPorts solution of disabling libneon version checking is odd — it also works, but I dunno if it’s linking to the right thing or not.

finding a fault-tolerant HTML parser for iPhone SDK

A new SelfSolved problem is ready for perusal:

A couple of my iPhone projects require a decent HTML/XHTML parser. On OS X, Cocoa ships with NSXMLDocument, which includes dirty HTML parsing functionality from libtidy. Unfortunately, NSXMLDocument is not part of the actual iPhone 2.2 SDK (though it is part of the 2.2 Simulator — so it’ll compile just fine at dev time but break when deploying — a big gotcha if you never tested against a real iPhone).

NSXMLParser is a part of the iPhone SDK…This is not a reasonable alternative.

Check out my writeup at SelfSolved #42: HTML or XHTML Parser for iPhone SDK 2.x

Finally, all out of all the potential alternatives I found (all referenced at the SelfSolved writeup — including one that requires a license fee to use), this one seems to be the most promising and requires the least amount of pain (read: interaction with the libxml C API — god knows I’ve done enough of that while building prototypes at Yahoo! Research Berkeley)

MenuMeters integer overflow in memory stats

MenuMeters is a very cool, free (as in freedom) system monitoring tool for OS X that sits in the menu bar and shows you live statistics, including such values as current bandwidth usage, current network activity, memory usage, page faults, etc.

One thing that has been irritating me lately is that there’s a cosmetic error MenuMeters 1.3 that causes negative values to appear in the VM Statistics section of the memory stats display. For example, the page faults value can roll over INT_MAX to report -1,800,000 page faults, when I’ve used the same OS X session for a long time without rebooting.

Since MenuMeters is GPL’ed, a quick lookthrough at its codebase reveals the problem. The details of this problem and solution is currently documented as #32 MenuMeters Memory Meter reports negative page faults at SelfSolved, a new web application I’ve written to keep track of these things.

More details to follow.

SSH, Subversion through SOCKS proxy on Mac OS X

One persistent problem that I run into is that I need to access certain network resources through a SOCKS proxy server. This is all well and good if they are web resources — Safari, Firefox, etc. support SOCKS proxies quite well. However, I also need, for example, SSH and Subversion access to some resources. SOCKS support is woefully inadequate or nonexistent in these tools.

In the case of SSH, even if you google for this, you’ll run through thousands of examples of using ssh as a SOCKS server, but not through one as a SOCKS client. There are some convoluted solutions, but none of them I can use directly on an OS X 10.5 machine.

TSocks: the solution…if it were that easy

Now, tsocks is a nifty little tool to transparently divert network calls through a SOCKS 4 or SOCKS 5 proxy. This allows even non-SOCKS-aware applications to function through a SOCKS server.

Unfortunately it is very old, unmaintained code (1.8 beta 5 was released in 2002). It doesn’t compile cleanly on OS X due to this, nor will it compile under GCC 4.x. Further, it won’t work out of the box either if you do manage to compile it. The problem is that it relies on the Linux-only LD_PRELOAD functionality to use a shared library to hijack network system calls. This mechanism is called DYLD_INSERT_LIBRARIES on OS X and only works if DYLD_FORCE_FLAT_NAMESPACES is active.

Getting a working tsocks: MacPorts

There is an easy way to get tsocks. MacPorts ships a ported tsocks package. If you use MacPorts, sudo port install tsocks should do it.

Unfortunately on several machines I don’t use MacPorts, and don’t want to pull down an entire third-party package manager with its own library tree on each of these boxes. So I have do to this the hard way.

Getting a working tsocks: rolling my own

First to notice is that there are two tsocks distributions. One is the original tsocks 1.8b5, last updated in the first half of this decade. To make it work, follow the instructions provided by Marc Abramowitz in 2006. Note that his patch is actually located at his new domain address instead of the old, linked one.

The MacPorts distribution, on the other hand, is based on R. Garcia’s patched tsocks distribution, incorporating some modernization and new features by the Tor team. This distribution is numbered 1.8.x, with the last being 1.8.4. Unfortunately it is also no longer maintained, as the Tor devs forked this into a custom version to use with the Tor network only. Unfortunate, but for now, it still compiles, and works a bit better than the 2002 original.

To roll your own tsocks via source out of the MacPorts distribution, you will want the patches from the MacPorts repository. An outline of the compilation procedure:

  1. Download tsocks 1.8.4 from the author’s page
  2. Download all the patches from the MacPorts repository
  3. Concatenate all of the patches together:
    cat patch-* > tsocks.osx.patch
  4. Put the concatenated tsocks.osx.patch file into the tsocks source directory. Apply the patches:
    patch -p0 < tsocks.osx.patch
  5. Regenerate the configure script:
    autoreconf
  6. Configure the package:
    ./configure --prefix=/usr/local --bindir=/usr/local/bin --mandir=/usr/local/man --sysconfdir=/etc --libdir=/usr/local/lib
  7. Install the library and binaries:
    sudo make install
  8. Install the conf file:
    sudo cp ./tsocks.conf.complex.example /etc/tsocks.conf
  9. Edit the conf file. Make sure that if you’re not using tor, that you write in the conf file
    tordns_enable = false

Configuring tsocks

The complex configuration file example should have explained all of the features to be set. For my configuration:

Some important settings:

  • local – this setting, in the format of IP/netmask can be repeated several times, each time to exclude a set of IPs from being diverted to the SOCKS server. For obvious reasons, your SOCKS server will have to exist in one of these excluded IP ranges – otherwise you will never even reach your proxy server.
  • server and server_port – these should point to the IP address and port of your SOCKS server
  • server_typetsocks defaults to SOCKS4 mode. You may wish to set it to 5 for SOCKS5 usage.
  • tordns_enable – this needs to be set as false if you don’t use Tor.

Using tsocks

Once this is set up, simply prefixing the network command you want to run with tsocks will force a diversion through the proxy connection. For example:

tsocks ssh example.com

The same can be applied to Subversion.

tsocks svn update

will force the svn client to act through the proxy set in tsocks.conf.

SOCKS on localhost

Note that SOCKS services on 127.0.0.1 has a minor gotcha. Sometimes, you are able to SSH into a remote machine, and use that connection as your SOCKS server. This is described in my post about using SSH as a pseudo-VPN, which describes the -D switch. My use case here is that once you do this, all further local SSH connections to other machines should be diverted through the first SSH. For example, I’d like to do:

my-machine$ ssh -D 40000 gateway.example.com # establish a SOCKS server on localhost:40000 to the gateway host

and then:

my-machine$ ssh lan-1.example.com # access the protected lan-1 machine through the SOCKS, which will see me as gateway.example.com 

This is very doable in the tsocks setup if you set tsocks.conf:

server = 127.0.0.1/255.255.255.255
server_port = 40000

and then:

my-machine$ ssh -D 40000 gateway.example.com
my-machine$ tsocks ssh lan-1.example.com

This is the gotcha: make sure the netmask is set correctly to 255.255.255.255. Otherwise tsocks will die with a cryptic:

IP (127.0.0.1) & SUBNET (0.0.0.0) != IP on line 22 in configuration file, ignored

It is apparently fairly sensitive about the subnet mask setup to conform to exact standards.

With this tsocks setup, you won’t have to create special VPNs to lock a LAN machine behind a gateway. As long as you can SSH into the gateway machine from your local machine, you can access the resources behind it with any application on your local machine via tsocks. Nifty, huh?

Fixing undefined library symbols for compiling PHP 5.2.8

So while compiling PHP 5.2.8 on OS X 10.5, you might run into something like:

Undefined symbols for architecture i386:
  "_xmlTextReaderSchemaValidate", referenced from:
      _zim_xmlreader_setSchema in php_xmlreader.o
  "_xmlTextReaderSetup", referenced from:
      _zim_xmlreader_XML in php_xmlreader.o
ld: symbol(s) not found for architecture i386
collect2: ld returned 1 exit status
Undefined symbols for architecture x86_64:
  "_xmlTextReaderSchemaValidate", referenced from:
      _zim_xmlreader_setSchema in php_xmlreader.o
  "_xmlTextReaderSetup", referenced from:
      _zim_xmlreader_XML in php_xmlreader.o
ld: symbol(s) not found for architecture x86_64

The MacPorts folks have encounted similar issues in ticket 15891, but WONTFIX‘ed the issue. Apparently the PHP devs are also punting on the problem.

The immediate cause is that you have multiple versions of some shared libraries. For example, in the case above, I have two libxml versions — one in /usr/lib, and another in /usr/local/lib. This is because I do not want to overwrite the Apple-provided libxml version, but still needed new features provided in later libxml versions. The arrangement works fine in every other software compile except this one, so I investigated further.

The root of the problem

Despite the developers’ airy dismissal of the issue, the underlying problem is indeed that the Makefile generated by PHP at configure time is slightly broken. In Makefile and Makefile.global, you’re going to see this line:

libs/libphp$(PHP_MAJOR_VERSION).bundle: $(PHP_GLOBAL_OBJS) $(PHP_SAPI_OBJS)
        $(CC) $(MH_BUNDLE_FLAGS) $(CFLAGS_CLEAN) $(EXTRA_CFLAGS) $(LDFLAGS) $(EXTRA_LDFLAGS) $(PHP_GLOBAL_OBJS:.lo=.o) $(PHP_SAPI_OBJS:.lo=.o) $(PHP_FRAMEWORKS) $(EXTRA_LIBS) $(ZEND_EXTRA_LIBS) -o $@ && cp $@ libs/libphp$(PHP_MAJOR_VERSION).so

where $MH_BUNDLE_FLAGS is usually defined as something like

MH_BUNDLE_FLAGS = -bundle -bundle_loader /usr/sbin/httpd -L/usr/lib \
 -L/usr/lib -laprutil-1 -lsqlite3 -lexpat -liconv -L/usr/lib -lapr-1 -lpthread

The problem is that this hardcodes the search paths for linking shared libraries. GCC searches for shared libraries to link in the order of the provided -L paths. In this case, MH_BUNDLE_FLAGS is expanded immediately after $CC — so the load order is:

  1. /usr/lib
  2. /usr/lib (these are redundant, and so will probably be collapsed into one path)
  3. …every other custom library path you specify

Now you see the issue. No matter what your library paths are set to, the PHP compilation system will insist that whatever shared libraries in /usr/lib take precedence. Therefore, even if you specified that another version (say, libxml.dylib in /usr/local/lib) should be used instead, the invocation to link against -lxml2 will search in /usr/lib first. And since it finds the old version, which may be missing a number of symbols, the compilation blows up right there.

Evidence

And indeed, if you look at the (rather long and massive) compilation/link command right before it fails, you’ll see:

gcc -bundle -bundle_loader /usr/sbin/httpd -L/usr/lib -L/usr/lib \
-laprutil-1 -lsqlite3 -lexpat  -liconv -L/usr/lib -lapr-1 -lpthread -O2 -I/usr/include -DZTS   \
-arch i386 -arch x86_64 -L/usr/local/lib ... 

emphasis mine, where /usr/local/lib might be /opt/lib or whatever custom path you provided to configure.

Solutions

The trivial solution is to manually invoke that last line of compilation, but swapping the -L load paths.

gcc -bundle -bundle_loader /usr/sbin/httpd -L/usr/local/lib -L/usr/lib \
-L/usr/lib -laprutil-1 -lsqlite3 -lexpat  -liconv -L/usr/lib -lapr-1 -lpthread -O2 -I/usr/include -DZTS   \
-arch i386 -arch x86_64  ... 

This is easy to do and takes just a second.

Another possible solution is to patch the Makefile, such that MH_BUNDLE_FLAGS comes later in the compilation line:

libs/libphp$(PHP_MAJOR_VERSION).bundle: $(PHP_GLOBAL_OBJS) $(PHP_SAPI_OBJS)
        $(CC) $(CFLAGS_CLEAN) $(EXTRA_CFLAGS) $(LDFLAGS) $(EXTRA_LDFLAGS) $(PHP_GLOBAL_OBJS:.lo=.o) $(PHP_SAPI_OBJS:.lo=.o) $(PHP_FRAMEWORKS) $(EXTRA_LIBS) $(ZEND_EXTRA_LIBS) $(MH_BUNDLE_FLAGS) -o $@ && cp $@ libs/libphp$(PHP_MAJOR_VERSION).so

This will force your library paths to be searched before /usr/lib, thus resolving the link problem.

update 7/18/09
An anonymous reader mentions that you could also specify the right libxml by full path, instead of letting it use -lxml. Basically, in the last compilation line, you would remove any mentions of -lxml and replace that with the full path to your library e.g. /usr/local/lib/libxml.dylib. In fact, this is probably the way that has the least possible side-effects, since you aren’t changing the search order for any other libraries.

Discussion

This is not the first time that PHP core developers have refused to fix a compilation issue that is arguably preventable through actual testing under different installation scenarios. This is an “edgier” edge case than the tidy.h issue, but still should be fairly noticeable for a substantial number of people.

The “You should only have one library installed” argument is, to be honest, unnecessarily arrogant (sadly, not as a rare a problem as you’d like in some open source development projects ). On OS X, due to the lack of official Apple package management systems, no one should be overwriting system default libraries — down that way lies insanity, especially at every system or security update. And yet, this build system is obviously broken any time there is a substantial difference between user-installed libraries and system libraries. This bad behavior is especially egregious, because the configure command allows you specify your own library path — misleading users into thinking that the path they specified would be obeyed at compile time. If you only intend for the system library to be used and no other, perhaps the configure script should auto-detect this on OS X and disable that command-line option. Basic user interface design should apply even to command-line interfaces.

Note that changing link ordering may have some unforseen consequences, since the devs obviously never tested this path. For example, you should make sure the dynamic libraries are loaded in the right order at runtime. On OS X, the load path is typically hard-coded into the dylib, so usually there won’t be a problem — but there may be edge cases. Test your build (and any PHP extensions you built) before using it in production!

Testing a POP3 server via telnet or OpenSSL

Sometimes you can’t be bothered to install and setup a command-line mail client and/or VPN, but you still need to access a POP 3 server from a remote machine. Sometimes you just need to know if a POP3 server is working or not. As a largely text-based protocol much like the HTTP protocol, telnet or openssl can be used to talk to a POP3 server and read some mail directly from the command line.

Establishing a connection

To start with, the usual process is to telnet to a POP3 server port, usually on TCP port 110. This would be very simple:
telnet mail.example.com 110

Nowadays, though, most POP3 servers are secured via SSL, usually sitting on port 995. If you try to use telnet on an SSL-only POP3 server, you’ll either get an error “Command is not valid in this state”, such as:


Trying 127.0.0.1...
Connected to mail.example.com.
+OK The Microsoft Exchange POP3 service is ready.
USER yiming
-ERR Command is not valid in this state.

or you’ll get a rather brusque brushoff


Trying 10.0.1.202...
Connected to mail2.example.com.
Escape character is '^]'.
USER yiming
Connection closed by foreign host.

When this is encountered, OpenSSL’s s_client should be used instead to perform the necessary SSL negotiations.

openssl s_client -connect mail.example.com:995

or

openssl s_client -crlf -connect mail.example.com:110 -starttls pop3

The second incantation is typically used for Microsoft Exchange servers. Note the -crlf option, which tells s_client to send \r\n line endings. If the wrong line ending is used for a server, the symptom is that the server will not respond to any commands. It will only sit there and wait for further input, while you are staring at a blank responses or blank lines in your telnet session.

Authentication

Having established a connection, it is now necessary to authenticate as a POP3 user. In the simplest case, plain text authentication is used. In this case, the command USER [username] is used to establish the username, and PASS [password] is used to establish the password in plaintext. (Since the connection is under SSL encryption, presumably this plaintext won’t matter).


+OK Server ready
USER yiming
+OK
PASS foobar
+OK Logged in.

Server interactions

Several commands are useful here.

  • LIST – lists the messages available in the user’s account, returning a status message and list with each row containing a message number and the size of that message in bytes
  • STAT – returns a status message, the number of messages in the mailbox, and the size of the mailbox in bytes
  • RETR [message_num] – returns the message identified by the message number, which is the same as the message number shown in the LIST command output
  • TOP [message_num] [n] – returns the top n lines of the message denoted by message number.

When finished, the QUIT command will end the session.

Conclusion

For other POP3 commands, such as commands marking deletion of a message, refer to RFC 1939, the canonical document defining the Post Office Protocol Version 3 ( POP3 ). At some point, if the commands to be tested become complicated, it may be more efficient use of time to install a mail client such as alpine.

See also my previous post on chatting with HTTP / HTTPS servers.

Building from source package on Debian / Ubuntu to fix sudo PATH issue

So I’ve been kicking around an Ubuntu installation, hoping to replace my aging Fedora 5 deployment. Last time I touched a Debian distro was…well…sufficiently long ago that it’s more or less all new to me.

What’s less new is the sudo path inheritance issue — this one’s been around. Ubuntu’s sudo hard-codes its PATH variable at compile-time with a --secure-path option. I’m sure this sounded like a good idea to the security goon who decided to fix this at fsckin’ COMPILE TIME with no way to override it in sudoers, or at runtime with -E after an env_reset. The policy may have been reasonable when it was set on a typical Debian stable server (where software is basically left to fossilize over decades), but certainly not on a constantly changing desktop distro. You can’t even sudo to any /opt/bin binaries! Read the Ubuntu bug report on sudo not preserving PATH.

Long story short, after a lot of experiments looking for workarounds (that won’t eventually take years off my life, one sudo command at a time), I decided to cut the Gordian knot and recompile sudo. Since I didn’t want to roll this from source (and incur all the maintenance hassle of removing/updating the software later on), this meant figuring out compiling source packages with dpkg — oh joy.

Debian source package compilation: the general process

It’s surprisingly non-painful compared with my RPM experience. The long way around:

  1. cd into a temp or source-keeping directory in your user account
  2. retrieve the source package: apt-get source [packagename]
  3. grab missing build dependencies: sudo apt-get build-dep [packagename]
  4. cd into the directory created for the package in your pwd (you can safely ignore the original tarball and the patch file, which have been untarred and applied for you already, respectively). Make edits to the source as needed.
  5. If you need to change configure options for the source package, look in the file debian/rules in the source directory
  6. when satisfied, build the binary package by issuing this incantation in the $PWD ( you’ll need the fakeroot package if you don’t already have it ):
    dpkg-buildpackage -rfakeroot -uc -b
  7. The completed .deb packages are placed in the parent directory, one level up from the source directory. cd back up one level.
  8. install: sudo dpkg -i [packagename].deb

If you’re screwing around with sudo, you will want to have a sudo tty session open before installing your replacement package, in case you screw up everything and lock yourself out.

A shortcut is potentially available using the -b switch to apt-get when you grab from source. However, I needed to look through configuration files and source code, so I took the long way around.

The easiest way to fix the sudo secure_path issue is to remove the --with-secure-path configuration option in debian/rules, in two places in that file. If you do this, pay attention to your $PATH and make sure they are sane (for example: it shouldn’t contain a globally writeable directory), as it will be inherited in sudo shells. In sudo 1.7, there is a runtime secure_path option for the sudoers file, so that would be the ideal, non-annoying solution to this issue.

Hard-coding the sudo PATH at compile-time tilts heavily toward security in the security/usability tradeoff — YMMV, but I find it entirely not worth it on a desktop distribution.

check last exit status code in bash shell

In case you’re curious (while debugging a program or a script) about the exit status code returned by the last shell command you ran, the incantation to retrieve it in the bash shell is:

echo $?

Given the nature of this variable (no one indexes text like ‘$?’), it’s annoyingly hard to Google for.