fixing a scrambled IPython command history on stock OS X 10.6

So I started over with a fresh install of OS X 10.6 recently, and wanted to restore my Python development environment. In doing so, IPython is absolutely essential if you want a sane interpreter environment to test out code. I had a bit of trouble with it though.

The Problem

The stock Python 2.6 shipped with OS X 10.6 Snow Leopard has a readline module linked to libedit, the BSD alternative to the GPL’ed readline. The readline module, if you are not aware, is (among other things) responsible for keeping command history in the IPython interpreter. This causes command history in the IPython 0.10 interpreter to behave in very odd ways. When backtracking through the command history buffer using the up-arrow key, for example, the previous command is only partially recalled, and appears completely scrambled. Indents, too, seem off — in a whitespace-sensitive language like Python, this is annoying. (See first figure)

IPython command interpreter is broken when using libedit with command history
IPython command interpreter is broken when using libedit with command history

Fixing IPython’s bugs are beyond my ability. While I certainly don’t want to delve into the quagmire that is GPL vs BSD licensing, I do understand why Apple would want to avoid the viral nature of the GPL and ship libedit instead. However, using a genuine Readline library is going to be the best recourse for this problem. I already have a copy of readline compiled and ready to go, and just need a new version of readline.so, the library that links Python to readline.

The easy solution

Sifting through my records, I came across a SelfSolved problem record from my good friend Hannes who had issues with his IPython command history.

The solution: sudo easy_install readline, which uses setuptools to install a precompiled package of readline.so statically linked to genuine GNU readline. Restart your IPython console and everything should work. (See second figure)

IPython with readline
IPython with readline

The hard solution

Being the inquisitive sort, I also wondered how I would be able to reproduce this work from scratch. readline.so ships with the Python source package, but surely I would not be required to compile a whole new copy of Python for one measly module library?

I documented this process in SelfSolved again: building readline.so for Python. At some point I should write an interface between SelfSolved and WordPress so that I don’t have to reproduce a lot of my work here manually.

Compiling readline.so

This is actually fairly easy.

  1. Get a copy of the Python source code. In OS X 10.6, it ships with Python 2.6.1.
  2. Unpack it and go into its directory. You should find a Modules subdirectory. In it is readline.c, the source file for readline.so.
  3. Compile the source file. The appropriate incantation is:
    gcc -O2 -arch x86_64 -arch i386 -shared -o readline.so readline.c -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -L/usr/local/lib -lreadline -ltermcap -framework Python

    where the -arch flags should be whatever processors you wish to support, the -I arguments should point to the directories that contain header files for the readline library and the Python framework, and the -L argument should point to the path for the readline library. Use whatever optimization flags you feel comfortable with, instead of -O2, if you wish.

Replacing readline.so

So now we have a readline.so that’s properly linked to readline.dylib. The thornier question is how to override the system-provided readline.so. The system version is located at /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/readline.so, and the naive would simply overwrite it with their new readline.so. This is a bad idea.

As I have mentioned in the past, overwriting system libraries in OS X is an unhealthy thing to do. The problem is that Apple furnishes no official package management system — anything you personally change is considered fair game for the next official system update. On the next system update, if the Python component is affected by the update, the Apple updater will happily clobber your compiled files with its own, leaving you suddenly back at square one. You don’t know how many times I’ve had to recompile emacs (for X11 support) on OS X 10.4 because of this little annoyance. Leave things in the /System/Library directory hierarchy alone, for your own sanity.

However, in this case /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload comes ahead of the user-modifiable /Library/Python/2.6/site-packages directory on Python’s sys.path. So if you just drop readline.so into site-packages, the system version still takes priority.

There are a few ways to do this. For one, you can create a sitecustomize.py in /Library/Python2.6/site-packages. In this file, arbitrary Python statements can be written, and the interpreter will automatically execute them at runtime. So, you can add a sys.path = ["/dir/here"] + sys.path statement and point it to a directory containing your readline.so file. Alternatively, you can abuse the technique used in the easy_install.pth file. It turns out that if you ever used easy_install, directories pointed to by the easy_install.pth file takes priority over the system paths. They use an interesting way to accomplish this, which you can copy. Or, you can just insert your directory containing readline.so into easy_install.pth. In any case, this will force the readline-based readline.so to take precedence over the libedit-based readline.so, without overwriting anything.

Discussion

So for any sane person, the easy solution should be enough. For the rest, the hard solution is an interesting exploration of how some of Python’s built-in modules can be compiled and inserted individually.

Subversion 1.6.2 runtime error on network access on OS X 10.5

A new SelfSolved solution is up for perusal. The problem I tried to solve:

After compiling Subversion 1.6.2 from source on OS X 10.5 Leopard, the compilation is apparently successful, but svn dies when it tries to connect to the network for the first time. Crash log reports that symbols are missing from libneon.dylib.

Crash report from shell:

dyld: lazy symbol binding failed: Symbol not found: _ne_set_connect_timeout
Referenced from: /usr/local/lib/libsvn_ra_neon-1.0.dylib
Expected in: dynamic lookup

dyld: Symbol not found: _ne_set_connect_timeout
Referenced from: /usr/local/lib/libsvn_ra_neon-1.0.dylib
Expected in: dynamic lookup

Check out the places that I googled and my final solution writeup … at SelfSolved #49: Subversion 1.6.2 explodes on first network access.

The problem is very similar to a previous compilation issue I solved for PHP. In essence, the -L library search path passed to GCC at compilation time has /usr/lib in front of everything else. This means whatever library path you might have given to it at configure time, it’ll always look for the library in /usr/lib first, picking up the old system libneon in the process. Since the bad libneon dynamically linked, the problem doesn’t manifest itself until runtime — and only at runtime with network access involved.

As with the PHP issue, change the very first -L/usr/lib to -L/usr/local/lib (or wherever your newer libneon is located), and it’ll link correctly.

Out of curiosity, I checked MacPorts first. The MacPorts solution of disabling libneon version checking is odd — it also works, but I dunno if it’s linking to the right thing or not.

SSH, Subversion through SOCKS proxy on Mac OS X

UPDATE Apr 2, 2012
Due to the complete lack of updates for tsocks, I recommend the use of proxychains over tsocks. It accomplishes the same thing but works out of the box.

One persistent problem that I run into is that I need to access certain network resources through a SOCKS proxy server. This is all well and good if they are web resources — Safari, Firefox, etc. support SOCKS proxies quite well. However, I also need, for example, SSH and Subversion access to some resources. SOCKS support is woefully inadequate or nonexistent in these tools.

In the case of SSH, even if you google for this, you’ll run through thousands of examples of using ssh as a SOCKS server, but not through one as a SOCKS client. There are some convoluted solutions, but none of them I can use directly on an OS X 10.5 machine.

TSocks: the solution…if it were that easy

Now, tsocks is a nifty little tool to transparently divert network calls through a SOCKS 4 or SOCKS 5 proxy. This allows even non-SOCKS-aware applications to function through a SOCKS server.

Unfortunately it is very old, unmaintained code (1.8 beta 5 was released in 2002). It doesn’t compile cleanly on OS X due to this, nor will it compile under GCC 4.x. Further, it won’t work out of the box either if you do manage to compile it. The problem is that it relies on the Linux-only LD_PRELOAD functionality to use a shared library to hijack network system calls. This mechanism is called DYLD_INSERT_LIBRARIES on OS X and only works if DYLD_FORCE_FLAT_NAMESPACES is active.

Getting a working tsocks: MacPorts

There is an easy way to get tsocks. MacPorts ships a ported tsocks package. If you use MacPorts, sudo port install tsocks should do it.

Unfortunately on several machines I don’t use MacPorts, and don’t want to pull down an entire third-party package manager with its own library tree on each of these boxes. So I have do to this the hard way.

Getting a working tsocks: rolling my own

First to notice is that there are two tsocks distributions. One is the original tsocks 1.8b5, last updated in the first half of this decade. To make it work, follow the instructions provided by Marc Abramowitz in 2006. Note that his patch is actually located at his new domain address instead of the old, linked one.

The MacPorts distribution, on the other hand, is based on R. Garcia’s patched tsocks distribution, incorporating some modernization and new features by the Tor team. This distribution is numbered 1.8.x, with the last being 1.8.4. Unfortunately it is also no longer maintained, as the Tor devs forked this into a custom version to use with the Tor network only. Unfortunate, but for now, it still compiles, and works a bit better than the 2002 original.

To roll your own tsocks via source out of the MacPorts distribution, you will want the patches from the MacPorts repository. An outline of the compilation procedure:

  1. Download tsocks 1.8.4 from the author’s page
  2. Download all the patches from the MacPorts repository
  3. Concatenate all of the patches together:
    cat patch-* > tsocks.osx.patch
  4. Put the concatenated tsocks.osx.patch file into the tsocks source directory. Apply the patches:
    patch -p0 < tsocks.osx.patch
  5. Regenerate the configure script:
    autoreconf
  6. Configure the package:
    ./configure --prefix=/usr/local --bindir=/usr/local/bin --mandir=/usr/local/man --sysconfdir=/etc --libdir=/usr/local/lib
  7. Install the library and binaries:
    sudo make install
  8. Install the conf file:
    sudo cp ./tsocks.conf.complex.example /etc/tsocks.conf
  9. Edit the conf file. Make sure that if you’re not using tor, that you write in the conf file
    tordns_enable = false

Configuring tsocks

The complex configuration file example should have explained all of the features to be set. For my configuration:

Some important settings:

  • local – this setting, in the format of IP/netmask can be repeated several times, each time to exclude a set of IPs from being diverted to the SOCKS server. For obvious reasons, your SOCKS server will have to exist in one of these excluded IP ranges – otherwise you will never even reach your proxy server.
  • server and server_port – these should point to the IP address and port of your SOCKS server
  • server_typetsocks defaults to SOCKS4 mode. You may wish to set it to 5 for SOCKS5 usage.
  • tordns_enable – this needs to be set as false if you don’t use Tor.

Using tsocks

Once this is set up, simply prefixing the network command you want to run with tsocks will force a diversion through the proxy connection. For example:

tsocks ssh example.com

The same can be applied to Subversion.

tsocks svn update

will force the svn client to act through the proxy set in tsocks.conf.

SOCKS on localhost

Note that SOCKS services on 127.0.0.1 has a minor gotcha. Sometimes, you are able to SSH into a remote machine, and use that connection as your SOCKS server. This is described in my post about using SSH as a pseudo-VPN, which describes the -D switch. My use case here is that once you do this, all further local SSH connections to other machines should be diverted through the first SSH. For example, I’d like to do:

my-machine$ ssh -D 40000 gateway.example.com # establish a SOCKS server on localhost:40000 to the gateway host

and then:

my-machine$ ssh lan-1.example.com # access the protected lan-1 machine through the SOCKS, which will see me as gateway.example.com 

This is very doable in the tsocks setup if you set tsocks.conf:

server = 127.0.0.1/255.255.255.255
server_port = 40000

and then:

my-machine$ ssh -D 40000 gateway.example.com
my-machine$ tsocks ssh lan-1.example.com

This is the gotcha: make sure the netmask is set correctly to 255.255.255.255. Otherwise tsocks will die with a cryptic:

IP (127.0.0.1) & SUBNET (0.0.0.0) != IP on line 22 in configuration file, ignored

It is apparently fairly sensitive about the subnet mask setup to conform to exact standards.

With this tsocks setup, you won’t have to create special VPNs to lock a LAN machine behind a gateway. As long as you can SSH into the gateway machine from your local machine, you can access the resources behind it with any application on your local machine via tsocks. Nifty, huh?

Fixing undefined library symbols for compiling PHP 5.2.8

So while compiling PHP 5.2.8 on OS X 10.5, you might run into something like:

Undefined symbols for architecture i386:
  "_xmlTextReaderSchemaValidate", referenced from:
      _zim_xmlreader_setSchema in php_xmlreader.o
  "_xmlTextReaderSetup", referenced from:
      _zim_xmlreader_XML in php_xmlreader.o
ld: symbol(s) not found for architecture i386
collect2: ld returned 1 exit status
Undefined symbols for architecture x86_64:
  "_xmlTextReaderSchemaValidate", referenced from:
      _zim_xmlreader_setSchema in php_xmlreader.o
  "_xmlTextReaderSetup", referenced from:
      _zim_xmlreader_XML in php_xmlreader.o
ld: symbol(s) not found for architecture x86_64

This doesn’t only happen with libxml. If you’ve installed any extra updated libraries, like iconv or tidy or any library that has significant symbol changes between versions, it’ll die in similar ways. The MacPorts folks have encounted similar issues in ticket 15891, but WONTFIX‘ed the issue. Apparently the PHP devs are also punting on the problem.

The immediate cause is that you have multiple versions of some shared libraries. For example, in the case above, I have two libxml versions — one in /usr/lib, and another in /usr/local/lib. This is because I do not want to overwrite the Apple-provided libxml version, but still needed new features provided in later libxml versions. The arrangement works fine in every other software compile except this one, so I investigated further.

The root of the problem

Despite the developers’ airy dismissal of the issue, the underlying problem is indeed that the Makefile generated by PHP at configure time is slightly broken. In Makefile and Makefile.global, you’re going to see this line:

libs/libphp$(PHP_MAJOR_VERSION).bundle: $(PHP_GLOBAL_OBJS) $(PHP_SAPI_OBJS)
        $(CC) $(MH_BUNDLE_FLAGS) $(CFLAGS_CLEAN) $(EXTRA_CFLAGS) $(LDFLAGS) $(EXTRA_LDFLAGS) $(PHP_GLOBAL_OBJS:.lo=.o) $(PHP_SAPI_OBJS:.lo=.o) $(PHP_FRAMEWORKS) $(EXTRA_LIBS) $(ZEND_EXTRA_LIBS) -o $@ && cp $@ libs/libphp$(PHP_MAJOR_VERSION).so

where $MH_BUNDLE_FLAGS is usually defined as something like

MH_BUNDLE_FLAGS = -bundle -bundle_loader /usr/sbin/httpd -L/usr/lib \
 -L/usr/lib -laprutil-1 -lsqlite3 -lexpat -liconv -L/usr/lib -lapr-1 -lpthread

The problem is that this hardcodes the search paths for linking shared libraries. GCC searches for shared libraries to link in the order of the provided -L paths. In this case, MH_BUNDLE_FLAGS is expanded immediately after $CC — so the load order is:

  1. /usr/lib
  2. /usr/lib (these are redundant, and so will probably be collapsed into one path)
  3. …every other custom library path you specify

Now you see the issue. No matter what your library paths are set to, the PHP compilation system will insist that whatever shared libraries in /usr/lib take precedence. Therefore, even if you specified that another version (say, libxml.dylib in /usr/local/lib) should be used instead, the invocation to link against -lxml2 will search in /usr/lib first. And since it finds the old version, which may be missing a number of symbols, the compilation blows up right there.

Evidence

And indeed, if you look at the (rather long and massive) compilation/link command right before it fails, you’ll see:

gcc -bundle -bundle_loader /usr/sbin/httpd -L/usr/lib -L/usr/lib \
-laprutil-1 -lsqlite3 -lexpat  -liconv -L/usr/lib -lapr-1 -lpthread -O2 -I/usr/include -DZTS   \
-arch i386 -arch x86_64 -L/usr/local/lib ... 

emphasis mine, where /usr/local/lib might be /opt/lib or whatever custom path you provided to configure.

Solutions

The trivial solution is to manually invoke that last line of compilation, but swapping the -L load paths.

gcc -bundle -bundle_loader /usr/sbin/httpd -L/usr/local/lib -L/usr/lib \
-L/usr/lib -laprutil-1 -lsqlite3 -lexpat  -liconv -L/usr/lib -lapr-1 -lpthread -O2 -I/usr/include -DZTS   \
-arch i386 -arch x86_64  ... 

This is easy to do and takes just a second.

Another possible solution is to patch the Makefile, such that MH_BUNDLE_FLAGS comes later in the compilation line:

libs/libphp$(PHP_MAJOR_VERSION).bundle: $(PHP_GLOBAL_OBJS) $(PHP_SAPI_OBJS)
        $(CC) $(CFLAGS_CLEAN) $(EXTRA_CFLAGS) $(LDFLAGS) $(EXTRA_LDFLAGS) $(PHP_GLOBAL_OBJS:.lo=.o) $(PHP_SAPI_OBJS:.lo=.o) $(PHP_FRAMEWORKS) $(EXTRA_LIBS) $(ZEND_EXTRA_LIBS) $(MH_BUNDLE_FLAGS) -o $@ && cp $@ libs/libphp$(PHP_MAJOR_VERSION).so

This will force your library paths to be searched before /usr/lib, thus resolving the link problem.

update 7/18/09
An anonymous reader mentions that you could also specify the right libxml by full path, instead of letting it use -lxml. Basically, in the last compilation line, you would remove any mentions of -lxml and replace that with the full path to your library e.g. /usr/local/lib/libxml.dylib. In fact, this is probably the way that has the least possible side-effects, since you aren’t changing the search order for any other libraries.

Discussion

This is not the first time that PHP core developers have refused to fix a compilation issue that is arguably preventable through actual testing under different installation scenarios. This is an “edgier” edge case than the tidy.h issue, but still should be fairly noticeable for a substantial number of people.

The “You should only have one library installed” argument is, to be honest, unnecessarily arrogant (sadly, not as a rare a problem as you’d like in some open source development projects ). I understand that it’s an open source project, and no self-respecting software engineer likes to use time on project plumbing / build systems rather than work on the product. However, on OS X, due to the lack of official Apple package management systems, no one should be overwriting system default libraries — down that way lies insanity, especially at every system or security update. PHP’s build system is obviously broken any time there is a substantial difference between user-installed libraries and system libraries. This bad behavior is especially egregious, because the configure command allows you specify your own library path — misleading users into thinking that the path they specified would be obeyed at compile time. If you only intend for the system library to be used and no other, perhaps the configure script should auto-detect this on OS X and disable that command-line option. Basic user interface design should apply even to command-line interfaces.

Note that changing link ordering may have some unforseen consequences, since the devs obviously never tested this path. For example, you should make sure the dynamic libraries are loaded in the right order at runtime. On OS X, the load path is typically hard-coded into the dylib, so usually there won’t be a problem — but there may be edge cases. Test your build (and any PHP extensions you built) before using it in production!

Building from source package on Debian / Ubuntu to fix sudo PATH issue

So I’ve been kicking around an Ubuntu installation, hoping to replace my aging Fedora 5 deployment. Last time I touched a Debian distro was…well…sufficiently long ago that it’s more or less all new to me.

What’s less new is the sudo path inheritance issue — this one’s been around. Ubuntu’s sudo hard-codes its PATH variable at compile-time with a --secure-path option. I’m sure this sounded like a good idea to the security goon who decided to fix this at fsckin’ COMPILE TIME with no way to override it in sudoers, or at runtime with -E after an env_reset. The policy may have been reasonable when it was set on a typical Debian stable server (where software is basically left to fossilize over decades), but certainly not on a constantly changing desktop distro. You can’t even sudo to any /opt/bin binaries! Read the Ubuntu bug report on sudo not preserving PATH.

Long story short, after a lot of experiments looking for workarounds (that won’t eventually take years off my life, one sudo command at a time), I decided to cut the Gordian knot and recompile sudo. Since I didn’t want to roll this from source (and incur all the maintenance hassle of removing/updating the software later on), this meant figuring out compiling source packages with dpkg — oh joy.

Debian source package compilation: the general process

It’s surprisingly non-painful compared with my RPM experience. The long way around:

  1. cd into a temp or source-keeping directory in your user account
  2. retrieve the source package: apt-get source [packagename]
  3. grab missing build dependencies: sudo apt-get build-dep [packagename]
  4. cd into the directory created for the package in your pwd (you can safely ignore the original tarball and the patch file, which have been untarred and applied for you already, respectively). Make edits to the source as needed.
  5. If you need to change configure options for the source package, look in the file debian/rules in the source directory
  6. when satisfied, build the binary package by issuing this incantation in the $PWD ( you’ll need the fakeroot package if you don’t already have it ):
    dpkg-buildpackage -rfakeroot -uc -b
    Use -nc if you mess up and need to continue a build.
  7. The completed .deb packages are placed in the parent directory, one level up from the source directory. cd back up one level.
  8. install: sudo dpkg -i [packagename].deb

If you’re screwing around with sudo, you will want to have a sudo tty session open before installing your replacement package, in case you screw up everything and lock yourself out.

A shortcut is potentially available using the -b switch to apt-get when you grab from source. However, I needed to look through configuration files and source code, so I took the long way around.

The easiest way to fix the sudo secure_path issue is to remove the --with-secure-path configuration option in debian/rules, in two places in that file. If you do this, pay attention to your $PATH and make sure they are sane (for example: it shouldn’t contain a globally writeable directory), as it will be inherited in sudo shells. In sudo 1.7, there is a runtime secure_path option for the sudoers file, so that would be the ideal, non-annoying solution to this issue.

Hard-coding the sudo PATH at compile-time tilts heavily toward security in the security/usability tradeoff — YMMV, but I find it entirely not worth it on a desktop distribution.

APR and 32-bit/64-bit universal binary compilation

When compiling APR, the Apache Portable Runtime 1.3.3 (as a part of Subversion 1.5.3 as I am doing here, or not) on OS X 10.5 Leopard, you may encounter the following error at compile time.

/bin/sh /tmp/subversion-1.5.3/apr/libtool --silent --mode=compile gcc-4.2 -Os -arch i386 -arch x86_64 -DHAVE_CONFIG_H -DDARWIN -DSIGPROCMASK_SETS_THREAD_MASK -no-cpp-precomp -I./include -I/tmp/subversion-1.5.3/apr/include/arch/unix -I./include/arch/unix -I/tmp/subversion-1.5.3/apr/include/arch/unix -I/tmp/subversion-1.5.3/apr/include -o strings/apr_snprintf.lo -c strings/apr_snprintf.c && touch strings/apr_snprintf.lo
strings/apr_snprintf.c: In function ‘conv_os_thread_t’:
strings/apr_snprintf.c:500: error: duplicate case value
strings/apr_snprintf.c:498: error: previously used here
strings/apr_snprintf.c: In function ‘conv_os_thread_t_hex’:
strings/apr_snprintf.c:671: error: duplicate case value
strings/apr_snprintf.c:669: error: previously used here

This will most likely happen when you are configured to build a dual 32-bit / 64-bit universal binary, whether it be ppc / ppc64, or i386 / x86_64, or any permutation thereof. This ticket over at MacPorts documents a particular instance of this problem, with no apparent solution.

The symptom is easy to explain. Somehow, two case labels in the relevant switch statement in strings/apr_snprintf.c:500:

switch(sizeof(u.tid)) {
    case sizeof(apr_int32_t):
        return conv_10(u.u32, TRUE, &is_negative, buf_end, len);
    case sizeof(apr_int64_t):
        return conv_10_quad(u.u64, TRUE, &is_negative, buf_end, len);
    default:
        /* not implemented; stick 0 in the buffer */
        return conv_10(0, TRUE, &is_negative, buf_end, len);
    }

have evaluated to the same value. In particular, it believes that sizeof(apr_int32_t) and sizeof(apr_int64_t) are the same value. As we all know in C, you cannot have two identical case labels in the same switch statement. However, the root of the problem is a bit more subtle.

In $SRCDIR/include/apr.h, you’re likely to see this fragment of code.

typedef  long       apr_int64_t;
typedef  unsigned long  apr_uint64_t;

Notice that it has typdef’ed apr_int64_t as a long and apr_uint64_t as unsigned long. This is because at configure time, the script detected that long values are 64-bit on this system, so it assigned the apache 64-bit types to longs. However, this only holds true for half of the compilation – because you are building a universal binary for a 32-bit architecture as well. Remember that in 32-bit GCC on OS X, longs are 32-bit rather than 64-bit. Your run-of-the-mill autoconf script, done by a non-OS X programmer, isn’t going to be able to detect this subtlety – if the 64-bit part worked, it’ll keep thinking longs are 64-bit, end of story – and happily generate the incorrect typedef expressions. When you apply sizeof to these types in apr_snprintf.c, both evaluate to 4 bytes under 32-bit compilation, thus blowing up the compile run.

To truly fix the root of the problem requires rewriting the autoconf script to detect Mac OS X and its universal binary building, which can potentially throw quadruple architectures at the same compilation script. However, a quick hack to make this particular problem go away is to change apr.h such that:

typedef  long long       apr_int64_t;
typedef  unsigned long long apr_uint64_t;

Now that we ensure in either 32-bit or 64-bit compilation, apr_int64_t and apr_uint64_t are always typedef’ed to appropriate, guaranteed 64-bit types. The compilation of APR (and Subversion) will proceed normally.

Note that long long is not an standard C type. As a GCC extension, this fix is a kludge. A kludge that works (for me), though.

UPDATE:
There may also be an issue with sizeof definitions that may cause the library to crash. In particular, there may be occurrences of

#define APR_SIZEOF_VOIDP 8

that were generated by configure. To fix this, you will need to remove the define and have the compiler check for 64-bit at compile-time:

#ifndef __LP64__
    #define APR_SIZEOF_VOIDP 4
#else
    #define APR_SIZEOF_VOIDP 8
#endif

In general, any predefined sizeofs need to be changed. I am not sure why the APR developers do hard-coded defines like this, given that the point of having sizeof() calls is to avoid such issues.

GNUMake 3.80 and failed malloc

Wacky compilation error:

Creating config.mak and config.h...
make(4162) malloc: *** vm_allocate(size=4272971776) failed (error code=3)
make(4162) malloc: *** error: can't allocate region
make(4162) malloc: *** set a breakpoint in szone_error to debug
make[1]: *** virtual memory exhausted. Stop.

This error has nothing to do with the particular code you’re compiling. Turns out GNU Make 3.80 has a bug, in which it sometimes sends a ridiculous requested block size to malloc(), which then fails to allocate the space.

The solution is to upgrade to GNUMake 3.81. Particularly pernicious on OS X 10.4 and Xcode 2.4.1.

AVCodecContext, Perian, and the Xcode dylib fetish

The Context

So recently I was trying to interact with an avi file with a video stream encoded in H.264 and an audio stream encoded in MP3. To my surprise, my trusty Perian 1.1 failed to even play the video and gave me a black screen (though the audio played just fine). With some Google-fu, it was found in a forum post that the SVN version of Perian fixed this error.

So compile my own Perian 1.1.1b1 from SVN, right? No big deal. Heh. And thus begins our story. (Note that this procedure is accurate as of the time of writing. OSS projects change rapidly as bugfixes are checked in. You may not have this problem at all. But it’s still useful as a reference in the future, in case similar problems arise).

Problem 1: AVCodecContext.bits_per_sample

The first problem you hit is probably:
/tmp/perian-orig/ff_private.c:181: error: ‘struct AVCodecContext’ has no member named ‘bits_per_sample’

/tmp/perian-orig/FFissionCodec/FFissionDecoder.cpp:289: error: ‘struct AVCodecContext’ has no member named ‘bits_per_sample’

Turns out that the upstream project FFMPEG, who is responsible for libavcodec, has very recently changed the API in one of its latest revisions. bits_per_sample has become
bits_per_coded_sample in revision 15262. Because Perian’s SVN does an external SVN checkout of the HEAD of libavcodec, it pulled these API-breaking changes without considering whether that breaks its own compilation or not.

The fix is simple enough: global search and replace the member variable to bits_per_coded_sample.

Problem 2: Xcode’s unnatural fondness for dynamic libraries

At this point, libavcodec compiles, so you think you’re home free. For most users, that is probably the case. For some of us, though, there’s one more hurdle.

While compiling Perian.component, you might get something like:
ld warning: in /Developer/SDKs/MacOSX10.4u.sdk/usr/local/lib/libavcodec.dylib, file is not of required architecture
Undefined symbols:
"_avcodec_close", referenced from:

…and so on

Closer examination reveals that at some point, you might have compiled FFMPEG or MPlayer or any of the many media projects that depend on FFMPEG libraries (for me, that was transcode, whose own compilation idiosyncracies I documented in an earlier post). When you installed these libraries, they went into your library search path. Note the Perian compilation line immediately preceding the compile failure:
/Developer/usr/bin/g++ ... -read_only_relocs suppress -lavcodec -lavformat -lavutil -lebml -lmatroska ... /tmp/perian/UniversalDetector/build/Release/libuniversaldetector.a -lbz2 -o /tmp/perian/build/Deployment/Perian.component/Contents/MacOS/Perian

As you can see, the -lavcodec linker argument will cause the linker go on the search path and try to find the library. It of course first finds your pre-installed library in /usr/local/lib or some other system lib path, and tries to use it. If you compiled that library as Intel only or PPC only and you tried to build a universal Perian, that would obvious fail and generate an architecture error. If you, like me, compiled only for x86_64 or ppc64, that would also trigger the same architecture problem (as Perian builds as x86 32-bit).

This problem stumped a user “vs”, in this Perian-discuss Google groups thread. Notice how the devs didn’t get the root of the problem, and answered snarkily and defensively to the user bug report.

The devs aren’t wrong, though. Didn’t we just see a statically compiled target, libavcodec.a, in the target list for the Perian Xcode project? Why yes, yes we did! In fact, there are static targets for all of the -lav* libraries in the Xcode project. So why did the linker go off and use -lavcodec rather than, say, $PERIAN_DIR/build/Universal/libavcodec.a, as instructed?

The workaround

In short: Xcode is screwing things up. Xcode 3.1 has a distinct (some say, unnatural) preference for dynamic libraries. When you add a library to the “Link Binary With Library” listing under Targets, it simply adds a -l<libname> to the link line to pass to ld at link time. When this happens, it will look for lib<libname>.dylib (in this case, libavcodec.dylib, libavformat.dylib, libavutil.dylib, etc in your library path. If it finds one, it will use that one rather than the static library you provided in the Xcode project file. Obviously, this ends up linking the wrong library, perhaps with the wrong architecture or the wrong version. Compilation explodes.

The workarounds are not very clean. There seems to be no good way to instruct Xcode to forget its dynamic library fetish and use the static versions you provide. You could remove the dylibs from the system library path entirely — when the linker fails to find the .dylib version, it will fall back to the .a.

Alternatively, you can explicitly instruct it to use static linking as an environment flag and remove the built-in Xcode linker relationships:

  1. Right-click on your Target (in this case, Perian) and Get Info to bring up the Target Info box.
  2. Go to the Build tab
  3. Select your Build configuration – in this case, probably Deployment
  4. Find or locate OTHER_LD_FLAGS, also called “Other Linker Flags”
  5. Manually add the paths to the .a files in this field, one at a time. If you don’t know the path, right-click -> Get Info on the desired libraries under Link Binary With Library, and it should show you a field called Full Path on the General tab. That’s the path to use. Should look something like /tmp/perian/build/Universal/libavcodec.a
  6. Do the same for all offending libraries – this may include libavcodec, libavformat, libavutil, libmatroska, etc.
  7. Remove the libraries you added to Other Linker Flags from your Xcode’s Target -> Target Name -> Link Binary With Library section, to stop Xcode from auto-generating broken linker flags. This is important.

If done correctly, this should no longer find the wrong type of the library. Your compilation of Perian should proceed as normal.

The Moral of the Story

Xcode has some weird idiosyncratic ways of building its projects. It’s useful to poke at things and find out about these behaviors. Its unnatural preference for dynamic libraries over your explicit instructions is disturbing, though.

Don’t attack your users when they pose a problem, and drop the attitude of: ooo, we’re doing this for free, so fsck off, users. In both problems, the exact same attitude surfaced amongst a few of the devs. Arrogance (“we don’t have the responsibility to do anything for you”) and snarky dismissal (‘Well, there’s your problem. We don’t build libavcodec.dylib nor do we place it in /usr/local/lib.’) helps no one. No matter what you thought your build script did, the linker is heading off toward /usr/local/lib, and you better damn well find out why.

It’s obvious that there’s something peculiar going on with the link process, and the user tried to make it clear multiple times (“You missed my point. We build a .a file and we don’t put it in /usr/local/lib.”), but no one seems to pay attention. Following that thread, the user is increasingly frustrated and expresses that, and now the dev’s hammer of arrogance comes down hard ( “You looked like a prick here, to be honest.” — really? not your obtuse-ness or your blindness to a user need?)

When you have a “works-for-me” situation, it’s especially important to communicate effectively with the user. In most cases, you have no idea why it blows up for the user and not for you. Admitting it is fine — could be a very localized issue for that user, or could reveal a subtle systemic problem in your code – either way, it’s useful to have that established. I realize we’re all very busy people, but at the end of the day it’s still your project. Either acknowledge you have no idea what’s going on and solicit help and cooperation with the user who’s having this issue, or man up, put on your detective hat, and fix your own damn problems. The rest of us are busy with our own issues.

The Free Software culture seems to breed this sort of contempt for the user in some devs. Might be a paper in analyzing how corrosive this can be.

UPDATED: I certainly do not mean this last section as a criticism for all Free Software devs. The vast majority of people I’ve talked to in the various communities are really knowledgeable, helpful folks. There are just a few people in some cases, though, who seem to think that if the user can’t make the software work, it’s somehow the user’s fault. It’s a joint problem to be solved together, not “you’re an idiot if you can’t make it work”.

UPDATED 2 (9-29-2008): Perian 1.1.1 has been released. This should fix the original problem with H.264 and .avi files that prompted this post.

transcode 1.0.5 invalid immediate error on Intel OS X

If you’re compiling transcode 1.0.5 from source on OS X Intel (for some odd reason, as I’ve just had to do on 10.5.4), your compile run may blow up in aclib with:


tcmemcpy.c:30:missing or invalid immediate expression `0b111' taken as 0
tcmemcpy.c:30:suffix or operands invalid for `and'
tcmemcpy.c:42:missing or invalid immediate expression `0b1000' taken as 0
tcmemcpy.c:42:suffix or operands invalid for `test'
tcmemcpy.c:52:missing or invalid immediate expression `0b1111' taken as 0

The problem here appears to be that the code in tcmemcpy.c under x86 and x86_64 relies on some inline assembly features not supported by all compiler/assemblers. Namely, the use of 0b (binary immediate) operands in assembly instructions is not apparently supported during compilation under Apple gcc and gcc-4.2.

Setting the CCAS flag did not seem to help. A quick fix, then, is to convert all binary immediates in the asm instructions to hex or decimal.

Thankfully, these operands only existed in tcmemcpy, so there’s not too much work. Look for 0bxxxx immediates and convert them to their hex or decimal equivalent. For example, 0b111 is 0x7, 0b1000 is 0x8, 0b1111 is 0xf. Thus, a line like:
and $0b111, %%eax # ... which is the number of bytes to copyn

becomes
and $0x7, %%eax # ... which is the number of bytes to copyn

Once these operands are modified, the compiler should no longer complain about invalid immediate expressions.

In any case, the latest CVS version of transcode should have resolved these problems already. The workaround should only be necessary if you still must compile the release tarball for 1.0.5.

Updated: There is a second potential problem if you were using ffmpeg-SVN and transcode 1.0.5, in which its hard-coded include lookup for avcodec.h always looks for ffmpeg/avcodec.h. The SVN version of ffmpeg has moved these headers to libavcodec/avcodec.h. A path patch for transcode 1.0.5 and ffmpeg is available, which basically boils down to changing the #include headers and the configure.in file to look for the new path.

Updated: A third problem has now cropped up, with the ffmpeg API change that moved AVCodecContext’s bits_per_sample to bits_per_coded_sample. A global search and replace seems to work for now. See this also in my post on compiling Perian.

libpng12.*.dylib related compile failures on OS X 10.5

broken libpng screenshot

If you’ve been having problems compiling various Unix packages on OS X 10.5.4, and that your compile run fails mysteriously with something like:

i686-apple-darwin9-gcc: /usr/X11/lib/libpng12.0.26.0.dylib: No such file or directory

One strange yet very likely explanation: your libtool archive file /usr/X11/lib/libpng12.la is lying about the location of the shared library for libpng12 — namely, that a file called /usr/X11/lib/libpng12.0.26.0.dylib exists and should be used for linking against libpng12. However, if you actually look in /usr/X11/lib, no such file exists – perhaps you might have libpng.0.24.0.dylib, but not .26. Therefore, packages that make use of this incorrect libtool archive metadata are suitably confused, causing the compiler to bail out when trying to link against this non-existent file.

Since libtool archive (.la) files are text-based, you can open it up in emacs. The quick and dirty fix to this is to simply change the offending library_names and the current and age properties to the correct numbers. In my case, the libpng sitting in /usr/X11/lib was .24, and so I string-replaced all the values in those three properties from 26 to 24. The compilation then proceeded normally.

The long term solution, of course, is to track down what put a wrong .la file there in the first place. I suspect Xcode 3.1 and Mac OS X SDK 10.5, which shipped with the latest iPhone SDK.

UPDATED 9/8/2008:
In the comments section, I’ve been informed that the X11SDK package in Xcode 3.1 in the culprit. Thanks Anonymous!