Fixing undefined library symbols for compiling PHP 5.2.8

So while compiling PHP 5.2.8 on OS X 10.5, you might run into something like:

Undefined symbols for architecture i386:
  "_xmlTextReaderSchemaValidate", referenced from:
      _zim_xmlreader_setSchema in php_xmlreader.o
  "_xmlTextReaderSetup", referenced from:
      _zim_xmlreader_XML in php_xmlreader.o
ld: symbol(s) not found for architecture i386
collect2: ld returned 1 exit status
Undefined symbols for architecture x86_64:
  "_xmlTextReaderSchemaValidate", referenced from:
      _zim_xmlreader_setSchema in php_xmlreader.o
  "_xmlTextReaderSetup", referenced from:
      _zim_xmlreader_XML in php_xmlreader.o
ld: symbol(s) not found for architecture x86_64

This doesn’t only happen with libxml. If you’ve installed any extra updated libraries, like iconv or tidy or any library that has significant symbol changes between versions, it’ll die in similar ways. The MacPorts folks have encounted similar issues in ticket 15891, but WONTFIX‘ed the issue. Apparently the PHP devs are also punting on the problem.

The immediate cause is that you have multiple versions of some shared libraries. For example, in the case above, I have two libxml versions — one in /usr/lib, and another in /usr/local/lib. This is because I do not want to overwrite the Apple-provided libxml version, but still needed new features provided in later libxml versions. The arrangement works fine in every other software compile except this one, so I investigated further.

The root of the problem

Despite the developers’ airy dismissal of the issue, the underlying problem is indeed that the Makefile generated by PHP at configure time is slightly broken. In Makefile and Makefile.global, you’re going to see this line:

libs/libphp$(PHP_MAJOR_VERSION).bundle: $(PHP_GLOBAL_OBJS) $(PHP_SAPI_OBJS)
        $(CC) $(MH_BUNDLE_FLAGS) $(CFLAGS_CLEAN) $(EXTRA_CFLAGS) $(LDFLAGS) $(EXTRA_LDFLAGS) $(PHP_GLOBAL_OBJS:.lo=.o) $(PHP_SAPI_OBJS:.lo=.o) $(PHP_FRAMEWORKS) $(EXTRA_LIBS) $(ZEND_EXTRA_LIBS) -o $@ && cp $@ libs/libphp$(PHP_MAJOR_VERSION).so

where $MH_BUNDLE_FLAGS is usually defined as something like

MH_BUNDLE_FLAGS = -bundle -bundle_loader /usr/sbin/httpd -L/usr/lib \
 -L/usr/lib -laprutil-1 -lsqlite3 -lexpat -liconv -L/usr/lib -lapr-1 -lpthread

The problem is that this hardcodes the search paths for linking shared libraries. GCC searches for shared libraries to link in the order of the provided -L paths. In this case, MH_BUNDLE_FLAGS is expanded immediately after $CC — so the load order is:

  1. /usr/lib
  2. /usr/lib (these are redundant, and so will probably be collapsed into one path)
  3. …every other custom library path you specify

Now you see the issue. No matter what your library paths are set to, the PHP compilation system will insist that whatever shared libraries in /usr/lib take precedence. Therefore, even if you specified that another version (say, libxml.dylib in /usr/local/lib) should be used instead, the invocation to link against -lxml2 will search in /usr/lib first. And since it finds the old version, which may be missing a number of symbols, the compilation blows up right there.

Evidence

And indeed, if you look at the (rather long and massive) compilation/link command right before it fails, you’ll see:

gcc -bundle -bundle_loader /usr/sbin/httpd -L/usr/lib -L/usr/lib \
-laprutil-1 -lsqlite3 -lexpat  -liconv -L/usr/lib -lapr-1 -lpthread -O2 -I/usr/include -DZTS   \
-arch i386 -arch x86_64 -L/usr/local/lib ... 

emphasis mine, where /usr/local/lib might be /opt/lib or whatever custom path you provided to configure.

Solutions

The trivial solution is to manually invoke that last line of compilation, but swapping the -L load paths.

gcc -bundle -bundle_loader /usr/sbin/httpd -L/usr/local/lib -L/usr/lib \
-L/usr/lib -laprutil-1 -lsqlite3 -lexpat  -liconv -L/usr/lib -lapr-1 -lpthread -O2 -I/usr/include -DZTS   \
-arch i386 -arch x86_64  ... 

This is easy to do and takes just a second.

Another possible solution is to patch the Makefile, such that MH_BUNDLE_FLAGS comes later in the compilation line:

libs/libphp$(PHP_MAJOR_VERSION).bundle: $(PHP_GLOBAL_OBJS) $(PHP_SAPI_OBJS)
        $(CC) $(CFLAGS_CLEAN) $(EXTRA_CFLAGS) $(LDFLAGS) $(EXTRA_LDFLAGS) $(PHP_GLOBAL_OBJS:.lo=.o) $(PHP_SAPI_OBJS:.lo=.o) $(PHP_FRAMEWORKS) $(EXTRA_LIBS) $(ZEND_EXTRA_LIBS) $(MH_BUNDLE_FLAGS) -o $@ && cp $@ libs/libphp$(PHP_MAJOR_VERSION).so

This will force your library paths to be searched before /usr/lib, thus resolving the link problem.

update 7/18/09
An anonymous reader mentions that you could also specify the right libxml by full path, instead of letting it use -lxml. Basically, in the last compilation line, you would remove any mentions of -lxml and replace that with the full path to your library e.g. /usr/local/lib/libxml.dylib. In fact, this is probably the way that has the least possible side-effects, since you aren’t changing the search order for any other libraries.

Discussion

This is not the first time that PHP core developers have refused to fix a compilation issue that is arguably preventable through actual testing under different installation scenarios. This is an “edgier” edge case than the tidy.h issue, but still should be fairly noticeable for a substantial number of people.

The “You should only have one library installed” argument is, to be honest, unnecessarily arrogant (sadly, not as a rare a problem as you’d like in some open source development projects ). I understand that it’s an open source project, and no self-respecting software engineer likes to use time on project plumbing / build systems rather than work on the product. However, on OS X, due to the lack of official Apple package management systems, no one should be overwriting system default libraries — down that way lies insanity, especially at every system or security update. PHP’s build system is obviously broken any time there is a substantial difference between user-installed libraries and system libraries. This bad behavior is especially egregious, because the configure command allows you specify your own library path — misleading users into thinking that the path they specified would be obeyed at compile time. If you only intend for the system library to be used and no other, perhaps the configure script should auto-detect this on OS X and disable that command-line option. Basic user interface design should apply even to command-line interfaces.

Note that changing link ordering may have some unforseen consequences, since the devs obviously never tested this path. For example, you should make sure the dynamic libraries are loaded in the right order at runtime. On OS X, the load path is typically hard-coded into the dylib, so usually there won’t be a problem — but there may be edge cases. Test your build (and any PHP extensions you built) before using it in production!

php 5.2.5 compile error – macro issue

Like the previously mentioned compile problem with transcode, PHP’s tidy extension appears to have a macro-induced collision problem on OS X Tiger. In particular the compile run blows up with:

In file included from /usr/include/tidy/tidy.h:70,
from /tmp/php-5.2.5/ext/tidy/tidy.c:34:
/usr/include/tidy/platform.h:515: error: duplicate 'unsigned'
/usr/include/tidy/platform.h:515: warning: useless type name in empty declaration

This exhibits a similar red herring, in that the compiler says a system include file (tidy/platform.h) is causing the issue. In actuality, this is most likely due to a preprocessor macro issue in PHP. In platform.h:515, we see:
typedef unsigned long ulong;

Now, in main/php_config.h:121:
#define ulong unsigned long

Since php_config.h is set to be included before platform.h, on this particular build configuration, platform.h:515 now becomes:
typedef unsigned long unsigned long;

Hence the compiler error message and the red-herring about your system include file (platform.h).

Since php_config.h is included by just about every main PHP source file, the easier solution is to switch the include order of tidy.h and the php default includes. As such, in tidy.c, we go from this:
#include "php.h"
#include "php_tidy.h"

#if HAVE_TIDY
...
#include "tidy.h"
#include "buffio.h"

to this:
#include "tidy.h"

#include "php.h"
#include "php_tidy.h"

#if HAVE_TIDY
...
#include "buffio.h"

You may also wish to wrap it with #if HAVE_TIDY and #endif, to preserve the original logic (note that tidy.h was originally within the #if), but in my case it seemed to have gone okay without it.

In my case, this compiled just fine with no further complaints. I don’t like doing this – perhaps the PHP-tidy devs had reasons for putting the includes in this order. But from a pragmatic point of view… hey, it compiles.

Again, the moral of the story: when screwing around with macros, try to avoid naming it something that will collide with system libraries.

Updated Feb 6, 2008
As jhardi notes in the comments section, the folks behind tidy have patched their latest version to work around this issue. Kudos to the tidy devs, and to the others who found this bug way before I even had to care about it.

Minor rant: so…this workaround requires that we upgrade tidy. Since Mac OS X doesn’t regularly update its Unix-y interiors , we’re left with the choice of overwriting an Apple system library, or shadowing it and remembering that we shadowed it. I picked option #1, since it’s unlikely that anything is going to blow up due to a new tidy library, but some people are understandably wary of overwriting system libraries (this is, in fact, one of the reasons why package managers like MacPorts stick copies in alternate directories instead).

This still leaves the current stable PHP not compiling with older tidy versions, on the Mac and any other platform using that typedef. And, as a bonus, there is a possibility that we might be doing this again if PHP ever decides to add another library that uses the word “ulong” somewhere.

That’s just lovely. I hope they address this in their future releases.

Getting custom HTTP variables out of PHP

PHP 5.0 stores HTTP headers in the $_SERVER variable as key-value pairs. It mangles their field names, however, by:

  • prepending “HTTP_” to the key
  • replacing “-” with “_” in the key
  • uppercasing all letters

Say that your custom HTTP client sends X-Hello: World as a header. To retrieve the value (e.g. “world”) from PHP, the correct key to use is $_SERVER["HTTP_X_HELLO"].

This does fit with the existing access pattern (User-Agent: is retrieved by $_SERVER['HTTP_USER_AGENT']). But it was not well documented in corresponding page for reserved variables (as of today, October 7, 2007). Took a bit of trial and error for me to figure this out.

I’m sure that amongst the insanely numerous and ill-organized set of functions that PHP provides, there is one to do this exact task without reverse-engineering its key-mangling algorithm. But this way works too.

Exchanging objects between PHP and Python

So over the course of my various projects, personal or otherwise, I’ve collected an assortment of information that may or may not be of interest to others or to myself in the future. What does end up happening is that I would make notes about it in a file or (dear god) on a random piece of paper, post a message to some forum or mailing list, or just plain put it in that lossy storage medium of my own mind…and then promptly forget all about it. For a would-be information specialist, this ironic lack of information organization has caused many problems and continues to do so, especially at retrieval time.

Now in 2007, as New Year’s Day draws to a close, I am putting my laziness to the test again by resolving to begin this project to document Random Things That I Somehow Know About. Some of this is trivial, some of this is not. But for one reason or another, I intend to keep track of it.

Starting with something recent. To workaround a problem when deploying a Python-based service on a server that disallowed Python CGI execution, the Python driver program had to be wrapped around a PHP frontend (which the server did allow). However, the driver needed to accept a number of parameters, and the PHP wrapping must conveniently pass these parameters via a call to system() and print the resulting output from the Python driver to stdout. In a previous, similar project, I engineered a set of subroutines in the Python driver to parse options on the command line, and had the PHP script put those options on the commandline at invocation time. It was tedious, error-prone, and remarkably insecure, even with Python variants on getopt() to help

Since this was a proof-of-concept project in any case, there had to be a faster, friendlier hack. It would be great if the wrapper and the driver could exchange data objects directly – in this case, PHP associative arrays and Python dicts. Then the Python driver can simply ask for the necessary values bound to fixed options/keys.

Enter Armin Ronacher’s phpserialize.py, conveniently under BSD license. It exposes two functions, serialize()and unserialize(), which encodes and decodes data created by PHP’s own serialize() function.

The solution comes together as follows. On the PHP side:

//something to ensure that we've got a correct parameter object
$params = array('phpArgs' => 'yes'); 
$params['foo'] = 'bar'; //check and populate $params
$s = serialize($params);
//so that if we're passing on cmdline
//things don't blow up if weird bytes encountered
$s = base64_encode($s);
//...and so on

Once the $s in base64 text is passed to the driver script, the Python side will simply call this:

def getParameters(php_base64_str):
    php_parameters = base64.decodestring(php_base64_str)
    parameters = unserialize(php_parameters)
    return parameters

 

and parameters will be a Python dictionary with all the values in place, mapped to their original keys.

>>> params = getParameters(s) # retrieve s from argv first
>>> params['foo']
'bar'

This makes exchanging data between frontend and backend a lot less headache inducing.