libpng12.*.dylib related compile failures on OS X 10.5

broken libpng screenshot

If you’ve been having problems compiling various Unix packages on OS X 10.5.4, and that your compile run fails mysteriously with something like:

i686-apple-darwin9-gcc: /usr/X11/lib/libpng12.0.26.0.dylib: No such file or directory

One strange yet very likely explanation: your libtool archive file /usr/X11/lib/libpng12.la is lying about the location of the shared library for libpng12 — namely, that a file called /usr/X11/lib/libpng12.0.26.0.dylib exists and should be used for linking against libpng12. However, if you actually look in /usr/X11/lib, no such file exists – perhaps you might have libpng.0.24.0.dylib, but not .26. Therefore, packages that make use of this incorrect libtool archive metadata are suitably confused, causing the compiler to bail out when trying to link against this non-existent file.

Since libtool archive (.la) files are text-based, you can open it up in emacs. The quick and dirty fix to this is to simply change the offending library_names and the current and age properties to the correct numbers. In my case, the libpng sitting in /usr/X11/lib was .24, and so I string-replaced all the values in those three properties from 26 to 24. The compilation then proceeded normally.

The long term solution, of course, is to track down what put a wrong .la file there in the first place. I suspect Xcode 3.1 and Mac OS X SDK 10.5, which shipped with the latest iPhone SDK.

UPDATED 9/8/2008:
In the comments section, I’ve been informed that the X11SDK package in Xcode 3.1 in the culprit. Thanks Anonymous!

Processing monitoring and Settlers of Catan save failure

process monitor finding Catan issues
Some versions of Bigfish Games’ Settlers of Catan (a faithful reproduction of the board game) have a strange issue, in which under certain operating contexts, it will not save a game. The error message reported is a generic and not-at-all useful ” an error has occurred while saving “. I suspected this was due to the fact that it failed to create a savegame directory, and indeed, a bit of sleuthing indicates that on Windows XP, the directory at C:\Documents and Settings\All Users\Application Data\Microsoft\MSN Games\Catan is missing (obviously on Vista, this would be somewhere else – probably C:\Users\…). Instead of creating this directory, Catan simply fails to save the game. The program runs fine otherwise.

Of course, it was not obvious where Catan was trying to save its games – finding out that missing directory was the culprit took a bit of investigation. I took a wild stab at the start by creating a “save” directory in its own program files directory. No such luck. Time to bring out the big guns.

A number of ways could have been used, but one is to use the awesome Sysinternals tool Process Monitor, or Procmon.exe. It tracks events and calls from a process, such as filesystem accesses, and has advanced filtering capability to organize and show only the events of interest to a debugging human.

With ProcMon, I simply filtered on the Catan process and tried to save a game as foo. Then, viewing the event log (screenshot 1), it was obvious that CreateFile calls to create foo.sav failed, with the exact target path specified. A quick Windows Explorer excursion confirms that the path does not exist. Creating that directory, of course, solved the savegame problem.

The moral of the story is that ProcMon is a fine tool for tracking mysterious interactions between an application and a system. For something like failing to make a saved game (in this narrow gamig context) or various system-related errors in general (especially when you lack the source code to debug in depth), it sometimes pays to examine the exact sequence of calls and events that led up to the failure. The solution could be very trivial, if you only knew what and where things failed.

Changing keyboard shortcut for "Find Next" in Word 2008

changing Find Next... shortcut in Microsoft Word 2008
Pierre Igot over at Betalogue writes of a way to reassign the keyboard shortcut for the “Find Next” command in Word 2008.

This has bothered me for a very long time. In essence, Office 2008 maps Command-G, the customary Mac OS keyboard shortcut for “Find Next” (in essence, to find the next instance of a string match and part of the set of standard Find/Replace [All] functions in any good text editor / word processor), to “Go To”.

To its credit, Word allows customization of its keyboard shortcuts. Unfortunately, I just could not locate the “FindNext” keyboard shortcut in Tools -> Customize Keyboard to apply customizations to. Turns out it’s actually named “RepeatFind”, as opposed to “EditFindNext”, which you’d expect when the “Find…” command is simply “EditFind”.

Further, it turns out that wouldn’t have mattered anyway. Microsoft overrides the keyboard shortcut selection behavior in View -> Customize Toolbars and Menus. It does so in a ridiculously arcane way, by which the menu name “&Go To” fixes the shortcut key to Command-G. Therefore, no matter what shortcut is set in Customize Keyboard, Command-G will always map to Go To and not Find Next. ARGH.

Even knowing this, it was not obvious how to edit the name either (screenshot). Apparently the solution is to use the fake menu bars that pop up after selecting Customize Toolbars and Menus, and then right-click or Control-click the Go To menu item. That will pop up an edit box with the name of the menu item, at which point you can remove the gratuitous &, such that Command-G will map correctly based on Customize Keyboard.

Baldur’s Gate 1 graphics glitch and disabling NVidia hardware acceleration

If you have a series 8 NVidia graphics card (say, an 8600M GT) with current drivers (as of the time of this post, of course), you’re likely to see graphics glitches (screenshot 1) in Baldur’s Gate 1. One workaround is to use 16-bit color and software transparent BLT. Another strategy, if your CPU is powerful enough to shoulder some 2D graphics work, is to temporarily turn off hardware acceleration for DirectDraw and avoid the bug entirely. black boxes in Baldur's Gate under Nvidia 2D acceleration

However, disabling hardware acceleration under Vista is apparently easier said than done. Instead of using the Personalization -> Display Settings control panel (as one might think to do based on Windows XP experience), the correct solution is to use the DirectX SDK and dxcpl.exe, the DirectX Control Panel (located within the SDK distribution under Utilities\bin\[cpu_arch]\. From within this control panel, pick DirectDraw on the upper tab bar. Amongst the various configuration options available on that tab, the only one you care about is the box to turn on or turn off hardware acceleration. Turn that off (temporarily, of course) and you’re good to go.

The Context

Baldur’s Gate 1 performs surprisingly well under Windows Vista, despite being a venerable (some might say, ‘ancient’) 10-year old RPG. Unfortunately it’s plagued by a number of graphics glitches when running on NVidia cards. In essence, a number of items and sprites (items on the belt, the timepiece to the lower left corner, birds flying overhead, for example) will be surrounded by black outlines. Further, on your character paper doll in the Inventory screen, giant black boxes obscure much of the figure. In some cases, the ‘fog of war’ on the unexplored regions of a map will be rectangular black boxes, rather than the ‘foggy’ darkness you’re used to. These glitches are widely experienced.

For this very annoying problem, two workarounds are available.

1. Trade color depth for correctness

The prevalent strategy, as noted in a forum post at Spellhold Studios, is to switch on Software Transparent BLT and use 16-bit color depth. This apparently routes around whatever strange bug NVidia managed to introduce in their graphics acceleration layer.

This method works just fine, but was not ideal for me. The game runs at 640 x 480, and is already quite pixelated when scaled up to full screen. 16-bit vs 32-bit color is somewhat noticeable, once at that scale.

2. Trade performance for correctness

Here’s another classic trade-off. Since the problem is obviously arising from the DirectX layer and its interaction with NVidia graphics hardware (boot into safe mode and run BG1 to verify this), another solution is to just kick NVidia out of the loop by disabling hardware acceleration for DirectDraw. This is feasible if you’re doing this on a fast machine (and if you’re using a series-8 card in that box, I’d assume it’s pretty fast anyway) – after all, BG1 is a 10-year old game. Your Core 2 Duo or quad-core Xeon can use some exercise anyway.

Of Hardware Acceleration Controls

In the glory days of XP, this simply meant right-click on Desktop -> Properties -> Settings -> Advanced -> Troubleshoot -> Hardware Acceleration slider (dear god that’s convoluted). In Vista, the analogous experience is right-click on Desktop -> Personalize -> Display Settings -> Advanced Settings -> Troubleshoot -> Change Settings (I see you still haven’t hired a good user experience designer, Microsoft).

The problem now is that if you try this with your NVidia card and Vista, you’ll just be staring at a disabled ‘Change Settings’ button and a terse message: 'Your current display driver does not allow changes to be made to hardware acceleration settings.' If you also try the old standby dxdiag, you might be surprised to know that the Disable buttons have been removed from the DirectX Features box on the Display tab. Thanks, NVidia and Microsoft. Apparently they really don’t want us changing these settings.

But we don’t really want to change much – just the DirectX technologies, and in fact, just the 2D-based DirectDraw (since 3D is more or less irrelevant for BG1), and only for a short while. Enter the DirectX SDK.

The SDK is meant for developing and debugging DirectX-based programs, but it comes with a fair suite of nifty utilities, one of which being the DirectX Control Panel: dxcpl.exe. Download the 400+ MB SDK (this is the 2008 version of the SDK — you might look for a newer version if you’re on Win 7; I think the control panel still ships with the latest versions) and grab the control panel in Utilities\bin (starting from where you installed the SDK to). Make sure that if you are on x64 (x86_64) that you also use the x86 architecture control panel for BG1, as it seems to be required to affect 32-bit mode apps (see comments for this post for more details).

In the control panel, use the DirectDraw tab – in the set of checkboxes, uncheck the “Use Hardware Acceleration” box. Fire up BG1 and see the non-black-boxed goodness of 1998 graphics (screenshot 2).
Baldur's Gate playing normally after disabling Nvidia-accelerated DirectDraw
If you were disabling acceleration for another reason, this control panel should work for you too – pick the appropriate tab and have at it. Do not forget to turn acceleration back on after the session. You probably do not want unaccelerated graphics performance in your normal, non-glitchy apps.

Turning off hardware accelerated DirectDraw avoids the BG1 black-box bug. You’ll have to assess for yourself which is more expendable: color depth (trivial to change directly from BG1’s Options configuration) or graphics performance (more difficult to tweak, but perhaps compensated by CPU performance).

DirectX SDK vs Settings slider

In any case, the DirectX control panel is a somewhat useful trick to know in general, especially when faced with Vista’s obstinate insistence on not letting you change graphics acceleration settings. The control panel provides all the functionality that the old XP Settings slider would have give you – except in a much more technical interface. In fact, the old slider more or less tweaked settings in the Direct3D and DirectDraw tabs, except in a coarse-grained, all-or-nothing kind of way. Here at least you have fine-grained control on most of the detailed options in each panel.

Still, such a pain.

Know that tweaking these settings are done at your own risk (NVidia and MS obviously are against it), and may or may not work at all depending on your setup, your driver version, and pure luck. On the plus side, if you ever need to write some DirectX apps, the SDK is now just a few clicks and SDK path text fields away from Visual Studio 2005/2008, so the 400 MB bandwidth and disk space isn’t completely wasted. Hopefully.

UPDATE 9/8/2008
It’s come to my attention that some people still have problems after applying this fix – namely, there are cursor trails on menu screens. I cannot reproduce this issue locally on my 8600M GT, but it’s possible that there are new problems introduced with newer series-8 cards and drivers. If this is the case, I’d recommend using the first workaround — that is, using Software BLT and 16-bit color, rather than the DirectDraw workaround.

UPDATE 7/15/2009
There is now a Nvidia driver .dll patcher available for BG1-era Infinity Engine games at Spellhold Studios. They also explain what the underlying issue is and what the DLL wrapper does to work around the bug. I have not personally tested this fix. It does install a new graphics DLL that overrides existing calls, so it is theoretically possible that it may introduce other issues, but I am told by others that it works quite well. Check it out — it saves you a lot of trouble of ticking DirectDraw boxes on and off, if you don’t mind unofficial DLL patching.

UPDATE 2015
There are still people arriving at this page, 7 years later after the initial post. At this point, you should probably just buy the remastered Baldur’s Gate Enhanced Edition on Steam instead. Much, much less hassle.

SSH and SOCKS proxy – almost as good as a VPN

OpenSSH has a port forwarding feature, which can be used as a SOCKS proxy server. This is useful if you are trying to reach a firewalled server which only accepts connections with from within its own local network (but doesn’t offer a VPN service to let you onto its local network).

If you have SSH access to any other machine on that local network, you can use the forwarding feature and the SOCKS 4 or 5 protocol to get to the server from your home box. The connection is mediated and forwarded by the machine on the network that you can reach, and to the firewalled server, you appear to be this internal machine.

The appropriate incantation is simply:
ssh -D port_num ssh_hostname

where port_num is a local port number (I like 50000, but any non-privileged port would be good)

Then, simply point your system or browser (in Firefox, for example, this would be in Preferences/Options -> Advanced -> Network -> Settings ) to use a SOCKS proxy at localhost, port port_num. Now accesses from that browser will be proxied through the ssh_hostname machine to the actual remote_host.

The context is that there was an application server that I had to reach from my home machine. The application server sits on machine R, which is restricted to an organization internal network I. There is no VPN service for I. SSH to machine H was available, which is also in I and is reachable from the public Internet. For small things, I could run commands from H, but it would have been really helpful to reach R directly from my home development box. I could use X11 forwarding to get an xterm for various tools there, but the overhead is huge. The administrator of network I has yet to grant me external access.

With this trick, just SSH’ed into machine H, set up the proxy port via -D, set up my browser to proxy through the local port, and easily accessed R from home. Nifty.

If you happen to have SSH access to a number of servers (as I seem to have for some reason…), this same trick can be used as a way to rotate through them fairly quickly. Just log out of your existing connection and ssh into a new host with the -D switch. This allows you to test various network apps from a number of different machines.

Cannot populate a select element under IE 6 using the innerHTML property

On the other hand, here is an obvious bug in IE 6 that I managed to run into. In IE 6, if you attempt to dynamically change the contents of a <select> element in Javascript, via assigning a new string containing <option> elements to its innerHTML property, the select element will not be populated. In fact, if you output the string after the assignment, it will be truncated and malformed.

This is documented via Microsoft KB article 276228. Microsoft’s proposed fix, in abstract: “don’t do that”.

So I was trying my hand at some fancy AJAX to dynamically populate a select dropdown menu based on a previous select menu. An xmlhttprequest is triggered onChange of the first menu. We hit up an API, parse and transform the response into this HTML fragment consisting of <option> tags and values, and assign to the innerHTML property. Simple and quick. Works on Safari, Firefox…and of course, not on IE 6.

Some time wasted later, an alert() on the innerHTML property shows that in fact, the string there is malformed. The first <option> start tag is truncated from the innerHTML. No wonder it doesn’t work; it’s malformed. And it wasn’t, to start with, when my transformation finished and delivered the final string.

The apparent solution is to follow Microsoft’s advice on that page and do something else:
– assign to the options collection
– workaround using outerHTML

I’m sure there’s a way to jury rig it so that IE is fooled into concat’ing an “<option>” back onto the innerHTML, and disabling this workaround for other user agents. But that’s rather inelegant, no? I opted to use the options collection, which seems a reasonable (if slightly more complicated) method than just a simple assignment to innerHTML.

The KB article lists this as a problem with IE 5, but it recurs in IE 6. Does it persist in IE 7?

IE 6 renders a blank page on XHTML-style script end tag

10-01-2011: And the world slides backwards. I believe all major browsers, including the latest Firefox and Safari, now have this behavior. If you see a blank XHTML-served-as-HTML page in Safari or Firefox, check the script tags and make sure they are not self-closing: always use <script> ... </script>

On IE 6, a well-formed and validated web page may be rendered as a blank page if you close <script> tags in XHTML style. As in, <script type="text/javascript" ... src="foo.js" />, rather than the HTML style <script type="text/javascript" ... src="foo.js"></script>

So one of my web pages renders great in Safari and Firefox, but in IE 6, it is a completely blank page, devoid of content. Puzzled, I ran it through the W3C validator – no problem at all. Selected a View Source in IE, and noted that the entire HTML output looked OK.

Eventually I narrowed down the problem to a <script> tag in the markup. Namely, a <script type="text/javascript" ... src="foo.js" /> kind of tag. IE rendered the page when I removed the tag, and goes blank when I put it back. Curiously enough, I hadn’t actually invoked any functions from that .js file, so it was definitely not any code I was executing. Replacing the .js file with a dummy .js file also triggered the blank page. Changing or omitting the other attributes did not help.

The problem is fairly obvious now. When I close the tag in HTML style, with an actual </script> tag, IE proceeds to render just fine.

The obvious conclusion is that IE is buggy, but that may not necessarily true (well, in this one instance anyway). Despite most pages’ “compliance” with XHTML, DOCTYPE’ed and all, most web servers still serve these “XHTML” files as mimetype text/html instead of the recommended application/xhtml+xml. This is pragmatic, since IE 6 doesn’t even bother to render application/xhtml+xml, and user agents are required to stop rendering upon encountering non-valid markup (imagine the chaos that would cause).

However, it seems this might introduce a cause for the gotcha. Interpreted in actual text/html mode, one might imagine that to a HTML parser, <script .... /> doesn’t really appear to close the <script> tag at all – in fact, it might merely look like a rather malformed start script tag and no end tag. If I were a dumbly compliant parser+renderer, I might just start walking down the response string looking for that mythical end to this start tag. And end up rendering nothing. Of course, if I were a slightly smarter parser, I would look for a DOCType, but then I’d contradict the server’s mimetype, and down that road lies even more madness.

Nevertheless, the solution, when staring at a blank page in IE when the markup seems fine, is to check your script tags, if any.

I’m no expert at this soup of SGML/HTML/XHTML/XML standards thing, so the above is just my random opinion plus some observations. Still, it seems that MS should patch this particular problem, since it’s fairly non-obvious (many people, I’d surmise, would use the this kind of shorthand close tag in an XHTML file, especially since it validates fine) and upsets the status quo compromise of incremental Web standards compliance through browser compliance modes, content negotiation, and occasionally bad mimetype service. But of course, that’s never going to happen.

Update: I’ve been made aware in the comments section that the same issue occurs in IE 7. Just great.

testing HTTPS with openssl

It’s often possible to emulate a web client by talking to a web server by hand, via telnet.

$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.1 200 OK
Date: Mon, 04 Feb 2008 09:18:05 GMT
Server: Apache/2.2.7 (Unix) mod_ssl/2.2.7 OpenSSL/0.9.7l DAV/2 mod_python/3.3.1 Python/2.5.1
Last-Modified: Sat, 20 Nov 2004 20:16:24 GMT
Accept-Ranges: bytes
Content-Length: 44
Connection: close 
Content-Type: text/html

<html><body><h1>It works!</h1></body></html>
Connection closed by foreign host.

This gives you the full output of the web server, headers and all. This is sometimes useful in debugging web apps, without having to turn on a packet sniffer. As long as you knew how to talk HTTP (and there are differences between 1.0 and 1.1), you can observe some of these underlying outputs directly. Trouble comes if you wanted to do the same with an SSL-enabled host. If you have a server enabled as such and try to telnet to a secured port (say, 443), you should get an error message along the lines of:

Bad Request
You're speaking plain HTTP to an SSL-enabled server port.

The solution is to use openssl instead. In the wonderful grab-bag of functionality implemented in the openssl command-line tool, it actually has a secure client for testing SSL connections.

$ openssl s_client -connect localhost:443
CONNECTED(00000003)
...lots of certificate-related stuff here...
---
GET / HTTP/1.0

HTTP/1.1 200 OK
Date: Mon, 04 Feb 2008 09:19:01 GMT
Server: Apache/2.2.7 (Unix) mod_ssl/2.2.7 OpenSSL/0.9.7l DAV/2 mod_python/3.3.1 Python/2.5.1
Last-Modified: Sat, 20 Nov 2004 20:16:24 GMT
Accept-Ranges: bytes
Content-Length: 44
Connection: close
Content-Type: text/html

<html><body><h1>It works!</h1></body></html>
Connection closed by foreign host.

Note that this is a general SSL client. I used HTTPS as a concrete example, but the same can be applied to other SSL-secured ports. If you’re designing a server application or protocol that works through a TLS or SSL layer, chances are this client can be a good debugging tool.

UPDATE: I’m really going to miss being able to do this when HTTP/2 becomes fully deployed. HTTP/2 is a binary protocol, which means that you can no longer type text at the server and expect a response. Binary protocols are friendly for performance, not developer sanity. – yliu, May 2015

php 5.2.5 compile error – macro issue

Like the previously mentioned compile problem with transcode, PHP’s tidy extension appears to have a macro-induced collision problem on OS X Tiger. In particular the compile run blows up with:

In file included from /usr/include/tidy/tidy.h:70,
from /tmp/php-5.2.5/ext/tidy/tidy.c:34:
/usr/include/tidy/platform.h:515: error: duplicate 'unsigned'
/usr/include/tidy/platform.h:515: warning: useless type name in empty declaration

This exhibits a similar red herring, in that the compiler says a system include file (tidy/platform.h) is causing the issue. In actuality, this is most likely due to a preprocessor macro issue in PHP. In platform.h:515, we see:
typedef unsigned long ulong;

Now, in main/php_config.h:121:
#define ulong unsigned long

Since php_config.h is set to be included before platform.h, on this particular build configuration, platform.h:515 now becomes:
typedef unsigned long unsigned long;

Hence the compiler error message and the red-herring about your system include file (platform.h).

Since php_config.h is included by just about every main PHP source file, the easier solution is to switch the include order of tidy.h and the php default includes. As such, in tidy.c, we go from this:
#include "php.h"
#include "php_tidy.h"

#if HAVE_TIDY
...
#include "tidy.h"
#include "buffio.h"

to this:
#include "tidy.h"

#include "php.h"
#include "php_tidy.h"

#if HAVE_TIDY
...
#include "buffio.h"

You may also wish to wrap it with #if HAVE_TIDY and #endif, to preserve the original logic (note that tidy.h was originally within the #if), but in my case it seemed to have gone okay without it.

In my case, this compiled just fine with no further complaints. I don’t like doing this – perhaps the PHP-tidy devs had reasons for putting the includes in this order. But from a pragmatic point of view… hey, it compiles.

Again, the moral of the story: when screwing around with macros, try to avoid naming it something that will collide with system libraries.

Updated Feb 6, 2008
As jhardi notes in the comments section, the folks behind tidy have patched their latest version to work around this issue. Kudos to the tidy devs, and to the others who found this bug way before I even had to care about it.

Minor rant: so…this workaround requires that we upgrade tidy. Since Mac OS X doesn’t regularly update its Unix-y interiors , we’re left with the choice of overwriting an Apple system library, or shadowing it and remembering that we shadowed it. I picked option #1, since it’s unlikely that anything is going to blow up due to a new tidy library, but some people are understandably wary of overwriting system libraries (this is, in fact, one of the reasons why package managers like MacPorts stick copies in alternate directories instead).

This still leaves the current stable PHP not compiling with older tidy versions, on the Mac and any other platform using that typedef. And, as a bonus, there is a possibility that we might be doing this again if PHP ever decides to add another library that uses the word “ulong” somewhere.

That’s just lovely. I hope they address this in their future releases.

building a dynamic library on OS X

[UPDATE: 02/26/2010
As of Xcode 3.x, the -shared argument seems to be working on OS X to create shared objects. the -dynamiclib switch still works for creating dylibs. This post is left for historical curiosity.]


Some free software packages have an optional make target to build shared libraries out of their core application functions. Unfortunately, some of these packages are set up to compile for typical Linux shared objects. For example:

gcc -shared -Wl,-soname,libfoo.so.1 -o libfoo.so.1 $(OBJ)

where $(OBJ) is your set of object files.

On Apple’s GCC (for Tiger and Leopard, at least), there is no -shared switch, and no -soname switch either. Compilation will fail at this step. The incantation to build a shared object would translate to something like:

gcc -dynamiclib -Wl,-headerpad_max_install_names,-undefined,dynamic_lookup,-compatibility_version,1.0,-current_version,1.0,-install_name,/usr/local/lib/libfoo.1.dylib -o libfoo.1.dylib $(OBJ)

headerpad_max_install_names will allow you to change the install path with install_name_tool later, should the install_name change during the life of the compiled object.

You must also ensure that your object files are compiled correctly for shared library usage. Usually this means compiling with -fPIC and -fno-common, depending on your code.

I’ve had to do this infrequently, which means I forget the syntax the next time and have to google for it again. Too many “cache-misses”, so to speak.