macOS “Optimized Battery Charging” in Windows 10 / Boot Camp causes MacBook Pro battery to not charge

On a MacBook Pro 16-inch 2019 (Intel), under macOS Ventura 13.1, if

  • the “Optimized Battery Charging” option is turned on in System Settings -> Battery
  • the power adapter is plugged in
  • the battery has charged to 80% or more under macOS
  • the machine is then rebooted into Windows 10, using Boot Camp
  • then: in Windows, the battery will not charge at all. Windows will report the battery as “Plugged in” but the battery will not charge. The battery will slowly drain to 0% as the machine is used.

Solution

Disable Optimized Battery Charging temporarily or permanently before rebooting into Windows 10 under Boot Camp, in System Settings -> Battery -> Battery Health -> Info disclosure

Discussion

Optimized Battery Charging is a new feature introduced in recent macOS versions to preserve the health of Intel MacBook Pro batteries. It is known that if a laptop is not being used on battery, then it is best to charge to 80% instead of full. macOS purports to learn its user’s laptop battery usage patterns, and will charge the battery to 80% under normal adapter usage, and only begins charging to full when it expects the user to go to battery power soon.

However, it seems to do this by instructing some SMC/firmware level controller to stop charging the battery once current capacity hits 80% or more. When rebooted into Windows 10, which does not understand this optimized charging feature, there is no corresponding instruction to begin charging the battery again when the capacity falls below 80%. The consequence is that the laptop battery will steadily drain to 0%, and nothing will make it charge again until the machine is rebooted back into macOS.

This is an understandable edge case that Apple engineers didn’t test for. However, it seems to have started only recently (I don’t recall macOS Monterey having this behavior). Until it is fixed, if regular Windows 10 / Boot Camp usage is expected, it is best to leave the Optimized Battery Charging feature turned off on the macOS side — temporarily or permanently.

Worth noting that there are multiple other possible reasons that Windows 10 / Boot Camp is causing battery drain on a MacBook Pro. For example, the 16-inch 2019 MBP’s white 96W USB-C charger looks identical to the 87W USB-C charger from previous-gen MacBook Pro 15. If mistakenly or deliberately used to power the 16-inch MBP, then under full load the 87W adapter is insufficient to run the laptop. In this scenario the OS will tap the battery in complement with the adapter, causing a steady drain. Windows also runs more inefficiently than macOS on MacBook Pro, so under full CPU/GPU usage, it seems to take more power sometimes than even the 96W adapter can provide.

However, in either of those cases, the drain is very slow. In the case of the Optimized Battery Charging bug described above, under full CPU/GPU usage, the battery drains extremely quickly, with 30 minutes of usage causing 50% or more battery drain sometimes. This is because in this case, the laptop is not drawing power from the adapter at all.

Outlook 2011 for Mac still adding arbitrary line breaks into plaintext emails

Outlook 2011 on Mac OS X, v14.1.3, for whatever reason, still does not properly support “format=flowed” content-type or “quoted-printable” extensions for plaintext emails. This causes plaintext emails to be sent as mangled messes, full of arbitrarily inserted linebreaks. This appears to be a regression from Entourage, as far as I recall, which never handled plaintext quite this badly, and this is also despite Microsoft’s promises to have “implemented format=flowed”.

This is the last straw. I’ve been a loyal MS Entourage / MS Outlook user since the days of Outlook Express for Mac and Office 2001. But at this point, this software has actively impeded my communications with my friends and colleagues. We’re done.

The Problem

Here’s a really simple illustration of the problem, from the receiver’s end:

See how the URL, which was composed as one plaintext line, gets split up into two lines?

Here is another example, purely from the editor UI (and not even being sent yet). I start with a perfectly good reply saved as a draft:

I make a small wording change and resave:

See that third line? Thanks to the hard line breaks inserted by Outlook (even at composition stage), the line wrap has been mangled. This draft has to be re-wrapped manually, by the tedious process of deleting the newline-based hard line breaks from every line following in the paragraph. That was a short paragraph. Imagine doing that in a long paragraph, from the first line.

To add insult to injury, there is not even a “re-wrap” functionality in the editor, to at least solve this user-interface level problem (as opposed to the protocol level problem). Obviously no one at Microsoft sends plaintext emails anymore.

The Issue

Back when email was first devised, servers didn’t have a lot of memory, and people had pretty tiny terminals with fixed line widths and not a whole lot of processing power to deal with it. The Internet standards for email messages http://www.ietf.org/rfc/rfc2822.txt, RFC2822 Section 2.1.1, defines recommendations for email body text transferred over SMTP:

There are two limits that this standard places on the number of
characters in a line. Each line of characters MUST be no more than
998 characters, and SHOULD be no more than 78 characters, excluding
the CRLF.

The 998 character limit is due to limitations in many implementations
which send, receive, or store Internet Message Format messages that
simply cannot handle more than 998 characters on a line. Receiving
implementations would do well to handle an arbitrarily large number
of characters in a line for robustness sake…

The more conservative 78 character recommendation is to accommodate
the many implementations of user interfaces that display these
messages which may truncate, or disastrously wrap, the display of
more than 78 characters per line…

…it is encumbant upon implementations which display messages
to handle an arbitrarily large number of characters in a line
(certainly at least up to the 998 character limit) for the sake of
robustness.

Basically, the SMTP server can count on messages that come in 80 characters per line (and always less than 1000 characters per line), and email clients can trust that they only have to render up to the 78th column of text. This limitation is hardly useful in the modern age, but persists since it’s part of the standard. And it’s a fine, conservative design model. But now we write some pretty long lines without linebreaking ourselves, so something magical has to happen in the email client itself, like Outlook 2011.

The naive solution, of course, is to slap arbitrary line breaks into the user’s email message at every 78 characters, which is what ye olde email clients (looking at you, pine — how did I ever put up with you…) from yesteryears did (and Outlook 2011 still does). It’s a matter of personal preference whether this is a reasonable solution. Proponents argue that the email will “always look the same” on all devices, including those limited to 78 chars per line.

I (and many others), on the other hand, think the spirit of the RFC is to allow the actual handling client to decide where to break lines. With the exception of source code, it is almost always better for the email client to use the full width of their display, however many characters that might be. Even in the case of source code, it should also not be mangled by the insertion of arbitrary line breaks in them — what if newlines are meaningful in this language, and the author used more than 78 characters per line? The example with the URI is illustrative of this problem — the URI got an arbitrary newline in the middle, destroying its meaning. Users who copy-paste the two lines will end up getting a 404, due to that stupid inserted newline in the middle of it. This should not be allowed to happen.

Because this naive solution was not perfect, an extension was proposed as RFC 2646. This format of email is characterized by the content-type:

Content-type: text/plain; charset=US-ASCII; format=flowed

In format=flowed emails, the sending and receiving email clients are allowed to reflow the text based on user linebreaks. It follows some simple reflowing rules, but in short it will preserve user-inserted hard line breaks while adjusting the rest of the message for the proper line length while the message is “on the wire”, and recombining the lines on receipt and display. Modern email clients like Thunderbird, designed for user comfort and the generous system limitations of the year 2011, implement this standard.

Guess what format Outlook 2011 sends?

Content-type: text/plain; charset="US-ASCII"

Not even an option to change that behavior. It does not appear that Outlook 2011 deals with any of this. It just inserts some line breaks and calls it a day.

An alternative, implemented by Apple’s Mail.app, is to send messages with the Content-Transfer-Encoding header set to “quoted-printable”, as per RFC 2045. In this model, soft line breaks are sent explicitly with the character “=” representing it, breaking at the usual 70-odd character column. On the receiving end, the client processes this character as a no-op and concats the line back together for display.

Outlook doesn’t do that either. It just wants to mangle your emails.

Conclusion

The world moved on and adopted HTML emails, which doesn’t have this newline problem. For those of us who do think HTML emails are an atrocity to be used sparingly, if at all, the idiosyncrasies of plaintext email have to be addressed. Outlook 2011 appears to do even worse than Entourage 2008 at this problem, by not dealing with it at all. And apparently getting a bunch of Microsoft “MVPs” on their forums to cloud the issue with promises of support and unrelated commentary.

Given the sad state of email clients on the Mac, I believe Thunderbird is now my only option for sane plaintext messaging.

fixing a scrambled IPython command history on stock OS X 10.6

So I started over with a fresh install of OS X 10.6 recently, and wanted to restore my Python development environment. In doing so, IPython is absolutely essential if you want a sane interpreter environment to test out code. I had a bit of trouble with it though.

The Problem

The stock Python 2.6 shipped with OS X 10.6 Snow Leopard has a readline module linked to libedit, the BSD alternative to the GPL’ed readline. The readline module, if you are not aware, is (among other things) responsible for keeping command history in the IPython interpreter. This causes command history in the IPython 0.10 interpreter to behave in very odd ways. When backtracking through the command history buffer using the up-arrow key, for example, the previous command is only partially recalled, and appears completely scrambled. Indents, too, seem off — in a whitespace-sensitive language like Python, this is annoying. (See first figure)

IPython command interpreter is broken when using libedit with command history
IPython command interpreter is broken when using libedit with command history

Fixing IPython’s bugs are beyond my ability. While I certainly don’t want to delve into the quagmire that is GPL vs BSD licensing, I do understand why Apple would want to avoid the viral nature of the GPL and ship libedit instead. However, using a genuine Readline library is going to be the best recourse for this problem. I already have a copy of readline compiled and ready to go, and just need a new version of readline.so, the library that links Python to readline.

The easy solution

Sifting through my records, I came across a SelfSolved problem record from my good friend Hannes who had issues with his IPython command history.

The solution: sudo easy_install readline, which uses setuptools to install a precompiled package of readline.so statically linked to genuine GNU readline. Restart your IPython console and everything should work. (See second figure)

IPython with readline
IPython with readline

The hard solution

Being the inquisitive sort, I also wondered how I would be able to reproduce this work from scratch. readline.so ships with the Python source package, but surely I would not be required to compile a whole new copy of Python for one measly module library?

I documented this process in SelfSolved again: building readline.so for Python. At some point I should write an interface between SelfSolved and WordPress so that I don’t have to reproduce a lot of my work here manually.

Compiling readline.so

This is actually fairly easy.

  1. Get a copy of the Python source code. In OS X 10.6, it ships with Python 2.6.1.
  2. Unpack it and go into its directory. You should find a Modules subdirectory. In it is readline.c, the source file for readline.so.
  3. Compile the source file. The appropriate incantation is:
    gcc -O2 -arch x86_64 -arch i386 -shared -o readline.so readline.c -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -L/usr/local/lib -lreadline -ltermcap -framework Python

    where the -arch flags should be whatever processors you wish to support, the -I arguments should point to the directories that contain header files for the readline library and the Python framework, and the -L argument should point to the path for the readline library. Use whatever optimization flags you feel comfortable with, instead of -O2, if you wish.

Replacing readline.so

So now we have a readline.so that’s properly linked to readline.dylib. The thornier question is how to override the system-provided readline.so. The system version is located at /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/readline.so, and the naive would simply overwrite it with their new readline.so. This is a bad idea.

As I have mentioned in the past, overwriting system libraries in OS X is an unhealthy thing to do. The problem is that Apple furnishes no official package management system — anything you personally change is considered fair game for the next official system update. On the next system update, if the Python component is affected by the update, the Apple updater will happily clobber your compiled files with its own, leaving you suddenly back at square one. You don’t know how many times I’ve had to recompile emacs (for X11 support) on OS X 10.4 because of this little annoyance. Leave things in the /System/Library directory hierarchy alone, for your own sanity.

However, in this case /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload comes ahead of the user-modifiable /Library/Python/2.6/site-packages directory on Python’s sys.path. So if you just drop readline.so into site-packages, the system version still takes priority.

There are a few ways to do this. For one, you can create a sitecustomize.py in /Library/Python2.6/site-packages. In this file, arbitrary Python statements can be written, and the interpreter will automatically execute them at runtime. So, you can add a sys.path = ["/dir/here"] + sys.path statement and point it to a directory containing your readline.so file. Alternatively, you can abuse the technique used in the easy_install.pth file. It turns out that if you ever used easy_install, directories pointed to by the easy_install.pth file takes priority over the system paths. They use an interesting way to accomplish this, which you can copy. Or, you can just insert your directory containing readline.so into easy_install.pth. In any case, this will force the readline-based readline.so to take precedence over the libedit-based readline.so, without overwriting anything.

Discussion

So for any sane person, the easy solution should be enough. For the rest, the hard solution is an interesting exploration of how some of Python’s built-in modules can be compiled and inserted individually.

Upgrading the Seagate Barracuda 7200.11 to firmware SD1A

TL;DR: If you’re applying firmware upgrade SD1A to Seagate drives, you need to double-check the firmware actually applied properly. If the Seagate patcher doesn’t work, make sure to use Legacy mode on SATA in the BIOS, instead of the more modern AHCI mode.

So perhaps you have heard of Seagate’s little manufacturing issue with its internal 3.5-inch Barracuda 7200.11 1TB drives a while back — namely, that some drives shipping with SD15 firmware are dying horribly. I had the unfortunate experience of buying such a hard drive — the ST31000340AS — as a scratch disk for my main machine, a MacBook Pro with a mere 240 GB internal drive (a pre-unibody revision, where the HD is insanely difficult to replace).

Seagate did in fact issue a firmware update — SD1A — that supposedly addressed this issue, but of course, there’s one catch: you can’t install the firmware through an external drive enclosure. In communication with Seagate support, a representative confirmed that for those of us without a desktop tower that has a SATA bay, we’re hosed:

Unfortunately, due to the nature of firmware updates and the way external drives work, the firmware update program cannot directly communicate with the drive in the manner it needs to in order to be able to upload the new firmware to the drive. It must be plugged into an internal SATA controller in order to update the drive.

Fair enough. That makes technical sense — but of course, it doesn’t work for me. I asked whether they would handle a mail-in repair, given that I have no easy access to such a desktop. The answer, of course, is No.

I have to find a desktop, open it up, jam this baby in (possibly in place of the existing drive if there’s only one bay), update the firmware, and put everything back together. Sadly, most of my friends who still own desktops would not trust me that far.

Half a year passes, and I finally find a sucker good friend who’s gullible awesome enough for me to try this procedure on his machine. The fellow owns a nice if aging Dell Precision T5400, which comes with two SATA bays (so I don’t have to inflict undue harm onto the existing system). Since this thing can run two drives at once, I can use the first method (a Windows-based firmware updater), though I burned a boot CD for the second method just in case. I popped in the drive, fired up Windows XP, downloaded the Windows-based Firmware Update Utility, double-clicked, and thought it was the (triumphant) end. In fact, it took 3 hours of my life to find out just how deep this rabbit hole goes.

Problem 1: The lying updater

The firmware updater will give a bunch of scary warnings and then reboot the machine. It will automatically reboot to a Seagate Loader screen, which attempts to apply the patch to all eligible SATA drives. To its credit, it’ll skip the non-qualifying (i.e. non-Seagate, non-Barracuda, etc.) drives, but it’ll still try them out first. At the end of the process, it will report “firmware downloaded” and “SUCCESSFUL” or some variant thereof, and automatically reboot back into Windows.

At this point, I advise you to use the SeaTools utility to verify that the firmware update actually applied. Despite its claims, if you were on a stock setup Dell T5400 (or perhaps other models as well), this will prove that the updater is a lying scumbag. And in fact, this particular drive still reported firmware SD15, the broken one.

Problem 2: The broken Boot CD

To save both me and my gracious host (who’s starting to suspect my computer-fixin’ skills now) some time, I decided to try the boot CD method, rather than pounding my head trying to see why the updater was lying. I downloaded the boot CD from the same Seagate Support site above, burned it to disk, and tried it out.

The result is a new SelfSolved posting: SelfSolved #59: getFatBlock error when upgrading Seagate Barracuda 7200.11 firmware. In essence:

The FreeDOS boot CD reports a number of ” error reading partition table drive 01 sector 0 ” errors. This is followed by ” get Fatblock failed:0x000000e8 ” or some variant of ” getFatBlock failed : ” The FreeDOS boot process appears to stall at this stage, and does not continue to the firmware flasher program.

That was lovely.

The Solution

I chased some red herrings. I came across postings about failures in various FreeDOS-based Seagate tools. One such post mentioned that it took a long time for the boot disc to get over the “error reading partition table” errors, but I waited forever (well, 15 minutes) and the boot process did appear to be frozen / stalled. I reformatted the drive via diskpart clean, thinking that the getFatBlock and error reading partition errors were related to a non-MBR partition table (I had it set to GPT). I should have realized, of course, that the errors were completely unrelated to filesystems, despite the “fat block” to which it refers.

The actual solution is deceptively simple — the boot disc & flasher appears to handle AHCI-based SATA mode badly. The Dell I was using was set to AHCI mode, out of the three possible Legacy, AHCI, and RAID options for SATA. Apparently the boot disc simply doesn’t handle this mode correctly on the Dell machine (and may also be related to why the Windows-based updater lies). When the machine switches on, use F12 to enter the boot menu, and select Setup to enter the BIOS. Then, on the list of Drive Options, skip past the SATA drives and down to SATA options. Pick the Legacy option to use ATA mode, instead of AHCI. Once this is done, the boot disc will function correctly, and the updated firmware will be applied without incident. Remember to switch the mode back to AHCI — it’s default for reason, no doubt.

The “error reading partition” messages were completely red herrings. They appear whether you are in the right SATA mode or not, and does not appear to affect the operation of the firmware updater or the boot process. It should not take very long to get to the flasher on this particular setup, so don’t wait on that message too long — it’s a good sign something’s not quite right.

In the end, I did recover my $100 hard drive, and the confidence of my peer in my mad hardware skillz (actually, quite non-existent).

Discussion

In the end, I’m quite appalled at Seagate. This sort of failure shouldn’t have happened, of course. Once it did, Seagate should have offered to take back and replace broken drives — the data I had on there was non-critical. I would have been perfectly willing to pay shipping costs to get a fixed replacement through mail-in service. I should not have been forced to search my social network for a person willing to let me tear his desktop computer apart, for a dubious and unsure firmware update procedure that fails mysteriously. I spent an additional 3 hours tracing mystery failures, for which the error messages were rather useless. Without my trusty iPhone and access to the Internet, I would not been able to solve this problem. How should I have known what “getFatBlock failed” means?

This little episode has convinced me to never buy a Seagate drive again — I simply cannot afford the time and energy for these sort of firmware upgrade adventures. While I was looking for a desktop to tear apart, I bought a Western Digital Caviar Black 1TB drive instead. Another $100, but at least I had a scratch drive for my work.

The moral of the story: Seagate, you are the worst storage vendor I’ve had to work with so far. I hope this record is not broken in the future.

Subversion 1.6.2 runtime error on network access on OS X 10.5

A new SelfSolved solution is up for perusal. The problem I tried to solve:

After compiling Subversion 1.6.2 from source on OS X 10.5 Leopard, the compilation is apparently successful, but svn dies when it tries to connect to the network for the first time. Crash log reports that symbols are missing from libneon.dylib.

Crash report from shell:

dyld: lazy symbol binding failed: Symbol not found: _ne_set_connect_timeout
Referenced from: /usr/local/lib/libsvn_ra_neon-1.0.dylib
Expected in: dynamic lookup

dyld: Symbol not found: _ne_set_connect_timeout
Referenced from: /usr/local/lib/libsvn_ra_neon-1.0.dylib
Expected in: dynamic lookup

Check out the places that I googled and my final solution writeup … at SelfSolved #49: Subversion 1.6.2 explodes on first network access.

The problem is very similar to a previous compilation issue I solved for PHP. In essence, the -L library search path passed to GCC at compilation time has /usr/lib in front of everything else. This means whatever library path you might have given to it at configure time, it’ll always look for the library in /usr/lib first, picking up the old system libneon in the process. Since the bad libneon dynamically linked, the problem doesn’t manifest itself until runtime — and only at runtime with network access involved.

As with the PHP issue, change the very first -L/usr/lib to -L/usr/local/lib (or wherever your newer libneon is located), and it’ll link correctly.

Out of curiosity, I checked MacPorts first. The MacPorts solution of disabling libneon version checking is odd — it also works, but I dunno if it’s linking to the right thing or not.

Fixing undefined library symbols for compiling PHP 5.2.8

So while compiling PHP 5.2.8 on OS X 10.5, you might run into something like:

Undefined symbols for architecture i386:
  "_xmlTextReaderSchemaValidate", referenced from:
      _zim_xmlreader_setSchema in php_xmlreader.o
  "_xmlTextReaderSetup", referenced from:
      _zim_xmlreader_XML in php_xmlreader.o
ld: symbol(s) not found for architecture i386
collect2: ld returned 1 exit status
Undefined symbols for architecture x86_64:
  "_xmlTextReaderSchemaValidate", referenced from:
      _zim_xmlreader_setSchema in php_xmlreader.o
  "_xmlTextReaderSetup", referenced from:
      _zim_xmlreader_XML in php_xmlreader.o
ld: symbol(s) not found for architecture x86_64

This doesn’t only happen with libxml. If you’ve installed any extra updated libraries, like iconv or tidy or any library that has significant symbol changes between versions, it’ll die in similar ways. The MacPorts folks have encounted similar issues in ticket 15891, but WONTFIX‘ed the issue. Apparently the PHP devs are also punting on the problem.

The immediate cause is that you have multiple versions of some shared libraries. For example, in the case above, I have two libxml versions — one in /usr/lib, and another in /usr/local/lib. This is because I do not want to overwrite the Apple-provided libxml version, but still needed new features provided in later libxml versions. The arrangement works fine in every other software compile except this one, so I investigated further.

The root of the problem

Despite the developers’ airy dismissal of the issue, the underlying problem is indeed that the Makefile generated by PHP at configure time is slightly broken. In Makefile and Makefile.global, you’re going to see this line:

libs/libphp$(PHP_MAJOR_VERSION).bundle: $(PHP_GLOBAL_OBJS) $(PHP_SAPI_OBJS)
        $(CC) $(MH_BUNDLE_FLAGS) $(CFLAGS_CLEAN) $(EXTRA_CFLAGS) $(LDFLAGS) $(EXTRA_LDFLAGS) $(PHP_GLOBAL_OBJS:.lo=.o) $(PHP_SAPI_OBJS:.lo=.o) $(PHP_FRAMEWORKS) $(EXTRA_LIBS) $(ZEND_EXTRA_LIBS) -o $@ && cp $@ libs/libphp$(PHP_MAJOR_VERSION).so

where $MH_BUNDLE_FLAGS is usually defined as something like

MH_BUNDLE_FLAGS = -bundle -bundle_loader /usr/sbin/httpd -L/usr/lib \
 -L/usr/lib -laprutil-1 -lsqlite3 -lexpat -liconv -L/usr/lib -lapr-1 -lpthread

The problem is that this hardcodes the search paths for linking shared libraries. GCC searches for shared libraries to link in the order of the provided -L paths. In this case, MH_BUNDLE_FLAGS is expanded immediately after $CC — so the load order is:

  1. /usr/lib
  2. /usr/lib (these are redundant, and so will probably be collapsed into one path)
  3. …every other custom library path you specify

Now you see the issue. No matter what your library paths are set to, the PHP compilation system will insist that whatever shared libraries in /usr/lib take precedence. Therefore, even if you specified that another version (say, libxml.dylib in /usr/local/lib) should be used instead, the invocation to link against -lxml2 will search in /usr/lib first. And since it finds the old version, which may be missing a number of symbols, the compilation blows up right there.

Evidence

And indeed, if you look at the (rather long and massive) compilation/link command right before it fails, you’ll see:

gcc -bundle -bundle_loader /usr/sbin/httpd -L/usr/lib -L/usr/lib \
-laprutil-1 -lsqlite3 -lexpat  -liconv -L/usr/lib -lapr-1 -lpthread -O2 -I/usr/include -DZTS   \
-arch i386 -arch x86_64 -L/usr/local/lib ... 

emphasis mine, where /usr/local/lib might be /opt/lib or whatever custom path you provided to configure.

Solutions

The trivial solution is to manually invoke that last line of compilation, but swapping the -L load paths.

gcc -bundle -bundle_loader /usr/sbin/httpd -L/usr/local/lib -L/usr/lib \
-L/usr/lib -laprutil-1 -lsqlite3 -lexpat  -liconv -L/usr/lib -lapr-1 -lpthread -O2 -I/usr/include -DZTS   \
-arch i386 -arch x86_64  ... 

This is easy to do and takes just a second.

Another possible solution is to patch the Makefile, such that MH_BUNDLE_FLAGS comes later in the compilation line:

libs/libphp$(PHP_MAJOR_VERSION).bundle: $(PHP_GLOBAL_OBJS) $(PHP_SAPI_OBJS)
        $(CC) $(CFLAGS_CLEAN) $(EXTRA_CFLAGS) $(LDFLAGS) $(EXTRA_LDFLAGS) $(PHP_GLOBAL_OBJS:.lo=.o) $(PHP_SAPI_OBJS:.lo=.o) $(PHP_FRAMEWORKS) $(EXTRA_LIBS) $(ZEND_EXTRA_LIBS) $(MH_BUNDLE_FLAGS) -o $@ && cp $@ libs/libphp$(PHP_MAJOR_VERSION).so

This will force your library paths to be searched before /usr/lib, thus resolving the link problem.

update 7/18/09
An anonymous reader mentions that you could also specify the right libxml by full path, instead of letting it use -lxml. Basically, in the last compilation line, you would remove any mentions of -lxml and replace that with the full path to your library e.g. /usr/local/lib/libxml.dylib. In fact, this is probably the way that has the least possible side-effects, since you aren’t changing the search order for any other libraries.

Discussion

This is not the first time that PHP core developers have refused to fix a compilation issue that is arguably preventable through actual testing under different installation scenarios. This is an “edgier” edge case than the tidy.h issue, but still should be fairly noticeable for a substantial number of people.

The “You should only have one library installed” argument is, to be honest, unnecessarily arrogant (sadly, not as a rare a problem as you’d like in some open source development projects ). I understand that it’s an open source project, and no self-respecting software engineer likes to use time on project plumbing / build systems rather than work on the product. However, on OS X, due to the lack of official Apple package management systems, no one should be overwriting system default libraries — down that way lies insanity, especially at every system or security update. PHP’s build system is obviously broken any time there is a substantial difference between user-installed libraries and system libraries. This bad behavior is especially egregious, because the configure command allows you specify your own library path — misleading users into thinking that the path they specified would be obeyed at compile time. If you only intend for the system library to be used and no other, perhaps the configure script should auto-detect this on OS X and disable that command-line option. Basic user interface design should apply even to command-line interfaces.

Note that changing link ordering may have some unforseen consequences, since the devs obviously never tested this path. For example, you should make sure the dynamic libraries are loaded in the right order at runtime. On OS X, the load path is typically hard-coded into the dylib, so usually there won’t be a problem — but there may be edge cases. Test your build (and any PHP extensions you built) before using it in production!

Apple Remote Desktop black screen and old machines

There appears to be some sort of limitation on screen colors when using ARD 3.2.2 to control an older Mac remotely. The symptom of this is a black screen when you attempt to Observe or Control the remote machine. Unfortunately, this same symptom usually appears when you have a blocked network port (ARD uses TCP and UDP ports 3283 and 5900), so it may be confusing as to which is the issue.

After verifying all network settings and router port forwardings are set up correctly, you might try this if you have an older Mac as the target: move the color slider on the top-right corner of your ARD admin panel to a lower value, and then try to reconnect.

The story is that I was trying to remote control a G4 dual 500 ( Mystic ) from a MacBook Pro (early 2008). This used to work until recently, when I had nothing but a black screen. Keyboard commands still worked (I can blindly log in from the loginwindow), though mouse movements did not pass through to the old Mac.

After fruitlessly chasing network issues with my AirPort router, the last post at an Apple Discussions thread pointed me to the right direction. Once I used the color control on the ARD application to lower the color depth by 1 notch, the next connection worked just fine, with the screen showing up and behaving normally.

Now this poor old G4 tower is running 10.4.11 with an ancient Rage 128 Pro graphics card, but it handles its 17inch screen just fine at “million of colors” color depth when sitting in front of it. Very odd how it just stopped working at that depth over ARD.

Bad Google cookie kills Safari

03-10-2010: I believe this is fixed in latest Safari versions. The contents of this post remain for historical purposes only.

In a bizarre case of digital food poisoning, I experienced a series of mysterious, persistent, reproducible crashes with Safari 3.2.1 this morning, traceable to a bad Google cookie.

The symptoms

Google has a nifty query suggestion feature that is turned on by default on its homepage search box. Whenever I typed in a phrase query (e.g. +"query suggestion" +"Google features") with the suggestion feature turned on, the browser crashed with a SIGSEGV around 30% of the time.

Excerpt from the crash log:

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x000000001bdca240
Crashed Thread:  0

Thread 0 Crashed:
0   ???                            0x16619e75 0 + 375496309
1   com.apple.WebCore              0x94325ea0 WebCore::AutoTableLayout::fullRecalc() + 704
2   com.apple.WebCore              0x9432581a WebCore::AutoTableLayout::calcPrefWidths(int&, int&) + 26
3   com.apple.WebCore              0x943252b8 WebCore::RenderTable::calcPrefWidths() + 56
4   com.apple.WebCore              0x9431c04b WebCore::RenderBox::minPrefWidth() const + 27
5   com.apple.WebCore              0x9432507c WebCore::RenderTable::calcWidth() + 124
6   com.apple.WebCore              0x943241a8 WebCore::RenderTable::layout() + 392
...

In the remainder of the cases, when it does not crash immediately, a JavaScript error is logged to the browser error console (to access, go to Develop -> Show Error Console)
SyntaxError: Invalid regular expression: nothing to repeat
http://www.google.com/extern_js/f/CgJlbhICdXMrMAc4AiwrMAo4EywrMA44AywrMBg4Ayw/nMD0sKnpeG0.js (line 21)

for every letter that I type into the search box. During this time, no query suggestion is made.

Diagnostics

  • I have never used an InputManager or “plug-in” to Safari
  • The same crash does NOT happen under a fresh new user account created for diagnostic purposes
  • Clearing the browser cache, temp files, hidden cache files ( getconf DARWIN_USER_CACHE_DIR ), etc. did not help.
  • Deleting Safari preferences did not help.

Solution

After applying a divide-and-conquer strategy to the entire ~/Library directory (not made any easier by Finder’s obstinate resistance to my attempt to move subdirectories within the Library directory, despite having the appropriate permissions — had to drop to Terminal for this), I traced it to the ~/Library/Cookies directory. Moving away the Cookies.plist file contained within cured the crash, the lack of query suggestions, and the Javascript error. More specifically, deleting all Google-related cookies within the Cookies file also accomplished the same thing.

Remarks

Some combination of a bad cookie and bad regexes appears to have triggered a crash bug in this version of WebKit / WebCore. You wouldn’t think a bad cookie could take down a browser. But apparently it does.

I dearly hope this is not a potential buffer overflow or other security problem within WebKit.

redhillonrails_core and broken MySQL empty string defaults

While hacking on a side project in Ruby on Rails, I ran across this weird error when trying to insert new data:

ActiveRecord::StatementInvalid: Mysql::Error: Column 'attr2' cannot be null: INSERT INTO `foo` (`attr1`, `attr2`) ... VALUES ('1', NULL)

where attr2 is a varchar (or t.string, in Rails lingo) and set to not null default '' (or, in other words, :null=>false, :default=>''). Strangely enough, instead of the default value of ”, ActiveRecord was setting the value to nil instead, which translates into a NULL. Since the schema explicitly forbids NULLs on that column, the statement explodes.

After an hour of poking around and hacking up a spike solution, it turns out a plugin was to blame. I’d pulled in the foreign_key_migrations plugin (a highly recommended add-on) to automatically install foreign key constraints (in this day and age, the foremost web framework still can’t automatically handle FOREIGN KEY constraints, the most basic tool for ensuring data integrity in relational databases, for its migrations? Bah!).

This plugin has a dependency on redhillonrails_core, which has a known bug: Incorrectly overwrites mysql empty-string default with nil for string/text/binary types.

The bug is apparently not being worked on as of the time of this writing. The dev doesn’t consider this a bug, as he claims that “[he considers] empty strings to be semantically identical to NULL”.

This position, unfortunately, is not supported by the SQL standard. Wikipedia has a section on common SQL NULL mistakes documenting some of the potential problems involved in making such an assumption. Philip Greenspun has further notes on this. Using NULLs trigger the all the arcane annoyances of three-valued logic, and you must be prepared to consider True, False, and NULL values as comparison outcomes. Someone not very versed in three-valued logic can easily cause a number of subtle mistakes trying to compare values.

To workaround this bug, you will need to comment out the initializer function in

vendor/plugins/redhillonrails_core/lib/red_hill_consulting/core/active_record/connection_adapters/mysql_column.rb

Alternatively, you can delete the file, and remove relevant references to it..

If you agree with dev’s position that NULLs are “semantically identical” to empty strings, remember to pay attention when you formulate your SQL queries (and when Rails formulates those queries) — your results may not be what you expect, if implemented naively. Get your three-valued logic truth tables out 🙂

APR and 32-bit/64-bit universal binary compilation

When compiling APR, the Apache Portable Runtime 1.3.3 (as a part of Subversion 1.5.3 as I am doing here, or not) on OS X 10.5 Leopard, you may encounter the following error at compile time.

/bin/sh /tmp/subversion-1.5.3/apr/libtool --silent --mode=compile gcc-4.2 -Os -arch i386 -arch x86_64 -DHAVE_CONFIG_H -DDARWIN -DSIGPROCMASK_SETS_THREAD_MASK -no-cpp-precomp -I./include -I/tmp/subversion-1.5.3/apr/include/arch/unix -I./include/arch/unix -I/tmp/subversion-1.5.3/apr/include/arch/unix -I/tmp/subversion-1.5.3/apr/include -o strings/apr_snprintf.lo -c strings/apr_snprintf.c && touch strings/apr_snprintf.lo
strings/apr_snprintf.c: In function ‘conv_os_thread_t’:
strings/apr_snprintf.c:500: error: duplicate case value
strings/apr_snprintf.c:498: error: previously used here
strings/apr_snprintf.c: In function ‘conv_os_thread_t_hex’:
strings/apr_snprintf.c:671: error: duplicate case value
strings/apr_snprintf.c:669: error: previously used here

This will most likely happen when you are configured to build a dual 32-bit / 64-bit universal binary, whether it be ppc / ppc64, or i386 / x86_64, or any permutation thereof. This ticket over at MacPorts documents a particular instance of this problem, with no apparent solution.

The symptom is easy to explain. Somehow, two case labels in the relevant switch statement in strings/apr_snprintf.c:500:

switch(sizeof(u.tid)) {
    case sizeof(apr_int32_t):
        return conv_10(u.u32, TRUE, &is_negative, buf_end, len);
    case sizeof(apr_int64_t):
        return conv_10_quad(u.u64, TRUE, &is_negative, buf_end, len);
    default:
        /* not implemented; stick 0 in the buffer */
        return conv_10(0, TRUE, &is_negative, buf_end, len);
    }

have evaluated to the same value. In particular, it believes that sizeof(apr_int32_t) and sizeof(apr_int64_t) are the same value. As we all know in C, you cannot have two identical case labels in the same switch statement. However, the root of the problem is a bit more subtle.

In $SRCDIR/include/apr.h, you’re likely to see this fragment of code.

typedef  long       apr_int64_t;
typedef  unsigned long  apr_uint64_t;

Notice that it has typdef’ed apr_int64_t as a long and apr_uint64_t as unsigned long. This is because at configure time, the script detected that long values are 64-bit on this system, so it assigned the apache 64-bit types to longs. However, this only holds true for half of the compilation – because you are building a universal binary for a 32-bit architecture as well. Remember that in 32-bit GCC on OS X, longs are 32-bit rather than 64-bit. Your run-of-the-mill autoconf script, done by a non-OS X programmer, isn’t going to be able to detect this subtlety – if the 64-bit part worked, it’ll keep thinking longs are 64-bit, end of story – and happily generate the incorrect typedef expressions. When you apply sizeof to these types in apr_snprintf.c, both evaluate to 4 bytes under 32-bit compilation, thus blowing up the compile run.

To truly fix the root of the problem requires rewriting the autoconf script to detect Mac OS X and its universal binary building, which can potentially throw quadruple architectures at the same compilation script. However, a quick hack to make this particular problem go away is to change apr.h such that:

typedef  long long       apr_int64_t;
typedef  unsigned long long apr_uint64_t;

Now that we ensure in either 32-bit or 64-bit compilation, apr_int64_t and apr_uint64_t are always typedef’ed to appropriate, guaranteed 64-bit types. The compilation of APR (and Subversion) will proceed normally.

Note that long long is not an standard C type. As a GCC extension, this fix is a kludge. A kludge that works (for me), though.

UPDATE:
There may also be an issue with sizeof definitions that may cause the library to crash. In particular, there may be occurrences of

#define APR_SIZEOF_VOIDP 8

that were generated by configure. To fix this, you will need to remove the define and have the compiler check for 64-bit at compile-time:

#ifndef __LP64__
    #define APR_SIZEOF_VOIDP 4
#else
    #define APR_SIZEOF_VOIDP 8
#endif

In general, any predefined sizeofs need to be changed. I am not sure why the APR developers do hard-coded defines like this, given that the point of having sizeof() calls is to avoid such issues.