duplicates and invoking LaunchServices

If you deal with duplicate files a lot, then fdupes is a very quick command-line solution. Among other things, it also does an MD5 comparison of file signatures, can recurse directories and following symlinks, and can be used for scripting as well as directly from standard input.


Bonaparte-Prime:/tmp $ fdupes -r -d ./
[1] ./foo.mp3
[2] ./bar/baz.mp3

Set 1 of 4, preserve files [1 - 2, all]:

However, sometimes I forget what the hell foo.mp3 and bar/baz.mp3 really were. Then I have to fire up the Finder (or another Terminal window) and go chase down the file. What I really wanted would be if there were an “open one of these files and let me see what it is, then I’ll make a decision” option.

The beauty of open source is that I can patch fdupes myself and scratch this itch.

Finding the right app to hand off to might be annoying. On OS X, if you want to use the nice automagical way that Finder opens files (meaning that it generally knows what applications are appropriate to open an arbitrary file), you’ll have to link against LaunchServices, a part of the ApplicationServices framework. The code to do that is actually quite trivial. Simply add the ApplicationServices.h header, and then a function such as:


OSStatus open_target_file(const char* filepath)
{
FSRef ref;
OSStatus err;
Boolean isDirectory;

err = FSPathMakeRef((const UInt8 *)filepath, &ref, &isDirectory);

err = LSOpenFSRef(&ref, NULL);

return err;
}

will do the job. Obviously this is simplistic and without error-checking, but you get the idea. LaunchServices will then handle the tedium of finding the right system application to launch the file.

The other half of the patch involves modifying fdupes’s command parser to take an additional variant. I chose a scheme by which if I precede the choice numbers I make with the character ‘o’, it should then hand off the filepath to the aforementioned open_target_file function and return to its previous state. Since the parser already has such a case to handle mistaken input, the patch is also quite simple. Simply check for ‘o’ in the first byte of the preservestr array at the appropriate place, and then should it match, have it execute open_target_file() rather than preserve[number] = 1; If you’re the portability stickler type, wrap these new lines under #ifdef __APPLE__ statements to check for OS X at compile-time.

Source code to simple utilities like these are really quite helpful. They let someone with just a modicum of programming ability tweak the behavior and address some common annoyances, without having re-invent the thing from scratch or to pester the original dev, who probably doesn’t care nearly enough about little features for specific platforms that only specific people might find useful. Write the extension code, create a patch, and keep your patch around in case the utility is ever updated. Scratch your own itch as free software intended.

Turning off the Unused Icons Wizard

Having been pestered by the Windows XP Desktop Cleaning Wizard one too many times, I wanted to turn the bloody thing off. Turns out it was under Display -> Desktop tab -> Customize Desktop -> a check box to turn off the wizard.

Interesting that a system maintenance service is actually controlled by a setting in the same tab that sets my Desktop picture – a similar jarring mix of purposes would be rare to find in a Mac system preference panel. I ended up having to Google “unused icons” and “Windows” to find out this little piece of information.

In a timed fire event like this, a logical “opt-out” solution would have been to have the rather intrusive wizard offer an option to deactivate itself – permanently if necessary. Instead, I had to go hunt down the magical control that triggers this wizard once some number of days. This is really poorly designed, but rather symptomatic of typical Windows UIs for preferences.

Without strong cues, users tend to overlook settings that are deeply hidden. A check box on a tab, from a dialog box that is opened by some button, on a tab of a control panel, is already pushing it. Who usually explores beyond what he can skim at a glance from a control panel window? Who actually drills down regularly into all those “Advanced” and “Properties” buttons, which hide dialog boxes, which themselves possess tabs (or, dear god, another “Properties” button that opens up yet another dialog)? The problem is compounded by the fact that there lacks a sufficiently powerful search interface to find the specific setting you want from the operating system.

Conventional wisdom holds that most users never change preferences from their defaults. I keep wondering if it’s that they simply can’t find the damn thing in a reasonable amount of time, or have enough cues to know that these things can be changed. The Desktop Cleaning Wizard certainly never hinted that its presence can be made to go away. If I were a less motivated user, I’d probably settle for ignoring the notification until it goes away on its own.

/Library/Java/Extensions, MATLAB errors, and you

If you installed MATLAB 2006b on OS X and encountered this error when Matlab starts up:

java.lang.IllegalArgumentException: http://java.sun.com/xml/jaxp/properties/schemaSource
at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.setAttribute(DocumentBuilderFactoryImpl.java:118)
at com.mathworks.xml.XMLValidator.validateFile(XMLValidator.java:54)
at com.mathworks.mlwidgets.util.productinfo.Product.getItems(Product.java:310)
at com.mathworks.mlwidgets.util.productinfo.ProductInfoUtils.pathChanged(ProductInfoUtils.java:160)
at com.mathworks.mlwidgets.util.productinfo.ProductInfoUtils.<init>(ProductInfoUtils.java:67)
at com.mathworks.mlwidgets.util.productinfo.ProductInfoUtils.getAllProductsInfo(ProductInfoUtils.java:910)
at com.mathworks.mlwidgets.util.productinfo.ProductInfoUtils.<clinit>(ProductInfoUtils.java:53)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:164)
at com.mathworks.mde.desk.StartupClassLoader.callClassForName(StartupClassLoader.java:304)
at com.mathworks.mde.desk.StartupClassLoader.access$000(StartupClassLoader.java:27)
at com.mathworks.mde.desk.StartupClassLoader$LoadInfo.<init>(StartupClassLoader.java:80)
at com.mathworks.mde.desk.StartupClassLoader.addLoadInfo(StartupClassLoader.java:219)
at com.mathworks.mde.desk.StartupClassLoader.createLoadInfos(StartupClassLoader.java:195)
at com.mathworks.mde.desk.StartupClassLoader.access$500(StartupClassLoader.java:27)
at com.mathworks.mde.desk.StartupClassLoader$2.run(StartupClassLoader.java:147)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
Exception in thread "Timer-2" java.lang.ExceptionInInitializerError
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:164)
at com.mathworks.mde.desk.StartupClassLoader.callClassForName(StartupClassLoader.java:304)
at com.mathworks.mde.desk.StartupClassLoader.access$000(StartupClassLoader.java:27)
at com.mathworks.mde.desk.StartupClassLoader$LoadInfo.<init>(StartupClassLoader.java:80)
at com.mathworks.mde.desk.StartupClassLoader.addLoadInfo(StartupClassLoader.java:219)
at com.mathworks.mde.desk.StartupClassLoader.createLoadInfos(StartupClassLoader.java:195)
at com.mathworks.mde.desk.StartupClassLoader.access$500(StartupClassLoader.java:27)
at com.mathworks.mde.desk.StartupClassLoader$2.run(StartupClassLoader.java:147)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
Caused by: java.lang.NullPointerException
at com.mathworks.mlwidgets.util.productinfo.Product.getItems(Product.java:317)
at com.mathworks.mlwidgets.util.productinfo.ProductInfoUtils.pathChanged(ProductInfoUtils.java:160)
at com.mathworks.mlwidgets.util.productinfo.ProductInfoUtils.<init>(ProductInfoUtils.java:67)
at com.mathworks.mlwidgets.util.productinfo.ProductInfoUtils.getAllProductsInfo(ProductInfoUtils.java:910)
at com.mathworks.mlwidgets.util.productinfo.ProductInfoUtils.<clinit>(ProductInfoUtils.java:53)
... 11 more

or something similar, you may have hit one of the mysterious gotchas of Mac OS X 10.4. In /Library/Java/Extensions (or another one of these system Java directories), you have a conflicting copy of xerces.jar, probably from another software package that rudely installed these extensions without informing you of it. Due to the way Java classpaths are being resolved, apparently the copy in Extensions overrides the copy in Matlab’s directory.

The symptoms of this: among other things, Matlab will throw these exceptions at startup; you can’t open certain panels or windows via the menu bar; you also cannot select certain panels in the Preferences window. Whenever you perform one of these operations, you’ll receive a NoClassDefFoundError on one of the Matlab widget classes, even though it already exists in the Matlab directory.

This is a rather classic gotcha, but a lot of people wouldn’t even realize /Library/Java/Extensions actually exists. If you went off chasing the red herring of the NoClassDefFound exception, or went tracing StartupClassLoader.java, you’d never realize the core cause of the issue.

/Library is sometimes an incomprehensible morass of things that cause mysterious failures in other applications (InputManagers, anyone?) and end up exceedingly difficult to trace.

Exchanging objects between PHP and Python

So over the course of my various projects, personal or otherwise, I’ve collected an assortment of information that may or may not be of interest to others or to myself in the future. What does end up happening is that I would make notes about it in a file or (dear god) on a random piece of paper, post a message to some forum or mailing list, or just plain put it in that lossy storage medium of my own mind…and then promptly forget all about it. For a would-be information specialist, this ironic lack of information organization has caused many problems and continues to do so, especially at retrieval time.

Now in 2007, as New Year’s Day draws to a close, I am putting my laziness to the test again by resolving to begin this project to document Random Things That I Somehow Know About. Some of this is trivial, some of this is not. But for one reason or another, I intend to keep track of it.

Starting with something recent. To workaround a problem when deploying a Python-based service on a server that disallowed Python CGI execution, the Python driver program had to be wrapped around a PHP frontend (which the server did allow). However, the driver needed to accept a number of parameters, and the PHP wrapping must conveniently pass these parameters via a call to system() and print the resulting output from the Python driver to stdout. In a previous, similar project, I engineered a set of subroutines in the Python driver to parse options on the command line, and had the PHP script put those options on the commandline at invocation time. It was tedious, error-prone, and remarkably insecure, even with Python variants on getopt() to help

Since this was a proof-of-concept project in any case, there had to be a faster, friendlier hack. It would be great if the wrapper and the driver could exchange data objects directly – in this case, PHP associative arrays and Python dicts. Then the Python driver can simply ask for the necessary values bound to fixed options/keys.

Enter Armin Ronacher’s phpserialize.py, conveniently under BSD license. It exposes two functions, serialize()and unserialize(), which encodes and decodes data created by PHP’s own serialize() function.

The solution comes together as follows. On the PHP side:

//something to ensure that we've got a correct parameter object
$params = array('phpArgs' => 'yes'); 
$params['foo'] = 'bar'; //check and populate $params
$s = serialize($params);
//so that if we're passing on cmdline
//things don't blow up if weird bytes encountered
$s = base64_encode($s);
//...and so on

Once the $s in base64 text is passed to the driver script, the Python side will simply call this:

def getParameters(php_base64_str):
    php_parameters = base64.decodestring(php_base64_str)
    parameters = unserialize(php_parameters)
    return parameters

 

and parameters will be a Python dictionary with all the values in place, mapped to their original keys.

>>> params = getParameters(s) # retrieve s from argv first
>>> params['foo']
'bar'

This makes exchanging data between frontend and backend a lot less headache inducing.