The Sarth Repository – Page 7 – source control for my (useless) knowledge

GUI cues for block-level copy in Disk Utility

Posted on December 19, 2007December 24, 2009 by yiming

In Mac OS X 10.4 Tiger (PPC edition), the only UI difference between a block-level copy and a file-level copy when using Disk Utility’s Restore (a.k.a. disk cloning) mode is that the progress bar label reports "Copying Blocks..." for the former and just plain "Copying..." for the latter. The difference is significant, especially for full-disk cloning operations.

To invoke block-level copy, you:

must not be booted from either the source or the destination partition – the Mac OS X DVD is good for this, if you do not have a different partition.
must tick the checkbox for Erase Destination when setting for Restore mode in Disk Utility
may or may not need to select “Skip Checksum”.

Unfortunately, the progress bar message confirming whether a file-level or block-level copy is displayed after one has already invoked the Restore procedure, and there isn’t a “Cancel” button anywhere in sight. As if potential for hours of time wasted (and quite possibly loss of metadata, since asr in file-copy mode doesn’t bother preserving such trivial things as file creation date) isn’t good enough to warrant a button to express one’s regret.

One wonders why Apple did not simply put in a simple checkbox for “Block-level copy”, and warn you about the requirements for block-level copy (and further, asking you if you wish to proceed with file-level copy anyway if you don’t meet the requirements.)

Probably because they thought (in their infinite Apple wisdom) that it would scare and confuse the “normals” who can’t tell the difference. Never mind that the people who would try to clone their entire drives for backup (as opposed copying files here and there to a USB key) are probably savvy enough to care.

Wonder if this changed in Leopard.

Update
Many seem to arrive at this post wondering what a block-level copy is. Here it is in a nutshell. A typical hard disk is divided into a linear set of n logical blocks, m bytes each. In short, your files are recorded within these blocks. To keep track of things, the filesystem is responsible for maintaining more metadata on top of this. This lets it create such niceties as “folders”, and forms the overall tree of folders nested within folders that you see in Finder.

A file-level copy means that the copying program loads in the directory tree and walks the tree. When it finds a directory, it’ll load metadata to find all the files contained in it. When it finds a file, it’ll go look for the blocks that contain the file’s data, and start copying. This has a lot of overhead, since the program has to load in the nice tree abstraction first, descend into each folder looking for files to copy, and then go find the proper blocks to copy, and then finally copy the data and any metadata associated with it. A block-level copy, on the other hand, recognizes that if you literally want to copy everything from one disk to another, it’s a lot easier and faster to just start copying at block 1 until you get to block n at the end — rather than running up and down that directory tree.

A block-level copy is a literal byte-for-byte (well, one would hope) copy of one disk to another, while file-level copying creates a copy of each file and folder from one disk to another. The distinction here is subtle but important. A file-level copy from disk A to B does not necessarily result in A == B, while a block-level copy (for all intents and purposes) does. As for performance, for a full-disk clone, block-level copying should be dramatically faster than file-by-file copying.

Cisco VPN behind a NAT

Posted on November 23, 2007January 13, 2009 by yiming

Useful if you’re:

on OS X
using the Cisco VPN Client 4.9.01 or below
are behind a router/NAT
and having intermittent connectivity issues with the Berkeley Campus Full Tunnel VPN

You might also be able to use this info if you have a similar network setup and having similar problems, but I’m not going to claim that.

Basically, the problem for me was that three connections out of four would get an IP address from the VPN, but the actual network is unreachable. No IP can be ping’ed successfully. The VPN GUI reports “Bytes In: 0, Bytes Out: xxxx”. The VPN log is stuck in a loop of:

Sending DPD request to xx.xx.xx.xx, our seq# = 1234 ... Received DPD ACK from xx.xx.xx.xx, seq# received = 1234, seq# expected = 1234 ...

The solution that I’ve found is to switch on Enable Transparent Tunneling -> IPSec over UDP ( NAT / PAT ). This can be done by hitting Modify on the GUI, for the appropriate Connection Entry. Then, use the Transport tab and tick on the appropriate box. For good measure, I also forwarded ports 500 and 4500 on my router’s NAT, to ensure that the conventional Cisco VPN ports are open to the network (and just to do some irrational voodoo). The IPSec over TCP option, btw, does not appear to work, despite what Berkeley IT say in the instructions page. The client refuses to connect with that option active, though in theory it should have worked. Perhaps I’m not forwarding the right ports for it.

In any case, finally, after 1.5 years of this nonsense, the Berkeley VPN doesn’t choke on me anymore (too bad I’ll be leaving here in 6 months. Argh.). Every connection I make gets through on the first time, rather than on the fourth or fifth time. It still doesn’t make sense how I was able to connect to the VPN before, though. Why would it fail intermittently, and not always?

This is why I am not a network engineer. It already gives me a headache.

Entourage sent-mail archival, episode 2

Posted on November 1, 2007January 13, 2009 by yiming

Previously, on The Sarth Repository…

I had this setup going on to automatically redirect most messages I send to a repository for later search and retrieval…A month later, by pure chance, I realized that Entourage wasn’t quite deactivating the CC field on the [redirected] archival email. In essence, all the people I cc’ed on anything got spammed with a duplicate every time I sent a message…

And now, the continuation…

So Google finally enabled IMAP for my accounts on thallos.org, which allowed me to test a new strategy for archiving sent mail. Again, the goal is to have a copy archived straight from Entourage, whenever I send a new email, to my mail repository. With proper IMAP access, however, this became much easier.

First, configure Entourage for IMAP access to Gmail / Google Apps. This is surpisingly non-trivial, since Entourage is not a supported client as of the time of this post. Rather strange, considering that Entourage must be at least second or third place in terms of install-base for Mac email clients. Follow the generic instructions for IMAP setup, and you should do okay. If you’re on Google Apps, the username is your_name@your_domain.tld, as per this configuration instruction.

You should have an IMAP structure for your Gmail boxes once this is complete. Simply set a rule in Rules -> Outgoing, for all messages, to copy the message to the Gmail/Sent Mail folder. In fact, this is the exact same approach if you were backing up to an IMAP-enabled mail server.

Unfortunately, It broke for me on a couple of messages. Gmail servers reported inconsistent failure messages, such as “Connection to the server failed or was dropped” and “The message could not be copied.” Some message headers also seemed to be mangled in transit, with the sender’s name dropped and so forth. The messages themselves were innocuous, text-only messages with no attachments, HTML, or any other random nonsense, so I find it very curious to be failing on these messages. Will have to look into it a bit more.

UPDATED Nov 22, 2007
See the exciting (yet depressing) episode 3 of my adventures in email archival.

getting my act together: papers and projects

Posted on October 18, 2007January 13, 2009 by yiming

Honestly, this was one of my more productive years. While I still haven’t gotten any further on either of my private projects (and in fact abandoned one of them, having been beaten to the punch by a startup this summer. You ambitious entrepeneurs should leave me some hobbies) , I did manage to get some reasonably meaningful things out.

Yiming Liu, David A. Shamma, Peter Shafton, Jeannie Yang. Zync: the design of synchronized video sharing. (DUX 2007), November 2007, Chicago, USA.
David A. Shamma, Ryan Shaw, Peter Shafton, Yiming Liu. Watch what I watch: Using Community Activity To Understand Content. In the workshop Multimedia Information Retrieval (MIR): Special Session on Semantic Indexing of Consumer and Web Videos at the Fifteenth ACM International Conference on Multimedia, (ACM MM 07), September 2007, Augsburg, Germany.

Worked on some pretty awesome stuff at Y!RB. Programming, data analysis, and social media research – it doesn’t get any better than that. Now I just need to figure out what I want to be doing after this year, which may in fact mark the end of my affiliation with graduate school and possibly academia in general. Scary thought. Where to go from here…

Getting custom HTTP variables out of PHP

Posted on October 6, 2007January 13, 2009 by yiming

PHP 5.0 stores HTTP headers in the $_SERVER variable as key-value pairs. It mangles their field names, however, by:

prepending “HTTP_” to the key
replacing “-” with “_” in the key
uppercasing all letters

Say that your custom HTTP client sends X-Hello: World as a header. To retrieve the value (e.g. “world”) from PHP, the correct key to use is $_SERVER["HTTP_X_HELLO"].

This does fit with the existing access pattern (User-Agent: is retrieved by $_SERVER['HTTP_USER_AGENT']). But it was not well documented in corresponding page for reserved variables (as of today, October 7, 2007). Took a bit of trial and error for me to figure this out.

I’m sure that amongst the insanely numerous and ill-organized set of functions that PHP provides, there is one to do this exact task without reverse-engineering its key-mangling algorithm. But this way works too.

Entourage: thwarting archival strategies since 2004

Posted on September 18, 2007October 11, 2010 by yiming

So I use Microsoft Entourage as my main email client, and had been wanting for some time to get my messages exported out of my local drive. As much I trust my laptop and my backups, one good earthquake later and all of that would be futile.

Getting my message archives preserved (with all metadata intact, like Sent and Received dates, etc) was the easy part. Grabbing all future messages was the hard one. Of course, Microsoft, in its infinite wisdom, didn’t include an auto-bcc for Entourage.

I had this setup going on to automatically redirect most messages I send to a repository for later search and retrieval. I had a process set up where, except for select messages that I mark as confidential, the above rule gets triggered.

A month later, by pure chance, I realized that Entourage wasn’t quite deactivating the CC field on the redirect for archival. There is a bug that resends the message to all CC’ed emails on redirect. For example, if I were sending to [email protected], cc’ed to [email protected], and redirecting to [email protected]:

1. the first copy goes out to a and b.
2. Then, the redirected copy will be sent to archive and b, as b appears on the CC list.
3. End result: a receives 1 copy, b receives 2 copies, and archive receives 1 copy.

In essence, all the people I cc’ed on anything got spammed with a duplicate every time I sent a redirected copy via Entourage’s Outgoing rule. This is stupid, and Microsoft’s website doesn’t warn you about this. Try it for yourself if you don’t believe me.

Had I been more diligent at searching the web or even just testing out this archival strategy, this wouldn’t have happened. Plus, I would have noticed one fellow complaining that all contacts on the CC list, for every email, received a copy of his archived messages. Ouch. I’m glad I didn’t try redirecting all of my sent box (there is another strategy, which I will outline sometime, is far easier – but it can’t do real-time, auto-bcc).

To all the people whom I inadvertently spammed, I’m awfully sorry. This won’t happen again.

Windows IE 6 ignores text/plain mimetype

Posted on August 9, 2007April 27, 2011 by yiming

A fairly border-case scenario that probably rarely comes up, but appears to be another gotcha. So apparently IE 6 for Windows, on occasion, decides it knows better than the web server what format a file is. Instead of using the mimetype supplied by the web server, as all good browsers tend to do, IE performs some heuristics on the file and overrides the mimetype with its own guess. The type text/plain is one such stupid circumstance.

Annoyingly, IE will insist on downloading plaintext files in some cases, instead of rendering it in browser. This usually occurs if a script is attempting to generate a “text/plain” document on the fly, but can also happen under other circumstances if the IE hard-coded heuristics comes up with a different result than the server-proclaimed mimetype.

A client-side workaround for text/plain is possible. You’d need to edit the Windows Registry (oh joy). In HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings, add the DWORD key IsTextPlainHonored and set value to 0x1. This will make IE behave correctly for text/plain mimetypes. This solution comes per the MS Knowledgebase article, “Text/Plain” Content-Type Header Field Is Ignored. There are also some further explanations on how mimetypes are resolved in the MSDN article, on mimetype detection in IE.

Unfortunately, this is not a solution if this behavior comes up in a web-based tool for external use – as every client machine registry will have to be thus modified. This change may also carry security implications (actually, I’m completely guessing here, because I don’t quite see why the IE team decided to “not honor” mimetypes for text/plain…).

The context:

A PHP script in a project I maintain pulls a text file from a remote location, and then prints it to the browser as Content-type: text/plain. A hack to be sure, but simple enough to get the job done. This works out fine in Firefox, etc, but not in Windows IE. IE insists that this is a PHP script file that must be downloaded. Of course, once downloaded, you can fire up Notepad and see that it’s bloody plaintext. Firefox et al will render it in browser as expected.

In this case, the script was only used for internal testing, so I switched all the test machines to honor plaintext mimetypes. A longer term workaround would probably involve porting the output to XML instead.

Fixing FilePlanet’s stupidity on the Mac

Posted on July 22, 2007November 15, 2010 by yiming

Lately I haven’t been able to download files from fileplanet.com via my Mac. It’s inane, because downloads apparently requires an ActiveX control. I’m appalled at the utter stupidity of excluding all non-Windows platform users from your download service, just to set up a download queue. Can’t you put up a Flash control instead? Just as shiny and unusable, but actually compatible with other operating systems.

It gets better. The good news – the designer had some foresight to set up a fallback mechanism, to use plain old HTML queue. The bad news – it simply presents you with a 403 Forbidden when clicked.

As it turns out, I found a post that contained a possible solution. Actually, that post is a bit unnecessarily complicated. Apparently, they’re blocking all browsers without a Windows user agent. On the fallback solution that was supposed to work for all platforms. Argh.

Until Fileplanet addresses this issue (which could be tomorrow. or never), the simplest solution (that worked for me) was to switch my user agent in Firefox (via the aforementioned and highly recommended User Agent Switcher) to a Windows browser (try the default Opera XP user agent). Then, click on the fallback queuing link, and it should kick you into the download page.

Note that the User Agent option in Safari (at least, v2.0.x) Debug menu will not work straight up. Believe me, as a primarily Safari user, I’ve tried hard to make this work. Because Fileplanet pops up a new window when download is selected, and the Debug menu setting only sets it for the active window. As the download window is a pop-up, you do not get a chance to intervene and change the User Agent code before FilePlanet denies you access to it. So for now, Firefox + User Agent Switcher is the solution. If you have a browser (or Safari, in the future) which allows the fake user agent setting to persist across windows spawned from the initial window, that browser would work too.

UPDATE 3/27/2009:
Feedback in the comments section reports that this is still a problem for many users. Appalling. It’s been 2 years.

Information Services and Design symposium, brief retrospect

Posted on March 8, 2007January 13, 2009 by yiming

Participated in the Information Services and Design symposium last week. Gave a short talk on intellectual property rights in context of economic development and the knowledge economy, as well as submitted a small paper that is now part of the UCB iSchool Report Series, 2007-011 (PDF).

In truth, I’m not terribly confident about the paper. The domain was too large to be effectively covered in either a 5000-word paper or a 15-minute talk. There are significant implications in IPR in development that falls more into international political economy. The problem is very multifaceted, and there are a number of IPE perspectives that can be used as a lens on this. I think my conclusions here derive from a realist position that national interest has trumped any pretenses to free market liberal economics in this case, but that would be a fairly gross simplification of the factors at work.

Eh, but it’s a tangible output that’s seen more publicity than anything else I’ve done lately.

CVS, Cygwin, and error code 0xc0000022

Posted on February 23, 2007February 2, 2009 by yiming

In short, if your project crashes at library load time after a round trip through CVS, you might want check your NTFS execute permissions on the DLLs that the project depends on. Also, if your application mysteriously blows up with error code 0xc0000022, you’d do well to make sure that:

all DLLs that your program depends on are valid and locateable.
Check all its DLL dependencies for permission problems. As in, permissions on the DLLs that your program depends on should be set to be executable for your user.

In one of my Windows projects, I wrote some code that relied a number of DLLs. To save myself sometime, I compiled these DLLs and checked in the compiled binaries into the CVS repository.

On another machine, I checked out the project via the cvs utility, under Cygwin, to work on it. As a Unix-y kind of guy, I prefer the tools that I’m used to. Everything compiled fine, but at runtime the application crashes before it gets to main(). ” The application failed to initialize properly (0xc0000022) … ” After some dependency tracking to find out if I lost a DLL somewhere, first via Dependency Walker, then via gflags, nothing unusual turned up.

Then I noticed that replacing the checked out libraries with fresh copies of the same DLLs fixed the issue. The problem was that upon checking md5 sum against the old and new libraries, they were exactly the same. There was no damage or corruption.

Turns out, of course, that Execute permissions were off on all those DLLs that I checked out. Apparently Cygwin’s cvs does not set execute bits on DLL files, and since you’re usually using ntsec settings with Cygwin, this causes a security/permissions problem on the Windows side. As a result, the project compiles just fine, fails at runtime, and gives you a completely obtuse error message that means very little unless you’ve done this sort of thing before. Cygwin and cvs’s role in this was also not a very obvious thing to deduce.

Two hours of my life, right there.

← Newer posts Older posts →