Leeeeeeeroy Jenkins!
Loosely Typed in Ohio

Software Leopard Upgrade Pain

So, this morning, my shiny new copy of Mac OS X 10.5 Leopard came out. And holy crap is it awesome! When it’s up, that is.

Kids, do yourself a favor: if you’re installing Leopard, backup your data and do a clean install, and then copy the stuff you want back. This morning I attempted to do an Upgrade from Tiger to Leopard, and while it worked, there’s a lot of really creepy issues. For example, sometimes it takes between 1 – 5 minutes between “I type in my login and press enter” and “login window goes away.” Not even to a useable desktop — just for the login window to accept my login.

Also, Finder started boning itself hard. Like, I’d try to mount a network drive and it would just take a massive dive. No “Finder is not responding”, and no “Hey, this network operation is taking a while.” Just Magic Beachball of Doom forever. Once, the Apple menu decided to not open, which means Finder’s borked. Instead of the system recognizing that Finder’s in a bad state, it recognized the active application as the culprit. So I force-restarted Finder, and it wouldn’t restart until I cascaded down and force-quit every running application.

So, yeah, I’m not too well pleased with the upgrade. I’m convinced that the upgrade is the problem though, so I went out and picked up a 250GB external drive, partitioned it into two halves, and am now making a bootable clone of my drive onto one of the backups. (Because I finally somehow made the machine boot right. I tried like seven times. I don’t know which magic keys made it work.)

BSOD Icon in Leopard I’ll do a writeup of the awesome bits of Leopard once I do my clean install. I’m sort of afraid to make it start doing any real work until my backup finishes, because I don’t want to wedge it. Also, I think I came down with a virus. (One of the biological ones, not related to computers. Strange, eh?) For now, please enjoy the awesome icon that made it from the Leopard beta!

Software Designing tag-based search systems

wtfTagging that works

A large project we launched recently has me thinking a lot about search. Search on most commercial websites has historically been a kind of cruel joke on users-both because search is usually an afterthought, implemented as a rudimentary full-text database query, and because web search is hard.

In general, doing good web search requires machine learning (see Google), or careful “results tuning” on the part of site designers. The machine learning approach only works for huge budgets and huge user-bases, so for MemberHealth, we used Lucene complemented by semantic analysis (woot!)-which we implemented through a combination of page-specific “tags,” together with domain-specific keywords (for example, the site is for a major pharmaceutical insurance company, so we check a global list of prescription drugs for matches to search queries, and return “drug search application” results if the query matches).

Both of these efforts result in a kind of uber tagging system where results are served according to carefully designed matches made courtesy of painstaking content editing and a $100K text file.

Tagging that doesn’t

So, I’ve been using Delicious for some time now, and despite its annoying URI, I like how quick it is to post bookmarks. The problem with Delicious though is that it doesn’t work. Well, it works for sharing links with friends (see popular), but it doesn’t work for classification of documents for later retrieval. In fact, reliance upon a personal bookmark cache is sort of like using a Ground Hog Day version of of Excite, which searches only meta name=”keywords” where content=”Web20, ToRead, Apple, NataliePortman”.

It could be that I’m an idiot, but I think the problem more likely stems from the fact that classifying documents is subjective, emotional, and too much work to do carefully when all I’m trying to do is read Slashdot. Moreover, making links useful is more than just classification-I need to anticipate how I will think about them (so that I can retrieve them) in the future-two weeks ago I was thinking about The Office as being “hilarious,” “can’t miss,” and “smart” and now I’m thinking about it as being “boring,” “go out instead,” and “insipid.”

In my Delicious account, I have 374 links and 218 distinct tags, for roughly 1.7 links per tag-but the median is much fewer, closer to 1 link per tag. Of course, links can have more than 1 tag–the ratio there is also about 1:1. So, we have essentially a simple directory-based filesystem. But with one file per directory. And with search that only operates upon directory names. Kind of like Windows Explorer.

Wrapping up

Users can’t self-classify documents. We’ve known that forever. Classification is particularly difficult when tag-based systems resemble directory-based systems where each document has one-and-only one tag. Designing systems that don’t rely upon smarter semantic search, trained by site designers is nearly always a mistake.

Here are some things to re-consider when you design search for your web application:

  • Do I have only one search box on the site? Presenting multiple search boxes is confusing to users who don’t understand your implementation model. Make sure search looks in all data sources on the site, not just your bodyText database column-you need to search vertical applications and .pdf documents from one single text input. Remember that users don’t think about your application architecture–they see one search box and will use it to find everything on your site.
  • Does the language my users speak match the way my site is written? Consider adding page-tags to full-text search to make sure visitors find the right pages. In particular, think about layman-izing jargon and acronyms.
  • Are you doing the work of classifying search results or are your users? Asking users to bookmark, tag, save, navigate to, or otherwise go out of their way to find content is just wrong. If you are being passive, the ordering of your results is a way to help your users find their content.
  • Lastly, don’t forget about Google. Even if your site isn’t SEO’d, make sure you know what the page summaries look like in Google. In particular, look at page titles and summaries for common searches like: foo site:mydomain.com

phpSprockets Intro to phpSprockets

On one of my recent days off, I decided to figure out how these screencast things work. To that end, I threw together a quick Introduction to phpSprockets. It’s obviously my first screencast, so be gentle. You’re watching about Take 45, if it matters.

If you want, just download the video.

Like what you see? Head over to our Google Code page to download the latest release, read more tutorials, and provide feedback!

Also, I’m certain our resident Graphic Artists wouldn’t mind throwing together some awesome “Made with phpSprockets” buttons, so expect to see those and an awesome download link within the next few days.

Software Real Development!!

I’m ashamed to admit that sometimes, I hit the programming reddit and comment on stories. I try to keep my comments relevant, on-topic, and related to the discussion, in that order.

Someone — who is not important, nor is the discussion — called me out and told me I wasn’t doing “real” development because I developed on a laptop, and any “real” development must take place on a desktop machine that has the power to handle thousands of source files and the debugger to interactively prod the application and so on.

I’d like to take this moment to remind everyone that our biggest client won the 2007 Inc. 500, so yeah; maybe I’m not developing simulations of the human brain, maybe I’m not developing “enterprise-level code” (and thank all that is good for that blessing), but my clients are reaping the benefits of my work, and that’s all that matters.

That, and that I get to swim in a vault of money Scrooge McDuck-style by the time I’m 30.

I’d have thought that the whole “real development” thing would have popped with the first bubble back in the ’90s. Seriously, we develop software! Some people develop software that requires the state-of-the-art, and some people develop equally awesome software that does not.

My software should not be judged by how many source files or lines of code I have. It should not be judged by how much it costs or how much I got paid to make it. It may even matter how optimized or clean my code is. My software should be judged on the value it brings to people. All other things come after that.

Networking/Systems Setting up Bandwidth.com IP trunks on Trixbox

About a month and a half ago I launched a fully-functional IP phone system here at Innova. It’s a TrixboxCE, which is an open-source (although corporate owned/backed, see Trixbox Pro and Fonality) package that utilizes Asterisk. Basically, it’s a preloaded version of CentOS with Asterisk and a few GUI’s that let you configure and manage the system without much command line. I know, I know, you’re already saying “A GUI? You must be a Windows user!” – I am, but that’s beside the point. Truth told I love command line, but have you ever set up something as big as a PBX with a command line? If you answered yes, click here, because you aren’t going to enjoy much more of this.

Continue Reading…

Close
E-mail It
Socialized through Gregarious 42