Archive for August, 2009


Coding for the unexpected

You could write a piece of code, run it a million times, wait ten years, run it again and get exactly the same results.  At least that’s what I used to believe.

One of the things I love about computers is that they are boringly consistent.  Given the same input, a computer will return the same output EVERY SINGLE TIME.  Likewise, code doesn’t change.  I could save a file on my computer and if it weren’t for hardware failure it would remain the same byte-for-byte until the end of time.  Code doesn’t change. It’s just a bunch of mathematical statements bound together by rules of logic that are burned into a tiny computer chip.  Or in geek terminology, code is immutable.

In theory this sounds great.  The problem is it doesn’t mesh with the everyday reality of my life.  My code breaks all the time without me changing a thing.

I remember getting an email from someone complaining that they couldn’t login to an application I had written.  The strange thing about this was that I hadn’t touched that code in years.  The servers were being managed and had been pretty reliable.  How could my application break if no one had broken it?  I SSH’d onto the server and quickly realized my server logs had gotten so large that there wasn’t any room left on the hard drive for new session files to be created.  I deleted the log files and changed my server settings to stop it from happening again.

Since then I’ve had countless experiences where code broke unexpectedly.  The culprits vary.  Sometimes it’s hardware failure.  Sometimes an unchecked log file.  More often, it’s the result of user input that I didn’t anticipate or an integration with an external service that fails.

You would think that we would be getting better at anticipating and preventing these sort of issues from happening.   But from what I can tell, these sort of issues are happening MORE OFTEN these days, not less.  On one hand, we’re getting smarter.  We’ve learned from our mistakes  about truncating server logs and baking in automatic fail-over for hardware issues.  But there’s a bigger trend happening on the web right now that is throwing some huge variables into the equation.  Very few applications stand alone anymore.  Every application now has a million integrations with Twitter, Facebook, Flickr, YouTube… you name it!   And guess what?  Every one of those services throws another kink into the chain, giving us more uncertainty and more points of failure to try and anticipate.

The integrated web is here to stay.  As developers, we need to figure out how we’re going to deal with this new layer of uncertainty in our applications.

 2 comments

My wish list for Facebook

I’m a big fan of Facebook.  It’s one of the few services I use every day and I find it to be invaluable for keeping up with people. One of the results of using any service a lot, is that it makes it really easy to point out the flaws and features that are missing. Here are a few things I’d love to see implemented in Facebook:

I wish Facebook had nested comments on newsfeed items. I often find myself annoyed by the flood of notifications I receive after other people interact with an item I commented on or liked. As a result, I find myself commenting less when I think it’s likely I’m going to be spammed.  Nested comments would solve this problem as I could receive notifications only for direct replies to what I said.

I wish I could untag myself in pictures. This feature doesn’t seem to be working consistently.  I click “remove tag” and Facebook redirects me to my profile page without untagging me.

I wish the vertical spacing on the top header was even in the new design. I’m OCD. Small details like this tend to drive me crazy.

I wish the messaging system worked better. The two main things I want are reply-by-email and my inbox count in facebook to be automatically decremented after I read a notification email. This could easily implemented by adding a simple beacon to the notification emails.

I wish chat worked with XMPP as promised over a year ago.

I really wish Facebook would let me use the API to map Facebook ID’s to a given list of email addresses.

That’s my list.  What did I miss?

 9 comments

Restarting mysql master-master replication

These are the steps for restarting master-master replication where replication has stopped due to a slave IO issue.  You can tell if this solution applies to you by typing “show slave status\G;” at the mysql prompt.  If “Slave_IO_Running: No”, on one or both databases, try this:

on db1:

mysql
STOP SLAVE;
FLUSH TABLES WITH READ LOCK;

on db2:

mysql
STOP SLAVE;
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;

on db1:

# get the values for log_file & log_pos from db2's master status in the previous step
CHANGE MASTER TO MASTER_LOG_FILE ='mysql-bin.000...',MASTER_LOG_POS=123456;
SHOW MASTER STATUS;

on db2:

# get the values for log_file & log_pos from db1's master status in the previous step
CHANGE MASTER TO MASTER_LOG_FILE ='mysql-bin.000...',MASTER_LOG_POS=123456;
START SLAVE;
UNLOCK TABLES;

on db1:

START SLAVE;
UNLOCK TABLES;

I’m posted this mainly for personal reference.  Keep in mind I’m a sysadmin by necessity, not preference.  Attempt this at your own risk.  Don’t expect to see many posts like this from me.

 1 comment

Choosing your audience

For every blog post I write, I pick an audience.

For me, it boils down to two options: the first is to write for my RSS subscribers, my loyal readers, my friends, my community.  The second option is to write for strangers, random people searching Google, the people who stumble upon my blog every day because of something I wrote that matches what they were seeking.

TechCrunch writes for the community. Mashable writes for Google.

I realize that’s a pretty broad generalization, but just look at the data.  The three featured posts on TechCrunch right now are No User Updates?, Location and Comparisons.  They may be catchy titles, but probably not anything that would rank well in Google.  Mashable, in comparison, has an entire How-to series permalinked from their top navigation.  This series contains posts like HOW TO: Retweet on Twitter and HOW TO: Download YouTube Videos to Your Desktop.  Guess what?  Almost every post in Mashable’s How-to section is the top result for its title phrase in Google.  Mashable is obviously milking the Google traffic for all it’s worth.  Meanwhile, TechCrunch doesn’t seem to even consider the SEO implications of their posts.

I think it’s obvious that both strategies can work.  What I find interesting is how hard it is to write for both audiences at the same time.

If you decide to start writing for Google, the first thing you should do is turn your title into something that looks more like a search query.  If you’re really thorough, you will Google the title you want to use and check the PageRank of the existing results to make sure you can dominate that phrase.  It’s amazing how well this simple strategy can work.  Sure, your newly devised titles aren’t as engaging as they used to be, but you’ll start seeing far more traffic from Google.  Best of all, the traffic shows up regardless of whether you create new content or not.  The danger of course is that you’ll start alienating your core audience.  This happens when you give in to the temptation to write broader and broader content.  There are far more beginners than intermediates, so the fastest way to get more traffic is to dumb things down.  After all, the top “how to” result on Google isn’t “how to implement pubsubhubbub” it’s “how to tie a tie”.  Traffic is addictive.  If you’re not careful, you’ll soon find yourself writing at the intermediate level instead of expert.  You’ll slowly lose your original audience which is now getting content it didn’t sign up for, but don’t worry – you’ll more than make up for the pageviews you lose.  Heck, the new audience is much better at clicking on ads anyway.  Everyone has to decide for themselves whether it is worth the trade.

When I write on this blog, I’m usually torn about which audience to pick.  50% of my traffic comes from Google, and my posts are pretty evenly split between the two audiences.  A lot of times when I solve a tricky problem with some code I want to share my solution to help someone out there from wasting hours of their life like I did.  It’s my way of giving back to the countless strangers that have helped me out by documenting their solutions along the way.  Posts written for Google send along a good bit of traffic, but I also enjoy writing posts that will never be found via Google.  The protocols powering the real-time web doesn’t get much traffic from Google, but the discussion that post generated was amazing.  My goal for this blog is simply to share what I am learning, and that holds true regardless of who reads it.

I’d love to hear your thoughts. Is one audience inherently better than the other?  Do you think it’s possible to maintain both?

 2 comments

Splitting names

You can find the latest version of this code on Github. There are libraries for both PHP and JavaScript.

The quest

I’m on a ongoing search to find the best algorithm for splitting a full name into a first name and a last name. I’m sure this sounds like a ridiculously trivial quest — just explode the string on a space, right?

The challenge

But how do you tell the difference between people with double first names like Jo Ann Smith and people with double last names like Jo Von Trapp? What would you do if I gave you a double first name AND and a double last name at the same time?

Did you remember that you might need to parse out prefixes (Mr, Mrs, etc) and suffixes (II, Jr, PhD, etc)?

How do you turn Paul T. S. Williams into Paul Williams while intelligently deducing that that T. James Adams probably wants to go by James Adams, but T. Adams should probably stay as T. Adams?

And how do you straighten out the capitalization? I MIGHT WRITE IN ALL CAPS or all lowercase. Most names have the first letter capitalized and everything else in lowercase, but of course there are exceptions. J.P. likes to have both initials capitalized and Mr. McDonald always gets fussy when you forget to capitalize the D. Oh, and I hope you’re prepared for other anomalies like people with dashes in their name.

As you have probably realized by now, splitting a full name into its proper parts is a little more complicated than it appears on the surface.

I wrote the first version of my name-parsing algorithm two years ago and I’ve been gradually refining it ever since. It’s not perfect, but it’s improved a lot over time. I’m posting this code along with a demo in hopes that it will spur contributions to improve its accuracy even more. Throw the hardest names you know at it and let me know how it performs. I know I’m missing words for the various dictionaries of prefixes, suffixes, and compound name identifiers. Please let me know what I missed.

The algorithm

We start by splitting the full name into separate words. We then do a dictionary lookup on the first and last words to see if they are a common prefix or suffix. Next, we take the middle portion of the string (everything minus the prefix & suffix) and look at everything except the last word of that string. We then loop through each of those words concatenating them together to make up the first name. While we’re doing that, we watch for any indication of a compound last name. It turns out that almost every compound last name starts with 1 of 15 prefixes (Von, Van, Vere, etc). If we see one of those prefixes, we break out of the first name loop and move on to concatenating the last name. We handle the capitalization issue by checking for camel-case before uppercasing the first letter of each word and lowercasing everything else. I wrote special cases for periods and dashes. We also have a couple other special cases, like ignoring words in parentheses all-together.

The code

<?

// split full names into the following parts:
// - prefix / salutation  (Mr., Mrs., etc)
// - given name / first name
// - middle initials
// - surname / last name
// - suffix (II, Phd, Jr, etc)
function split_full_name($full_name) {
    $full_name = trim($full_name);
    // split into words
    $unfiltered_name_parts = explode(" ",$full_name);
    // completely ignore any words in parentheses
    foreach ($unfiltered_name_parts as $word) {
        if ($word{0} != "(")
            $name_parts[] = $word;
    }
    $num_words = sizeof($name_parts);

    // is the first word a title? (Mr. Mrs, etc)
    $salutation = is_salutation($name_parts[0]);
    $suffix = is_suffix($name_parts[sizeof($name_parts)-1]);

    // set the range for the middle part of the name (trim prefixes & suffixes)
    $start = ($salutation) ? 1 : 0;
    $end = ($suffix) ? $num_words-1 : $num_words;

    // concat the first name
    for ($i=$start; $i < $end-1; $i++) {
        $word = $name_parts[$i];
        // move on to parsing the last name if we find an indicator of a compound last name (Von, Van, etc)
        // we use $i != $start to allow for rare cases where an indicator is actually the first name (like "Von Fabella")
        if (is_compound_lname($word) && $i != $start)
            break;
        // is it a middle initial or part of their first name?
        // if we start off with an initial, we'll call it the first name
        if (is_initial($word)) {
            // is the initial the first word?  
            if ($i == $start) {
                // if so, do a look-ahead to see if they go by their middle name
                // for ex: "R. Jason Smith" => "Jason Smith" & "R." is stored as an initial
                // but "R. J. Smith" => "R. Smith" and "J." is stored as an initial
                if (is_initial($name_parts[$i+1]))
                    $fname .= " ".strtoupper($word);
                else
                    $initials .= " ".strtoupper($word);
            // otherwise, just go ahead and save the initial
            } else {
                $initials .= " ".strtoupper($word);
            }
        } else {
            $fname .= " ".fix_case($word);
        }  
    }

    // check that we have more than 1 word in our string
    if ($end-$start > 1) {
        // concat the last name
        for ($i; $i < $end; $i++) {
            $lname .= " ".fix_case($name_parts[$i]);
        }
    } else {
        // otherwise, single word strings are assumed to be first names
        $fname = fix_case($name_parts[$i]);
    }

    // return the various parts in an array
    $name['salutation'] = $salutation;
    $name['fname'] = trim($fname);
    $name['initials'] = trim($initials);
    $name['lname'] = trim($lname);
    $name['suffix'] = $suffix;
    return $name;
}

// detect and format standard salutations
// I'm only considering english honorifics for now & not words like
function is_salutation($word) {
    // ignore periods
    $word = str_replace('.','',strtolower($word));
    // returns normalized values
    if ($word == "mr" || $word == "master" || $word == "mister")
        return "Mr.";
    else if ($word == "mrs")
        return "Mrs.";
    else if ($word == "miss" || $word == "ms")
        return "Ms.";
    else if ($word == "dr")
        return "Dr.";
    else if ($word == "rev")
        return "Rev.";
    else if ($word == "fr")
        return "Fr.";
    else
        return false;
}

//  detect and format common suffixes
function is_suffix($word) {
    // ignore periods
    $word = str_replace('.','',$word);
    // these are some common suffixes - what am I missing?
    $suffix_array = array('I','II','III','IV','V','Senior','Junior','Jr','Sr','PhD','APR','RPh','PE','MD','MA','DMD','CME');
    foreach ($suffix_array as $suffix) {
        if (strtolower($suffix) == strtolower($word))
            return $suffix;
    }
    return false;
}

// detect compound last names like "Von Fange"
function is_compound_lname($word) {
    $word = strtolower($word);
    // these are some common prefixes that identify a compound last names - what am I missing?
    $words = array('vere','von','van','de','del','della','di','da','pietro','vanden','du','st.','st','la','ter');
    return array_search($word,$words);
}

// single letter, possibly followed by a period
function is_initial($word) {
    return ((strlen($word) == 1) || (strlen($word) == 2 && $word{1} == "."));
}

// detect mixed case words like "McDonald"
// returns false if the string is all one case
function is_camel_case($word) {
    if (preg_match("|[A-Z]+|s", $word) && preg_match("|[a-z]+|s", $word))
        return true;
    return false;
}

// ucfirst words split by dashes or periods
// ucfirst all upper/lower strings, but leave camelcase words alone
function fix_case($word) {
    // uppercase words split by dashes, like "Kimura-Fay"
    $word = safe_ucfirst("-",$word);
    // uppercase words split by periods, like "J.P."
    $word = safe_ucfirst(".",$word);
    return $word;
}

// helper function for fix_case
function safe_ucfirst($seperator, $word) {
    // uppercase words split by the seperator (ex. dashes or periods)
    $parts = explode($seperator,$word);
    foreach ($parts as $word) {
        $words[] = (is_camel_case($word)) ? $word : ucfirst(strtolower($word));
    }
    return implode($seperator,$words);
}

?>
 6 comments

Stop flash from covering HTML content

Browse through a few developer forums and you will find lots of people pulling their hair out over flash content covering up their dropdown menus or modal windows.  The problem is especially bad in IE (surprise, surprise). The fix is actually quite simple as long as you can edit the embed code for your flash. The trick is to change the wmode parameter to either “transparent” or “opaque”.  You should do this for both the <embed> and <object> tags like this:

<object width="200" height="300" data="example.swf" type="application/x-shockwave-flash">
    <param name="quality" value="high" />
    <param name="wmode" value="transparent" />
    <param name="src" value="example.swf" />
</object>

The problem is you don’t always have the option of changing the embed code.  What if you’re developing a JavaScript widget and don’t have any control over how the flash is embedded?  The solution is more complicated than you would expect.  It turns out there are a slew of IE bugs around the <object> tag that escalate what should be a simple task into a real headache.  Basically you need to replace each <embed> and <object> tag with a cloned version that has had its wmode parameter fixed. Here’s how you do it:

<script type="text/javascript">

function fix_flash() {
    // loop through every embed tag on the site
    var embeds = document.getElementsByTagName('embed');
    for(i=0; i<embeds.length; i++)  {
        embed = embeds[i];
        var new_embed;
        // everything but Firefox & Konqueror
        if(embed.outerHTML) {
            var html = embed.outerHTML;
            // replace an existing wmode parameter
            if(html.match(/wmode\s*=\s*('|")[a-zA-Z]+('|")/i))
                new_embed = html.replace(/wmode\s*=\s*('|")window('|")/i,"wmode='transparent'");
            // add a new wmode parameter
            else
                new_embed = html.replace(/<embed\s/i,"<embed wmode='transparent' ");
            // replace the old embed object with the fixed version
            embed.insertAdjacentHTML('beforeBegin',new_embed);
            embed.parentNode.removeChild(embed);
        } else {
            // cloneNode is buggy in some versions of Safari & Opera, but works fine in FF
            new_embed = embed.cloneNode(true);
            if(!new_embed.getAttribute('wmode') || new_embed.getAttribute('wmode').toLowerCase()=='window')
                new_embed.setAttribute('wmode','transparent');
            embed.parentNode.replaceChild(new_embed,embed);
        }
    }
    // loop through every object tag on the site
    var objects = document.getElementsByTagName('object');
    for(i=0; i<objects.length; i++) {
        object = objects[i];
        var new_object;
        // object is an IE specific tag so we can use outerHTML here
        if(object.outerHTML) {
            var html = object.outerHTML;
            // replace an existing wmode parameter
            if(html.match(/<param\s+name\s*=\s*('|")wmode('|")\s+value\s*=\s*('|")[a-zA-Z]+('|")\s*\/?\>/i))
                new_object = html.replace(/<param\s+name\s*=\s*('|")wmode('|")\s+value\s*=\s*('|")window('|")\s*\/?\>/i,"<param name='wmode' value='transparent' />");
            // add a new wmode parameter
            else
                new_object = html.replace(/<\/object\>/i,"<param name='wmode' value='transparent' />\n</object>");
            // loop through each of the param tags
            var children = object.childNodes;
            for(j=0; j<children.length; j++) {
                if(children[j].getAttribute('name').match(/flashvars/i)) {
                    new_object = new_object.replace(/<param\s+name\s*=\s*('|")flashvars('|")\s+value\s*=\s*('|")[^'"]*('|")\s*\/?\>/i,"<param name='flashvars' value='"+children[j].getAttribute('value')+"' />");
                }
            }
            // replace the old embed object with the fixed versiony
            object.insertAdjacentHTML('beforeBegin',new_object);
            object.parentNode.removeChild(object);
        }
    }
}

</script>

This solution is adapted from code I found on QIndex. Hopefully this version is a little cleaner and easier for people to find.

Note: There is also a jquery version of this code, complements of José Nobile.
 51 comments

Search plugin for Google Caffeine

I read on Mashable and TechCrunch tonight that Google has released a developer preview of the next version of Google (code name Caffeine).  Wanting to try it out, I did a quick search to see if I could find a search plugin for Firefox.  Finding none, I decided to write one myself.  If you’re using Firefox, you should be able to click here to install my Google Sandbox search plugin.

And yes, I tried to upload my plugin to the Mozilla directory but I keep getting a “Not a valid install package” error.  I don’t have time to trouble shoot this right now.  If anyone can tell me what I’m doing wrong, I would much appreciate it.  I also didn’t have time to figure out Google’s URL to enable auto-completion.  If you happen to know the URL for the auto-complete AJAX request, please let me know in the comments.

 3 comments