Managing code releases

by Josh Fraser on March 25, 2009


Recently I decided to streamline my code release process. I use subversion for my source control which means I push code live by running svn up on each of our production servers. I’m lazy, so I wanted an easier way to do this all at once. The end result is a simple shell script that lets me run svn update commands on multiple servers at once. It shows me the status of svn on each server and gives me chance to confirm that everything is okay before going ahead with the launch.

This example assumes you have two servers (app1 and app2) that are using public key authentication. Obviously, you’ll need to modify this script to work in your own environment. Make sure you replace “/var/www/” with your own document root and change appX.yourdomain.com to the IP address of each production server.

#!/bin/sh

# connect to each server and echo their current status
echo "Connecting to app1...\n"
ssh app1.yourdomain.com 'cd /var/www/; svn status --show-updates; exit'
echo "\nConnecting to app2...\n"
ssh app2.yourdomain.com 'cd /var/www/; svn status --show-updates; exit'
# add additional servers here as needed
tput smso
# confirm the release before publishing
echo "\nDo you want to publish these changes to production? (y/n)\n"
tput rmso
read answer
if [ $answer == "y" ]; then
  # if "y", proceed with the release
  echo "\nPublishing to production..."
  echo "\nPublishing to app1..."
  ssh app1.yourdomain.com 'cd /var/www/; svn up; exit'
  echo "\nPublishing to app2..."
  ssh app2.yourdomain.com 'cd /var/www/; svn up; exit'
  # add additional servers here as needed
 echo "\nDone"
else
  # if "n", cancel the release.
  echo "\nCanceled"
  exit;
fi

Too many DNS lookups in an SPF record

by Josh Fraser on March 20, 2009


I recently noticed I was having new email deliverability issues. It surprised me since things had been going well since switching to AuthSMTP for our outgoing mail. The first thing I checked was my SPF record. It looked like this:

v=spf1 a mx include:aspmx.googlemail.com include:authsmtp.com include:salesforce.com -all

At first glance everything seems okay. Basically it says to include all A records, MX records, and to include the SPF records provided by Google Apps, AuthSMTP and Salesforce. Since that covers every legitimate sender, I finish it off with the -all which indicates a hard fail. Ok, so the syntax is good. You can’t tell that anything is wrong without digging a little deeper. When you actually try to evaluate it you’ll get this error message:

Results - PermError SPF Permanent Error: Too many DNS lookups

After a little research I found out that you are only allowed 10 DNS lookups and fetching the TXT and SPF records count toward that total. That means after you add in the A and MX lookups, we’re at 7 before we even look inside the includes. Let’s pull up the SPF record for Google Apps:

v=spf1 redirect=_spf.google.com

That redirect counts as another DNS lookup. That puts me up to 8 DNS lookups. Thankfully the Salesforce SPF record is nice and clean:

v=spf1 ip4:204.14.232.0/25 ip4:204.14.234.0/25 ip4:63.150.46.16 ip4:207.126.144.0/20 ip4:64.18.0.0/20 mx ~all

That leaves AuthSMTP:

v=spf1 include:spf-a.authsmtp.com include:spf-b.authsmtp.com include:spf-c.authsmtp.com include:spf-d.authsmtp.com ~all

Ouch! That’s 4 more lookups and the worst part of it is that spf-d.authsmtp.com doesn’t even do anything!

The first thing I did was take out the MX lookup since it’s redundant. I also replaced aspmx.googlemail.com with _spf.google.com which is what it redirects to anyway. Technically, this isn’t a good idea since Google could change it on me — but remember I don’t have a lot of options here. I’m just happy to see my revised record pass the test:

v=spf1 a include:_spf.google.com include:authsmtp.com include:salesforce.com -all

I also sent an email to the AuthSMTP team. They responded within 30 minutes saying that they would remove the extra DNS record and look at how they can clean things up.

I learned something tonight. Remember to count the DNS lookups in your SPF record. It turns out they can add up faster than points on a teenagers drivers license. And if you’re using a lot of includes like I am, remember to do periodic checks to make sure nothing has changed.

Resources:

  • I wrote about Sending email through Gmail over a year ago. While I absolutely don’t recommend you try this anymore, it has some useful information on SPF records and email deliverability in general.
  • Kitterman have a great tool to help validate your SPF records.

Doing cool stuff with Flash

by Josh Fraser on March 8, 2009


A few months ago I encouraged my friend Kevin Musselman to start blogging. He had been doing a lot of interesting stuff with Flash and I told him should be writing it down and sharing it with the community. He took my advice and has some interesting posts up including:

If that’s the sort of stuff that interests you, head over there and subscribe. I’m looking forward to seeing what else he’s got up his sleeve.

Permanent links to profile pictures on twitter

by Josh Fraser on February 27, 2009


Twitter currently does not offer permanent links for their users profile pictures. This means that if you do any caching of twitter profile pictures, you stand a good chance of the image being gone by the time you try to display it. We’ve been running into this problem a lot recently at EventVue, but until recently I hadn’t taken the time to try and fix it. Thankfully, someone else solved the problem for me.

Last week Pete Warden wrote about a project that Shannon Whitley started that provides a simple solution for the roaming profile picture. The SPIURL project is a small python script that is designed to run on Google App Engine. It caches the profile URL’s, but checks that the profile image still exists before returning the picture. The end result is a static URL that can be used to retrieve any profile picture from twitter. For example, http://purl.org/net/spiurl/joshfraz returns my profile picture even if I upload a new one to twitter.

It’s a great script as it is, but I made a few modifications of my own. The main thing I added was the ability to specify which size of picture you want — either the 48×48 thumbnail or the original. I also added a content-type header to make it easier to view the picture in a browser. You can download my modified version if you’d like.

Thanks Shannon and Pete for sharing! I hope this helps someone else as much as it helped me.

Update 3/13/09: Pete discovered that you may need to add authentication to stop from bumping into the rate limits. You can keep up with the latest updates to this project over at Google Code.

How to use curl_multi() without blocking

by Josh Fraser on January 26, 2009


A more efficient implementation of curl_multi()
curl_multi is a great way to process multiple HTTP requests in parallel in PHP. curl_multi is particularly handy when working with large data sets (like fetching thousands of RSS feeds at one time). Unfortunately there is very little documentation on the best way to implement curl_multi. As a result, most of the examples around the web are either inefficient or fail entirely when asked to handle more than a few hundred requests.

The problem is that most implementations of curl_multi wait for each set of requests to complete before processing them. If there are too many requests to process at once, they usually get broken into groups that are then processed one at a time. The problem with this is that each group has to wait for the slowest request to download. In a group of 100 requests, all it takes is one slow one to delay the processing of 99 others. The larger the number of requests you are dealing with, the more noticeable this latency becomes.

The solution is to process each request as soon as it completes. This eliminates the wasted CPU cycles from busy waiting. I also created a queue of cURL requests to allow for maximum throughput. Each time a request is completed, I add a new one from the queue. By dynamically adding and removing links, we keep a constant number of links downloading at all times. This gives us a way to throttle the amount of simultaneous requests we are sending. The result is a faster and more efficient way of processing large quantities of cURL requests in parallel.

function rolling_curl($urls, $callback, $rolling_window = 100) {

    // make sure the rolling window isn't greater than the # of urls
    $rolling_window = (sizeof($urls) < $rolling_window) ? sizeof($urls) : $rolling_window;

    $master = curl_multi_init();
    $curl_arr = array();

    // add additional curl options here
    $options = array(CURLOPT_RETURNTRANSFER => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_MAXREDIRS => 5);

    // start the first batch of requests
    for ($i = 0; $i < $rolling_window; $i++) {
        $ch = curl_init();
        $options[CURLOPT_URL] = $urls[$i];
        curl_setopt_array($ch,$options);
        curl_multi_add_handle($master, $ch);
    }

    do {
        while(($execrun = curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM);
            if($execrun != CURLM_OK)
                break;
            // a request was just completed -- find out which one
            while($done = curl_multi_info_read($master)) {

            $info = curl_getinfo($done['handle']);
            if ($info['http_code'] == 200)  {
                $output = curl_multi_getcontent($done['handle']);

                // request successful.  process output using the specified callback function.
                $callback($output);
   
                // start a new request (it's important to do this before removing the old one)
                $ch = curl_init();
                $options[CURLOPT_URL] = $urls[$i++];  // increment i
                curl_setopt_array($ch,$options);
                curl_multi_add_handle($master, $ch);
   
                // remove the curl handle that just completed
                curl_multi_remove_handle($master, $done['handle']);
            } else {

                // request failed.  add error handling.

            }
        }
    } while ($running);
    curl_multi_close($master);
}

Note: I set my max number of parallel requests ($rolling_window) to 100. Be sure to update this value according to the bandwidth available on your specific server.

Updated 3/6/09: Fixed a missing semi-colon. Thanks to Steve Gricci for catching the typo.

Updated 4/2/09: Improved code reusability by adding a $callback parameter for processing successful requests.

3 web design firms I regularly recommend

by Josh Fraser on January 24, 2009


I’ve spent the last 10 years of my life writing code for money. I sold my first website when I was 14, paid my way through college doing contract work and haven’t stopped coding since. When I started EventVue, I made the tough decision to not take on any contract work. I understood that for EventVue to be a success it would take my full and undivided attention (along with a lot of hard work / luck). Apparently it’s taking a while for the word to get out, because I still get regular emails asking me to design a website or code up some widget. My response is always the same. “I don’t do that anymore, but here are 3 companies I recommend. Let me know if you want me to make an introduction.”

  • OrangeCoat (Greenville, SC)
    I’ve known the OrangeCoat guys for a while now. I’ve written code for them. They’ve done design work for me. I love working with these guys.
  • Engenius Development (Greenville, SC)
    Chase did some contract work for me while we were both at Clemson. He’s since graduated and launched out on his own. He’s a great guy and you won’t find anyone who will work harder to make your website a success.
  • Slice of Lime (Boulder, CO)
    I’ve not yet had a chance to work with them personally, but I’ve admired a lot of the work they’ve done for other companies around Boulder. They make good stuff.

How to start MAMP on port 80 without a password

by Josh Fraser on December 2, 2008


I’m a big fan of MAMP. It’s the fastest way for anyone to get set up with a local PHP/MySQL development environment on a mac. One of the small annoyances with MAMP is that it requires you to enter your password all the time if you want to run it on port 80 (which I do). To be fair, it’s got more to do with UNIX security than MAMP… but it’s still bloody annoying!

I tried Steve Stringer’s technique of using launch daemons, but it just couldn’t get it to work for me.

The trick to getting MAMP to start behind the scenes is knowing that all that pretty GUI does is call a couple shell scripts. Specifically, those scripts are /Applications/MAMP/bin/startApache.sh and /Applications/MAMP/bin/startMysql.sh (assuming you installed MAMP at the default location).

The second thing you should know is that startApache.sh must be run as root, but startMysql.sh must be run as the current user. I created a new shell script to call those scripts appropriately:

sudo /Applications/MAMP/bin/startApache.sh
/Applications/MAMP/bin/startMysql.sh
exit 0

I then added added an exception for that script to my sudoers file so I didn’t need to enter a password when I used sudo. The easiest way to add this exception is to use the ‘visudo’ command as root.

Finally, I used Automator to wrap the whole thing up as an application I could add to my dock. It works! One less daily annoyance in my life!

Hello world. It’s been a while.

by Josh Fraser on October 15, 2008


As you may have noticed, I’ve been spending less and less time blogging these days. On the other hand, I’m still pretty active on twitter. Since I have a lot more readers here than I have followers on twitter, I thought it might be a good time to remind you about all the witty, insightful comments you are missing if you aren’t following joshfraz on twitter. I’m cranking on some really important milestones right now and blogging tends to be the first thing to go when I get busy. Thanks for understanding. I’ll be back soon.

How to detect the RSS feed for a blog

by Josh Fraser on June 28, 2008


Every wondered how to automatically figure out the RSS feed for a blog?

Generally speaking, it’s a simple task — just download the HTML for the given blog and use a fancy regular expression to find the associated RSS feed. In PHP, it looks something like this:

$bloghtml = file_get_contents($blogurl);
preg_match('/<link.*types*=s*["']*application/rss+xml["']*.*hrefs*=s*["']?([^'" >]+)['" >]/i', $bloghtml, $match);
$rssurl = $match[1];

The main problem with this approach is that some blogs take a long time to load — and that often translates to your application being slow as well. On top of that, it’s frustrating to have to download and process an entire page of HTML just to extract one URL.

Recently Google came out with a better solution in the form of their AJAX Feed API. Using their API, detecting feeds is now easier, faster and more reliable:

$lookup_url = "http://ajax.googleapis.com/ajax/services/feed/lookup?v=1.0&q=".urlencode($blogurl);
$result = curl($lookup_url);

I’ve been using this API for about a month now and have really appreciated the improvements. If you need to detect feeds, give it a try. I think you’ll like it.

5 things I wish someone had told me

by Josh Fraser on April 18, 2008


Root-relative navigation is the way to go.
There are several different ways to organize your navigation system. Some people use relative URL’s (../index.php), others use absolute URLS (http://www.onlineaspect.com/index.php). I’ve found that using root-relative navigation (/index.php) works best for me. It makes it easy to maintain multiple development environments without having to store a path variable and you don’t get into messy situations when you try to include templates from within nested folders.

User generated content should always be stored in your database as UTF-8.
Otherwise you’ll wake up one day and you’ll have people from 92 different countries using your application and you’ll have funky characters all over the place.

One day memcached is going to be your best friend. Start preparing now.
You probably don’t need to implement memcached for your early prototype, but that doesn’t mean you shouldn’t be already thinking about it. Understanding how memcached works now will help you design an architecture that will maximize the benefits of it later.

Don’t ever send an email right now if you can delay it an hour.

The biggest 2 mistakes you could make as web startup are 1) deleting data that shouldn’t be deleted and 2) sending an email that shouldn’t be sent. You can protect your company and your reputation by backing up your data and buffering every email that you’re likely to mess up. Does that newsletter email really need to be sent right now? What if you made a mistake? Remember, there’s no undo button on email. Why accelerate the “oh crap!” moment? Sure, you will make mistakes, but sometimes a simple delay can prevent the whole world from knowing about it.

What you learned in database class isn’t necessarily best for your application.
Relational databases are great, especially in theory. Just be prepared to trade in your favorite normal form when you need to achieve speed and scalability. In particular, avoid joins and cache any counts that you find yourself using on a frequent basis.