Archive for January, 2009

How to use curl_multi() without blocking

You can find the latest version of this library on Github.

A more efficient implementation of curl_multi()
curl_multi is a great way to process multiple HTTP requests in parallel in PHP. curl_multi is particularly handy when working with large data sets (like fetching thousands of RSS feeds at one time). Unfortunately there is very little documentation on the best way to implement curl_multi. As a result, most of the examples around the web are either inefficient or fail entirely when asked to handle more than a few hundred requests.

The problem is that most implementations of curl_multi wait for each set of requests to complete before processing them. If there are too many requests to process at once, they usually get broken into groups that are then processed one at a time. The problem with this is that each group has to wait for the slowest request to download. In a group of 100 requests, all it takes is one slow one to delay the processing of 99 others. The larger the number of requests you are dealing with, the more noticeable this latency becomes.

The solution is to process each request as soon as it completes. This eliminates the wasted CPU cycles from busy waiting. I also created a queue of cURL requests to allow for maximum throughput. Each time a request is completed, I add a new one from the queue. By dynamically adding and removing links, we keep a constant number of links downloading at all times. This gives us a way to throttle the amount of simultaneous requests we are sending. The result is a faster and more efficient way of processing large quantities of cURL requests in parallel.

function rolling_curl($urls, $callback, $custom_options = null) {

    // make sure the rolling window isn't greater than the # of urls
    $rolling_window = 5;
    $rolling_window = (sizeof($urls) < $rolling_window) ? sizeof($urls) : $rolling_window;

    $master = curl_multi_init();
    $curl_arr = array();

    // add additional curl options here
    $std_options = array(CURLOPT_RETURNTRANSFER => true,
    $options = ($custom_options) ? ($std_options + $custom_options) : $std_options;

    // start the first batch of requests
    for ($i = 0; $i < $rolling_window; $i++) {
        $ch = curl_init();
        $options[CURLOPT_URL] = $urls[$i];
        curl_multi_add_handle($master, $ch);

    do {
        while(($execrun = curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM);
        if($execrun != CURLM_OK)
        // a request was just completed -- find out which one
        while($done = curl_multi_info_read($master)) {
            $info = curl_getinfo($done['handle']);
            if ($info['http_code'] == 200)  {
                $output = curl_multi_getcontent($done['handle']);

                // request successful.  process output using the callback function.

                // start a new request (it's important to do this before removing the old one)
                $ch = curl_init();
                $options[CURLOPT_URL] = $urls[$i++];  // increment i
                curl_multi_add_handle($master, $ch);

                // remove the curl handle that just completed
                curl_multi_remove_handle($master, $done['handle']);
            } else {
                // request failed.  add error handling.
    } while ($running);
    return true;

Note: I set my max number of parallel requests ($rolling_window) to 100 5. Be sure to update this value according to the bandwidth available on your server / servers you are curling. Be nice and read this first.

Updated 3/6/09: Fixed a missing semi-colon. Thanks to Steve Gricci for catching the typo.

Updated 4/2/09: Made some changes to increase reusability. rolling_curl now expects a $callback parameter for a function that will process each response. It also accepts an array called $options that let’s you add custom curl options such as authentication, custom headers, etc

Updated 4/8/09: Fixed a new bug that was introduced with the last update. Thanks to Damian Clement for alerting me to the problem.


3 web design firms I regularly recommend

I’ve spent the last 10 years of my life writing code for money. I sold my first website when I was 14, paid my way through college doing contract work and haven’t stopped coding since. When I started EventVue, I made the tough decision to not take on any contract work. I understood that for EventVue to be a success it would take my full and undivided attention (along with a lot of hard work / luck). Apparently it’s taking a while for the word to get out, because I still get regular emails asking me to design a website or code up some widget. My response is always the same. “I don’t do that anymore, but here are 3 companies I recommend. Let me know if you want me to make an introduction.”

  • OrangeCoat (Greenville, SC)
    I’ve known the OrangeCoat guys for a while now. I’ve written code for them. They’ve done design work for me. I love working with these guys.
  • Engenius Development (Greenville, SC)
    Chase did some contract work for me while we were both at Clemson. He’s since graduated and launched out on his own. He’s a great guy and you won’t find anyone who will work harder to make your website a success.
  • Slice of Lime (Boulder, CO)
    I’ve not yet had a chance to work with them personally, but I’ve admired a lot of the work they’ve done for other companies around Boulder. They make good stuff.