How to use curl_multi() without blocking

January 26, 2009

You can find the latest version of this library on Github.

A more efficient implementation of curl_multi()
curl_multi is a great way to process multiple HTTP requests in parallel in PHP. curl_multi is particularly handy when working with large data sets (like fetching thousands of RSS feeds at one time). Unfortunately there is very little documentation on the best way to implement curl_multi. As a result, most of the examples around the web are either inefficient or fail entirely when asked to handle more than a few hundred requests.

The problem is that most implementations of curl_multi wait for each set of requests to complete before processing them. If there are too many requests to process at once, they usually get broken into groups that are then processed one at a time. The problem with this is that each group has to wait for the slowest request to download. In a group of 100 requests, all it takes is one slow one to delay the processing of 99 others. The larger the number of requests you are dealing with, the more noticeable this latency becomes.

The solution is to process each request as soon as it completes. This eliminates the wasted CPU cycles from busy waiting. I also created a queue of cURL requests to allow for maximum throughput. Each time a request is completed, I add a new one from the queue. By dynamically adding and removing links, we keep a constant number of links downloading at all times. This gives us a way to throttle the amount of simultaneous requests we are sending. The result is a faster and more efficient way of processing large quantities of cURL requests in parallel.

Note: I set my max number of parallel requests ($rolling_window) to ~~100~~ 5. Be sure to update this value according to the bandwidth available on your server / servers you are curling. Be nice and read this first.

Updated 3/6/09: Fixed a missing semi-colon. Thanks to Steve Gricci for catching the typo.

Updated 4/2/09: Made some changes to increase reusability. rolling_curl now expects a $callback parameter for a function that will process each response. It also accepts an array called $options that let’s you add custom curl options such as authentication, custom headers, etc

Updated 4/8/09: Fixed a new bug that was introduced with the last update. Thanks to Damian Clement for alerting me to the problem.

Josh Fraser
Entrepreneur, world traveler and rock climber.
Software engineer and co-founder of Din, Torbit and EventVue.
Read more...

Comments

Michael said at 1:26 pm on December 13th, 2011:

Hey Josh,

Is it possible to pass a value (e.g. $row['id']) into RollingCurl so that it's available for use within the callback function?

foreach ($rows as $row) {
// Add each request to the RollingCurl object.
$request = new RollingCurlRequest($row['url']);
$rc->add($request);
}

(Basically, it's the MySQL primary key for each row ($row['id']) that I'm trying to pass and make available within the callback function.)

Thanks.
Josh Fraser said at 2:13 pm on December 13th, 2011:

Sure, an easy way to do this is to add a GET variable to the end of the URL you are fetching (ie. ?mysql_id=42) and then parse out that ID when the request completes from the CURL info array.
Michael said at 4:47 pm on December 16th, 2011:

Ok. I got it implemented using a hash tag to pass the monitor id and then getting this value from the $request ($info['url'] doesn't retain the hash tag on the URL for whatever reason). This way, by using a hash tag, I figure there is no possibility that it'll ever change the URL that is checked. Still it'd be cool, if RollingCurl had a way to pass a value without affecting the URL. But this is working for now. Thanks for sharing RC!
Elad Karako said at 7:37 pm on December 1st, 2015:

just pass it along….

function rolling_curl($urls, $callback, $additional_callback_variables = [], $custom_options = null) {

// make sure the rolling window isn’t greater than the # of urls

$rolling_window = min([5, count($urls)]);

$rolling_window = (sizeof($urls) < $rolling_window) ? sizeof($urls) : $rolling_window;

$master = curl_multi_init();

$curl_arr = array();

// add additional curl options here

$std_options = array(CURLOPT_RETURNTRANSFER => true,

CURLOPT_FOLLOWLOCATION => true,

CURLOPT_MAXREDIRS => 5);

$options = ($custom_options) ? ($std_options + $custom_options) : $std_options;

// start the first batch of requests

for ($i = 0; $i < $rolling_window; $i++) {

$ch = curl_init();

$options[CURLOPT_URL] = $urls[$i];

curl_setopt_array($ch,$options);

curl_multi_add_handle($master, $ch);

}

do {

while(($execrun = curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM);

if($execrun != CURLM_OK)

break;

// a request was just completed — find out which one

while($done = curl_multi_info_read($master)) {

$info = curl_getinfo($done[‘handle’]);

if ($info[‘http_code’] == 200) {

$output = curl_multi_getcontent($done[‘handle’]);

// request successful. process output using the callback function.

$callback($output, $additional_callback_variables);

// start a new request (it’s important to do this before removing the old one)

$ch = curl_init();

$options[CURLOPT_URL] = $urls[$i++]; // increment i

curl_setopt_array($ch,$options);

curl_multi_add_handle($master, $ch);

// remove the curl handle that just completed

curl_multi_remove_handle($master, $done[‘handle’]);

} else {

// request failed. add error handling.

}

}

} while ($running);

curl_multi_close($master);

return true;

}
Bogdan said at 7:20 am on May 11th, 2012:

For the given simple script bellow, how can I use RollingCurl library to make POST request for each of http://www.site_01.com, http://www.site_02.com and http://www.site_03.com using parsed variables catched with "my_request" function from the GET request ? Thank you.

<?PHP
require("RollingCurl.php");

function my_request($response) {

……………
……………

(code used to parse some variables to use later in POST request)

……………
……………

}

$urls = array("http://www.site_01.com",
"http://www.site_02.com",
"http://www.site_03.com");

$rc = new RollingCurl("my_request");

$rc->window_size = 3;

foreach ($urls as $url) {
$request = new RollingCurlRequest($url, "GET");
$rc->add($request);
}

$rc->execute();
?>
Bob said at 6:00 am on June 13th, 2012:

Hi, nice job.
Just some changes for me :
// for your pb of number of $urls must be more than number of rolling_window
$rolling_window = min(array(5,count($urls)));

// this ligne, just after the "for" (start the first batch of requests)
// because if you have 5 windows, you get out from this for with $i == 5 (last $i++)
// then, when you get the next url in the do while, you make another $i++ witch do not take the 5th url !!!
$i–;

thats all for me ! thank you again, this makes me save some hours !!!!
Zeke said at 4:13 pm on November 4th, 2012:

Not sure if I did it correctly, but my problem with the code is with the callback function:

for example:
call_user_func($callback,$urls[$z],$output);

When I called the callback function, the $output does not match the url, since I want to display the link with the output to match each other. What I am getting is the $output will either come before or after the next url…

tried to fix with sleep and curl_multi_select(which suppose to wait for activity on the connect), but can’t fix the problem…
Zeke said at 6:23 pm on November 4th, 2012:

If you trying to return the links like I am, don’t do the stupid way that I am doing >.<!

use the url (from getInfo) parameter instead…
$info['url'].
Zeke said at 6:29 pm on November 4th, 2012:

Forgot to mention use: curl_multi_select($master);
to lower your CPU spike when running…
Saagar said at 12:22 am on February 3rd, 2013:

Hey, i am using curl_mult_exec for processing thousands of URLs. Currently it is breaking down at around 15 to 20k.. plz help me on that… plzzzzzz
pokemon said at 5:09 pm on March 24th, 2014:

This is a very old post, but I just thought I might aswell reply as I think I know what part you don't understand.
When you say "overwrite $ch every time" you must take notice of "curl_multi_add_handle($master, $ch);".
Before $ch is being overwritten, its "data" is being added to $master.
From the opening part of your comment, it seems to me that you are noticing a variable being overwritten and wondering how its old contents still exist for CURL to see, you are correct here, that data is gone.
Viacheslav said at 10:01 am on April 27th, 2014:

You should be careful with while loops. Without appropriate sleep the become insane))

You should insert
usleep(10000);
before
} while ($running);

Spent an hour investigating why my simple scripts are using processor to 100% )
Manu said at 9:24 am on July 3rd, 2014:

Hi, I like the idea, however I was wondering how I know which handle/url made the original call?

You only process the returned data, as it seems to me? However for me it's important to know which made it since I have to write data back into the corresponding record in the database. Any idea on how to achieve this? Can I access the url the handle called?
Jonathan Rodan said at 4:35 pm on August 6th, 2014:

The line
while(($execrun = curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM);

is a CPU killer. You can probably use curl_multi_select instead or suffer the los time of a microsleep().
Kaolin Fire's solution is much better than the line I specified.
ترافل said at 3:15 am on March 1st, 2015:

I like your idea of a rolling window
Paris Nakita Kejser said at 5:20 am on July 7th, 2015:

i have use this method to handle my images downloader from external partners, i use xdebug so if you use this remember to disable xdebug.max_nesting_level i have make a cap on 1milion and make a inner loop to call next curl instenas, i can pase on our company internetline ( not so good ) like 1500-2000 images in a minut.

Thanks a lot for this guide and its helping a lot! 🙂
eliosh said at 10:04 am on August 26th, 2015:

This is my solution for Multi Head Requests:

https://gist.github.com/anonymous/1a9eb381f6a5f260bd20
Oded Arbel said at 6:08 am on June 27th, 2016:

This is a brilliant article and sums very clearly how to use curl_multi_exec() correctly – unlike all the examples on php.net which do a horrible work of that.

I’ve seen many times implementations of curl multi that are based on just a copy&paste of some php.net example code and that completely misses the point of being able to pipe requests through the multi processor. Thanks for writing this!
Artjom Kurapov said at 12:14 am on August 26th, 2016:

Btw, if someone has problems with PHP, you may look into disabling session blocking for the same user
ecatombe said at 10:42 pm on January 2nd, 2017:

Some problems.

1.- if some result is distint from 200 no add more request.

2.- $options[CURLOPT_URL] = $urls[$i++]; When do you check that there is no more elements in array?, o my god.
JohnShroff said at 12:30 am on February 12th, 2018:

Absolutely fantastic and insightful post! I was basing my implementation off of another example and was having problems. When sending 100 simultaneous requests, I was getting back 70-80 right away, then the last 20 or so would stall and eventually come back empty. Batching them like this solved the issue and actually made it much faster overall!

Online Aspect

How to use curl_multi() without blocking

Comments