Php curl параллельные запросы

Making Concurrent cURL Requests Using PHP’s curl_multi* Functions

The cURL library proves a valuable resource for developers needing to make use of common URL-based protocols (e.g., HTTP, FTP, etc.) for exchanging data. PHP provides a set of curl* wrapper functions in an extension that nicely integrates cURL’s functionality.

When you have to make multiple requests in a script, it’s often more efficient to utilize the curl_multi* functions (e.g., curl_multi_init), which make it possible to process requests concurrently. For example, if you have to make 2 web requests in a script and each one requires 2 seconds to complete, making 2 separate curl requests, one right after the other, requires 4 seconds. However, if you make use of the curl_multi* functions, the requests will be made concurrently (i.e., we no longer have to wait for one request to finish to start the next one), and requires only 2 seconds (the actual execution time depends on if the scripts are truly running in parallel or merely concurrently.)

Let’s take a look at a function that provides a simple interface to the concurrent capabilities of cURL and is extensible to most situations, as the curl_multi* functions can be cumbersome.

/** * Simple wrapper function for concurrent request processing with PHP's cURL functions (i.e., using curl_multi* functions.) * * @param array $requests Array containing request url, post_data, and settings. * @param array $opts Optional array containing general options for all requests. * @return array Array containing keys from requests array and values of arrays each containing data (response, null if response empty or error), info (curl info, null if error), and error (error string if there was an error, otherwise null). */ function multi(array $requests, array $opts = []) < // create array for curl handles $chs = []; // merge general curl options args with defaults $opts += [CURLOPT_CONNECTTIMEOUT =>3, CURLOPT_TIMEOUT => 3, CURLOPT_RETURNTRANSFER => 1]; // create array for responses $responses = []; // init curl multi handle $mh = curl_multi_init(); // create running flag $running = null; // cycle through requests and set up foreach ($requests as $key => $request) < // init individual curl handle $chs[$key] = curl_init(); // set url curl_setopt($chs[$key], CURLOPT_URL, $request['url']); // check for post data and handle if present if ($request['post_data']) < curl_setopt($chs[$key], CURLOPT_POST, 1); curl_setopt($chs[$key], CURLOPT_POSTFIELDS, $request['post_array']); >// set opts curl_setopt_array($chs[$key], (isset($request['opts']) ? $request['opts'] + $opts : $opts)); curl_multi_add_handle($mh, $chs[$key]); > do < // execute curl requests curl_multi_exec($mh, $running); // block to avoid needless cycling until change in status curl_multi_select($mh); // check flag to see if we're done >while($running > 0); // cycle through requests foreach ($chs as $key => $ch) < // handle error if (curl_errno($ch)) < $responses[$key] = ['data' =>null, 'info' => null, 'error' => curl_error($ch)]; > else < // save successful response $responses[$key] = ['data' =>curl_multi_getcontent($ch), 'info' => curl_getinfo($ch), 'error' => null]; > // close individual handle curl_multi_remove_handle($mh, $ch); > // close multi handle curl_multi_close($mh); // return respones return $responses; >

To use this function, you can call it like so:

$responses = multi([ 'google' => ['url' => 'http://google.com', 'opts' => [CURLOPT_TIMEOUT => 2]], 'msu' => ['url'=> 'http://msu.edu'] ]);

And, then you can cycle through the responses:

foreach ($responses as $response) < if ($response['error']) < // handle error continue; >// check for empty response if ($response['data'] === null) < // examine $response['info'] continue; >// handle data $data = $response['data']; // do something extraordinary >

While the above function is helpful for a few requests, if you need to make a large number of requests (perhaps more than 5), then instead you should have a look at the rolling curl library, which makes better use of resources.

Читайте также:  text-transform

And your significant other said you couldn’t multitask 🙂

Источник

curl_multi_exec

Обрабатывает каждый дескриптор в стеке. Этот метод может быть вызван вне зависимости от необходимости дескриптора читать или записывать данные.

Список параметров

Мультидескриптор cURL, полученный из curl_multi_init() .

Ссылка на флаг, указывающий, идут ли ещё какие-либо действия.

Возвращаемые значения

Замечание:

Здесь возвращаются ошибки, относящиеся только ко всему стеку. Проблемы всё ещё могут произойти на индивидуальных запросах, даже когда эта функция возвращает CURLM_OK .

Список изменений

Примеры

Пример #1 Пример использования curl_multi_exec()

Этот пример создаст два дескриптора cURL, добавит их в набор дескрипторов, а затем запустит их асинхронно.

// создаём оба ресурса cURL
$ch1 = curl_init ();
$ch2 = curl_init ();

// устанавливаем URL и другие соответствующие опции
curl_setopt ( $ch1 , CURLOPT_URL , «http://example.com/» );
curl_setopt ( $ch1 , CURLOPT_HEADER , 0 );
curl_setopt ( $ch2 , CURLOPT_URL , «http://www.php.net/» );
curl_setopt ( $ch2 , CURLOPT_HEADER , 0 );

//создаём набор дескрипторов cURL
$mh = curl_multi_init ();

//добавляем два дескриптора
curl_multi_add_handle ( $mh , $ch1 );
curl_multi_add_handle ( $mh , $ch2 );

//запускаем множественный обработчик
do $status = curl_multi_exec ( $mh , $active );
if ( $active ) // Ждём какое-то время для оживления активности
curl_multi_select ( $mh );
>
> while ( $active && $status == CURLM_OK );

//закрываем дескрипторы
curl_multi_remove_handle ( $mh , $ch1 );
curl_multi_remove_handle ( $mh , $ch2 );
curl_multi_close ( $mh );

Смотрите также

  • curl_multi_init() — Создаёт набор cURL-дескрипторов
  • curl_multi_select() — Ждёт активности на любом curl_multi соединении
  • curl_exec() — Выполняет запрос cURL

User Contributed Notes 17 notes

Solve CPU 100% usage, a more simple and right way:

do curl_multi_exec ( $mh , $running );
curl_multi_select ( $mh );
> while ( $running > 0 );

Probably you also want to be able to download the HTML content into buffers/variables, for parsing the HTML or for other processing in your program.

The example code on this page only outputs everything on the screen, without giving you the possibility to save the downloaded pages in string variables. Because downloading multiple pages is what I wanted to do (not a big surprise, huh? that’s the reason for using multi-page parallel Curl) I was initially baffled, because this page doesn’t give pointers to a guide how to do that.

Fortunately, there’s a way to download content with parallel Curl requests (just like you would do for a single download with the regular curl_exec). You need to use: http://php.net/manual/en/function.curl-multi-getcontent.php

The function curl_multi_getcontent should definitely be mentioned in the «See Also» section of curl_multi_exec. Probably most people who find their way to the docs page of curl_multi_exec, actually want to download the multiple HTML pages (or other content from the multiple parallel Curl connections) into buffers, one page per one buffer.

// Setando opção padrão para todas url e adicionando a fila para processamento
$mh = curl_multi_init();
foreach($url as $key => $value) $ch[$key] = curl_init($value);
curl_setopt($ch[$key], CURLOPT_NOBODY, true);
curl_setopt($ch[$key], CURLOPT_HEADER, true);
curl_setopt($ch[$key], CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch[$key], CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch[$key], CURLOPT_SSL_VERIFYHOST, false);

// Executando consulta
do curl_multi_exec($mh, $running);
curl_multi_select($mh);
> while ($running > 0);

// Obtendo dados de todas as consultas e retirando da fila
foreach(array_keys($ch) as $key) echo curl_getinfo($ch[$key], CURLINFO_HTTP_CODE);
echo curl_getinfo($ch[$key], CURLINFO_EFFECTIVE_URL);
echo «\n»;

Just for people struggling to get this to work, here is my approach.
No infinite loops, no CPU 100%, speed can be tweaked.

$start_time = microtime ( true );
$num_descriptors = curl_multi_select ( $mh , $maxTime );
if( $num_descriptors === — 1 ) usleep ( $umin );
>

$timespan = ( microtime ( true ) — $start_time );
if( $timespan < $umin )usleep ( $umin - $timespan );
>
>

$handles = [
[
CURLOPT_URL => «http://example.com/» ,
CURLOPT_HEADER => false ,
CURLOPT_RETURNTRANSFER => true ,
CURLOPT_FOLLOWLOCATION => false ,
],
[
CURLOPT_URL => «http://www.php.net» ,
CURLOPT_HEADER => false ,
CURLOPT_RETURNTRANSFER => true ,
CURLOPT_FOLLOWLOCATION => false ,

$chandles = [];
foreach( $handles as $opts ) $ch = curl_init ();
curl_setopt_array ( $ch , $opts );
curl_multi_add_handle ( $mh , $ch );
$chandles [] = $ch ;
>

$prevRunning = null ;
do $status = curl_multi_exec_full ( $mh , $running );
if( $running < $prevRunning )while ( $read = curl_multi_info_read ( $mh , $msgs_in_queue ))

$info = curl_getinfo ( $read [ ‘handle’ ]);

if( $read [ ‘result’ ] !== CURLE_OK ) print «Error: » . $info [ ‘url’ ]. PHP_EOL ;
>

if( $read [ ‘result’ ] === CURLE_OK ) /*
if(isset($info[‘redirect_url’]) && trim($info[‘redirect_url’])!==»)

print «running redirect: «.$info[‘redirect_url’].PHP_EOL;
$ch3 = curl_init();
curl_setopt($ch3, CURLOPT_URL, $info[‘redirect_url’]);
curl_setopt($ch3, CURLOPT_HEADER, 0);
curl_setopt($ch3, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch3, CURLOPT_FOLLOWLOCATION, 0);
curl_multi_add_handle($mh,$ch3);
>
*/

print_r ( $info );
//echo curl_multi_getcontent($read[‘handle’]));
>
>
>

if ( $running > 0 ) curl_multi_wait ( $mh );
>

> while ( $running > 0 && $status == CURLM_OK );
foreach( $chandles as $ch ) curl_multi_remove_handle ( $mh , $ch );
>
curl_multi_close ( $mh );
?>

Источник

Оцените статью