Php check request uri

How can I check if a URL exists via PHP?

Note: Several servers do not send any headers back (empty header), so answers who rely on headers are not guaranteed to work. The site can still exist.

22 Answers 22

$file = 'http://www.example.com/somefile.jpg'; $file_headers = @get_headers($file); if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') < $exists = false; >else

From here and right below the above post, there’s a curl solution:

some websites have a different $file_headers[0] on error page. for example, youtube.com. its error page having that value as HTTP/1.0 404 Not Found (difference is 1.0 and 1.1). what to do then?

@alexandru.topliceanu The «Not Found» text status is optional; developers can put whatever they want in there, it’s still valid.

When figuring out if an url exists from php there are a few things to pay attention to:

  • Is the url itself valid (a string, not empty, good syntax), this is quick to check server side.
  • Waiting for a response might take time and block code execution.
  • Not all headers returned by get_headers() are well formed.
  • Use curl (if you can).
  • Prevent fetching the entire body/content, but only request the headers.
  • Consider redirecting urls:
  • Do you want the first code returned?
  • Or follow all redirects and return the last code?
  • You might end up with a 200, but it could redirect using meta tags or javascript. Figuring out what happens after is tough.

Keep in mind that whatever method you use, it takes time to wait for a response.
All code might (and probably will) halt untill you either know the result or the requests have timed out.

Читайте также:  Set mail function in php ini

For example: the code below could take a LONG time to display the page if the urls are invalid or unreachable:

$url) < // this could potentially take 0-30 seconds each // (more or less depending on connection, target site, timeout settings. ) if( ! isValidUrl($url) )< unset($urls[$k]); >> echo "yay all done! now show my site"; foreach($urls as $url)< echo "\"> 
"; >

The functions below could be helpfull, you probably want to modify them to suit your needs:

 function isValidUrl($url) < // first do some quick sanity checks: if(!$url || !is_string($url))< return false; >// quick check url is roughly a valid http request: ( http://blah/. ) if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:5+)?(\/.*)?$/i', $url) ) < return false; >// the next bit could be slow: if(getHttpResponseCode_using_curl($url) != 200) < // if(getHttpResponseCode_using_getheaders($url) != 200)< // use this one if you cant use curl return false; >// all good! return true; > function getHttpResponseCode_using_curl($url, $followredirects = true) < // returns int responsecode, or false (if url does not exist or connection timeout occurs) // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings)) // if $followredirects == false: return the FIRST known httpcode (ignore redirects) // if $followredirects == true : return the LAST known httpcode (when redirected) if(! $url || ! is_string($url))< return false; >$ch = @curl_init($url); if($ch === false) < return false; >@curl_setopt($ch, CURLOPT_HEADER ,true); // we want headers @curl_setopt($ch, CURLOPT_NOBODY ,true); // dont need body @curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true); // catch output (do NOT print!) if($followredirects)< @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true); @curl_setopt($ch, CURLOPT_MAXREDIRS ,10); // fairly random number, but could prevent unwanted endless redirects with followlocation=true >else < @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false); >// @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5); // fairly random number (seconds). but could prevent waiting forever to get a result // @curl_setopt($ch, CURLOPT_TIMEOUT ,6); // fairly random number (seconds). but could prevent waiting forever to get a result // @curl_setopt($ch, CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"); // pretend we're a regular browser @curl_exec($ch); if(@curl_errno($ch)) < // should be 0 @curl_close($ch); return false; >$code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int @curl_close($ch); return $code; > function getHttpResponseCode_using_getheaders($url, $followredirects = true) < // returns string responsecode, or false if no responsecode found in headers (or url does not exist) // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings)) // if $followredirects == false: return the FIRST known httpcode (ignore redirects) // if $followredirects == true : return the LAST known httpcode (when redirected) if(! $url || ! is_string($url))< return false; >$headers = @get_headers($url); if($headers && is_array($headers)) < if($followredirects)< // we want the last errorcode, reverse array so we start at the end: $headers = array_reverse($headers); >foreach($headers as $hline) < // search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc. // note that the exact syntax/version/output differs, so there is some string magic involved here if(preg_match('/^HTTP\/\S+\s+(668)\s+.*/', $hline, $matches) )> // no HTTP/xxx found in headers: return false; > // no headers : return false; > 

if someone has the same problem, check dns-nameservers.. use opendns with no followredirects stackoverflow.com/a/11072947/1829460

+1 for being the only answer to deal with redirects. Changed the return $code to if($code == 200) return false; to sort out only successes

@PKHunter : No. My quick preg_match regex was a simple example and will not match all the urls listed in there. See this test url: regex101.com/r/EpyDDc/2 If you want a better one, replace it with the one listed on your link ( mathiasbynens.be/demo/url-regex ) from diegoperini ; it seems to match all of them, see this testlink: regex101.com/r/qMQp23/1

Finding a lot of valid URLs are returning an CURL error 60 on exec. «SSL certificate problem: unable to get local issuer certificate»

$headers = @get_headers($this->_value); if(strpos($headers[0],'200')===false)return false; 

so anytime you contact a website and get something else than 200 ok it will work

Above on one line: return strpos(@get_headers($url)[0],’200′) === false ? false : true . Might be useful.

$this is in PHP is a reference to the current object. Reference: php.net/manual/en/language.oop5.basic.php Primer: phpro.org/tutorials/Object-Oriented-Programming-with-PHP.html Most likely the code snippet was taken from a class and not fixed accordingly.

you cannot use curl in certain servers u can use this code

/** * @param $url * @param array $options * @return string * @throws Exception */ function checkURL($url, array $options = array()) < if (empty($url)) < throw new Exception('URL is empty'); >// list of HTTP status codes $httpStatusCodes = array( 100 => 'Continue', 101 => 'Switching Protocols', 102 => 'Processing', 200 => 'OK', 201 => 'Created', 202 => 'Accepted', 203 => 'Non-Authoritative Information', 204 => 'No Content', 205 => 'Reset Content', 206 => 'Partial Content', 207 => 'Multi-Status', 208 => 'Already Reported', 226 => 'IM Used', 300 => 'Multiple Choices', 301 => 'Moved Permanently', 302 => 'Found', 303 => 'See Other', 304 => 'Not Modified', 305 => 'Use Proxy', 306 => 'Switch Proxy', 307 => 'Temporary Redirect', 308 => 'Permanent Redirect', 400 => 'Bad Request', 401 => 'Unauthorized', 402 => 'Payment Required', 403 => 'Forbidden', 404 => 'Not Found', 405 => 'Method Not Allowed', 406 => 'Not Acceptable', 407 => 'Proxy Authentication Required', 408 => 'Request Timeout', 409 => 'Conflict', 410 => 'Gone', 411 => 'Length Required', 412 => 'Precondition Failed', 413 => 'Payload Too Large', 414 => 'Request-URI Too Long', 415 => 'Unsupported Media Type', 416 => 'Requested Range Not Satisfiable', 417 => 'Expectation Failed', 418 => 'I\'m a teapot', 422 => 'Unprocessable Entity', 423 => 'Locked', 424 => 'Failed Dependency', 425 => 'Unordered Collection', 426 => 'Upgrade Required', 428 => 'Precondition Required', 429 => 'Too Many Requests', 431 => 'Request Header Fields Too Large', 449 => 'Retry With', 450 => 'Blocked by Windows Parental Controls', 500 => 'Internal Server Error', 501 => 'Not Implemented', 502 => 'Bad Gateway', 503 => 'Service Unavailable', 504 => 'Gateway Timeout', 505 => 'HTTP Version Not Supported', 506 => 'Variant Also Negotiates', 507 => 'Insufficient Storage', 508 => 'Loop Detected', 509 => 'Bandwidth Limit Exceeded', 510 => 'Not Extended', 511 => 'Network Authentication Required', 599 => 'Network Connect Timeout Error' ); $ch = curl_init($url); curl_setopt($ch, CURLOPT_NOBODY, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); if (isset($options['timeout'])) < $timeout = (int) $options['timeout']; curl_setopt($ch, CURLOPT_TIMEOUT, $timeout); >curl_exec($ch); $returnedStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE); curl_close($ch); if (array_key_exists($returnedStatusCode, $httpStatusCodes)) < return "URL: '' - Error code: - Definition: "; > else < return "'' does not exist"; > > 
$url = 'http://google.com'; $not_url = 'stp://google.com'; if (@file_get_contents($url)): echo "Found '$url'!"; else: echo "Can't find '$url'."; endif; if (@file_get_contents($not_url)): echo "Found '$not_url!"; else: echo "Can't find '$not_url'."; endif; // Found 'http://google.com'!Can't find 'stp://google.com'. 
function URLIsValid($URL) < $exists = true; $file_headers = @get_headers($URL); $InvalidHeaders = array('404', '403', '500'); foreach($InvalidHeaders as $HeaderVal) < if(strstr($file_headers[0], $HeaderVal)) < $exists = false; break; >> return $exists; > 

The php manual advises against using strstr() to check the existence of a substring — it encourages the use of strpos() .

function urlIsOk($url) < $headers = @get_headers($url); $httpStatus = intval(substr($headers[0], 9, 3)); if ($httpStatus<400) < return true; >return false; > 

maybe the server is using HTTP/1.11? or some http version with 3+ digits? it’s safer to use $httpStatus = intval(explode(» «, $header[0], 2)[1]); (-:

karim79’s get_headers() solution didn’t worked for me as I gotten crazy results with Pinterest.

get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Array ( [url] => https://www.pinterest.com/jonathan_parl/ [exists] => ) get_headers(): Failed to enable crypto Array ( [url] => https://www.pinterest.com/jonathan_parl/ [exists] => ) get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed Array ( [url] => https://www.pinterest.com/jonathan_parl/ [exists] => ) 

Anyway, this developer demonstrates that cURL is way faster than get_headers():

Since many people asked for karim79 to fix is cURL solution, here’s the solution I built today.

/** * Send an HTTP request to a the $url and check the header posted back. * * @param $url String url to which we must send the request. * @param $failCodeList Int array list of code for which the page is considered invalid. * * @return Boolean */ public static function isUrlExists($url, array $failCodeList = array(404)) < $exists = false; if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp"))< $url = "https://" . $url; >if (preg_match(RegularExpression::URL, $url)) < $handle = curl_init($url); curl_setopt($handle, CURLOPT_RETURNTRANSFER, true); curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($handle, CURLOPT_HEADER, true); curl_setopt($handle, CURLOPT_NOBODY, true); curl_setopt($handle, CURLOPT_USERAGENT, true); $headers = curl_exec($handle); curl_close($handle); if (empty($failCodeList) or !is_array($failCodeList))< $failCodeList = array(404); >if (!empty($headers)) < $exists = true; $headers = explode(PHP_EOL, $headers); foreach($failCodeList as $code)< if (is_numeric($code) and strpos($headers[0], strval($code)) !== false)< $exists = false; break; >> > > return $exists; > 

Let me explains the curl options:

CURLOPT_RETURNTRANSFER: return a string instead of displaying the calling page on the screen.

CURLOPT_SSL_VERIFYPEER: cUrl won’t checkout the certificate

CURLOPT_HEADER: include the header in the string

CURLOPT_NOBODY: don’t include the body in the string

CURLOPT_USERAGENT: some site needs that to function properly (by example : https://plus.google.com)

Additional note: In this function I’m using Diego Perini’s regex for validating the URL before sending the request:

const URL = "%^(?:(?:https?|ftp)://)(?:\S+(. \S*)?@|\d(?:\.\d)|(?:(?:[a-z\d\x-\x]+-?)*[a-z\d\x-\x]+)(?:\.(?:[a-z\d\x-\x]+-?)*[a-z\d\x-\x]+)*(?:\.[a-z\x-\x]))(. \d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini 

Additional note 2: I explode the header string and user headers[0] to be sure to only validate only the return code and message (example: 200, 404, 405, etc.)

Additional note 3: Sometime validating only the code 404 is not enough (see the unit test), so there’s an optional $failCodeList parameter to supply all the code list to reject.

And, of course, here’s the unit test (including all the popular social network) to legitimates my coding:

public function testIsUrlExists()< //invalid $this->assertFalse(ToolManager::isUrlExists("woot")); $this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456")); $this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800")); $this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405))); $this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/")); $this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456")); $this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546")); $this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405))); $this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456")); //valid $this->assertTrue(ToolManager::isUrlExists("www.google.ca")); $this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque")); $this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque")); $this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/")); $this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque")); $this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/")); $this->assertTrue(ToolManager::isUrlExists("https://regex101.com")); $this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire")); $this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/")); $this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666")); > 

Jonathan Parent-Lévesque from Montreal

Источник

Оцените статью