Php получение данных страницы

parse_url

This function parses a URL and returns an associative array containing any of the various components of the URL that are present. The values of the array elements are not URL decoded.

This function is not meant to validate the given URL, it only breaks it up into the parts listed below. Partial and invalid URLs are also accepted, parse_url() tries its best to parse them correctly.

Parameters

Specify one of PHP_URL_SCHEME , PHP_URL_HOST , PHP_URL_PORT , PHP_URL_USER , PHP_URL_PASS , PHP_URL_PATH , PHP_URL_QUERY or PHP_URL_FRAGMENT to retrieve just a specific URL component as a string (except when PHP_URL_PORT is given, in which case the return value will be an int ).

Return Values

On seriously malformed URLs, parse_url() may return false .

  • scheme — e.g. http
  • host
  • port
  • user
  • pass
  • path
  • query — after the question mark ?
  • fragment — after the hashmark #

If the component parameter is specified, parse_url() returns a string (or an int , in the case of PHP_URL_PORT ) instead of an array . If the requested component doesn’t exist within the given URL, null will be returned. As of PHP 8.0.0, parse_url() distinguishes absent and empty queries and fragments:

http://example.com/foo → query = null, fragment = null http://example.com/foo? → query = "", fragment = null http://example.com/foo# → query = null, fragment = "" http://example.com/foo?# → query = "", fragment = ""

Previously all cases resulted in query and fragment being null .

Note that control characters (cf. ctype_cntrl() ) in the components are replaced with underscores ( _ ).

Changelog

Version Description
8.0.0 parse_url() will now distinguish absent and empty queries and fragments.
Читайте также:  Html http server request

Examples

Example #1 A parse_url() example

var_dump ( parse_url ( $url ));
var_dump ( parse_url ( $url , PHP_URL_SCHEME ));
var_dump ( parse_url ( $url , PHP_URL_USER ));
var_dump ( parse_url ( $url , PHP_URL_PASS ));
var_dump ( parse_url ( $url , PHP_URL_HOST ));
var_dump ( parse_url ( $url , PHP_URL_PORT ));
var_dump ( parse_url ( $url , PHP_URL_PATH ));
var_dump ( parse_url ( $url , PHP_URL_QUERY ));
var_dump ( parse_url ( $url , PHP_URL_FRAGMENT ));
?>

The above example will output:

array(8) < ["scheme"]=>string(4) "http" ["host"]=> string(8) "hostname" ["port"]=> int(9090) ["user"]=> string(8) "username" ["pass"]=> string(8) "password" ["path"]=> string(5) "/path" ["query"]=> string(9) "arg=value" ["fragment"]=> string(6) "anchor" > string(4) "http" string(8) "username" string(8) "password" string(8) "hostname" int(9090) string(5) "/path" string(9) "arg=value" string(6) "anchor"

Example #2 A parse_url() example with missing scheme

// Prior to 5.4.7 this would show the path as «//www.example.com/path»
var_dump ( parse_url ( $url ));
?>

The above example will output:

array(3) < ["host"]=>string(15) "www.example.com" ["path"]=> string(5) "/path" ["query"]=> string(17) "googleguy=googley" >

Notes

This function may not give correct results for relative or invalid URLs, and the results may not even match common behavior of HTTP clients. If URLs from untrusted input need to be parsed, extra validation is required, e.g. by using filter_var() with the FILTER_VALIDATE_URL filter.

Note:

This function is intended specifically for the purpose of parsing URLs and not URIs. However, to comply with PHP’s backwards compatibility requirements it makes an exception for the file:// scheme where triple slashes (file:///. ) are allowed. For any other scheme this is invalid.

See Also

  • pathinfo() — Returns information about a file path
  • parse_str() — Parses the string into variables
  • http_build_query() — Generate URL-encoded query string
  • dirname() — Returns a parent directory’s path
  • basename() — Returns trailing name component of path
  • » RFC 3986

User Contributed Notes 35 notes

[If you haven’t yet] been able to find a simple conversion back to string from a parsed url, here’s an example:

function unparse_url ( $parsed_url ) <
$scheme = isset( $parsed_url [ ‘scheme’ ]) ? $parsed_url [ ‘scheme’ ] . ‘://’ : » ;
$host = isset( $parsed_url [ ‘host’ ]) ? $parsed_url [ ‘host’ ] : » ;
$port = isset( $parsed_url [ ‘port’ ]) ? ‘:’ . $parsed_url [ ‘port’ ] : » ;
$user = isset( $parsed_url [ ‘user’ ]) ? $parsed_url [ ‘user’ ] : » ;
$pass = isset( $parsed_url [ ‘pass’ ]) ? ‘:’ . $parsed_url [ ‘pass’ ] : » ;
$pass = ( $user || $pass ) ? » $pass @» : » ;
$path = isset( $parsed_url [ ‘path’ ]) ? $parsed_url [ ‘path’ ] : » ;
$query = isset( $parsed_url [ ‘query’ ]) ? ‘?’ . $parsed_url [ ‘query’ ] : » ;
$fragment = isset( $parsed_url [ ‘fragment’ ]) ? ‘#’ . $parsed_url [ ‘fragment’ ] : » ;
return » $scheme$user$pass$host$port$path$query$fragment » ;
>

Here is utf-8 compatible parse_url() replacement function based on «laszlo dot janszky at gmail dot com» work. Original incorrectly handled URLs with user:pass. Also made PHP 5.5 compatible (got rid of now deprecated regex /e modifier).

$parts = parse_url ( $enc_url );

if( $parts === false )
throw new \ InvalidArgumentException ( ‘Malformed URL: ‘ . $url );
>

foreach( $parts as $name => $value )
$parts [ $name ] = urldecode ( $value );
>

It may be worth reminding that the value of the #fragment never gets sent to the server. Anchors processing is exclusively client-side.

Here’s a function which implements resolving a relative URL according to RFC 2396 section 5.2. No doubt there are more efficient implementations, but this one tries to remain close to the standard for clarity. It relies on a function called «unparse_url» to implement section 7, left as an exercise for the reader (or you can substitute the «glue_url» function posted earlier).

/**
* Resolve a URL relative to a base path. This happens to work with POSIX
* filenames as well. This is based on RFC 2396 section 5.2.
*/
function resolve_url ( $base , $url ) if (! strlen ( $base )) return $url ;
// Step 2
if (! strlen ( $url )) return $base ;
// Step 3
if ( preg_match ( ‘!^[a-z]+:!i’ , $url )) return $url ;
$base = parse_url ( $base );
if ( $url < 0 >== «#» ) // Step 2 (fragment)
$base [ ‘fragment’ ] = substr ( $url , 1 );
return unparse_url ( $base );
>
unset( $base [ ‘fragment’ ]);
unset( $base [ ‘query’ ]);
if ( substr ( $url , 0 , 2 ) == «//» ) // Step 4
return unparse_url (array(
‘scheme’ => $base [ ‘scheme’ ],
‘path’ => $url ,
));
> else if ( $url < 0 >== «/» ) // Step 5
$base [ ‘path’ ] = $url ;
> else // Step 6
$path = explode ( ‘/’ , $base [ ‘path’ ]);
$url_path = explode ( ‘/’ , $url );
// Step 6a: drop file from base
array_pop ( $path );
// Step 6b, 6c, 6e: append url while removing «.» and «..» from
// the directory portion
$end = array_pop ( $url_path );
foreach ( $url_path as $segment ) if ( $segment == ‘.’ ) // skip
> else if ( $segment == ‘..’ && $path && $path [ sizeof ( $path )- 1 ] != ‘..’ ) array_pop ( $path );
> else $path [] = $segment ;
>
>
// Step 6d, 6f: remove «.» and «..» from file portion
if ( $end == ‘.’ ) $path [] = » ;
> else if ( $end == ‘..’ && $path && $path [ sizeof ( $path )- 1 ] != ‘..’ ) $path [ sizeof ( $path )- 1 ] = » ;
> else $path [] = $end ;
>
// Step 6h
$base [ ‘path’ ] = join ( ‘/’ , $path );

Unfortunately parse_url() DO NOT parse correctly urls without scheme or ‘//’. For example ‘www.xyz.com’ is consider as path not host:

Code:
var_dump ( parse_url ( ‘www.xyz.com’ ));
?>
Output:
array(1) [«path»]=>
string(10) «www.xyz.com»
>

To get better output change url to:
‘//www.xyz.com’ or ‘http://www.xyz.com’

I was writing unit tests and needed to cause this function to kick out an error and return FALSE in order to test a specific execution path. If anyone else needs to force a failure, the following inputs will work:

URL’s in the query string of a relative URL will cause a problem

unset a query var from passed in or current URL:

function unsetqueryvar($var, $url=null) if (null == $url) $url = $_SERVER[‘REQUEST_URI’];
//mogrify to list
$url = parse_url($url);
$rq = [];
parse_str($url[‘query’], $rq);
unset($rq[$var]);
return $url[‘scheme’].$url[‘host’].$url[‘path’].’?’.http_build_query($rq).$url[‘fragment’];
>

I have coded a function which converts relative URL to absolute URL for a project of mine. Considering I could not find it elsewhere, I figured I would post it here.

The following function takes in 2 parameters, the first parameter is the URL you want to convert from relative to absolute, and the second parameter is a sample of the absolute URL.

Currently it does not resolve ‘../’ in the URL, only because I do not need it. Most webservers will resolve this for you. If you want it to resolve the ‘../’ in the path, it just takes minor modifications.

function relativeToAbsolute ( $inurl , $absolute ) // Get all parts so not getting them multiple times 🙂
$absolute_parts = parse_url ( $absolute );
// Test if URL is already absolute (contains host, or begins with ‘/’)
if ( ( strpos ( $inurl , $absolute_parts [ ‘host’ ]) == false ) ) // Define $tmpurlprefix to prevent errors below
$tmpurlprefix = «» ;
// Formulate URL prefix (SCHEME)
if (!(empty( $absolute_parts [ ‘scheme’ ]))) <
// Add scheme to tmpurlprefix
$tmpurlprefix .= $absolute_parts [ ‘scheme’ ] . «://» ;
>
// Formulate URL prefix (USER, PASS)
if ((!(empty( $absolute_parts [ ‘user’ ]))) and (!(empty( $absolute_parts [ ‘pass’ ])))) <
// Add user:port to tmpurlprefix
$tmpurlprefix .= $absolute_parts [ ‘user’ ] . «:» . $absolute_parts [ ‘pass’ ] . «@» ;
>
// Formulate URL prefix (HOST, PORT)
if (!(empty( $absolute_parts [ ‘host’ ]))) <
// Add host to tmpurlprefix
$tmpurlprefix .= $absolute_parts [ ‘host’ ];
// Check for a port, add if exists
if (!(empty( $absolute_parts [ ‘port’ ]))) // Add port to tmpurlprefix
$tmpurlprefix .= «:» . $absolute_parts [ ‘port’ ];
>
>
// Formulate URL prefix (PATH) and only add it if the path to image does not include ./
if ( (!(empty( $absolute_parts [ ‘path’ ]))) and ( substr ( $inurl , 0 , 1 ) != ‘/’ ) ) <
// Get path parts
$path_parts = pathinfo ( $absolute_parts [ ‘path’ ]);
// Add path to tmpurlprefix
$tmpurlprefix .= $path_parts [ ‘dirname’ ];
$tmpurlprefix .= «/» ;
>
else <
$tmpurlprefix .= «/» ;
>
// Lets remove the ‘/’
if ( substr ( $inurl , 0 , 1 ) == ‘/’ ) < $inurl = substr ( $inurl , 1 ); >
// Lets remove the ‘./’
if ( substr ( $inurl , 0 , 2 ) == ‘./’ ) < $inurl = substr ( $inurl , 2 ); >
return $tmpurlprefix . $inurl ;
>
else // Path is already absolute. Return it 🙂
return $inurl ;
>
>

// Define a sample absolute URL
$absolute = «http://» . «user:pass@example.com:8080/path/to/index.html» ; // Just evading php.net spam filter, not sure how example.com is spam.

/* EXAMPLE 1 */
echo relativeToAbsolute ( $absolute , $absolute ) . «\n» ;
/* EXAMPLE 2 */
echo relativeToAbsolute ( «img.gif» , $absolute ) . «\n» ;
/* EXAMPLE 3 */
echo relativeToAbsolute ( «/img.gif» , $absolute ) . «\n» ;
/* EXAMPLE 4 */
echo relativeToAbsolute ( «./img.gif» , $absolute ) . «\n» ;
/* EXAMPLE 5 */
echo relativeToAbsolute ( «../img.gif» , $absolute ) . «\n» ;
/* EXAMPLE 6 */
echo relativeToAbsolute ( «images/img.gif» , $absolute ) . «\n» ;
/* EXAMPLE 7 */
echo relativeToAbsolute ( «/images/img.gif» , $absolute ) . «\n» ;
/* EXAMPLE 8 */
echo relativeToAbsolute ( «./images/img.gif» , $absolute ) . «\n» ;
/* EXAMPLE 9 */
echo relativeToAbsolute ( «../images/img.gif» , $absolute ) . «\n» ;

?>

OUTPUTS:
http :// user:pass@example.com:8080/path/to/index.html
http :// user:pass@example.com:8080/path/to/img.gif
http :// user:pass@example.com:8080/img.gif
http :// user:pass@example.com:8080/path/to/img.gif
http :// user:pass@example.com:8080/path/to/../img.gif
http :// user:pass@example.com:8080/path/to/images/img.gif
http :// user:pass@example.com:8080/images/img.gif
http :// user:pass@example.com:8080/path/to/images/img.gif
http :// user:pass@example.com:8080/path/to/../images/img.gif

Sorry if the above code is not your style, or if you see it as «messy» or you think there is a better way to do it. I removed as much of the white space as possible.

Источник

Оцените статью