Parsing domain from a URL
@LightnessRacesinOrbit This is a bit more than just «looking in the manual». PHP’s parse_url() returns the host, not the domain.
@w3dk: It would still have been a fantastic starting point, allowing this question to be about that limitation of parse_url rather than a vague «what can I do».
@LightnessRacesinOrbit your defense is disingenuous given your reputation — more simply you can admit that you did not read the question completely
19 Answers 19
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'; $parse = parse_url($url); echo $parse['host']; // prints 'google.com'
parse_url doesn’t handle really badly mangled urls very well, but is fine if you generally expect decent urls.
One thing parse_url() does not do is only return the domain. If you add www.google.com or www.google.co.uk, it will return the host as well. Any suggestions for that?
parse_url() would possibly parse URLs with a domain that contains dashes wrongly. Could not find definite proof, but check out this bug. FILTER_VALIDATE_URL uses parse_url() internally.
Or simply: print parse_url($url, PHP_URL_HOST)) if you don’t need the $parse array for anything else.
$domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));
This would return the google.com for both http://google.com/. and http://www.google.com/.
for some odd reason, parse_url returns the host (ex. example.com) as the path when no scheme is provided in the input url. So I’ve written a quick function to get the real host:
function getHost($Address) < $parseUrl = parse_url(trim($Address)); return trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2))); >getHost("example.com"); // Gives example.com getHost("http://example.com"); // Gives example.com getHost("www.example.com"); // Gives www.example.com getHost("http://example.com/xyz"); // Gives example.com
function get_domain($url = SITE_URL) < preg_match("/[a-z0-9\-]\.[a-z\.]$/", parse_url($url, PHP_URL_HOST), $_domain_tld); return $_domain_tld[0]; > get_domain('http://www.cdl.gr'); //cdl.gr get_domain('http://cdl.gr'); //cdl.gr get_domain('http://www2.cdl.gr'); //cdl.gr
No working for me either: example.com // Incorrect: empty string example.com // Correct: example.com www.example.com // Incorrect: empty string example.com/xyz // Correct: example.com
This is a great answer and deserves more credit. Just add this line as the first line in the function and it also solves the problems of MangeshSathe and jenlampton: if((substr($url,0,strlen(‘http://’)) <> ‘http://’) && (substr($url,0,strlen(‘https://’)) <> ‘https://’)) $url = ‘http://’.$url;
The code that was meant to work 100% didn’t seem to cut it for me, I did patch the example a little but found code that wasn’t helping and problems with it. so I changed it out to a couple of functions (to save asking for the list from Mozilla all the time, and removing the cache system). This has been tested against a set of 1000 URLs and seemed to work.
function domain($url) < global $subtlds; $slds = ""; $url = strtolower($url); $host = parse_url('http://'.$url,PHP_URL_HOST); preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches); foreach($subtlds as $sub)< if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz))< preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches); >> return @$matches[0]; > function get_tlds() < $address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1'; $content = file($address); foreach ($content as $num =>$line) < $line = trim($line); if($line == '') continue; if(@substr($line[0], 0, 2) == '/') continue; $line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line); if($line == '') continue; //$line = '.'.$line; if(@$line[0] == '.') $line = substr($line, 1); if(!strstr($line, '.')) continue; $subtlds[] = $line; //echo ": ''"; echo "
"; > $subtlds = array_merge(array( 'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk', 'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au', 'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au' ), $subtlds); $subtlds = array_unique($subtlds); return $subtlds; >
$subtlds = get_tlds(); echo domain('www.example.com') //outputs: example.com echo domain('www.example.uk.com') //outputs: example.uk.com echo domain('www.example.fr') //outputs: example.fr
I know I should have turned this into a class, but didn’t have time.
Please consider replacring the accepted solution with the following:
parse_url() will always include any sub-domain(s), so this function doesn’t parse domain names very well. Here are some examples:
$url = 'http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html'; $parse = parse_url($url); echo $parse['host']; // prints 'www.google.com' echo parse_url('https://subdomain.example.com/foo/bar', PHP_URL_HOST); // Output: subdomain.example.com echo parse_url('https://subdomain.example.co.uk/foo/bar', PHP_URL_HOST); // Output: subdomain.example.co.uk
Instead, you may consider this pragmatic solution. It will cover many, but not all domain names — for instance, lower-level domains such as ‘sos.state.oh.us’ are not covered.
function getDomain($url) < $host = parse_url($url, PHP_URL_HOST); if(filter_var($host,FILTER_VALIDATE_IP)) < // IP address returned as domain return $host; //* or replace with null if you don't want an IP back >$domain_array = explode(".", str_replace('www.', '', $host)); $count = count($domain_array); if( $count>=3 && strlen($domain_array[$count-2])==2 ) < // SLD (example.co.uk) return implode('.', array_splice($domain_array, $count-3,3)); >else if( $count>=2 ) < // TLD (example.com) return implode('.', array_splice($domain_array, $count-2,2)); >> // Your domains echo getDomain('http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com echo getDomain('http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com echo getDomain('http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html'); // google.co.uk // TLD echo getDomain('https://shop.example.com'); // example.com echo getDomain('https://foo.bar.example.com'); // example.com echo getDomain('https://www.example.com'); // example.com echo getDomain('https://example.com'); // example.com // SLD echo getDomain('https://more.news.bbc.co.uk'); // bbc.co.uk echo getDomain('https://www.bbc.co.uk'); // bbc.co.uk echo getDomain('https://bbc.co.uk'); // bbc.co.uk // IP echo getDomain('https://1.2.3.45'); // 1.2.3.45
Finally, Jeremy Kendall’s PHP Domain Parser allows you to parse the domain name from a url. League URI Hostname Parser will also do the job.
How to get the current domain name using PHP
The $_SERVER array is a global variable that contains server and header information.
To get the domain name where you run PHP code, you need to access the SERVER_NAME or HTTP_HOST index from the $_SERVER array.
Suppose your website has the URL of https://example.com/post/1 . Here’s how you get the domain name:
If you are using Apache server 2, you need to configure the directive UseCanonicalName to On and set the ServerName .
Otherwise, the ServerName value reflects the hostname supplied by the client, which can be spoofed.
Aside from the SERVER_NAME index, you can also get the domain name using HTTP_HOST like this:
The difference is that HTTP_HOST is controlled from the browser, while SERVER_NAME is controlled from the server.
If you need the domain name for business logic, you should use SERVER_NAME because it is more secure.
Finally, you can combine SERVER_NAME with HTTPS and REQUEST_URI indices to get your website’s complete URL.
See the code example below:
Now you’ve learned how to get the domain name of your website using PHP. Nice!
Take your skills to the next level ⚡️
I’m sending out an occasional email with the latest tutorials on programming, web development, and statistics. Drop your email in the box below and I’ll send new stuff straight into your inbox!
About
Hello! This website is dedicated to help you learn tech and data science skills with its step-by-step, beginner-friendly tutorials.
Learn statistics, JavaScript and other programming languages using clear examples written for people.
Search
Type the keyword below and hit enter
Tags
Click to see all tutorials tagged with:
Get current domain
@TonyEvyght that’s the point infgeoax and I try to make, you should get the host name you’re connecting with in $_SERVER[‘HTTP_HOST’] . If the sites one.com and two.com are «redirecting» using an (i)frame, the page itself still comes from myserver.uk.com, so you won’t get the real domain. What is the HTML source for one.com ?
9 Answers 9
-1: With this answer alone, I do not know exactly what the different suggestions I am looking at do. Sure, this gives me a point to continue looking from, but by itself this is really not a good answer.
@SarahLewis HTTP_X_ORIGINAL_HOST can be modified by the user, and cannot be trusted. This may not always be a problem, but it’s something to be aware of.
@Sumurai8 Can you share more information on that? How can it be spoofed? As far as I know a visitor cannot change it. Some other script may change it, yes, but that goes with all other environment variables.
The only secure way of doing this
The only guaranteed secure method of retrieving the current domain is to store it in a secure location yourself.
Most frameworks take care of storing the domain for you, so you will want to consult the documentation for your particular framework. If you’re not using a framework, consider storing the domain in one of the following places:
Secure methods of storing the domain | Used By |
---|---|
A configuration file | Joomla, Drupal/Symfony |
The database | WordPress |
An environmental variable | Laravel |
A service registry | Kubernetes DNS |
The following work… but they’re not secure
Hackers can make the following variables output whatever domain they want. This can lead to cache poisoning and barely noticeable phishing attacks.
This gets the domain from the request headers which are open to manipulation by hackers. Same with:
This one can be made better if the Apache setting UseCanonicalName is turned on; in which case $_SERVER[‘SERVER_NAME’] will no longer be allowed to be populated with arbitrary values and will be secure. This is, however, non-default and not as common of a setup.
In popular systems
Below is how you can get the current domain in the following frameworks/systems:
$urlparts = wp_parse_url(home_url()); $domain = $urlparts['host'];
If you’re constructing a URL in WordPress, just use home_url or site_url, or any of the other URL functions.
The request()->getHost function is inherited from Symfony, and has been secure since the 2013 CVE-2013-4752 was patched.
The installer does not yet take care of making this secure (issue #2404259). But in Drupal 8 there is documentation you can follow at Trusted Host Settings to secure your Drupal installation after which the following can be used:
Other frameworks
Feel free to edit this answer to include how to get the current domain in your favorite framework. When doing so, please include a link to the relevant source code or to anything else that would help me verify that the framework is doing things securely.
Addendum
- Cache poisoning can happen if a botnet continuously requests a page using the wrong hosts header. The resulting HTML will then include links to the attackers website where they can phish your users. At first the malicious links will only be sent back to the hacker, but if the hacker does enough requests, the malicious version of the page will end up in your cache where it will be distributed to other users.
- A phishing attack can happen if you store links in the database based on the hosts header. For example, let say you store the absolute URL to a user’s profiles on a forum. By using the wrong header, a hacker could get anyone who clicks on their profile link to be sent a phishing site.
- Password reset poisoning can happen if a hacker uses a malicious hosts header when filling out the password reset form for a different user. That user will then get an email containing a password reset link that leads to a phishing site. Another more complex form of this skips the user having to do anything by getting the email to bounce and resend to one of the hacker’s SMTP servers (for example CVE-2017-8295.)
- Here are some more malicious examples
Additional Caveats and Notes:
- When UseCanonicalName is turned off the $_SERVER[‘SERVER_NAME’] is populated with the same header $_SERVER[‘HTTP_HOST’] would have used anyway (plus the port). This is Apache’s default setup. If you or DevOps turns this on then you’re okay — ish — but do you really want to rely on a separate team, or yourself three years in the future, to keep what would appear to be a minor configuration at a non-default value? Even though this makes things secure, I would caution against relying on this setup.
- Red Hat, however, does turn UseCanonicalName on by default [source].
- If serverAlias is used in the virtual hosts entry, and the aliased domain is requested, $_SERVER[‘SERVER_NAME’] will not return the current domain, but will return the value of the serverName directive.
- If the serverName cannot be resolved, the operating system’s hostname command is used in its place [source].
- If the host header is left out, the server will behave as if UseCanonicalName was on [source].
- Lastly, I just tried exploiting this on my local server, and was unable to spoof the hosts header. I’m not sure if there was an update to Apache that addressed this, or if I was just doing something wrong. Regardless, this header would still be exploitable in environments where virtual hosts are not being used.
This question received hundreds of thousands of views without a single mention of the security problems at hand! It shouldn’t be this way, but just because a Stack Overflow answer is popular, that doesn’t mean it is secure.