Get HTML code using JavaScript with a URL
I am trying to get the source code of HTML by using an XMLHttpRequest with a URL. How can I do that? I am new to programming and I am not too sure how can I do it without jQuery.
You may want to look into the problem of the same origin policy. Just search on SO and you will find tons of info.
but is there any other way of going about this thing? like not using xmlhttprequest? with just javascript?
no. xmlhttprequest and iframes are the only way, and both are limited by same-origin policy. If you want to get around this, the remote server needs to cooperate (by serving as jsonp, or putting a special header on the data it serves)
7 Answers 7
Without jQuery (just JavaScript):
function makeHttpObject() < try catch (error) <> try catch (error) <> try catch (error) <> throw new Error("Could not create HTTP request object."); > var request = makeHttpObject(); request.open("GET", "your_url", true); request.send(null); request.onreadystatechange = function() < if (request.readyState == 4) alert(request.responseText); >;
@Senad Meskin thanks for your answer, but issit possible to do it with jQuery? i was wondering if there are other methods to do it.
No its not possible, only thing that you can is call your url, and on serverside code call www.google.com and write to response content of google.com
fetch('some_url') .then(function (response) < switch (response.status) < // status "OK" case 200: return response.text(); // status "Not Found" case 404: throw response; >>) .then(function (template) < console.log(template); >) .catch(function (response) < // "Not Found" console.log(response.statusText); >);
Asynchronous with arrow function version:
There is a tutorial on how to use Ajax here: https://www.w3schools.com/xml/ajax_intro.asp
This is an example code taken from that tutorial:
I had problems with the fetch api and it seams that it always returns promise even when it returns text «return await response.text();» and to handle that promise with the text, it needs to be handled in async method by using .then .
It uses $.ajax() function, so it includes jquery.
First, you must know that you will never be able to get the source code of a page that is not on the same domain as your page in javascript. (See http://en.wikipedia.org/wiki/Same_origin_policy).
In PHP, this is how you do it:
In javascript, there is three ways :
Firstly, by XMLHttpRequest : http://jsfiddle.net/635YY/1/
var url="../635YY",xmlhttp;//Remember, same domain if("XMLHttpRequest" in window)xmlhttp=new XMLHttpRequest(); if("ActiveXObject" in window)xmlhttp=new ActiveXObject("Msxml2.XMLHTTP"); xmlhttp.open('GET',url,true); xmlhttp.onreadystatechange=function() < if(xmlhttp.readyState==4)alert(xmlhttp.responseText); >; xmlhttp.send(null);
var url="../XYjuX";//Remember, same domain var iframe=document.createElement("iframe"); iframe.onload=function() < alert(iframe.contentWindow.document.body.innerHTML); >iframe.src=url; iframe.style.display="none"; document.body.appendChild(iframe);
Thirdly, by jQuery : [http://jsfiddle.net/edggD/2/
$.get('../edggD',function(data)//Remember, same domain < alert(data); >);
get web page text via javascript [closed]
It’s difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
4 Answers 4
You could do it with Range s / TextRange s. This has the advantage of only getting the visible text on the page (unlike, for example, the textContent property of elements in non-IE browsers, which will also get you the contents of and possibly other elements). The following will work in all mainstream browsers although I can’t make any guarantees about the consistency of line breaks between different browsers.
UPDATE November 2012
I don’t think this is a good idea these days. While Selection is now specified, its toString() method is not, and for some time (including when Microsoft were implementing it for IE 9) it was specified to behave like textContent . For this particular method, browser consistency has got worse rather than better since 2009.
function getBodyText(win) < var doc = win.document, body = doc.body, selection, range, bodyText; if (body.createTextRange) < return body.createTextRange().text; >else if (win.getSelection) < selection = win.getSelection(); range = doc.createRange(); range.selectNodeContents(body); selection.addRange(range); bodyText = selection.toString(); selection.removeAllRanges(); return bodyText; >> alert( getBodyText(window) );
How to get the entire document HTML as a string?
Stop upvoting John’s bolded comment! The answer he links to replaces && with && and so it breaks all your inline tags! You should use document.documentElement.outerHTML instead, but note that it doesn’t grab , so you’ll need to add that yourself.
17 Answers 17
Get the root element with document.documentElement then get its .innerHTML :
const txt = document.documentElement.innerHTML; alert(txt);
or its .outerHTML to get the tag as well
const txt = document.documentElement.outerHTML; alert(txt);
worked like a charm! thank you! is there any way to get the size of any/all files linked to the document as well including js and css files?
@CMCDragonkai: You could get the doctype separately and prepend it to the markup string. Not ideal, I know, but possible.
note that neither this nor none of these answers necessarily give you content that is the exact hash equivalent of saving the page to a file or the file generated by view-source. It seems the DOM normalizes some fields from the literal response content, like capitalising DOCTYPE headers
new XMLSerializer().serializeToString(document)
in browsers newer than IE 9
This was the first correct answer according to date/time stamps. Parts of the page such as the XML declaration will not be included and browsers will manipulate the code when using the other «answers». This is the only post that should be up-voted (dos’s posted three days later). People need to pay attention!
This is not entirely correct since it serializeToString performs an HTML encode. For example if your code contains styles defining fonts such as «Times New Roman», Times, serif the quotes will get html encoded. Perhaps that is not important to some of you but to me it is.
@John well the OP actually asks for «the entire HTML within the html tags». And the selected best answer by Colin Burnett does achieve this. This particular answer (Erik’s) will include the html tags and the doctype. That said, this was totally a diamond in the rough for me and exactly what I was looking for! Your comment helped too because it made me spend more time with this answer, so thanks 🙂
I think people should be careful with this one, specifically because it returns a value that is not the actual html that your browser receives. In my case, it added attributes to the html tag that the server never actually sent 🙁
I tried the various answers to see what is returned. I’m using the latest version of Chrome.
The suggestion document.documentElement.innerHTML; returned .