Get HTML code using JavaScript with a URL
I am trying to get the source code of HTML by using an XMLHttpRequest with a URL. How can I do that? I am new to programming and I am not too sure how can I do it without jQuery.
You may want to look into the problem of the same origin policy. Just search on SO and you will find tons of info.
but is there any other way of going about this thing? like not using xmlhttprequest? with just javascript?
no. xmlhttprequest and iframes are the only way, and both are limited by same-origin policy. If you want to get around this, the remote server needs to cooperate (by serving as jsonp, or putting a special header on the data it serves)
7 Answers 7
Without jQuery (just JavaScript):
function makeHttpObject() < try catch (error) <> try catch (error) <> try catch (error) <> throw new Error("Could not create HTTP request object."); > var request = makeHttpObject(); request.open("GET", "your_url", true); request.send(null); request.onreadystatechange = function() < if (request.readyState == 4) alert(request.responseText); >;
@Senad Meskin thanks for your answer, but issit possible to do it with jQuery? i was wondering if there are other methods to do it.
No its not possible, only thing that you can is call your url, and on serverside code call www.google.com and write to response content of google.com
fetch('some_url') .then(function (response) < switch (response.status) < // status "OK" case 200: return response.text(); // status "Not Found" case 404: throw response; >>) .then(function (template) < console.log(template); >) .catch(function (response) < // "Not Found" console.log(response.statusText); >);
Asynchronous with arrow function version:
There is a tutorial on how to use Ajax here: https://www.w3schools.com/xml/ajax_intro.asp
This is an example code taken from that tutorial:
I had problems with the fetch api and it seams that it always returns promise even when it returns text «return await response.text();» and to handle that promise with the text, it needs to be handled in async method by using .then .
It uses $.ajax() function, so it includes jquery.
First, you must know that you will never be able to get the source code of a page that is not on the same domain as your page in javascript. (See http://en.wikipedia.org/wiki/Same_origin_policy).
In PHP, this is how you do it:
In javascript, there is three ways :
Firstly, by XMLHttpRequest : http://jsfiddle.net/635YY/1/
var url="../635YY",xmlhttp;//Remember, same domain if("XMLHttpRequest" in window)xmlhttp=new XMLHttpRequest(); if("ActiveXObject" in window)xmlhttp=new ActiveXObject("Msxml2.XMLHTTP"); xmlhttp.open('GET',url,true); xmlhttp.onreadystatechange=function() < if(xmlhttp.readyState==4)alert(xmlhttp.responseText); >; xmlhttp.send(null);
var url="../XYjuX";//Remember, same domain var iframe=document.createElement("iframe"); iframe.onload=function() < alert(iframe.contentWindow.document.body.innerHTML); >iframe.src=url; iframe.style.display="none"; document.body.appendChild(iframe);
Thirdly, by jQuery : [http://jsfiddle.net/edggD/2/
$.get('../edggD',function(data)//Remember, same domain < alert(data); >);
Can Javascript read the source of any web page?
I am working on screen scraping, and want to retrieve the source code a particular page. How can achieve this with javascript? Please help me.
Here is similar page you may get your answer as it solve my problem of getting the source of the HTML Page stackoverflow.com/questions/1367587/javascript-page-source-code
@mikenvck Why did you even mention PHP when the question was about JavaScript? The answers below show how to do this with JavaScript.
to get source of a link, you may need to use $.ajax for external links. here is the solution — stackoverflow.com/a/18447625/2657601
jQuery is native JavaScript. It’s just JavaScript you can copy from jquery.com instead of from stackoverflow.com.
17 Answers 17
$("#links").load("/Main_Page #jq-p-Getting-Started li");
Another way to do screen scraping in a much more structured way is to use YQL or Yahoo Query Language. It will return the scraped data structured as JSON or xml.
e.g.
Let’s scrape stackoverflow.com
select * from html where url="http://stackoverflow.com"
will give you a JSON array (I chose that option) like this
The beauty of this is that you can do projections and where clauses which ultimately gets you the scraped data structured and only the data what you need (much less bandwidth over the wire ultimately)
e.g
select * from html where url="http://stackoverflow.com" and xpath='//div/h3/a'
Now to get only the questions we do a
select title from html where url="http://stackoverflow.com" and xpath='//div/h3/a'
Note the title in projections
Once you write your query it generates a url for you
So ultimately you end up doing something like this
var titleList = $.getJSON(theAboveUrl);
Beautiful, isn’t it?
Brilliant, especially for hinting to the poor-man’s solution at yahoo that eliminates the need for a proxy to fetch the data. Thank you!! I took the liberty to fix the last demo-link to query.yahooapis.com: it was missing a % sign in the url-encoding. Cool that this still works!!
query.yahooapis has been retired as of Jan. 2019. Looks really neat, too bad we can’t use it now. See tweet here: twitter.com/ydn/status/1079785891558653952?ref_src=twsrc%5Etfw
Javascript can be used, as long as you grab whatever page you’re after via a proxy on your domain:
that’s really interesting. presumably there is some code to install on the server to make that happen?
You will get a ‘from origin ‘null’ has been blocked by CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource.’ if you are not on the same domain though
You could simply use XmlHttp (AJAX) to hit the required URL and the HTML response from the URL will be available in the responseText property. If it’s not the same domain, your users will receive a browser alert saying something like «This page is trying to access a different domain. Do you want to allow this?»
const URL = 'https://www.sap.com/belgique/index.html'; fetch(URL) .then(res => res.text()) .then(text => < console.log(text); >) .catch(err => console.log(err));
As a security measure, Javascript can’t read files from different domains. Though there might be some strange workaround for it, I’d consider a different language for this task.
If you absolutely need to use javascript, you could load the page source with an ajax request.
Note that with javascript, you can only retrieve pages that are located under the same domain with the requesting page.
You can’t request a page outside of your domain in this way, you have to do it via proxy, e.g. $.get(‘mydomain.com/?url=www.google.com’)
I used ImportIO. They let you request the HTML from any website if you set up an account with them (which is free). They let you make up to 50k requests per year. I didn’t take them time to find an alternative, but I’m sure there are some.
In your Javascript, you’ll basically just make a GET request like this:
var request = new XMLHttpRequest(); request.onreadystatechange = function() < jsontext = request.responseText; alert(jsontext); >request.open("GET", "https://extraction.import.io/query/extractor/THE_PUBLIC_LINK_THEY_GIVE_YOU?_apikey=YOUR_KEY&url=YOUR_URL", true); request.send();
Sidenote: I found this question while researching what I felt like was the same question, so others might find my solution helpful.
UPDATE: I created a new one which they just allowed me to use for less than 48 hours before they said I had to pay for the service. It seems that they shut down your project pretty quick now if you aren’t paying. I made my own similar service with NodeJS and a library called NightmareJS. You can see their tutorial here and create your own web scraping tool. It’s relatively easy. I haven’t tried to set it up as an API that I could make requests to or anything.