Too many requests python

Python HTTP error 429 (Too Many Requests)

Any idea how I can solve the HTTP Error 429 with pandas and requests? Solution: The error is being thrown by the web server that you are making the requests to, almost certainly because you’re issuing requests too quickly

Python HTTP error 429 (Too Many Requests)

I used to fetch a CSV file from a URL and put that CSV file directly to a Pandas dataframe like this:

import pandas as pd grab_csv = 'https://XXXX.****/data.csv' pd_data = pd.read_csv(grab_csv).drop(columns=['Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 4', 'Column 5', 'Column 6', 'Column 7']) 

Since today, I get urllib.error.HTTPError: HTTP Error 429: Too Many Requests . What I tried in order to fix it:

import pandas as pd import requests from io import StringIO grab_csv = 'https://XXXX.****/data.csv' headers = res_grab_data = requests.get(StringIO(grab_csv), headers=headers).text pd_data = pd.read_csv(res_grab_data).drop(columns=['Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 4', 'Column 5', 'Column 6', 'Column 7']) 

This time, I get the error requests.exceptions.MissingSchema: Invalid URL »: No schema supplied. Perhaps you meant http://? .

Читайте также:  Взаимодействия объектов в java

Any idea how I can solve the HTTP Error 429 with pandas and requests?

The error is being thrown by the web server that you are making the requests to, almost certainly because you’re issuing requests too quickly and they don’t like it. It’s not because of an error in your code.

Your attempt at fixing it doesn’t make much sense — StringIO allows you to use an in-memory string as if it were a file object. Passing it as a parameter to requests.get isn’t really a valid use case — you should be using requests.get(grab_csv, . as you were previously, as .get() expects the url parameter to be a string.

I’d consult the documentation for the API your using (if there is any), and slow down your rate of requests to be in line with their limits.

There is a neat Python package (aptly named ratelimit ) that lets you decorate your function to enforce the rate limiting: https://pypi.org/project/ratelimit/

How can I bypass 429 Too Many Request error on c, 1 Answer. While I can’t see all the code , and can’t see any recursing I assume you are making multiple calls to SearchAndWrite , you probably need to …

HTTP Status Code 429: What Is a 429 Error «Too Many

If you’re trying to figure out what the HTTP status code 429 error «too many requests » is, this is the video for you. Grab our free status code cheatsheet wi

Script either gives 429 error (too many requests) or takes too long. How to chain function calls?

First of all, I’m not an experienced programmer and I’m very new to Google Apps Script.

I’m running a Google Apps Script and I’m stuck. What the script does: it copies a part of a sheet to a temp sheet, makes that into a PDF and sends it by mail.

I want to do this for (right now) 40 mail addresses. If I run the script it gives me a 429 (too many requests) error , after 5 to 8 addresses. This is the heavy part, I found out: var response = UrlFetchApp.fetch(url, params).getBlob(); If I comment it out, it works great, even copy-pasting the temp sheet and sending the emails.

To prevent this I added a sleep timer. I had to go up to 12 seconds and didn’t get the error. Great. But now the script takes more than 6 min (the maximum time), so it takes too long and doesn’t finish (gets about halfway).

After reading some I think the script (correct me if I’m wrong) is pretty optimal and I need to «chain function calls». But I have no idea how to go about that. I assigned this script to a button in the sheet. But I can’t see how I can run 1 function and have that trigger other functions, without it considering that the same function (and thus stopping after 6 min). How do I go about this?

Here’s the full code. Sorry for the Dutch text (they are just some confirmation windows and such):

function exportNamedRangesAsPDF() < var y = 1 var sec = 40 var ui = SpreadsheetApp.getUi(); var result = ui.alert('Weet je zeker dat je alle maandstaten wil versturen via mail?', ui.ButtonSet.YES_NO); var html = HtmlService.createHtmlOutputFromFile('Page') .setWidth(400) .setHeight(200); if (result == ui.Button.YES) < var ss = SpreadsheetApp.getActiveSpreadsheet(); var sheet = ss.getSheetByName('Maandstaten print'); var namenSheet = ss.getSheetByName('Alle_namen'); var namenSheetLastRow = namenSheet.getLastRow(); var namenSheetAantalX = namenSheet.getRange("Q1").getValue() var startKol = 4 var namenVerzonden = []; var mailOntbreekt = []; //Logger.log(namenSheetLastRow) var newSheet = ss.getSheetByName('print'); if (!newSheet) < newSheet = ss.insertSheet('print'); >newSheet.showSheet(); var gid = newSheet.getSheetId(); var ssID = "138zfRxR_SQ6oRJouQsMwKQZdyZYbqarUuMCfZTc8fGs"; var url = "https://docs.google.com/spreadsheets/d/"+ssID+"/export"+ "?format=pdf&"+ "size=7&"+ "fzr=false&"+ "portrait=true&"+ "fitw=true&"+ "gridlines=false&"+ "printtitle=false&"+ "sheetnames=false&"+ "pagenum=UNDEFINED&"+ "gid=1186495600&"+ "top_margin=0.75&"+ "bottom_margin=0.75&"+ "left_margin=0.2&"+ "right_margin=0.2&"+ "attachment=true"; var params = >; var i; for (i = 1; i < namenSheetLastRow; i++) < //////// START FOR LOOP if( namenSheet.getRange(i+1,16, 1, 1).getValue() == "x" ) < /////// START IF 1 var startRij = namenSheet.getRange(i+1,15, 1, 1).getValue() var voornaam = sheet.getRange(startRij+10, startKol-1, 1, 1).getValues(); var zoeknaam = sheet.getRange(startRij+4, startKol-1, 1, 1).getValues(); ui.showModalDialog(html, "Bezig met versturen. " + y + " van " + namenSheetAantalX); /*if (y % 5 == 0) < // Wait (sleep) every 5th time the script runs, to prevent 429 error (too many reqests) ui.showModalDialog(html, "Wachten. "+ sec + " sec"); Utilities.sleep(sec*1000); // https://stackoverflow.com/questions/47648338/creating-multiple-google-sheets-pdfs-throws-429-error Logger.log("Sleep: "+ sec + " sec") >y = y+1*/ if( namenSheet.getRange(i+1,16, 1, 1).getValue() == "x" && namenSheet.getRange(i+1,14, 1, 1).getValue() != "" ) < ////////// START IF 2 var volleNaam = sheet.getRange(startRij+2, startKol-2, 1, 1).getValues(); var maand = sheet.getRange(1, 2, 1, 1).getValues(); var mailAdres = sheet.getRange(startRij+9, startKol-1, 1, 1).getValue(); Logger.log(mailAdres + " " + volleNaam); namenVerzonden.push(" " + zoeknaam); sheet.getRange(startRij, startKol, 39, 16).copyTo(newSheet.getRange(1, 1, 39, 16), ); //copy the right part of the sheet to the new sheet, content only sheet.getRange(startRij, startKol, 39, 16).copyTo(newSheet.getRange(1, 1, 39, 16), );//copy the right part of the sheet to the new sheet, formatting only var response = UrlFetchApp.fetch(url, params).getBlob(); // This is the super heavy part, running it too often causes a 429 (too many requests) error //DriveApp.createFile(response); //save to drive var message = < //send as email to: mailAdres, subject: "Maandstaat "+ maand, body: "Beste "+ voornaam + ",\n\nIn de bijlage vind je de maandstaat van maand " + maand + ".\n\nMet vriendelijke groet,\nCJ Hendriks Group", name: "CJ Hendriks", attachments: [< fileName: "Maandstaat - " + maand + " - " + volleNaam + ".pdf", content: response.getBytes(), mimeType: "application/pdf" >] > //MailApp.sendEmail(message); // This is the actual mail action > ////////// END IF 2 > /////// END IF 1 else if( namenSheet.getRange(i+1,16, 1, 1).getValue() == "x" && namenSheet.getRange(i+1,14, 1, 1).getValue() == "" ) < mailOntbreekt.push(" " + zoeknaam); >> //////// END FOR LOOP ui.showModalDialog(html, "Maandstaten verzonden naar: " + namenVerzonden); Logger.log('Maandstaten verzonden naar: \n'+namenVerzonden); if( mailOntbreekt.length != 0) < ui.alert('Mail adres ontbreekt bij: \n'+mailOntbreekt); >newSheet.hideSheet(); // hide the "print" sheet > else < ui.alert('Maandstaten NIET verzonden.'); >Logger.log("Succesvol voltooid") > 

Your script is complex in terms of efficiency

As mentioned in the comment section, you should use arrays instead of getting all the array, this is pretty infeasible for you code. What’s more on the line 80 var response = UrlFetchApp.fetch(url, params).getBlob(); you should make this request outside the loop, that’s the reason why you are getting 429 because your script’s making namenSheetLastRow times a request which is the same response. Keep in mind that Sheets API has its limits when it comes to non-billing accounts, 100 requests per 100 seconds per user .

As a workaround

Move line 80 to line 50 as the following:

. var params = >; var response = UrlFetchApp.fetch(url, params).getBlob(); . 

In doing so you will have to make this request only once and the bytes will be stored in response .

Reference

How to fixed 429 (Too Many Requests)?, Many people are using cors-anywhere, which is probably why it sends Too many requests all the time. I guess its better to rely on an own proxy in this …

How can I bypass 429 Too Many Request error on c#?

So, I am trying to make a plagiarism checker with c# and google and after a few debugs, my code does not work anymore. I tried to see the exception and saw that it is a 429 Too Many Requests error . What I want to do is either bypass this error with something (as I can still access google from the same pc) or get a time that I can try again. How can I do that?

private void SearchAndWrite(string text) < string txtKeyWords = text; listBox1.Items.Clear(); StringBuilder sb = new StringBuilder(); byte[] ResultsBuffer = new byte[8192]; string SearchResults = "http://google.com/search?q=" + txtKeyWords.Trim(); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults); //request.Headers["X-My-Custom-Header"] = "'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)\\AppleWebKit / 537.36(KHTML, like Gecko) Cafari / 537.36'"; try < int count = 0; HttpWebResponse response = (HttpWebResponse)request.GetResponse(); Stream resStream = response.GetResponseStream(); count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length); string tempString = null; do < if (count != 0) < tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count); sb.Append(tempString); >> while (count > 0); string sbb = sb.ToString(); HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument(); html.OptionOutputAsXml = true; html.LoadHtml(sbb); HtmlNode doc = html.DocumentNode; foreach (HtmlNode link in doc.SelectNodes("//a[@href]")) < //HtmlAttribute att = link.Attributes["href"]; string hrefValue = link.GetAttributeValue("href", string.Empty); if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://")) < int index = hrefValue.IndexOf("&"); if (index >0) < hrefValue = hrefValue.Substring(0, index); listBox1.Items.Add(hrefValue.Replace("/url?q=", "")); >> > > catch(Exception e) < MessageBox.Show(e.ToString(), "An error has occurred!", MessageBoxButtons.OK, MessageBoxIcon.Exclamation); >> 

While I can’t see all the code , and can’t see any recursing I assume you are making multiple calls to SearchAndWrite , you probably need to rate limit your queries.

Try putting a 5 or 10 second wait in between each request, if the problem goes away then you need to find a way to not hammer google with queries.

Consider using a queue, and worker loop.

HTTP Status Code 429: What Is a 429 Error «Too Many, If you’re trying to figure out what the HTTP status code 429 error «too many requests » is, this is the video for you. Grab our free status code cheatsheet wi

Источник

How to avoid HTTP error 429 (Too Many Requests) python

I am trying to use Python to login to a website and gather information from several webpages and I get the following error:

> Traceback (most recent call last): > File «extract_test.py», line 43, in > response=br.open(v) > File «/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py», line 203, in open > return self._mech_open(url, data, timeout=timeout) > File «/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py», line 255, in _mech_open > raise response > mechanize._response.httperror_seek_wrapper: HTTP Error 429: Unknown Response Code

I used time.sleep() and it works, but it seems unintelligent and unreliable, is there any other way to dodge this error?

import mechanize import cookielib import re first=("example.com/page1") second=("example.com/page2") third=("example.com/page3") fourth=("example.com/page4") ## I have seven URL's I want to open urls_list=[first,second,third,fourth] br = mechanize.Browser() # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) # Log in credentials br.open("example.com") br.select_form(nr=0) br["username"] = "username" br["password"] = "password" br.submit() for url in urls_list: br.open(url) print re.findall("Some String") 

Python Solutions

Solution 1 — Python

Receiving a status 429 is not an error, it is the other server «kindly» asking you to please stop spamming requests. Obviously, your rate of requests has been too high and the server is not willing to accept this.

You should not seek to «dodge» this, or even try to circumvent server security settings by trying to spoof your IP, you should simply respect the server’s answer by not sending too many requests.

If everything is set up properly, you will also have received a «Retry-after» header along with the 429 response. This header specifies the number of seconds you should wait before making another call. The proper way to deal with this «problem» is to read this header and to sleep your process for that many seconds.

You can find more information on status 429 here: https://www.rfc-editor.org/rfc/rfc6585#page-3

Solution 2 — Python

Writing this piece of code when requesting fixed my problem:

requests.get(link, headers = 'User-agent': 'your bot 0.1'>) 

This works because sites sometimes return a Too Many Requests (429) error when there isn’t a user agent provided. For example, Reddit’s API only works when a user agent is applied.

Solution 3 — Python

As MRA said, you shouldn’t try to dodge a 429 Too Many Requests but instead handle it accordingly. You have several options depending on your use-case:

  1. Sleep your process. The server usually includes a Retry-after header in the response with the number of seconds you are supposed to wait before retrying. Keep in mind that sleeping a process might cause problems, e.g. in a task queue, where you should instead retry the task at a later time to free up the worker for other things.
  2. Exponential backoff. If the server does not tell you how long to wait, you can retry your request using increasing pauses in between. The popular task queue Celery has this feature built right-in.
  3. Token bucket. This technique is useful if you know in advance how many requests you are able to make in a given time. Each time you access the API you first fetch a token from the bucket. The bucket is refilled at a constant rate. If the bucket is empty, you know you’ll have to wait before hitting the API again. Token buckets are usually implemented on the other end (the API) but you can also use them as a proxy to avoid ever getting a 429 Too Many Requests . Celery’s rate_limit feature uses a token bucket algorithm.

Here is an example of a Python/Celery app using exponential backoff and rate-limiting/token bucket:

class TooManyRequests(Exception): """Too many requests""" @task( rate_limit='10/s', autoretry_for=(ConnectTimeout, TooManyRequests,), retry_backoff=True) def api(*args, **kwargs): r = requests.get('placeholder-external-api') if r.status_code == 429: raise TooManyRequests() 

Solution 4 — Python

if response.status_code == 429: time.sleep(int(response.headers["Retry-After"])) 

Solution 5 — Python

Another workaround would be to spoof your IP using some sort of Public VPN or Tor network. This would be assuming the rate-limiting on the server at IP level.

There is a brief blog post demonstrating a way to use tor along with urllib2:

Solution 6 — Python

I’ve found out a nice workaround to IP blocking when scraping sites. It lets you run a Scraper indefinitely by running it from Google App Engine and redeploying it automatically when you get a 429.

Solution 7 — Python

In many cases, continuing to scrape data from a website even when the server is requesting you not to is unethical. However, in the cases where it isn’t, you can utilize a list of public proxies in order to scrape a website with many different IP addresses.

Источник

Оцените статью