Python get site url

Содержание

Get current URL in Python
CAPTURE URL
CAPTURE GET DATA
CAPTURE POST DATA
Write RAW to FILE
You need to define your own myPath here
How to get current URL in python web page?
2 Answers 2
Как получить url страницы?
Get current URL from browser using python
6 Answers 6

Get current URL in Python

Also, if you just need the querystring, this will work:

And, lastly, if you know the querystring variable that you’re looking for, you can do this:

self.request.get("name-of-querystring-variable")

For anybody finding this via google,

you can get the query strings on your current request using:

which is a UnicodeMultiDict of your query strings!

Yes, in general you want to use the tools provided by your framework and not manually parse URLs; this is a solved problem.

I couldn’t get the other answers to work, but here is what worked for me:

 url = os.environ['HTTP_HOST'] uri = os.environ['REQUEST_URI'] return url + uri

import os url = os.environ['HTTP_HOST']

this appears to ignore any query strings as it’s only grabbing the host URL: ParseResult(scheme=’localhost’, netloc=», path=’8080′, params=», query=», fragment=»)

This is how I capture in Python 3 from CGI (A) URL, (B) GET parameters and (C) POST data:

CAPTURE URL

myURLSelf = myDomainSelf + myPathSelf

CAPTURE GET DATA

CAPTURE POST DATA

myTotalBytes=int(os.environ.get('HTTP_CONTENT_LENGTH')) myPostDataRaw = io.open(sys.stdin.fileno(),"rb").read(myTotalBytes) myPostData = myPostDataRaw.decode("utf-8")

Write RAW to FILE

mySpy = «myURLSelf: [» + str(myURLSelf) + «]\n»

mySpy = mySpy + «myQuerySelf: [» + str(myQuerySelf) + «]\n»

mySpy = mySpy + «myPostData: [» + str(myPostData) + «]\n»

You need to define your own myPath here

myFilePath = myPath + «\» + myFilename

Here are some other useful CGI environment vars:

I am using these methods running Python 3 on Windows Server with CGI via MIIS.

requests module has ‘url’ attribute, that is changed url.
just try this:

import requests current_url=requests.get("some url").url print(current_url)

Hi Welcome to SO. Thank you for your answer but it doesn’t really solve the problem. How this will help extract query string out of the URL which is the main question asked.

If your python script is server side:

 import os url = os.environ print(url)

with that, you will see all the data os.environ gives you. It looks like your need the ‘QUERY_STRING’. Like any JSON object, you can obtain the data like this.

 import os url = os.environ['QUERY_STRING'] print(url)

And if you want a really elegant scalable solution you can use anywhere and always, you can save the variables into a dictionary (named vars here) like so:

 vars=<> splits=os.environ['QUERY_STRING'].split('&') for x in splits: name,value=x.split('=') vars[name]=value print(vars)

If you are client side, then any of the other responses involving the get request will work

Источник

How to get current URL in python web page?

I am a noob in Python. Just installed it, and spent 2 hours googleing how to get to a simple parameter sent in the URL to a Python script Found this Very helpful, except I cannot for anything in the world to figure out how to replace

import urlparse url = 'http://foo.appspot.com/abc?def=ghi' parsed = urlparse.urlparse(url) print urlparse.parse_qs(parsed.query)['def']

With what do I replace url = ‘string’ to make it work? I just want to access http://site.com/test/test.py?param=abc and see abc printed. Final code after Alex’s answer:

url = os.environ["REQUEST_URI"] parsed = urlparse.urlparse(url) print urlparse.parse_qs(parsed.query)['param']

Final code after Alex’s answer: code url = os.environ[«REQUEST_URI»] parsed = urlparse.urlparse(url) print urlparse.parse_qs(parsed.query)[‘param’] code

2 Answers 2

If you don’t have any libraries to do this for you, you can construct your current URL from the HTTP request that gets sent to your script via the browser.

The headers that interest you are Host and whatever’s after the HTTP method (probably GET , in your case). Here are some more explanations (first link that seemed ok, you’re free to Google some more :).

This answer shows you how to get the headers in your CGI script:

If you are running as a CGI, you can’t read the HTTP header directly, but the web server put much of that information into environment variables for you. You can just pick it out of os.environ[] .

If you’re doing this as an exercise, then it’s fine because you’ll get to understand what’s behind the scenes. If you’re building anything reusable, I recommend you use libraries or a framework so you don’t reinvent the wheel every time you need something.

Источник

Как получить url страницы?

Подскажите, пожалуйста, как с помощью python получить url страницы на которой находишься?
На самой вебстранице нет ссылок на саму страницу, чтобы спарсить из какой либо ссылки.

Например, зашел я на сайт domen.ru/catalog/product-1
как получить сам url — domen.ru/catalog/product-1 из адресной строки браузера?

tumbler

Судя по наличию слова «спарсить», всё-таки имеется ввиду запрос URL из python для последующего разбора. Как получить URL запроса, который только что запросил? Думаю, сохранить в переменную перед запросом и прочитать из переменной сразу после. Если используется scrapy или другие библиотеки для обхода интернета — тогда подробности в студию.

Если же действительно вопрос состоит в получении URL из адресной строки браузера, то я бы в первую очередь смотрел на browser api (к которому наверняка можно достучаться аналогично тому, как это делает Selenium) или на расширения (которые не на python, а на js) — с последующей отправкой этого url на сервер.

Сергей, все верно, изначально я собираю ссылки одной функцией на странице категории, после перехода на страницу другой функцией парсю товар. Толкьо я не совсем понимаю как прикрепить ссылку из первой функции и выводить во второй. Часть кода такая:

# получаем все ссылки на товары на одной странице def get_all_links(html): soup = BeautifulSoup(html, 'lxml') links_product_detail = soup.findAll('a', class_='product__list--code') links = [] for hrefs in links_product_detail: a = hrefs.get('href') # string link = 'http://www.futureelectronics.com' + a links.append(link) return links

def get_page_data(html): soup = BeautifulSoup(html, 'lxml') # # 3. название товара try: name = soup.find('h2', class_='product-title').text.strip() except: name = ''

Я собираю в get_page_data(html) все парамеры товара и формирую csv. Не хватает знаний как можно и ссылку на товар прикрепить в таблицу с соответствующим товаром

tumbler

import requests from bs4 import BeautifulSoup import re import csv from datetime import datetime from multiprocessing import Pool def get_html(url): r = requests.get(url) if r.ok: # 200 return r.text # возвращает HTML-код страницы (url) print(r.status_code) # получаем все ссылки на товары на одной странице def get_all_links(html): soup = BeautifulSoup(html, 'lxml') links_product_detail = soup.findAll('a', class_='product__list--code') links = [] for hrefs in links_product_detail: a = hrefs.get('href') # string link = 'http://www.futureelectronics.com' + a links.append(link) return links def get_page_data(html): soup = BeautifulSoup(html, 'lxml') links_product_detail = soup.findAll('a', class_='product__list--code') for hrefs in links_product_detail: a = hrefs.get('href') # string link = 'http://www.futureelectronics.com' + a print(link) # # 3. название товара try: name = soup.find('h2', class_='product-title').text.strip() except: name = '' data = return data def write_csv(data): with open('semiconductors_analog.csv', 'a') as file: writer = csv.writer(file) writer.writerow( (data['name']) ) # print(data['name'], 'parsed') def make_all(url): html = get_html(url) data = get_page_data(html) write_csv(data) def main(): start = datetime.now() # Current Category: Semiconductors # https://www.futureelectronics.com/c/semiconductors/analog/products?q=%3Arelevance&text=&pageSize=25&page=1 # https://www.futureelectronics.com/c/semiconductors/analog/products?q=%3Arelevance&text=&pageSize=25&page=2 pattern = 'https://www.futureelectronics.com/c/semiconductors/analog/products?q=%3Arelevance&text=&pageSize=25&page=<>' # передаем url в функцию def get_html(url) for i in range(0, 1): url = pattern.format(str(i)) # print(url) all_links = get_all_links( get_html(url) ) # до мультипарсинга 0:00:10.555286 # for index, url in enumerate(all_links): # html = get_html(url) # data = get_page_data(html) # write_csv(data) # print(index) with Pool(20) as p: p.map(make_all, all_links) end = datetime.now() total = end - start print(str(total)) if __name__ == '__main__': main()

tumbler

url — это и есть ссылка на текущую страницу. Добавьте параметр в make_all_dataи передавайте в make_all.

Источник

Get current URL from browser using python

I am running an HTTP server which serves a bitmap according to the dimensions in the browser URL i.e localhost://image_x120_y30.bmp . My server is running in infinite loop and I want to get the URL any time user requests for BITMAP, and at the end I can extract the image dimensions from the URL. The question asked here: How to get current URL in python web page? does not address my problem as I am running in infinite loop and I want to keep on getting the current URL so I can deliver the requested BITMAP to the user.

6 Answers 6

If to use Selenium for web navigation:

from selenium import webdriver driver = webdriver.Firefox() print (driver.current_url)

Selenium compatible with 2.7 as well as with 3.4. You need to install this package first and then import it within code. Try pip install selenium

it is now working but it opens a new browser window of firefox , what I want is to get the url from browser.

You can access page of required site with driver.get(‘put_your_site_name’) and then get page url with driver.current_url after each loop iteration P.S. Please put more info about how your script works/should work or just show the part of existed code

You can get the current url by doing path_info = request.META.get(‘PATH_INFO’) http_host = request.META.get(‘HTTP_HOST’) . You can add these two to get complete url. Basically request.META returns you a dictionary which contain a lot of information. You can try it.

I just solved a class problem similar to this. We’ve been using Splinter to walk through pages (you will need to download splinter and Selenium). As I walk through pages, I periodically need to pull the url of the page I’m currently on. I do that using the command new_url = browser.url Below is an example of my code.

I do this using the following code.

##import dependencies from splinter import browser import requests ## go to original page browser.visit(url) ## Loop through the page associated with each headline for headline in titles: print(headline.text) browser.click_link_by_partial_text(headline.text) ## Now that I'm on the new page, I need to grab the url new_url = browser.url print(new_url) ## Go back to original page browser.visit(url)

Below is the solution I use in Django.

try: from urlparse import urlparse except ImportError: from urllib.parse import urlparse frontend_url = request.META.get('HTTP_REFERER') url = urlparse(frontend_url) print (url) # ParseResult(scheme='https', netloc='example.com', path='/dashboard', params='', query='', fragment='')

 Hello you can use below code in order to achieve URL from open browser import os import webbrowser import pyperclip import time import keyboard import pygetwindow as gw import pyautogui @app.route("/") def redirect_to_authorization(): redirect_url = f"https://www.google.com" webbrowser.open(redirect_url) time.sleep(5) browser_window = gw.getActiveWindow() browser_window.activate() pyautogui.hotkey('ctrl', 'l') time.sleep(2) pyautogui.hotkey('ctrl', 'c') keyboard.press_and_release('ctrl + c') time.sleep(0.5) url = pyperclip.paste() print(url) # os.system("taskkill /f /im chrome.exe") index = url.find('code=') if index != -1: code = url[index + len('code='):] print("Code:", code) # os.system("taskkill /f /im chrome.exe") return # Or you can use below code too @app.route("/CodeTwo") def redirect_to_authorization(): redirect_url = f"https://www.google.com" webbrowser.open(redirect_url) time.sleep(5) active_window = gw.getActiveWindow() if active_window is not None: title = active_window.title if " - Google Chrome" in title: # Extract the URL from the title url = title.split(" - Google Chrome")[0] return

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

You could use the requests module:

import requests link = "https://stackoverflow.com" data = requests.request("GET", link) url = data.url

This solution doesn’t answer what he needs. It also has errors as you passed url instead of variable link as parameter to request.

Источник

Читайте также: Text limit in html