Python bs4 получить href

Содержание

How to Get href of Element using BeautifulSoup [Easily]
Get the href attribute of a tag
Get the href attribute of multi tags
Related Tutorials:
Recent Tutorials:
Парсер ссылок средствами Beautifulsoup

How to Get href of Element using BeautifulSoup [Easily]

To get the href attribute of tag, we need to use the following syntax:

Get the href attribute of a tag

In the following example, we’ll use find() function to find tag and [‘href’] to print the href attribute.

Python string ''' soup = BeautifulSoup(html, 'html.parser') # 👉️ Parsing a_tag = soup.find('a', href=True) # 👉️ Find tag that have a href attr print(a_tag['href']) # 👉️ Print href

href=True: the tags that have a href attribute.

Get the href attribute of multi tags

To get the href of multi tags, we need to use findall() function to find all tags and [‘href’] to print the href attribute. However, let’s see an example.

Python string Python variable Python list Python set 
 ''' soup = BeautifulSoup(html, 'html.parser') # 👉️ Parsing a_tags = soup.find_all('a', href=True) # 👉️ Find all tags that have a href attr # 👇 Loop over the results for tag in a_tags: print(tag['href']) # 👉️ Print href

Remember, when you want to get any attribute of a tag, use the following syntax:

You can visit beautifulsoup attribute to learn more about the BeautifulSoup attribute. Also, for more BeautifulSoup topics, scroll down you will find it.

Recent Tutorials:

Источник

Парсер ссылок средствами Beautifulsoup

Статья будет простая и для кого то будет из разряда «как нарисовать сову», но для меня это неважно, ибо материал все равно кому-нибудь пригодится.

Речь пойдет о библиотеке Beautfulsoup и в качестве искомых данных будут URL адреса на ссылки, которые на языке HTML размечаются как ссылка, для этого будем ловить значения тега и следующего за ним атрибута href.

Импортируем библиотеку requests:

и библиотеку bs4, откуда вызываем объект супа:

from bs4 import BeautifulSoup

url = 'https://yandex.ru/' r = requests.get(url) soup_ing = str(BeautifulSoup(r.content, 'lxml'))

предварительно кодируем переменную soup_ing:

сохраняем контент в файл test.html:

with open("test.html", "wb") as file: file.write(soup_ing)

создаем метод fromSoup, который будет искать ссылки и
открываем сохраненный файл:

def fromSoup(): html_file = ("test.html") html_file = open(html_file, encoding='UTF-8').read() soup = BeautifulSoup(html_file, 'lxml')

создаем объект soup, чтобы передать ему содержание файла:

soup = BeautifulSoup(html_file, 'lxml')

объявляем что поиск пройдет по всем тегам a:

for link in soup.find_all('a'):

и выводя содержимое в виде ссылок:

import requests from bs4 import BeautifulSoup url = 'https://yandex.ru/' r = requests.get(url) soup_ing = str(BeautifulSoup(r.content, 'lxml')) soup_ing = soup_ing.encode() with open("test.html", "wb") as file: file.write(soup_ing) def fromSoup(): html_file = ("test.html") html_file = open(html_file, encoding='UTF-8').read() soup = BeautifulSoup(html_file, 'lxml') # name of our soup for link in soup.find_all('a'): print(link.get('href')) fromSoup()