Beautifulsoup python findall href

how to get all page urls from a website

Web scraping is the technique to extract data from a website.

The module BeautifulSoup is designed for web scraping. The BeautifulSoup module can handle HTML and XML. It provides simple method for searching, navigating and modifying the parse tree.

from BeautifulSoup import BeautifulSoup
import urllib2
import re

html_page = urllib2.urlopen(«https://arstechnica.com»)
soup = BeautifulSoup(html_page)
for link in soup.findAll(‘a’, attrs={‘href’: re.compile(«^http://»)}):
print link.get(‘href’)

It downloads the raw html code with the line:

html_page = urllib2.urlopen(«https://arstechnica.com»)

A BeautifulSoup object is created and we use this object to find all links:

soup = BeautifulSoup(html_page)
for link in soup.findAll(‘a’, attrs={‘href’: re.compile(«^http://»)}):
print link.get(‘href’)

from BeautifulSoup import BeautifulSoup
import urllib2
import re

html_page = urllib2.urlopen(«https://arstechnica.com»)
soup = BeautifulSoup(html_page)
links = []

for link in soup.findAll(‘a’, attrs={‘href’: re.compile(«^http://»)}):
links.append(link.get(‘href’))

print(links)

from BeautifulSoup import BeautifulSoup
import urllib2
import re

def getLinks(url):
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)
links = []

for link in soup.findAll(‘a’, attrs={‘href’: re.compile(«^http://»)}):
links.append(link.get(‘href’))

return links

print( getLinks(«https://arstechnica.com») )

Источник

How to Get href of Element using BeautifulSoup [Easily]

To get the href attribute of tag, we need to use the following syntax:

Get the href attribute of a tag

In the following example, we’ll use find() function to find tag and [‘href’] to print the href attribute.

Python string ''' soup = BeautifulSoup(html, 'html.parser') # 👉️ Parsing a_tag = soup.find('a', href=True) # 👉️ Find tag that have a href attr print(a_tag['href']) # 👉️ Print href 
href=True: the tags that have a href attribute.

Get the href attribute of multi tags

To get the href of multi tags, we need to use findall() function to find all tags and [‘href’] to print the href attribute. However, let’s see an example.

Python string 
Python variable Python list Python set
''' soup = BeautifulSoup(html, 'html.parser') # 👉️ Parsing a_tags = soup.find_all('a', href=True) # 👉️ Find all tags that have a href attr # 👇 Loop over the results for tag in a_tags: print(tag['href']) # 👉️ Print href

Remember, when you want to get any attribute of a tag, use the following syntax:

You can visit beautifulsoup attribute to learn more about the BeautifulSoup attribute. Also, for more BeautifulSoup topics, scroll down you will find it.

Recent Tutorials:

Источник

Читайте также:  Php самоучитель с заданиями
Оцените статью