H2O, Colab, Theano, Flutter, KNime, Mean.js, Weka, Solidity, Org.Json, AWS QuickSight, JSON.Simple, Jackson Annotations, Passay, Boon, MuleSoft, Nagios, Matplotlib, Java NIO, PyTorch, SLF4J, Parallax Scrolling, Java Cryptography

Beautiful Soup — Installation

As BeautifulSoup is not a standard python library, we need to install it first. We are going to install the BeautifulSoup 4 library (also known as BS4), which is the latest one.

To isolate our working environment so as not to disturb the existing setup, let us first create a virtual environment.

Creating a virtual environment (optional)

A virtual environment allows us to create an isolated working copy of python for a specific project without affecting the outside setup.

Best way to install any python package machine is using pip, however, if pip is not installed already (you can check it using – “pip –version” in your command or shell prompt), you can install by giving below command −

Linux environment

$sudo apt-get install python-pip

Windows environment

To install pip in windows, do the following −

  • Download the get-pip.py from https://bootstrap.pypa.io/get-pip.py or from the github to your computer.
  • Open the command prompt and navigate to the folder containing get-pip.py file.
  • Run the following command −
Читайте также:  Добавить python interpreter pycharm

That’s it, pip is now installed in your windows machine.

You can verify your pip installed by running below command −

>pip --version pip 19.2.3 from c:\users\yadur\appdata\local\programs\python\python37\lib\site-packages\pip (python 3.7)

Installing virtual environment

Run the below command in your command prompt −

After running, you will see the below screenshot −

Virtualenv

Below command will create a virtual environment (“myEnv”) in your current directory −

Screenshot

To activate your virtual environment, run the following command −

Virtual Environment

In the above screenshot, you can see we have “myEnv” as prefix which tells us that we are under virtual environment “myEnv”.

To come out of virtual environment, run deactivate.

(myEnv) C:\Users\yadur>deactivate C:\Users\yadur>

As our virtual environment is ready, now let us install beautifulsoup.

Installing BeautifulSoup

As BeautifulSoup is not a standard library, we need to install it. We are going to use the BeautifulSoup 4 package (known as bs4).

Linux Machine

To install bs4 on Debian or Ubuntu linux using system package manager, run the below command −

$sudo apt-get install python-bs4 (for python 2.x) $sudo apt-get install python3-bs4 (for python 3.x)

You can install bs4 using easy_install or pip (in case you find problem in installing using system packager).

$easy_install beautifulsoup4 $pip install beautifulsoup4

(You may need to use easy_install3 or pip3 respectively if you’re using python3)

Windows Machine

To install beautifulsoup4 in windows is very simple, especially if you have pip already installed.

>pip install beautifulsoup4

Beautifulsoup4

So now beautifulsoup4 is installed in our machine. Let us talk about some problems encountered after installation.

Problems after installation

On windows machine you might encounter, wrong version being installed error mainly through −

  • error: ImportError “No module named HTMLParser”, then you must be running python 2 version of the code under Python 3.
  • error: ImportError “No module named html.parser”error, then you must be running Python 3 version of the code under Python 2.

Best way to get out of above two situations is to re-install the BeautifulSoup again, completely removing existing installation.

If you get the SyntaxError “Invalid syntax” on the line ROOT_TAG_NAME = u’[document]’, then you need to convert the python 2 code to python 3, just by either installing the package −

or by manually running python’s 2 to 3 conversion script on the bs4 directory −

Installing a Parser

By default, Beautiful Soup supports the HTML parser included in Python’s standard library, however it also supports many external third party python parsers like lxml parser or html5lib parser.

To install lxml or html5lib parser, use the command −

Linux Machine

$apt-get install python-lxml $apt-get insall python-html5lib

Windows Machine

$pip install lxml $pip install html5lib

Installing a Parser

Generally, users use lxml for speed and it is recommended to use lxml or html5lib parser if you are using older version of python 2 (before 2.7.3 version) or python 3 (before 3.2.2) as python’s built-in HTML parser is not very good in handling older version.

Running Beautiful Soup

It is time to test our Beautiful Soup package in one of the html pages (taking web page – https://www.tutorialspoint.com/index.htm, you can choose any-other web page you want) and extract some information from it.

In the below code, we are trying to extract the title from the webpage −

from bs4 import BeautifulSoup import requests url = "https://www.tutorialspoint.com/index.htm" req = requests.get(url) soup = BeautifulSoup(req.text, "html.parser") print(soup.title)

Output

 

One common task is to extract all the URLs within a webpage. For that we just need to add the below line of code −

for link in soup.find_all('a'): print(link.get('href'))

Output

https://www.tutorialspoint.com/index.htm https://www.tutorialspoint.com/about/about_careers.htm https://www.tutorialspoint.com/questions/index.php https://www.tutorialspoint.com/online_dev_tools.htm https://www.tutorialspoint.com/codingground.htm https://www.tutorialspoint.com/current_affairs.htm https://www.tutorialspoint.com/upsc_ias_exams.htm https://www.tutorialspoint.com/tutor_connect/index.php https://www.tutorialspoint.com/whiteboard.htm https://www.tutorialspoint.com/netmeeting.php https://www.tutorialspoint.com/index.htm https://www.tutorialspoint.com/tutorialslibrary.htm https://www.tutorialspoint.com/videotutorials/index.php https://store.tutorialspoint.com https://www.tutorialspoint.com/gate_exams_tutorials.htm https://www.tutorialspoint.com/html_online_training/index.asp https://www.tutorialspoint.com/css_online_training/index.asp https://www.tutorialspoint.com/3d_animation_online_training/index.asp https://www.tutorialspoint.com/swift_4_online_training/index.asp https://www.tutorialspoint.com/blockchain_online_training/index.asp https://www.tutorialspoint.com/reactjs_online_training/index.asp https://www.tutorix.com https://www.tutorialspoint.com/videotutorials/top-courses.php https://www.tutorialspoint.com/the_full_stack_web_development/index.asp …. …. https://www.tutorialspoint.com/online_dev_tools.htm https://www.tutorialspoint.com/free_web_graphics.htm https://www.tutorialspoint.com/online_file_conversion.htm https://www.tutorialspoint.com/netmeeting.php https://www.tutorialspoint.com/free_online_whiteboard.htm https://www.tutorialspoint.com https://www.facebook.com/tutorialspointindia https://plus.google.com/u/0/+tutorialspoint http://www.twitter.com/tutorialspoint http://www.linkedin.com/company/tutorialspoint https://www.youtube.com/channel/UCVLbzhxVTiTLiVKeGV7WEBg https://www.tutorialspoint.com/index.htm /about/about_privacy.htm#cookies /about/faq.htm /about/about_helping.htm /about/contact_us.htm

Similarly, we can extract useful information using beautifulsoup4.

Now let us understand more about “soup” in above example.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: No module named ‘html.parser’; ‘html’ is not a package #25

ImportError: No module named ‘html.parser’; ‘html’ is not a package #25

Comments

I conda installed all dependencies in a fresh python=3.5 environment, but dendropy=4.3.0, which I pip installed, since I could not find a matching channel for this package.
Next, I cloned the sources and tried python setup.py install , which fails with the following error:

Traceback (most recent call last): File "setup.py", line 2, in from setuptools import setup, find_packages File "/home/sjanssen/miniconda3/envs/opal/lib/python3.5/site-packages/setuptools/__init__.py", line 14, in from setuptools.dist import Distribution, Feature File "/home/sjanssen/miniconda3/envs/opal/lib/python3.5/site-packages/setuptools/dist.py", line 24, in from setuptools.depends import Require File "/home/sjanssen/miniconda3/envs/opal/lib/python3.5/site-packages/setuptools/depends.py", line 7, in from .py33compat import Bytecode File "/home/sjanssen/miniconda3/envs/opal/lib/python3.5/site-packages/setuptools/py33compat.py", line 11, in from setuptools.extern.six.moves import html_parser File "/home/sjanssen/miniconda3/envs/opal/lib/python3.5/site-packages/setuptools/_vendor/six.py", line 92, in __get__ result = self._resolve() File "/home/sjanssen/miniconda3/envs/opal/lib/python3.5/site-packages/setuptools/_vendor/six.py", line 115, in _resolve return _import_module(self.mod) File "/home/sjanssen/miniconda3/envs/opal/lib/python3.5/site-packages/setuptools/_vendor/six.py", line 82, in _import_module __import__(name) ImportError: No module named 'html.parser'; 'html' is not a package 

The text was updated successfully, but these errors were encountered:

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show «ImportError: No module named HTMLParser» when python scripts are packed with cx_Freeze #38

Show «ImportError: No module named HTMLParser» when python scripts are packed with cx_Freeze #38

Comments

I have JIRA 0.39 installed and have the latest six and HTMLParser installed. However, I still got the following error message when my script is packed with cx_Freeze.

  • It works fine when JIRA related code is removed.
  • It works fined during debugging (source code level).
    Appreciate your comments on this.

Traceback (most recent call last):
File «C:\Python27\lib\site-packages\cx_Freeze\initscripts\Console.py», line 27
, in
exec code in m.dict
File «FreeMind.py», line 3, in
File «C:\Python27\lib\site-packages\jira__init__.py», line 5, in
from .config import get_jira
File «C:\Python27\lib\site-packages\jira\config.py», line 17, in
from jira.client import JIRA
File «C:\Python27\lib\site-packages\jira\client.py», line 33, in
from six.moves import html_parser
File «C:\Python27\lib\site-packages\six.py», line 199, in load_module
mod = mod._resolve()
File «C:\Python27\lib\site-packages\six.py», line 113, in _resolve
return _import_module(self.mod)
File «C:\Python27\lib\site-packages\six.py», line 80, in _import_module
import(name)
ImportError: No module named HTMLParser

The text was updated successfully, but these errors were encountered:

Источник

Оцените статью