- How to read XML file in Python
- Introduction to XML
- Read XML File Using MiniDOM
- Syntax
- Example Read XML File in Python
- Read XML File Using BeautifulSoup alongside the lxml parser
- Example Read XML File in Python
- Read XML File Using Element Tree
- Example Read XML File in Python
- Read XML File Using Simple API for XML (SAX)
- Python Code Example
- Conclusion
- Reading and Writing XML Files in Python
- Using BeautifulSoup alongside with lxml parser
- Reading Data From an XML File
- Python3
- Writing an XML File
- Python3
- Using Elementtree
- Reading XML Files
- Python3
- Writing XML Files
How to read XML file in Python
In this article, we will learn various ways to read XML files in Python. We will use some built-in modules and libraries available in Python and some related custom examples as well. Let’s first have a quick look over the full form of XML, introduction to XML, and then read about various parsing modules to read XML documents in Python.
Introduction to XML
XML stands for Extensible Markup Language . It’s needed for keeping track of the tiny to medium amount of knowledge. It allows programmers to develop their own applications to read data from other applications. The method of reading the information from an XML file and further analyzing its logical structure is known as Parsing. Therefore, reading an XML file is that the same as parsing the XML document.
In this article, we would take a look at four different ways to read XML documents using different XML modules. They are:
1. MiniDOM(Minimal Document Object Model)
2. BeautifulSoup alongside the lxml parser
XML File: We are using this XML file to read in our examples.
Read XML File Using MiniDOM
It is Python module, used to read XML file. It provides parse() function to read XML file. We must import Minidom first before using its function in the application. The syntax of this function is given below.
Syntax
xml.dom.minidom.parse(filename_or_file[, parser[, bufsize]])
This function returns a document of XML type.
Example Read XML File in Python
Since each node will be treated as an object, we are able to access the attributes and text of an element using the properties of the object. Look at the example below, we’ve accessed the attributes and text of a selected node.
from xml.dom import minidom # parse an xml file by name file = minidom.parse('models.xml') #use getElementsByTagName() to get tag models = file.getElementsByTagName('model') # one specific item attribute print('model #2 attribute:') print(models[1].attributes['name'].value) # all item attributes print('\nAll attributes:') for elem in models: print(elem.attributes['name'].value) # one specific item's data print('\nmodel #2 data:') print(models[1].firstChild.data) print(models[1].childNodes[0].data) # all items data print('\nAll model data:') for elem in models: print(elem.firstChild.data)
model #2 attribute:
model2
All attributes:
model1
model2
model #2 data:
model2abc
model2abc
All model data:
model1abc
model2abc
Read XML File Using BeautifulSoup alongside the lxml parser
In this example, we will use a Python library named BeautifulSoup . Beautiful Soup supports the HTML parser (lxml) included in Python’s standard library. Use the following command to install beautiful soup and lmxl parser in case, not installed.
#for beautifulsoup pip install beautifulsoup4 #for lmxl parser pip install lxml
After successful installation, use these libraries in python code.
We are using this XML file to read with Python code.
Acer is a laptop Add model number here Onida is an oven Exclusive Add price here Add content here Add company name here Add number of employees here
Example Read XML File in Python
Let’s read the above file using beautifulsoup library in python script.
from bs4 import BeautifulSoup # Reading the data inside the xml file to a variable under the name data with open('models.xml', 'r') as f: data = f.read() # Passing the stored data inside the beautifulsoup parser bs_data = BeautifulSoup(data, 'xml') # Finding all instances of tag b_unique = bs_data.find_all('unique') print(b_unique) # Using find() to extract attributes of the first instance of the tag b_name = bs_data.find('child', ) print(b_name) # Extracting the data stored in a specific attribute of the `child` tag value = b_name.get('qty') print(value)
Read XML File Using Element Tree
The Element tree module provides us with multiple tools for manipulating XML files. No installation is required. Due to the XML format present in the hierarchical data format, it becomes easier to represent it by a tree. Element Tree represents the whole XML document as a single tree.
Example Read XML File in Python
To read an XML file, firstly, we import the ElementTree class found inside the XML library. Then, we will pass the filename of the XML file to the ElementTree.parse() method, to start parsing. Then, we will get the parent tag of the XML file using getroot() . Then we will display the parent tag of the XML file. Now, to get attributes of the sub-tag of the parent tag will use root[0].attrib . At last, display the text enclosed within the 1st sub-tag of the 5th sub-tag of the tag root.
# importing element tree import xml.etree.ElementTree as ET # Pass the path of the xml document tree = ET.parse('models.xml') # get the parent tag root = tree.getroot() # print the root (parent) tag along with its memory location print(root) # print the attributes of the first tag print(root[0].attrib) # print the text contained within first subtag of the 5th tag from the parent print(root[5][0].text)
Read XML File Using Simple API for XML (SAX)
In this method, first, register callbacks for events that occur, then the parser proceeds through the document. this can be useful when documents are large or memory limitations are present. It parses the file because it reads it from disk and also the entire file isn’t stored in memory. Reading XML using this method requires the creation of ContentHandler by subclassing xml.sax.ContentHandler.
Note: This method might not be compatible with Python 3 version. Please check your version before implementing this method.
- ContentHandler — handles the tags and attributes of XML. The ContentHandler is called at the beginning and at the end of every element.
- startDocument and endDocument — called at the start and the end of the XML file.
- If the parser is’nt in namespace mode, the methods startElement(tag, attributes) and endElement(tag) are called; otherwise, the corresponding methods startElementNS and endElementNS
35000 12 Samsung 46500 14 Onida 30000 8 Lenovo 45000 12 Acer
Python Code Example
import xml.sax class XMLHandler(xml.sax.ContentHandler): def __init__(self): self.CurrentData = "" self.price = "" self.qty = "" self.company = "" # Call when an element starts def startElement(self, tag, attributes): self.CurrentData = tag if(tag == "model"): print("*****Model*****") title = attributes["number"] print("Model number:", title) # Call when an elements ends def endElement(self, tag): if(self.CurrentData == "price"): print("Price:", self.price) elif(self.CurrentData == "qty"): print("Quantity:", self.qty) elif(self.CurrentData == "company"): print("Company:", self.company) self.CurrentData = "" # Call when a character is read def characters(self, content): if(self.CurrentData == "price"): self.price = content elif(self.CurrentData == "qty"): self.qty = content elif(self.CurrentData == "company"): self.company = content # create an XMLReader parser = xml.sax.make_parser() # turn off namepsaces parser.setFeature(xml.sax.handler.feature_namespaces, 0) # override the default ContextHandler Handler = XMLHandler() parser.setContentHandler( Handler ) parser.parse("models.xml")
*****Model*****
Model number: ST001
Price: 35000
Quantity: 12
Company: Samsung
*****Model*****
Model number: RW345
Price: 46500
Quantity: 14
Company: Onida
*****Model*****
Model number: EX366
Price: 30000
Quantity: 8
Company: Lenovo
*****Model*****
Model number: FU699
Price: 45000
Quantity: 12
Company: Acer
Conclusion
In this article, we learned about XML files and different ways to read an XML file by using several built-in modules and API’s such as Minidom , Beautiful Soup , ElementTree , Simple API(SAX) . We used some custom parsing codes as well to parse the XML file.
Reading and Writing XML Files in Python
Extensible Markup Language, commonly known as XML is a language designed specifically to be easy to interpret by both humans and computers altogether. The language defines a set of rules used to encode a document in a specific format. In this article, methods have been described to read and write XML files in python.
Note: In general, the process of reading the data from an XML file and analyzing its logical components is known as Parsing. Therefore, when we refer to reading a xml file we are referring to parsing the XML document.
In this article, we would take a look at two libraries that could be used for the purpose of xml parsing. They are:
Using BeautifulSoup alongside with lxml parser
For the purpose of reading and writing the xml file we would be using a Python library named BeautifulSoup. In order to install the library, type the following command into the terminal.
pip install beautifulsoup4
Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser (used for parsing XML/HTML documents). lxml could be installed by running the following command in the command processor of your Operating system:
Firstly we will learn how to read from an XML file. We would also parse data stored in it. Later we would learn how to create an XML file and write data to it.
Reading Data From an XML File
There are two steps required to parse a xml file:-
XML File used:
Python3
Writing an XML File
Writing a xml file is a primitive process, reason for that being the fact that xml files aren’t encoded in a special way. Modifying sections of a xml document requires one to parse through it at first. In the below code we would modify some sections of the aforementioned xml document.
Python3
Using Elementtree
Elementtree module provides us with a plethora of tools for manipulating XML files. The best part about it being its inclusion in the standard Python’s built-in library. Therefore, one does not have to install any external modules for the purpose. Due to the xmlformat being an inherently hierarchical data format, it is a lot easier to represent it by a tree. The module provides ElementTree provides methods to represent whole XML document as a single tree.
In the later examples, we would take a look at discrete methods to read and write data to and from XML files.
Reading XML Files
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree.parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot(). Then displayed (printed) the root tag of our xml file (non-explicit way). Then displayed the attributes of the sub-tag of our parent tag using root[0].attrib. root[0] for the first tag of parent root and attrib for getting it’s attributes. Then we displayed the text enclosed within the 1st sub-tag of the 5th sub-tag of the tag root.
Python3
Writing XML Files
Now, we would take a look at some methods which could be used to write data on an xml document. In this example we would create a xml file from scratch.
To do the same, firstly, we create a root (parent) tag under the name of chess using the command ET.Element(‘chess’). All the tags would fall underneath this tag, i.e. once a root tag has been defined, other sub-elements could be created underneath it. Then we created a subtag/subelement named Opening inside the chess tag using the command ET.SubElement(). Then we created two more subtags which are underneath the tag Opening named E4 and D4. Then we added attributes to the E4 and D4 tags using set() which is a method found inside SubElement(), which is used to define attributes to a tag. Then we added text between the E4 and D4 tags using the attribute text found inside the SubElement function. In the end we converted the datatype of the contents we were creating from ‘xml.etree.ElementTree.Element’ to bytes object, using the command ET.tostring() (even though the function name is tostring() in certain implementations it converts the datatype to `bytes` rather than `str`). Finally, we flushed the data to a file named gameofsquares.xml which is a opened in `wb` mode to allow writing binary data to it. In the end, we saved the data to our file.