- Python XML File – How to Read, Write & Parse
- How to Parse XML using minidom
- How to Write XML Node
- XML Parser Example
- How to Parse XML using ElementTree
- Summary
- Reading and Writing XML Files in Python
- Using BeautifulSoup alongside with lxml parser
- Reading Data From an XML File
- Python3
- Writing an XML File
- Python3
- Using Elementtree
- Reading XML Files
- Python3
- Writing XML Files
Python XML File – How to Read, Write & Parse
XML stands for eXtensible Markup Language. It was designed to store and transport small to medium amounts of data and is widely used for sharing structured information.
Python enables you to parse and modify XML documents. In order to parse XML document, you need to have the entire XML document in memory. In this tutorial, we will see how we can use XML minidom class in Python to load and parse XML files.
How to Parse XML using minidom
We have created a sample XML file that we are going to parse.
Step 1) Create Sample XML file
Inside the file, we can see the first name, last name, home, and the area of expertise (SQL, Python, Testing and Business)
Step 2) Use the parse function to load and parse the XML file
Once we have parsed the document, we will print out the “node name” of the root of the document and the “firstchild tagname”. Tagname and nodename are the standard properties of the XML file.
- Import the xml.dom.minidom module and declare file that has to be parsed (myxml.xml)
- This file carries some basic information about an employee like first name, last name, home, expertise, etc.
- We use the parse function on the XML minidom to load and parse the XML file
- We have variable doc and doc gets the result of the parse function
- We want to print the nodename and child tagname from the file, so we declare it in print function
- Run the code- It prints out the nodename (#document) from the XML file and the first child tagname (employee) from the XML file
Nodename and child tagname are the standard names or properties of an XML dom.
Step 3) Call the list of XML tags from the XML document and printed out
Next, We can also call the list of XML tags from the XML document and printed out. Here we printed out the set of skills like SQL, Python, Testing and Business.
- Declare the variable expertise, from which we going to extract all the expertise name employee is having
- Use the dom standard function called “getElementsByTagName”
- This will get all the elements named skill
- Declare loop over each one of the skill tags
- Run the code- It will give list of four skills
How to Write XML Node
We can create a new attribute by using the “createElement” function and then append this new attribute or tag to the existing XML tags. We added a new tag “BigData” in our XML file.
- You have to code to add the new attribute (BigData) to the existing XML tag
- Then, you have to print out the XML tag with new attributes appended with the existing XML tag
- To add a new XML and add it to the document, we use code “doc.create elements”
- This code will create a new skill tag for our new attribute “Big-data”
- Add this skill tag into the document first child (employee)
- Run the code- the new tag “big data” will appear with the other list of expertise
XML Parser Example
Python 2 Example
import xml.dom.minidom def main(): # use the parse() function to load and parse an XML file doc = xml.dom.minidom.parse("Myxml.xml"); # print out the document node and the name of the first child tag print doc.nodeName print doc.firstChild.tagName # get a list of XML tags from the document and print each one expertise = doc.getElementsByTagName("expertise") print "%d expertise:" % expertise.length for skill in expertise: print skill.getAttribute("name") #Write a new XML tag and add it into the document newexpertise = doc.createElement("expertise") newexpertise.setAttribute("name", "BigData") doc.firstChild.appendChild(newexpertise) print " " expertise = doc.getElementsByTagName("expertise") print "%d expertise:" % expertise.length for skill in expertise: print skill.getAttribute("name") if name == "__main__": main();
Python 3 Example
import xml.dom.minidom def main(): # use the parse() function to load and parse an XML file doc = xml.dom.minidom.parse("Myxml.xml"); # print out the document node and the name of the first child tag print (doc.nodeName) print (doc.firstChild.tagName) # get a list of XML tags from the document and print each one expertise = doc.getElementsByTagName("expertise") print ("%d expertise:" % expertise.length) for skill in expertise: print (skill.getAttribute("name")) # Write a new XML tag and add it into the document newexpertise = doc.createElement("expertise") newexpertise.setAttribute("name", "BigData") doc.firstChild.appendChild(newexpertise) print (" ") expertise = doc.getElementsByTagName("expertise") print ("%d expertise:" % expertise.length) for skill in expertise: print (skill.getAttribute("name")) if __name__ == "__main__": main();
How to Parse XML using ElementTree
ElementTree is an API for manipulating XML. ElementTree is the easy way to process XML files.
We are using the following XML document as the sample data:
Reading XML using ElementTree:
we must first import the xml.etree.ElementTree module.
import xml.etree.ElementTree as ET
Now let’s fetch the root element:
Following is the complete code for reading above xml data
import xml.etree.ElementTree as ET tree = ET.parse('items.xml') root = tree.getroot() # all items data print('Expertise Data:') for elem in root: for subelem in elem: print(subelem.text)
Expertise Data: SQL Python
Summary
Python enables you to parse the entire XML document at one go and not just one line at a time. In order to parse XML document you need to have the entire document in memory.
- To parse XML document
- Import xml.dom.minidom
- Use the function “parse” to parse the document ( doc=xml.dom.minidom.parse (file name);
- Call the list of XML tags from the XML document using code (=doc.getElementsByTagName( “name of xml tags”)
- To create and add new attribute in XML document
- Use function “createElement”
Reading and Writing XML Files in Python
Extensible Markup Language, commonly known as XML is a language designed specifically to be easy to interpret by both humans and computers altogether. The language defines a set of rules used to encode a document in a specific format. In this article, methods have been described to read and write XML files in python.
Note: In general, the process of reading the data from an XML file and analyzing its logical components is known as Parsing. Therefore, when we refer to reading a xml file we are referring to parsing the XML document.
In this article, we would take a look at two libraries that could be used for the purpose of xml parsing. They are:
Using BeautifulSoup alongside with lxml parser
For the purpose of reading and writing the xml file we would be using a Python library named BeautifulSoup. In order to install the library, type the following command into the terminal.
pip install beautifulsoup4
Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser (used for parsing XML/HTML documents). lxml could be installed by running the following command in the command processor of your Operating system:
Firstly we will learn how to read from an XML file. We would also parse data stored in it. Later we would learn how to create an XML file and write data to it.
Reading Data From an XML File
There are two steps required to parse a xml file:-
XML File used:
Python3
Writing an XML File
Writing a xml file is a primitive process, reason for that being the fact that xml files aren’t encoded in a special way. Modifying sections of a xml document requires one to parse through it at first. In the below code we would modify some sections of the aforementioned xml document.
Python3
Using Elementtree
Elementtree module provides us with a plethora of tools for manipulating XML files. The best part about it being its inclusion in the standard Python’s built-in library. Therefore, one does not have to install any external modules for the purpose. Due to the xmlformat being an inherently hierarchical data format, it is a lot easier to represent it by a tree. The module provides ElementTree provides methods to represent whole XML document as a single tree.
In the later examples, we would take a look at discrete methods to read and write data to and from XML files.
Reading XML Files
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree.parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot(). Then displayed (printed) the root tag of our xml file (non-explicit way). Then displayed the attributes of the sub-tag of our parent tag using root[0].attrib. root[0] for the first tag of parent root and attrib for getting it’s attributes. Then we displayed the text enclosed within the 1st sub-tag of the 5th sub-tag of the tag root.
Python3
Writing XML Files
Now, we would take a look at some methods which could be used to write data on an xml document. In this example we would create a xml file from scratch.
To do the same, firstly, we create a root (parent) tag under the name of chess using the command ET.Element(‘chess’). All the tags would fall underneath this tag, i.e. once a root tag has been defined, other sub-elements could be created underneath it. Then we created a subtag/subelement named Opening inside the chess tag using the command ET.SubElement(). Then we created two more subtags which are underneath the tag Opening named E4 and D4. Then we added attributes to the E4 and D4 tags using set() which is a method found inside SubElement(), which is used to define attributes to a tag. Then we added text between the E4 and D4 tags using the attribute text found inside the SubElement function. In the end we converted the datatype of the contents we were creating from ‘xml.etree.ElementTree.Element’ to bytes object, using the command ET.tostring() (even though the function name is tostring() in certain implementations it converts the datatype to `bytes` rather than `str`). Finally, we flushed the data to a file named gameofsquares.xml which is a opened in `wb` mode to allow writing binary data to it. In the end, we saved the data to our file.