- Reading and Writing XML Files in Python
- Using BeautifulSoup alongside with lxml parser
- Reading Data From an XML File
- Python3
- Writing an XML File
- Python3
- Using Elementtree
- Reading XML Files
- Python3
- Writing XML Files
- xmltodict Module in Python: A Practical Reference
- Install the xmltodict module using pip
- What is an XML file?
- How to read XML data into a Python dictionary?
- How to convert a Python dictionary to XML?
- How to convert XML to JSON?
- How to convert JSON data to XML?
- Conclusion
Reading and Writing XML Files in Python
Extensible Markup Language, commonly known as XML is a language designed specifically to be easy to interpret by both humans and computers altogether. The language defines a set of rules used to encode a document in a specific format. In this article, methods have been described to read and write XML files in python.
Note: In general, the process of reading the data from an XML file and analyzing its logical components is known as Parsing. Therefore, when we refer to reading a xml file we are referring to parsing the XML document.
In this article, we would take a look at two libraries that could be used for the purpose of xml parsing. They are:
Using BeautifulSoup alongside with lxml parser
For the purpose of reading and writing the xml file we would be using a Python library named BeautifulSoup. In order to install the library, type the following command into the terminal.
pip install beautifulsoup4
Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser (used for parsing XML/HTML documents). lxml could be installed by running the following command in the command processor of your Operating system:
Firstly we will learn how to read from an XML file. We would also parse data stored in it. Later we would learn how to create an XML file and write data to it.
Reading Data From an XML File
There are two steps required to parse a xml file:-
XML File used:
Python3
Writing an XML File
Writing a xml file is a primitive process, reason for that being the fact that xml files aren’t encoded in a special way. Modifying sections of a xml document requires one to parse through it at first. In the below code we would modify some sections of the aforementioned xml document.
Python3
Using Elementtree
Elementtree module provides us with a plethora of tools for manipulating XML files. The best part about it being its inclusion in the standard Python’s built-in library. Therefore, one does not have to install any external modules for the purpose. Due to the xmlformat being an inherently hierarchical data format, it is a lot easier to represent it by a tree. The module provides ElementTree provides methods to represent whole XML document as a single tree.
In the later examples, we would take a look at discrete methods to read and write data to and from XML files.
Reading XML Files
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree.parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot(). Then displayed (printed) the root tag of our xml file (non-explicit way). Then displayed the attributes of the sub-tag of our parent tag using root[0].attrib. root[0] for the first tag of parent root and attrib for getting it’s attributes. Then we displayed the text enclosed within the 1st sub-tag of the 5th sub-tag of the tag root.
Python3
Writing XML Files
Now, we would take a look at some methods which could be used to write data on an xml document. In this example we would create a xml file from scratch.
To do the same, firstly, we create a root (parent) tag under the name of chess using the command ET.Element(‘chess’). All the tags would fall underneath this tag, i.e. once a root tag has been defined, other sub-elements could be created underneath it. Then we created a subtag/subelement named Opening inside the chess tag using the command ET.SubElement(). Then we created two more subtags which are underneath the tag Opening named E4 and D4. Then we added attributes to the E4 and D4 tags using set() which is a method found inside SubElement(), which is used to define attributes to a tag. Then we added text between the E4 and D4 tags using the attribute text found inside the SubElement function. In the end we converted the datatype of the contents we were creating from ‘xml.etree.ElementTree.Element’ to bytes object, using the command ET.tostring() (even though the function name is tostring() in certain implementations it converts the datatype to `bytes` rather than `str`). Finally, we flushed the data to a file named gameofsquares.xml which is a opened in `wb` mode to allow writing binary data to it. In the end, we saved the data to our file.
xmltodict Module in Python: A Practical Reference
In this tutorial, we will see how to install the xmltodict module and use it in our Python programs to work easily with XML files. We will see how to convert XML to Python dictionaries and to JSON format and vice-versa.
Install the xmltodict module using pip
For Python 3 or above we can use the pip3 command to install xmltodict using the terminal.
For older versions of Python we can use the following command to install xmltodict.
What is an XML file?
XML stands for extensible markup language and it was designed primarily for storing and transporting data.
It’s a descriptive language that supports writing structured data and we have to use other software to store, send, receive, or display XML data.
The following XML file has data for an airplane such as the year, make, model and color.
Cessna Skyhawk Light blue and white
Now, in the following sections, we will play with this airplane data and see how to convert it into Python dictionary and JSON and convert them back to XML format using the xmltodict module.
How to read XML data into a Python dictionary?
We can convert XML files to a Python dictionary using the xmltodict.parse() method in the xmltodict module.
xmltodict.parse() method takes an XML file as input and changes it to Ordered Dictionary.
We can then extract the dictionary data from Ordered Dictionary using dict constructor for Python dictionaries.
#import module import xmltodict #open the file fileptr = open("/home/aditya1117/askpython/plane.xml","r") #read xml content from the file xml_content= fileptr.read() print("XML content is:") print(xml_content) #change xml format to ordered dict my_ordered_dict=xmltodict.parse(xml_content) print("Ordered Dictionary is:") print(my_ordered_dict) print("Year of plane is:") print(my_ordered_dict['plane']['year']) #Use contents of ordered dict to make python dictionary my_plane= dict(my_ordered_dict['plane']) print("Created dictionary data is:") print(my_plane) print("Year of plane is") print(my_plane['year'])
XML content is:Ordered Dictionary is: OrderedDict([('plane', OrderedDict([('year', '1977'), ('make', 'Cessna'), ('model', 'Skyhawk'), ('color', 'Light blue and white')]))]) Year of plane is: 1977 Created dictionary data is: Year of plane is 1977 Cessna Skyhawk Light blue and white
In the above example, we have successfully extracted our airplane data from XML format using xmltodict.parse() method and printed the data both in the form of Ordered Dictionary and dictionary.
How to convert a Python dictionary to XML?
We can convert a python dictionary into the XML format using xmltodict.unparse() method of xmltodict module.
This method accepts the dictionary object as input as returns an XML format data as output.
The only restriction here is that the dictionary should have single root so that XML data can be formatted easily. Otherwise it will cause ValueError .
#import module import xmltodict #define dictionary with all the attributes mydict=> print("Original Dictionary of plane data is:") print(mydict) #create xml format xml_format= xmltodict.unparse(my_ordered_dict,pretty=True) print("XML format data is:") print(xml_format)
Original Dictionary of plane data is: > XML format data is:Cessna Skyhawk Light blue and white
In the above example, we have created airplane data in XML format from simple python dictionary data. Now we will see how to convert XML data into JSON format.
How to convert XML to JSON?
We can convert XML data into JSON format using the xmltodict module and the json module in python. In this process, we first create an ordered dictionary from XML format using xmltodict.parse() method.
Then we convert the ordered dictionary into JSON format using json.dumps() method which takes ordered dictionary as an argument and converts it into JSON string.
#import module import xmltodict import json #open the file fileptr = open("/home/aditya1117/askpython/plane.xml","r") #read xml content from the file xml_content= fileptr.read() print("XML content is:") print(xml_content) #change xml format to ordered dict my_ordered_dict=xmltodict.parse(xml_content) print("Ordered Dictionary is:") print(my_ordered_dict) json_data= json.dumps(my_ordered_dict) print("JSON data is:") print(json_data) x= open("plane.json","w") x.write(json_data) x.close()
XML content is:Ordered Dictionary is: OrderedDict([('plane', OrderedDict([('year', '1977'), ('make', 'Cessna'), ('model', 'Skyhawk'), ('color', 'Light blue and white')]))]) JSON data is: > Cessna Skyhawk Light blue and white
In the above example, we have read XML data into xml_content and then xmltodict.parse() creates an ordered dictionary my_ordered_dict and then JSON data is created using json.dumps() method from ordered dictionary.
How to convert JSON data to XML?
Now, let’s convert JSON data into the XML format using the xmltodict module by first converting the JSON data to Python dictionary using json.load() method and then converting the dictionary to XML using xmltodict.unparse() .
Again, here restriction is that JSON data should have a single root otherwise it will cause ValueError .
#import module import xmltodict import json #define dictionary with all the attributes fileptr = open("/home/aditya1117/askpython/plane.json","r") json_data=json.load(fileptr) print("JSON data is:") print(json_data) #create xml format xml_format= xmltodict.unparse(json_data,pretty=True) print("XML format data is:") print(xml_format)
JSON data is: > XML format data is:Cessna Skyhawk Light blue and white
In the above example, json.load() accepts the file object as argument and parses the data thereby creating a Python dictionary which is stored in json_data . Then we convert the dictionary into XML file using xmltodict.unparse() method.
Conclusion
In this article, we have used the xmltodict module to process XML data. We have seen how to convert XML data to Python dictionary and JSON format and also converted them back into XML format. Happy Learning!