Python pretty print an XML given an XML string
I generated a long and ugly XML string with Python and I need to filter it through pretty printer to look nicer. I found this post for python pretty printers, but I have to write the XML string to a file to be read back to use the tools, which I want to avoid if possible. What python pretty tools are available that work on strings?
3 Answers 3
Here’s how to parse from a text string to the lxml structured data type.
from lxml import etree xml_str = "text other text " root = etree.fromstring(xml_str) print etree.tostring(root, pretty_print=True)
from lxml import etree xml_str = "text other text " root = etree.fromstring(xml_str) print(etree.tostring(root, pretty_print=True).decode())
I use the lxml library, and there it’s as simple as
>>> print(etree.tostring(root, pretty_print=True))
You can do that operation using any etree , which you can either generate programmatically, or read from a file.
If you’re using the DOM from PyXML, it’s
import xml.dom.ext xml.dom.ext.PrettyPrint(doc)
That prints to the standard output, unless you specify an alternate stream.
To directly use the minidom, you want to use the toprettyxml() function.
If your xml exists as a minidom node, you can use the toprettyxml() function. If it really only ever exists as a string, you will have to parse it in before you can pretty print it out.
Here’s a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations. You mention that you have an xml string already so I am going to assume you used xml.dom.minidom.parseString()
With the following solution you can avoid writing to a file first:
import xml.dom.minidom import os def pretty_print_xml_given_string(input_string, output_xml): """ Useful for when you are editing xml data on the fly """ xml_string = input_string.toprettyxml() xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue with open(output_xml, "w") as file_out: file_out.write(xml_string)
I found how to fix the common newline issue here.
Background
I am using SQLite to access a database and retrieve the desired information. I’m using ElementTree in Python version 2.6 to create an XML file with that information.
Code
import sqlite3 import xml.etree.ElementTree as ET # NOTE: Omitted code where I acccess the database, # pull data, and add elements to the tree tree = ET.ElementTree(root) # Pretty printing to Python shell for testing purposes from xml.dom import minidom print minidom.parseString(ET.tostring(root)).toprettyxml(indent = " ") ####### Here lies my problem ####### tree.write("New_Database.xml")
Attempts
I’ve tried using tree.write(«New_Database.xml», «utf-8») in place of the last line of code above, but it did not edit the XML’s layout at all — it’s still a jumbled mess. I also decided to fiddle around and tried doing:
tree = minidom.parseString(ET.tostring(root)).toprettyxml(indent = » «)
instead of printing this to the Python shell, which gives the error AttributeError: ‘unicode’ object has no attribute ‘write’.
Questions
When I write my tree to an XML file on the last line, is there a way to pretty print to the XML file as it does to the Python shell? Can I use toprettyxml() here or is there a different way to do this?
8 Answers 8
Whatever your XML string is, you can write it to the file of your choice by opening a file for writing and writing the string to the file.
from xml.dom import minidom xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent=" ") with open("New_Database.xml", "w") as f: f.write(xmlstr)
There is one possible complication, especially in Python 2, which is both less strict and less sophisticated about Unicode characters in strings. If your toprettyxml method hands back a Unicode string ( u»something» ), then you may want to cast it to a suitable file encoding, such as UTF-8. E.g. replace the one write line with:
This answer would be clearer if you included the import xml.dom.minidom as minidom statement that appears to be required.
@KenPronovici Possibly. That import appears in the original question, but I’ve added it here so there’s no confusion.
This answer is repeated so often on any kind of questions, but it is anything but a good answer: You fully need to convert the whole XML tree to a string, reparse it, to again get it printed, this time just differently. This is not a good approach. Use lxml instead and serialize directly using the builtin method provided by lxml, this way eliminating any interemediate printing followed by reparsing.
This is an answer about how serialized XML gets written to file, not an endorsement of the OP’s serialization strategy, which is undoubtedly Byzantine. I love lxml , but being based on C, it’s not always available.
I simply solved it with the indent() function:
xml.etree.ElementTree.indent(tree, space=» «, level=0) Appends whitespace to the subtree to indent the tree visually. This can be used to generate pretty-printed XML output. tree can be an Element or ElementTree . space is the whitespace string that will be inserted for each indentation level, two space characters by default. For indenting partial subtrees inside of an already indented tree, pass the initial indentation level as level .
tree = ET.ElementTree(root) ET.indent(tree, space="\t", level=0) tree.write(file_name, encoding="utf-8")
Note, the indent() function was added in Python 3.9.
Note that @Tatarize’s answer has effectively a polyfill for this that works on older Python versions.
I found a way using straight ElementTree, but it is rather complex.
ElementTree has functions that edit the text and tail of elements, for example, element.text=»text» and element.tail=»tail» . You have to use these in a specific way to get things to line up, so make sure you know your escape characters.
I have the following file:
To place a third element in and keep it pretty, you need the following code:
addElement = ET.Element("data") # Make a new element addElement.set("version", "3") # Set the element's attribute addElement.tail = "\n" # Edit the element's tail addElement.text = "\n\t\t" # Edit the element's text newData = ET.SubElement(addElement, "data") # Make a subelement and attach it to our element newData.tail = "\n\t" # Edit the subelement's tail newData.text = "5431" # Edit the subelement's text root[-1].tail = "\n\t" # Edit the previous element's tail, so that our new element is properly placed root.append(addElement) # Add the element to the tree.
To indent the internal tags (like the internal data tag), you have to add it to the text of the parent element. If you want to indent anything after an element (usually after subelements), you put it in the tail.
This code give the following result when you write it to a file:
76939 266720 3569 5431
As another note, if you wish to make the program uniformally use \t , you may want to parse the file as a string first, and replace all of the spaces for indentations with \t .
This code was made in Python3.7, but still works in Python2.7.
Use xml.etree.ElementTree to print nicely formatted xml files [duplicate]
I am trying to use xml.etree.ElementTree to write out xml files with Python. The issue is that they keep getting generated in a single line. I want to be able to easily reference them so if it’s possible I would really like to be able to have the file written out cleanly. This is what I am getting:
This is not a true duplicate: The other question leaves the possibility to use any XML library. This question asks specifically for a solution when you are already working with the built-in element tree library. Imho it makes perfect sense to ask this question specifically for this library, because it is apparently a missing feature!?
@bluenote10 When someone report, the other people will re-report, just to follow something. I’m from Colombia, so sorry for my bad english.
You can check my answer here: stackoverflow.com/a/63373633/268627. As of Python 3.9 there will be a new indent function built-in.
2 Answers 2
You can use the function toprettyxml() from xml.dom.minidom in order to do that:
def prettify(elem): """Return a pretty-printed XML string for the Element. """ rough_string = ElementTree.tostring(elem, 'utf-8') reparsed = minidom.parseString(rough_string) return reparsed.toprettyxml(indent="\t")
The idea is to print your Element in a string, parse it using minidom and convert it again in XML using the toprettyxml function.