Python read xml to string

Содержание

Convert Python ElementTree to string
Example usage
Explanation
Why not use str()?
Non-Latin Answer Extension

Convert Python ElementTree to string

The following is compatible with both Python 2 & 3, but only works for Latin characters:

xml_str = ElementTree.tostring(xml).decode()

Example usage

from xml.etree import ElementTree xml = ElementTree.Element("Person", Name="John") xml_str = ElementTree.tostring(xml).decode() print(xml_str)

Explanation

Despite what the name implies, ElementTree.tostring() returns a bytestring by default in Python 2 & 3. This is an issue in Python 3, which uses Unicode for strings.

In Python 2 you could use the str type for both text and binary data. Unfortunately this confluence of two different concepts could lead to brittle code which sometimes worked for either kind of data, sometimes not. [. ]
To make the distinction between text and binary data clearer and more pronounced, [Python 3] made text and binary data distinct types that cannot blindly be mixed together.

Source: Porting Python 2 Code to Python 3

If we know what version of Python is being used, we can specify the encoding as unicode or utf-8 . Otherwise, if we need compatibility with both Python 2 & 3, we can use decode() to convert into the correct type.

For reference, I’ve included a comparison of .tostring() results between Python 2 and Python 3.

ElementTree.tostring(xml) # Python 3: b'' # Python 2: ElementTree.tostring(xml, encoding='unicode') # Python 3: # Python 2: LookupError: unknown encoding: unicode ElementTree.tostring(xml, encoding='utf-8') # Python 3: b'' # Python 2: ElementTree.tostring(xml).decode() # Python 3: # Python 2:

Thanks to Martijn Peters for pointing out that the str datatype changed between Python 2 and 3.

Why not use str()?

In most scenarios, using str() would be the «cannonical» way to convert an object to a string. Unfortunately, using this with Element returns the object’s location in memory as a hexstring, rather than a string representation of the object’s data.

from xml.etree import ElementTree xml = ElementTree.Element("Person", Name="John") print(str(xml)) #

I had the same problem in Python 3.8 and none of the previous answers solved it. The issue is that ElementTree is both the name of a module and of a class within it. Using an alias makes it clear:

from xml.etree.ElementTree import ElementTree import xml.etree.ElementTree as XET . ElementTree.tostring(. ) # Attribute-error XET.tostring(. ) # Works

Element objects have no .getroot() method. Drop that call, and the .tostring() call works:

xmlstr = ElementTree.tostring(et, encoding='utf8', method='xml')

You only need to use .getroot() if you have an ElementTree instance.

This produces a bytestring, which in Python 3 is the bytes type.
If you must have a str object, you have two options:
Decode the resulting bytes value, from UTF-8: xmlstr.decode(«utf8»)
Use encoding=’unicode’ ; this avoids an encode / decode cycle:

xmlstr = ElementTree.tostring(et, encoding='unicode', method='xml')

Non-Latin Answer Extension

Extension to @Stevoisiak’s answer and dealing with non-Latin characters. Only one way will display the non-Latin characters to you. The one method is different on both Python 3 and Python 2.

xml = ElementTree.fromstring('') xml = ElementTree.Element("Person", Name="크리스") # Read Note about Python 2

NOTE: In Python 2, when calling the toString(. ) code, assigning xml with ElementTree.Element(«Person», Name=»크리스») will raise an error.

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xed in position 0: ordinal not in range(128)

ElementTree.tostring(xml) # Python 3 (크리스): b'' # Python 3 (John): b'' # Python 2 (크리스): # Python 2 (John): ElementTree.tostring(xml, encoding='unicode') # Python 3 (크리스):  <-------- Python 3 # Python 3 (John): # Python 2 (크리스): LookupError: unknown encoding: unicode # Python 2 (John): LookupError: unknown encoding: unicode ElementTree.tostring(xml, encoding='utf-8') # Python 3 (크리스): b'' # Python 3 (John): b'' # Python 2 (크리스):  <-------- Python 2 # Python 2 (John): ElementTree.tostring(xml).decode() # Python 3 (크리스): # Python 3 (John): # Python 2 (크리스): # Python 2 (John):

Источник

Читайте также: Font style in html email