Python lxml xpath примеры

IanHopkinson / lxml_examples.py

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

The part about a default namespace when using lxml xpath for xml has been very helpfull to me. Thanks a lot!

To be honest the default namespace for using xml was my main reason for writing this. I spent a very long time getting this to work the first time around

This is really helpful. So the idea is, to name any prefix like foo for being able to use the default namespace because an empty prefix is not allowed?

The key point is that we can have an xml file that does not have prefixes for elements, they implicitly take the default namespace. When we query the xml we have to specify the default namespace. As I recall my mistake in the past was believing that my query would pick up the default namespace from the XML file it was querying, and that is not the case.

What is the purpose of the namespace variables on lines 59–61? They’re not used anywhere else that I can see.

@lsloan I think that is probably a hangover from an earlier version of the code, in this version it serves no purpose. The lxml documentation uses that style of namespace definition, I probably intended to use it down at line 74 and then forgot!

Gotcha. I thought that might be the case.

I forked your gist and made some changes. Then I added on some other examples of processing XML that contains QTI (Question & Test Interoperability) data. Experimenting with lxml.etree , I found that the default, unnamed namespace in the XML is available in the tree’s data in nsmap[None] . See my lxml-test-etree.py, line 11…

defaultNamespace = '_': root.nsmap[None]>

I found that naming it _ makes it convenient to refer to it in the XPath statement, as on line 23…

items = root.xpath('//_:item', namespaces=defaultNamespace)

I wanted to get that namespace used by default when xpath() is called. I tried setting the key to None or using root.nsmap itself, but those caused an error (» TypeError: empty namespace prefix is not supported in XPath «).

I’d like to not need to use the _: prefix for the element name, but at least it’s minimally obtrusive. Trying to set a truly default namespace is a lost cause, apparently. As written in the lxml FAQ, «How can I specify a default namespace for XPath expressions?». The short answer: «You can’t.» 🤷

As it turns out, I may prefer using lxml.objectify rather than lxml.etree , but I need to investigate a little more before I know for sure. See my lxml-test-objectify.py, for example.

Источник

Читайте также: String pool java что