I’m exporting the tags to xml for a bunch of sites because I need to audit whats the same and whats different.
I’ve found that the ordering of the tags in the output seems to be non-deterministic (or rather I couldn’t see how they are being ordered). An obvious ordering, such as alphabetically by tag name, would make using automated comparison tools like WinMerge possible, which would be a huge reduction in workload. Right now I’ve exported the tags for two (what look like they could be) identical sites, but since the tag orderings are not consistent between them WinMerge is showing more differences than there actually are.
I suppose using XML-specific analysis tools could alleviate this, but it would be nice to be able to get away with just using WinMerge and Notepad++, for example.
The following python will open an XML file, sort the elements by the ‘name’ attribute, and then resave the file for use in WinMerge or whatever your diff tools of choice are:
from lxml import etree
with open('BLRVNTRR_changed.xml', 'r') as my_file:
data = my_file.read()
doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))
for parent in doc.xpath('//*[./*]'): # Search for parent elements
parent[:] = sorted(parent,key=lambda x: x.get('name'))
with open('BLRVNTRR_changed_reformatted.xml', 'w') as my_file:
my_file.write(etree.tostring(doc,pretty_print=True))
print "Done"
Thanks, having it sort would certainly be possible, but I definitely appreciate your script. More examples of working with the files in scripting are always helpful!