This is my input file
<datasource formatted-name='federated.1819qwi0hys5391dzxhl70o95li4' inline='true' source-platform='win' version='18.1' xmlns:user='http://www.tableausoftware.com/xml/user'>
<connection class='federated'>
<named-connections>
<named-connection caption='Sample - Superstore' name='excel.1ew9u4t0tggb9315darmm0nfz2kb'>
<connection class='excel' driver='' filename='C:/Users/XXXX/Downloads/Sample - Superstore.xls' header='yes' imex='1' password='' server='' />
</named-connection>
</named-connections>
<relation connection='excel.1ew9u4t0tggb9315darmm0nfz2kb' name='Custom SQL Query' type='text'>SELECT [Orders$].[Category] AS [Category], [Orders$].[City] AS [City], [Orders$].[Country] AS [Country], [Orders$].[Customer ID] AS [Customer ID], [Orders$].[Customer Name] AS [Customer Name], [Orders$].[Discount] AS [Discount], [Orders$].[Profit] AS [Profit], [Orders$].[Quantity] AS [Quantity], [Orders$].[Region] AS [Region], [Orders$].[State] AS [State], [People$].[Person] AS [Person], [People$].[Region] AS [Region (People)] FROM [Orders$] INNER JOIN [People$] ON [Orders$].[Region] = [People$].[Region]</relation>
<metadata-records>
<metadata-record class='column'>
<remote-name>Category</remote-name>
<remote-type>130</remote-type>
<local-name>[Category]</local-name>
<parent-name>[Custom SQL Query]</parent-name>
<remote-alias>Category</remote-alias>
<ordinal>1</ordinal>
<local-type>string</local-type>
<aggregation>Count</aggregation>
<contains-null>true</contains-null>
<collation>LEN_RUS_S2_WO</collation>
<attributes>
<attribute datatype='string' name='DebugRemoteType'>"WSTR"</attribute>
</attributes>
</metadata-record>
I want to get the attribute tag. I Have tried
for x in xmlRoot.findall('./metadata-record'):
sqlString=x.find('attribute').text
but im getting only space as result. I have changed all the possible combinations in findall, still not able to get the result. I want to read that attribute tag dynamically and write in the output file as same. I have retrived the other tags from metadata-record but this alone not working. Can some one help??
My expected output is
<metadata-records>
<metadata-record class='column'>
<remote-name>Category</remote-name>
<remote-type>130</remote-type>
<local-name>[Category]</local-name>
<parent-name>[Custom SQL Query]</parent-name>
<remote-alias>Category</remote-alias>
<ordinal>1</ordinal>
<local-type>string</local-type>
<aggregation>Count</aggregation>
<contains-null>true</contains-null>
<collation>LEN_RUS_S2_WO</collation>
<attributes>
<attribute datatype='string' name='DebugRemoteType'>"WSTR"</attribute>
</attributes>
</metadata-record>
I have retrieved till collation tag but do not know how to get the attributes tag. Can someone help??
Thanks, Aarush
First, I would fix the input file. It is not a good xml as it is missing some closing tags.
I fixed it for you here
<datasource formatted-name='federated.1819qwi0hys5391dzxhl70o95li4' inline='true' source-platform='win' version='18.1' xmlns:user='http://www.tableausoftware.com/xml/user'>
<connection class='federated'>
<named-connections>
<named-connection caption='Sample - Superstore' name='excel.1ew9u4t0tggb9315darmm0nfz2kb'>
<connection class='excel' driver='' filename='C:/Users/XXXX/Downloads/Sample - Superstore.xls' header='yes' imex='1' password='' server='' />
</named-connection>
</named-connections>
<relation connection='excel.1ew9u4t0tggb9315darmm0nfz2kb' name='Custom SQL Query' type='text'>SELECT [Orders$].[Category] AS [Category], [Orders$].[City] AS [City], [Orders$].[Country] AS [Country], [Orders$].[Customer ID] AS [Customer ID], [Orders$].[Customer Name] AS [Customer Name], [Orders$].[Discount] AS [Discount], [Orders$].[Profit] AS [Profit], [Orders$].[Quantity] AS [Quantity], [Orders$].[Region] AS [Region], [Orders$].[State] AS [State], [People$].[Person] AS [Person], [People$].[Region] AS [Region (People)] FROM [Orders$] INNER JOIN [People$] ON [Orders$].[Region] = [People$].[Region]
</relation>
</connection>
<metadata-records>
<metadata-record class='column'>
<remote-name>Category</remote-name>
<remote-type>130</remote-type>
<local-name>[Category]</local-name>
<parent-name>[Custom SQL Query]</parent-name>
<remote-alias>Category</remote-alias>
<ordinal>1</ordinal>
<local-type>string</local-type>
<aggregation>Count</aggregation>
<contains-null>true</contains-null>
<collation>LEN_RUS_S2_WO</collation>
<attributes>
<attribute datatype='string' name='DebugRemoteType'>"WSTR"</attribute>
</attributes>
</metadata-record>
</metadata-records>
</datasource>
getElementsByTagName
Here is my code
from xml.dom import minidom
mydoc = minidom.parse('x.xml')
items = mydoc.getElementsByTagName('attribute')
print(items)
print(items)
will print the object [<DOM Element: attribute at 0x10aad6690>]
To get the values inside, you need to print the contents of this object which is a nodelist. Do this to get the value between the tags
# Traverse the childNodes of the tag
for t in items[0].childNodes:
# if the node is a text node then print it
if t.nodeType == t.TEXT_NODE:
print(t.nodeValue)
One Liner
print(''.join((t.nodeValue for t in items[0].childNodes if t.nodeType == t.TEXT_NODE)))
This page really helped me get started with XML parsingReference page
Using xml.etree.ElementTree
, you can try something like this:
import xml.etree.ElementTree as ET
xmlRoot = ET.fromstring(xml)
print(''.join([ET.tostring(x, encoding="unicode") for x in xmlRoot.findall('.//metadata-records//*')]))
Where xml
is your xml input data.
Key is the findall
: It looks from the root for any subelement called metadata-records
and from that it just looks for any element.
The double forward slash //
makes sure not only direct children are found, but any descendant of the metadata-records
element. That is why you did find the <attributes>
element (child), but failed to find the <attribute>
element (child of child)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.