简体   繁体   English

在 Pandas 数据框中扩展 XML 数据列并将其另存为新列

[英]Expanding XML data column in Pandas dataframe and save it as new columns

I have (11145, 14) shape dataset.我有 (11145, 14) 形状数据集。 In one of the column, I have a really complicated XML values.在其中一个专栏中,我有一个非常复杂的 XML 值。 I am trying to expand this XML column and add them as new columns.我正在尝试扩展此 XML 列并将它们添加为新列。 Here is one example of this XML: ( i changed the values for privacy reason but this is the structure)这是此 XML 的一个示例:(出于隐私原因,我更改了值,但这是结构)

'
<?xml version="1.0" encoding="UTF-8"?>
<modulo
    xmlns="http://www.sadasdasdasdasd.it/12312312312/Fasdasdasda"
    xmlns:xsi="http://www.sss1231231233.org/200232321/XMLSchema-instance">
    <nomeTxt dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="Nome" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasdasdqw">
        <![CDATA[*****]]>
    </nomeTxt>
    <adasdasdasdaq2qwdwasxasxas dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="asdasdasdqweqwe" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasd2szszxc">
        <![CDATA[*****]]>
    </adasdasdasdaq2qwdwasxasxas>
    <qweweqweqweqweqweqwe dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="sdsdsds" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasd">
        <![CDATA[M]]>
    </qweweqweqweqweqweqwe>
    <qewtrweqrqwerqwrqweqw dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="qewtrweqrqwerqwrqweqw" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdas">
        <![CDATA[213123123123]]>
    </qewtrweqrqwerqwrqweqw>
    <qewtrweqrqwerqwrqzxczxcasxcasxweqw dataFill="ew.fill(\'date\')" dataFillMode="auto" modelCodeMeaning="Data di nascita" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasfafassadasdasdasdas">
        <![CDATA[1927-21-13]]>
    </qewtrweqrqwerqwrqzxczxcasxcasxweqw>
    <sadasdasdasdasdsa codeValue="0" codeScheme="asdasdasdasdasdasd" codeMeaning="No" codeSchemeVersion="01">
        <![CDATA[No]]>
    </rbg_allergiefarmacologiche>
    <xczcxzcxzczxczxcz codeValue="0" codeScheme="asdasdasdasdasdasd" codeMeaning="No" codeSchemeVersion="01">
        <![CDATA[No]]>
    </xczcxzcxzczxczxcz>
    <asdasfascasasxasx codeValue="0" codeScheme="asdasdasdas" codeMeaning="No" codeSchemeVersion="01">
        <![CDATA[No]]>
    </asdasfascasasxasx>
    <asdasxcasxasxasxzxxz>
        <![CDATA[false]]>
    </asdasxcasxasxasxzxxz>
    <asxasxasxsaxasx xsi:nil="true"></asxasxasxsaxasx>
    <saxasx>
        <![CDATA[false]]>
    </saxasx>
    <asdasxasxasxas xsi:nil="true"></asdasxasxasxas>
    <asasdasdasdas>
        <![CDATA[false]]>
    </asasdasdasdas>
    <asasdasdasdasasasasd xsi:nil="true"></asasdasdasdasasasasd>
    <asasdasdasasd>
        <![CDATA[false]]>
    </asasdasdasasd>
    <zcxzcxzc xsi:nil="true"></zcxzcxzc>
</modulo>'

I tried to search each column with for loop and and then tried to convert it as dictionary and then save it as columns.我尝试使用 for 循环搜索每一列,然后尝试将其转换为字典,然后将其保存为列。 The problem with this solution, in each row there are different <xml columns and number of them are different.这个解决方案的问题是,每一行都有不同的 <xml 列并且它们的数量不同。 So my solution is not working.所以我的解决方案不起作用。

df["XML_column"]
0        <?xml version="1.0" encoding="UTF-8"?><modulo ...
1        <?xml version="1.0" encoding="UTF-8"?><modulo ...
2        <?xml version="1.0" encoding="UTF-8"?><modulo ...
3        <?xml version="1.0" encoding="UTF-8"?><modulo ...
4        <?xml version="1.0" encoding="UTF-8"?><modulo ...
                               ...                        
11140    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11141    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11142    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11143    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11144    <?xml version="1.0" encoding="UTF-8"?><modulo ...

Welcome.欢迎。 Your XML seems to be a bit bumpy.您的 XML 似乎有点颠簸。 If I take a clean fragment, for instance this:如果我拿一个干净的片段,例如这个:

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<modulo
    xmlns="http://www.sadasdasdasdasd.it/12312312312/Fasdasdasda"
    xmlns:xsi="http://www.sss1231231233.org/200232321/XMLSchema-instance">
    <nomeTxt dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="Nome" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasdasdqw">
        <![CDATA[*****]]>
    </nomeTxt>
    <adasdasdasdaq2qwdwasxasxas dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="asdasdasdqweqwe" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasd2szszxc">
        <![CDATA[*****]]>
    </adasdasdasdaq2qwdwasxasxas>
    <qweweqweqweqweqweqwe dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="sdsdsds" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasd">
        <![CDATA[M]]>
    </qweweqweqweqweqweqwe>
    <qewtrweqrqwerqwrqweqw dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="qewtrweqrqwerqwrqweqw" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdas">
        <![CDATA[213123123123]]>
    </qewtrweqrqwerqwrqweqw>
    <qewtrweqrqwerqwrqzxczxcasxcasxweqw dataFill="ew.fill(\'date\')" dataFillMode="auto" modelCodeMeaning="Data di nascita" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasfafassadasdasdasdas">
        <![CDATA[1927-21-13]]>
    </qewtrweqrqwerqwrqzxczxcasxcasxweqw>
</modulo>'''

I can do the following (showing only the first six columns here):我可以执行以下操作(此处仅显示前六列):

pd.read_xml(xml,parser='etree')
dataFill数据填充 dataFillMode数据填充模式 modelCodeMeaning型号代码含义 modelCodeScheme模型代码方案 modelCodeSchemeVersion模型代码方案版本 modelCodeValue模型代码值
0 0 ew.fill() ew.fill() auto汽车 Nome名称 asdasdasdas阿斯达斯达斯 1 1 asdasdasdasdasdasdqw asdasdasdasdasdqw
1 1 ew.fill() ew.fill() auto汽车 asdasdasdqweqwe asdasdasdqweqwe asdasdasdas阿斯达斯达斯 1 1 asdasdasdasdasd2szszxc asdasdasdasdasd2szszxc
2 2 ew.fill() ew.fill() auto汽车 sdsdsds sdsds asdasdasdas阿斯达斯达斯 1 1 asdasdasd呸呸呸
3 3 ew.fill() ew.fill() auto汽车 qewtrweqrqwerqwrqweqw qewtrweqrqwerqwrqweqw asdasdasdas阿斯达斯达斯 1 1 asdasdasdas阿斯达斯达斯
4 4 ew.fill('date') ew.fill('日期') auto汽车 Data di nascita纳斯达克数据 asdasdasdas阿斯达斯达斯 1 1 asdasfafassadasdasdasdas阿斯达斯法法萨达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM