簡體   English   中英

在 Pandas 數據框中擴展 XML 數據列並將其另存為新列

[英]Expanding XML data column in Pandas dataframe and save it as new columns

我有 (11145, 14) 形狀數據集。 在其中一個專欄中,我有一個非常復雜的 XML 值。 我正在嘗試擴展此 XML 列並將它們添加為新列。 這是此 XML 的一個示例:(出於隱私原因,我更改了值,但這是結構)

'
<?xml version="1.0" encoding="UTF-8"?>
<modulo
    xmlns="http://www.sadasdasdasdasd.it/12312312312/Fasdasdasda"
    xmlns:xsi="http://www.sss1231231233.org/200232321/XMLSchema-instance">
    <nomeTxt dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="Nome" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasdasdqw">
        <![CDATA[*****]]>
    </nomeTxt>
    <adasdasdasdaq2qwdwasxasxas dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="asdasdasdqweqwe" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasd2szszxc">
        <![CDATA[*****]]>
    </adasdasdasdaq2qwdwasxasxas>
    <qweweqweqweqweqweqwe dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="sdsdsds" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasd">
        <![CDATA[M]]>
    </qweweqweqweqweqweqwe>
    <qewtrweqrqwerqwrqweqw dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="qewtrweqrqwerqwrqweqw" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdas">
        <![CDATA[213123123123]]>
    </qewtrweqrqwerqwrqweqw>
    <qewtrweqrqwerqwrqzxczxcasxcasxweqw dataFill="ew.fill(\'date\')" dataFillMode="auto" modelCodeMeaning="Data di nascita" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasfafassadasdasdasdas">
        <![CDATA[1927-21-13]]>
    </qewtrweqrqwerqwrqzxczxcasxcasxweqw>
    <sadasdasdasdasdsa codeValue="0" codeScheme="asdasdasdasdasdasd" codeMeaning="No" codeSchemeVersion="01">
        <![CDATA[No]]>
    </rbg_allergiefarmacologiche>
    <xczcxzcxzczxczxcz codeValue="0" codeScheme="asdasdasdasdasdasd" codeMeaning="No" codeSchemeVersion="01">
        <![CDATA[No]]>
    </xczcxzcxzczxczxcz>
    <asdasfascasasxasx codeValue="0" codeScheme="asdasdasdas" codeMeaning="No" codeSchemeVersion="01">
        <![CDATA[No]]>
    </asdasfascasasxasx>
    <asdasxcasxasxasxzxxz>
        <![CDATA[false]]>
    </asdasxcasxasxasxzxxz>
    <asxasxasxsaxasx xsi:nil="true"></asxasxasxsaxasx>
    <saxasx>
        <![CDATA[false]]>
    </saxasx>
    <asdasxasxasxas xsi:nil="true"></asdasxasxasxas>
    <asasdasdasdas>
        <![CDATA[false]]>
    </asasdasdasdas>
    <asasdasdasdasasasasd xsi:nil="true"></asasdasdasdasasasasd>
    <asasdasdasasd>
        <![CDATA[false]]>
    </asasdasdasasd>
    <zcxzcxzc xsi:nil="true"></zcxzcxzc>
</modulo>'

我嘗試使用 for 循環搜索每一列,然后嘗試將其轉換為字典,然后將其保存為列。 這個解決方案的問題是,每一行都有不同的 <xml 列並且它們的數量不同。 所以我的解決方案不起作用。

df["XML_column"]
0        <?xml version="1.0" encoding="UTF-8"?><modulo ...
1        <?xml version="1.0" encoding="UTF-8"?><modulo ...
2        <?xml version="1.0" encoding="UTF-8"?><modulo ...
3        <?xml version="1.0" encoding="UTF-8"?><modulo ...
4        <?xml version="1.0" encoding="UTF-8"?><modulo ...
                               ...                        
11140    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11141    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11142    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11143    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11144    <?xml version="1.0" encoding="UTF-8"?><modulo ...

歡迎。 您的 XML 似乎有點顛簸。 如果我拿一個干凈的片段,例如這個:

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<modulo
    xmlns="http://www.sadasdasdasdasd.it/12312312312/Fasdasdasda"
    xmlns:xsi="http://www.sss1231231233.org/200232321/XMLSchema-instance">
    <nomeTxt dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="Nome" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasdasdqw">
        <![CDATA[*****]]>
    </nomeTxt>
    <adasdasdasdaq2qwdwasxasxas dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="asdasdasdqweqwe" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasd2szszxc">
        <![CDATA[*****]]>
    </adasdasdasdaq2qwdwasxasxas>
    <qweweqweqweqweqweqwe dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="sdsdsds" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasd">
        <![CDATA[M]]>
    </qweweqweqweqweqweqwe>
    <qewtrweqrqwerqwrqweqw dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="qewtrweqrqwerqwrqweqw" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdas">
        <![CDATA[213123123123]]>
    </qewtrweqrqwerqwrqweqw>
    <qewtrweqrqwerqwrqzxczxcasxcasxweqw dataFill="ew.fill(\'date\')" dataFillMode="auto" modelCodeMeaning="Data di nascita" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasfafassadasdasdasdas">
        <![CDATA[1927-21-13]]>
    </qewtrweqrqwerqwrqzxczxcasxcasxweqw>
</modulo>'''

我可以執行以下操作(此處僅顯示前六列):

pd.read_xml(xml,parser='etree')
數據填充 數據填充模式 型號代碼含義 模型代碼方案 模型代碼方案版本 模型代碼值
0 ew.fill() 汽車 名稱 阿斯達斯達斯 1 asdasdasdasdasdqw
1 ew.fill() 汽車 asdasdasdqweqwe 阿斯達斯達斯 1 asdasdasdasdasd2szszxc
2 ew.fill() 汽車 sdsds 阿斯達斯達斯 1 呸呸呸
3 ew.fill() 汽車 qewtrweqrqwerqwrqweqw 阿斯達斯達斯 1 阿斯達斯達斯
4 ew.fill('日期') 汽車 納斯達克數據 阿斯達斯達斯 1 阿斯達斯法法薩達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達斯達

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM