如何使用 python 提取以兄弟节点信息为条件的节点信息？

Question

I have a list with personId of interest:我有一个感兴趣的personId列表：

agents = {'id': ['20','32','12']}

Then I have an XML file with household characteristics:然后我有一个具有家庭特征的XML文件：

<households
    <household id="980921">
        <members>
            <personId refId="5"/>
            <personId refId="15"/>
            <personId refId="20"/>
        </members>
        <income currency="CHF" period="month">
                8000.0
        </income>
        <attributes>
            <attribute name="numberOfCars" class="java.lang.String" >2</attribute>
        </attributes>

    </household>
    <household id="980976">
        <members>
            <personId refId="2891"/>
            <personId refId="100"/>
            <personId refId="2044"/>
        </members>
        <income currency="CHF" period="month">
                8000.0
        </income>
        <attributes>
            <attribute name="numberOfCars" class="java.lang.String" >1</attribute>
        </attributes>

    </household>
    <household id="980983">
        <members>
            <personId refId="11110"/>
            <personId refId="32"/>
            <personId refId="34"/>
        </members>
        <income currency="CHF" period="month">
                10000.0
        </income>
        <attributes>
            <attribute name="numberOfCars" class="java.lang.String" >0</attribute>
        </attributes>

    </household>
</households>

What I want is to have a data frame, which shows me the income of the households, which house a member which belongs to the list of agents which are of interest.我想要的是有一个数据框，它向我显示家庭的income ，其中包含属于感兴趣的agents列表的member 。 Something like this (a plus would be an additional column which indicates the count of members of the household which houses a person of interest):像这样的东西（加号将是一个额外的列，表示容纳感兴趣的人的家庭成员的数量）：

personId    income
20          8000.0
32          10000.0

My approach did not really get too far.我的方法并没有真正走得太远。 I have difficulties how to filter for the members and then access info from a "sibling" node.我很难过滤members ，然后从“兄弟”节点访问信息。 My output is an empty data frame.我的 output 是一个空数据框。

import xml.etree.ElementTree as ET
import pandas as pd

with open(xml) as fd:
    root = ET.parse(fd).getroot()

xpath_fmt = 'household/members/personId[@refId="{}"]/income'
rows = []
for pid in agents['id']:
    xpath = xpath_fmt.format(pid)
    r = root.findall(xpath)
    for res in r:
        rows.append([pid, res.text])
d = pd.DataFrame(rows, columns=['personId', 'income'])

Thanks a lot for your help!非常感谢你的帮助！

Answer 1

As stated in the comments, here is the solution using BeautifulSoup ( xml_txt is your XML text from the question):如评论中所述，这是使用 BeautifulSoup 的解决方案（ xml_txt是问题中的 XML 文本）：

import pandas as pd
from bs4 import BeautifulSoup

agents = {'id': ['20','32','12']}

soup = BeautifulSoup(xml_txt, 'xml')  #xml_txt is your XML text from the question

css_selector = ','.join('household > members > personId[refId="{}"]'.format(i) for i in agents['id'])

data = {'personId':[], 'income':[]}
for person in soup.select(css_selector):
    data['personId'].append( person['refId'] )
    data['income'].append( person.find_parent('household').find('income').get_text(strip=True) )

df = pd.DataFrame(data)
print(df)

Prints:印刷：

  personId   income
0       20   8000.0
1       32  10000.0

如何使用 python 提取以兄弟节点信息为条件的节点信息？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-24 12:28:22

如何使用 python 提取以兄弟节点信息为条件的节点信息？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-24 12:28:22

解决方案1
1 已采纳 2020-05-24 12:28:22