简体   繁体   English

如何使用 python 提取以兄弟节点信息为条件的节点信息?

[英]How to extract node information conditional to the information of a sibling node using python?

I have a list with personId of interest:我有一个感兴趣的personId列表:

agents = {'id': ['20','32','12']}

Then I have an XML file with household characteristics:然后我有一个具有家庭特征的XML文件:

<households
    <household id="980921">
        <members>
            <personId refId="5"/>
            <personId refId="15"/>
            <personId refId="20"/>
        </members>
        <income currency="CHF" period="month">
                8000.0
        </income>
        <attributes>
            <attribute name="numberOfCars" class="java.lang.String" >2</attribute>
        </attributes>

    </household>
    <household id="980976">
        <members>
            <personId refId="2891"/>
            <personId refId="100"/>
            <personId refId="2044"/>
        </members>
        <income currency="CHF" period="month">
                8000.0
        </income>
        <attributes>
            <attribute name="numberOfCars" class="java.lang.String" >1</attribute>
        </attributes>

    </household>
    <household id="980983">
        <members>
            <personId refId="11110"/>
            <personId refId="32"/>
            <personId refId="34"/>
        </members>
        <income currency="CHF" period="month">
                10000.0
        </income>
        <attributes>
            <attribute name="numberOfCars" class="java.lang.String" >0</attribute>
        </attributes>

    </household>
</households>

What I want is to have a data frame, which shows me the income of the households, which house a member which belongs to the list of agents which are of interest.我想要的是有一个数据框,它向我显示家庭的income ,其中包含属于感兴趣的agents列表的member Something like this (a plus would be an additional column which indicates the count of members of the household which houses a person of interest):像这样的东西(加号将是一个额外的列,表示容纳感兴趣的人的家庭成员的数量):

personId    income
20          8000.0
32          10000.0

My approach did not really get too far.我的方法并没有真正走得太远。 I have difficulties how to filter for the members and then access info from a "sibling" node.我很难过滤members ,然后从“兄弟”节点访问信息。 My output is an empty data frame.我的 output 是一个空数据框。

import xml.etree.ElementTree as ET
import pandas as pd

with open(xml) as fd:
    root = ET.parse(fd).getroot()

xpath_fmt = 'household/members/personId[@refId="{}"]/income'
rows = []
for pid in agents['id']:
    xpath = xpath_fmt.format(pid)
    r = root.findall(xpath)
    for res in r:
        rows.append([pid, res.text])
d = pd.DataFrame(rows, columns=['personId', 'income']) 

Thanks a lot for your help!非常感谢你的帮助!

As stated in the comments, here is the solution using BeautifulSoup ( xml_txt is your XML text from the question):如评论中所述,这是使用 BeautifulSoup 的解决方案( xml_txt是问题中的 XML 文本):

import pandas as pd
from bs4 import BeautifulSoup

agents = {'id': ['20','32','12']}

soup = BeautifulSoup(xml_txt, 'xml')  #xml_txt is your XML text from the question

css_selector = ','.join('household > members > personId[refId="{}"]'.format(i) for i in agents['id'])

data = {'personId':[], 'income':[]}
for person in soup.select(css_selector):
    data['personId'].append( person['refId'] )
    data['income'].append( person.find_parent('household').find('income').get_text(strip=True) )

df = pd.DataFrame(data)
print(df)

Prints:印刷:

  personId   income
0       20   8000.0
1       32  10000.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 XML 和 Python 中的节点之后提取信息? - How to extract information after a node in XML with Python? 如何使用py2neo从作为python中的节点返回的recordList中提取信息? - How can I extract information from a recordList returned as node in python using py2neo? 如何在xpath1.0中将此信息提取为一个节点? - how to extract this information as one node in xpath1.0? 如何使用 beautifulsoup 将具有多个子节点的父节点和子节点信息提取到 dataframe? - How to extract parent and child node information with multiple children to a dataframe, using beautifulsoup? 使用 python 提取带括号的信息 - Extract Information with brackets using python 如何提取python字典信息 - How to extract python dictionary information Python Tkinter:获取树节点信息 - Python Tkinter: obtain tree node information 如何使用网络摄像头捕获图像并使用 python 提取图像上的信息? - How to use the webcam to capture an image and extract the information on it using python? 如何使用python从html个元素中提取信息 - How to extract information from html elements using python 如何使用 python 从多个 XML 节点和层次结构中提取信息? - How to extract information from multiple XML nodes and hierarchies using python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM