简体   繁体   English

xml分析Python 3和Beautiful Soup第2部分

[英]xml parsing Python 3 with Beautiful Soup part 2

This is the second part of my previous question ( Parsing xml file using Python3 and BeautifulSoup ). 这是我之前的问题的第二部分( 使用Python3和BeautifulSoup解析xml文件 )。

I'm wondering how I parse the following lists, given their different xml structures. 考虑到它们的XML结构不同,我想知道如何解析以下列表。 Also, I need to differentiate the different lists (or 'poll titles' in the single xml file. I can search for the 'results' element, but that element is present in 3 separate lists in the file. 另外,我需要区分单个xml文件中的不同列表(或“投票标题”。我可以搜索“ results”元素,但是该元素存在于文件的3个单独的列表中。

The first poll title xml list uses this code to extract the data. 第一个投票标题xml列表使用此代码来提取数据。 The 'numplayers = True' argument differentiates this list from the other two, but there is no attribute in the results line for these. “ numplayers = True”参数将此列表与其他两个列表区分开,但是结果行中没有这些属性。

for result in soup.find_all('results', numplayers = True):
    numplayers = result['numplayers']
    best = result.find('result', {'value': 'Best'})['numvotes']
    recommended = result.find('result', {'value': 'Recommended'})['numvotes']
    not_recommended = result.find('result', {'value': 'Not Recommended'})['numvotes']
    print (numplayers, best, recommended, not_recommended)

I can't seem to figure out how to write something similar to this code for the following two xml lists. 我似乎无法弄清楚如何为以下两个xml列表编写类似于此代码的内容。 Thank you. 谢谢。

<poll title="Language Dependence" name="language_dependence" totalvotes="32">
    <results>
        <result value="No necessary in-game text" numvotes="32" level="1"/>
        <result value="Some necessary text - easily memorized or small crib sheet" numvotes="0" level="2"/>
        <result value="Moderate in-game text - needs crib sheet or paste ups" numvotes="0" level="3"/>
        <result value="Extensive use of text - massive conversion needed to be playable" numvotes="0" level="4"/>
        <result value="Unplayable in another language" numvotes="0" level="5"/>
    </results>
</poll>
<poll title="User Suggested Player Age" name="suggested_playerage" totalvotes="32">
    <results>
        <result value="2" numvotes="0"/>
        <result value="3" numvotes="0"/>
        <result value="4" numvotes="0"/>
        <result value="5" numvotes="1"/>
        <result value="6" numvotes="6"/>
        <result value="8" numvotes="15"/>
        <result value="10" numvotes="10"/>
        <result value="12" numvotes="0"/>
        <result value="14" numvotes="0"/>
        <result value="16" numvotes="0"/>
        <result value="18" numvotes="0"/>
        <result value="21 and up" numvotes="0"/>
    </results>
</poll>

Here's what I think should work for the language dependence list, but it doesn't. 我认为这应该适用于语言依赖列表,但事实并非如此。

for result in soup.find_all('result',level=True):
    level = result['level']
    None = result.find('result', {'level': '1'})['numvotes']
    Some = result.find('result', {'level': '2'})['numvotes']
    Mod = result.find('result', {'level': '3'})['numvotes']
    Ext = result.find('result', {'level': '4'})['numvotes']
    Unp = result.find('result', {'level': '5'})['numvotes']

You have to use two different condition, see the code below. 您必须使用两种不同的条件,请参见下面的代码。

from bs4 import BeautifulSoup
xml = """<poll title="Language Dependence" name="language_dependence" totalvotes="32">
    <results>
        <result value="No necessary in-game text" numvotes="32" level="1"/>
        <result value="Some necessary text - easily memorized or small crib sheet" numvotes="0" level="2"/>
        <result value="Moderate in-game text - needs crib sheet or paste ups" numvotes="0" level="3"/>
        <result value="Extensive use of text - massive conversion needed to be playable" numvotes="0" level="4"/>
        <result value="Unplayable in another language" numvotes="0" level="5"/>
    </results>
</poll>
<poll title="User Suggested Player Age" name="suggested_playerage" totalvotes="32">
    <results>
        <result value="2" numvotes="0"/>
        <result value="3" numvotes="0"/>
        <result value="4" numvotes="0"/>
        <result value="5" numvotes="1"/>
        <result value="6" numvotes="6"/>
        <result value="8" numvotes="15"/>
        <result value="10" numvotes="10"/>
        <result value="12" numvotes="0"/>
        <result value="14" numvotes="0"/>
        <result value="16" numvotes="0"/>
        <result value="18" numvotes="0"/>
        <result value="21 and up" numvotes="0"/>
    </results>
</poll>"""
soup = BeautifulSoup(xml,'lxml')
for i in soup.find_all('poll',{'name':'language_dependence'})[0].find_all('result'):
    value = i['value']
    numvotes = i['numvotes']
    level = i['level']
    print('Value:',value,'\n','Numvotes:',numvotes,'\n','Level:',level)
print('--------------------------------------------')   
for i in soup.find_all('poll',{'name':'suggested_playerage'})[0].find_all('result'):
    value = i['value']
    numvotes = i['numvotes']
    print('Value:',value,'\n','Numvotes:',numvotes)

Output 产量

Value: No necessary in-game text
 Numvotes: 32
 Level: 1
Value: Some necessary text - easily memorized or small crib sheet
 Numvotes: 0
 Level: 2
Value: Moderate in-game text - needs crib sheet or paste ups
 Numvotes: 0
 Level: 3
Value: Extensive use of text - massive conversion needed to be playable
 Numvotes: 0
 Level: 4
Value: Unplayable in another language
 Numvotes: 0
 Level: 5
--------------------------------------------
Value: 2
 Numvotes: 0
Value: 3
 Numvotes: 0
Value: 4
 Numvotes: 0
Value: 5
 Numvotes: 1
Value: 6
 Numvotes: 6
Value: 8
 Numvotes: 15
Value: 10
 Numvotes: 10
Value: 12
 Numvotes: 0
Value: 14
 Numvotes: 0
Value: 16
 Numvotes: 0
Value: 18
 Numvotes: 0
Value: 21 and up
 Numvotes: 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM