简体   繁体   English

如何使用 Python 提取/解析字典元素?

[英]How do I extract/parse dictionary elements using Python?

I would like to extract the 00s from the decades, but all attempts are not pulling the intended results.我想从几十年中提取00后,但所有的尝试都没有达到预期的效果。

Here is part of what my XML file looks like, save as gorillas_catalog.xml.这是我的 XML 文件的一部分,另存为 gorillas_catalog.xml。

<CATALOG>
    <CD decade="00s">
        <TITLE>Gorillaz</TITLE>
        <ARTIST>Gorillaz</ARTIST>
        <COUNTRY>UK</COUNTRY>
        <COMPANY>Virgin</COMPANY>
        <PRICE>10.90</PRICE>
        <YEAR>2001</YEAR>
    </CD>
    <CD decade="00s">
        <TITLE>Demon Days</TITLE>
        <ARTIST>Gorillaz</ARTIST>
        <COUNTRY>UK</COUNTRY>
        <COMPANY>Parlaphone</COMPANY>
        <PRICE>9.90</PRICE>
        <YEAR>1988</YEAR>
    </CD>

My intended results are something like this:我的预期结果是这样的:

Title: Gorillaz, Album: Gorillaz, Decade: 00s
Title: Gorillaz, Album: Demon Days, Decade: 00s

So on and so forth through the rest of my XML file.依此类推,通过我的 XML 文件的 rest。

I tested each part and got as far as this code below:我测试了每个部分,并得到了以下代码:

import xml.etree.ElementTree as ET

tree = ET.parse("gorillaz_catalog.xml")
root = tree.getroot()

for ARTIST in root.iter("ARTIST"):
    print("Artist:", ARTIST.text)

for TITLE in root.iter("TITLE"):
    print("Title:", TITLE.text)

for decade in root.iter("CD"):
    print("Decade:", decade.attrib)

For decade I am receiving Decade: {'decade': '00s'} where I just want 00s .十年来,我收到了Decade: {'decade': '00s'}我只想要00s的地方。

Then I tried to loop everything to get my intended results (after commenting out the 3 for statements above).然后我尝试循环所有内容以获得我想要的结果(在注释掉上面的 3 for 语句之后)。

for ARTIST in root.iter("ARTIST"):
    for TITLE in root.iter("TITLE"):
        for decade in root.iter("CD"):
            print("Artist:", ARTIST.text,", Title:", TITLE.text, ", Decade:", decade.attrib)

The results I got are looping through 20 times to many:我得到的结果循环了 20 次:

Artist: Gorillaz , Album: Gorillaz , Decade: {'decade': 00s'}

twenty times (that's the number number of records in the file), then二十次(这是文件中的记录数),然后

Artist: Gorillaz , Album: Demon Days , Decade: {'decade': '80s'}

twenty times...二十次……

So this gives me the line I want, but I don't need them 20 times each.所以这给了我想要的线,但我不需要它们每次 20 次。

  1. Clearly my nested loop is incorrect, so how do I get it to produce my intended lines?显然我的嵌套循环不正确,那么如何让它产生我想要的行? I'm thinking I might need to put the items in a dictionary list, but I'm not too familiar with accomplishing this.我想我可能需要将这些项目放在字典列表中,但我不太熟悉完成这个。

I think you made it a little too complicated;我认为你让它有点太复杂了; try it with another library plus xpath:尝试使用另一个库加上 xpath:

import lxml.html as lh

cds = """[your html above]"""

doc = lh.fromstring(cds)
for cd in doc.xpath('//cd'):
    decade = cd.xpath('./@decade')[0]
    title = cd.xpath('./title/text()')[0]
    artist = cd.xpath('./artist/text()')[0]
    print("Title: "+title+", Artist: "+artist+", Decade: "+decade)

Output: Output:

Title: Gorillaz, Artist: Gorillaz, Decade: 00s
Title: Demon Days, Artist: Gorillaz, Decade: 00s

Here is my final code after reviewing a bit more documentation after posting.这是我在发布后查看更多文档后的最终代码。 Thank you all for the advice.谢谢大家的建议。

import xml.etree.ElementTree as ET

tree = ET.parse("gorillaz_catalog.xml")
root = tree.getroot()

for item in tree.iterfind("CD"):
    artist = item.findtext("ARTIST")
    title = item.findtext("TITLE")
    decade = item.get("decade")
    print(f"Artist: {artist}, Album: {title}, Decade: {decade}")

Output: Output:

> Title: Gorillaz, Album: Gorillaz, Decade: 00s
> Title: Gorillaz, Album: Demon Days, Decade: 00s

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM