簡體   English   中英

Python lxml iterparse 按屬性大 xml 文件排序

[英]Python lxml iterparse sort by attribute large xml file

我有一個大的 XML 文件,我正在嘗試為每個程序訂購圖標,我想按寬度屬性中的值降序排列圖標,我設法刪除了某些不需要但我的圖標我不確定如何訂購圖標,任何幫助將不勝感激。

這是我用來刪除我不想要的圖標的代碼,但我不確定如何才能訂購它們。 我正在使用 iterparse,因為讀取整個文件最多需要 memory。

當前刪除代碼:

import lxml.etree as ET
xml_source = 'ss_sky_sw_xmltv.xml'
xml_output = 'ss_sky_sw_xmltv_parsed.xml'

context = ET.iterparse(xml_source, encoding='iso-8859-1', tag='icon')
for event, elem in context:
    if elem.getparent().tag != 'channel' :
        if elem.tag == 'icon':
            if elem.attrib['width'] == '180' and elem.attrib['height'] == '135':
                elem.getparent().remove(elem)
            elif elem.attrib['width'] == '120' and elem.attrib['height'] == '180':
                elem.getparent().remove(elem)
ET.ElementTree(context.root).write(xml_output, xml_declaration=True)

XML 文件:

<tv source-info-name="Schedules Direct" generator-info-name="mc2xml" generator-info-url="mailto:mc2xml@gmail.com">
    <channel id="I963.24337.schedulesdirect.org">
        <display-name>963 BBC1SE</display-name>
        <display-name>963</display-name>
        <display-name>BBC1SE</display-name>
        <display-name>BBC One South East</display-name>
        <display-name>BBC1</display-name>
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/stationLogos/s24337_h3_aa.png" width="360" height="270" />
    </channel>
    <channel id="I964.24326.schedulesdirect.org">
        <display-name>964 BBC1STH</display-name>
        <display-name>964</display-name>
        <display-name>BBC1STH</display-name>
        <display-name>BBC One South</display-name>
        <display-name>BBC1</display-name>
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/stationLogos/s24326_h3_aa.png" width="360" height="270" />
    </channel>
    <programme start="20191007150000 +0100" stop="20191007154500 +0100" channel="I101.24327.schedulesdirect.org">
        <title lang="en">Escape to the Perfect Town</title>
        <sub-title lang="en">Knaresborough, North Yorkshire</sub-title>
        <desc lang="en">Steve Brown helps a couple feeling the pinch of the London property market to decide on their perfect town and the right property in which to raise their young children. They're amazed by what their £280,000 budget can buy them out of the capital city, and that moving to a desirable town means buzzing high streets, great community spirit and green spaces, as well as a quick commute to York for a teaching job is all on their doorstep.</desc>
        <credits>
            <producer>John Comerford</producer>
            <producer>Eleanor Brocklehurst</producer>
        </credits>
        <date>20191007</date>
        <category lang="en">House/garden</category>
        <category lang="en">Home improvement</category>
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p17421608_st_v6_aa.jpg" width="120" height="180" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p17421608_st_v2_aa.jpg" width="135" height="180" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p17421608_st_h5_aa.jpg" width="180" height="135" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p17421608_st_h14_aa.jpg" width="240" height="135" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p17421608_st_v5_aa.jpg" width="240" height="360" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p17421608_st_v3_aa.jpg" width="270" height="360" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p17421608_st_h3_aa.jpg" width="360" height="270" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p17421608_st_h13_aa.jpg" width="480" height="270" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p17421608_st_v7_aa.jpg" width="480" height="720" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p17421608_st_v4_aa.jpg" width="540" height="720" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p17421608_st_h6_aa.jpg" width="720" height="540" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p17421608_st_h12_aa.jpg" width="960" height="540" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p17421608_st_h11_aa.jpg" width="1280" height="720" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p17421608_st_v8_aa.jpg" width="960" height="1440" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p17421608_st_v9_aa.jpg" width="1080" height="1440" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p17421608_st_h9_aa.jpg" width="1440" height="1080" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p17421608_st_h10_aa.jpg" width="1920" height="1080" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p17421608_st_h2_aa.jpg" width="2048" height="1024" />
        <episode-num system="dd_progid">EP03325404.0001</episode-num>
        <episode-num system="xmltv_ns">0.0.</episode-num>
        <new />
    </programme>
    <programme start="20191007154500 +0100" stop="20191007163000 +0100" channel="I101.24327.schedulesdirect.org">
        <title lang="en">Make Me a Dealer</title>
        <sub-title lang="en">Liverpool: Sarah &amp; Marika</sub-title>
        <desc lang="en">Paul Martin teaches two antiques lovers the tricks of the trade and turns them into successful antiques dealers. In Liverpool, hairdresser Sarah faces off against civil servant Marika.</desc>
        <credits>
            <director>Gabe Crozier</director>
            <director>Dan Donnelly</director>
            <producer>Paul Tucker</producer>
            <producer>Carole Lochhead</producer>
            <producer>Jo Dunscombe</producer>
            <presenter>Paul Martin</presenter>
        </credits>
        <date>20191007</date>
        <category lang="en">How-to</category>
        <category lang="en">Collectibles</category>
        <category lang="en">Art</category>
        <category lang="en">Arts/crafts</category>
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_v6_aa.jpg" width="120" height="180" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p16172084_b_v2_aa.jpg" width="135" height="180" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p16172084_b_h5_aa.jpg" width="180" height="135" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p16172084_b_h14_aa.jpg" width="240" height="135" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_v5_aa.jpg" width="240" height="360" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_v3_aa.jpg" width="270" height="360" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_h3_aa.jpg" width="360" height="270" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_h13_aa.jpg" width="480" height="270" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_v7_aa.jpg" width="480" height="720" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p16172084_b_v4_aa.jpg" width="540" height="720" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p16172084_b_h6_aa.jpg" width="720" height="540" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_h12_aa.jpg" width="960" height="540" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p16172084_b_h11_aa.jpg" width="1280" height="720" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_v8_aa.jpg" width="960" height="1440" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p16172084_b_v9_aa.jpg" width="1080" height="1440" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_h9_aa.jpg" width="1440" height="1080" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_h10_aa.jpg" width="1920" height="1080" />
        <icon src="https://s3.amazonaws.com/schedulesdirect/assets/p16172084_b_v12_aa.jpg" width="1920" height="2880" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p16172084_b_v13_aa.jpg" width="2160" height="2880" />
        <icon src="https://json.schedulesdirect.org/20141201/image/assets/p16172084_b_s4_aa.jpg" width="3000" height="3000" />
        <episode-num system="dd_progid">EP03082486.0021</episode-num>
        <episode-num system="xmltv_ns">1.0.</episode-num>
        <new />
    </programme>
</tv>
import lxml.etree as ET
from copy import deepcopy

xml_source = 'ss_sky_sw_xmltv.xml'
xml_output = 'ss_sky_sw_xmltv_parsed.xml'
# icons with these dimensions (width, height) will be removed:
remove_dimensions = (
    (180, 135),
    (120, 180),
    )

tree = ET.parse(xml_source)
root = tree.getroot()
for programme in root.iterfind('programme'):
    # Create copy of all icons to reinsert them in the right order
    icons = deepcopy(sorted(programme.findall('icon'), key=lambda x: int(x.attrib['height'])))
    # Remove all icons from programme
    for old_icon in programme.findall('icon'):
        programme.remove(old_icon)

    # Reinsert the items
    for new_icon in icons:
        # Create a dict to compare
        dimensions = int(new_icon.attrib['width']), int(new_icon.attrib['height'])
        # Compare the dict if it should be removed (not included again)
        if dimensions not in remove_dimensions:
            programme.append(new_icon)

# Save the file
tree.write(xml_output, xml_declaration=True, pretty_print=True)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM