简体   繁体   English

我似乎无法删除值包含某个字符串的字典的键,为什么?

[英]I can't seem to delete keys of a dictionary whose value contains a certain string, why?

I have this code that takes an XML file, takes the child elements ( text tag) of the new_line tag and stores their index as a key in a dictionary, and the elements as values in the same dictionary.我有这段代码,它采用 XML 文件,采用new_line标签的子元素( text标签)并将它们的索引作为键存储在字典中,并将元素作为值存储在同一个字典中。 I want to delete the keys of the dictionary whose values contain "10.238", but it doesn't seem to work.我想删除值包含“10.238”的字典的键,但它似乎不起作用。 Everything else works.其他一切都有效。 This is my code:这是我的代码:

import re
from xml.dom import minidom
from xml.etree import ElementTree as ET


def filter_values_by_keyword(my_dict, filter_by):
    """
    Return a list of values which contains `filter_by` keyword.

    Arguments:
        my_dict (dict): Dict containing (...data specifics here)
        filter_by (str): Keyword to look for in values of my_dict

    Return:
        List of filtered values
    """
    return [key for key, value in my_dict.items() if filter_by in value]


def get_xml_by_tag_names(xml_path, tag_name_1, tag_name_2):
    """
    Your docstring here.
    """
    data = {}
    xml_tree = minidom.parse(xml_path)
    item_group_nodes = xml_tree.getElementsByTagName(tag_name_1)
    for idx, item_group_node in enumerate(item_group_nodes):
        cl_compile_nodes = item_group_node.getElementsByTagName(tag_name_2)
        for _ in cl_compile_nodes:
            data[idx]=[item_group_node.toxml()]
    return data


def main():
    data = get_xml_by_tag_names('output2.xml', 'new_line', 'text')
    filtered_values = filter_values_by_keyword(data, '10.238')

    for item in filtered_values:
        del data[item]

    for value in data.values():
        myxml = ' '.join(value)
        # print(myxml)

        tree = ET.fromstring(myxml)
        lista = ([text.text for text in tree.findall('text')])
        testo = (' '.join(lista))

        print(testo)


if __name__ == "__main__":
    main()

And this is a sample of the XML:这是 XML 的样本:

    <pages>
      <page id="1" bbox="0.000,0.000,462.047,680.315" rotate="0">
        <textbox id="0" bbox="191.745,592.218,249.042,603.578">
    <textline>
         <new_line>
                  <text font="NUMPTY+ImprintMTnum" bbox="297.284,540.828,300.188,553.310" colourspace="DeviceGray" ncolour="0" size="12.482">della quale non conosce che una parte;] </text>
                  <text font="PYNIYO+ImprintMTnum-Italic" bbox="322.455,540.839,328.251,553.566" colourspace="DeviceGray" ncolour="0" size="12.727">prima</text>
                  <text font="NUMPTY+ImprintMTnum" bbox="331.206,545.345,334.683,552.834" colourspace="DeviceGray" ncolour="0" size="7.489">1</text>
                  <text font="NUMPTY+ImprintMTnum" bbox="177.602,528.028,180.850,540.510" colourspace="DeviceGray" ncolour="0" size="12.482">che nonconosce ancora appieno;</text>
                  <text font="NUMPTY+ImprintMTnum" bbox="189.430,532.545,192.908,540.034" colourspace="DeviceGray" ncolour="0" size="7.489">2</text>
                  <text font="NUMPTY+ImprintMTnum" bbox="203.879,528.028,208.975,540.510" colourspace="DeviceGray" ncolour="0" size="12.482">che</text>
                </new_line>
    </textline>
<textline bbox="68.032,408.428,372.762,421.166">
<new_line>
          <text font="NUMPTY+ImprintMTnum" bbox="307.143,408.428,310.392,420.910" colourspace="DeviceGray" ncolour="0" size="12.482">viso] vi</text>
          <text font="NUMPTY+ImprintMTnum" bbox="310.280,408.808,313.243,419.046" colourspace="DeviceGray" ncolour="0" size="10.238">-</text>
          <text font="PYNIYO+ImprintMTnum-Italic" bbox="320.072,408.439,325.868,421.166" colourspace="DeviceGray" ncolour="0" size="12.727">su</text>
          <text font="NUMPTY+ImprintMTnum" bbox="328.829,408.428,338.452,420.910" colourspace="DeviceGray" ncolour="0" size="12.482">m</text>
        </new_line>
</textline>
    </textbox>
    </page>
    </pages>

Give you another example that may help you.再举一个例子,可能对你有帮助。

from simplified_scrapy import SimplifiedDoc,req,utils
html = '''
Your xml
'''
doc = SimplifiedDoc(html)
# texts = doc.getElements('text').notContains('-',attr='text').text # Filter with text
texts = doc.getElements('text').notContains('10.238',attr='size').text # Filter with size
print(texts)

Result:结果:

['della quale non conosce che una parte;]', 'prima', '1', 'che nonconosce ancora appieno;', '2', 'che', 'viso] vi', 'su', 'm']

Here are more examples.这里有更多例子。 https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么我可以使用按位AND来检查列表中是否包含字典的键? - Why can I check if a list contains keys to a dictionary using bitwise AND? 删除值为空列表的字典的键 - Delete keys of a dictionary whose values are empty lists 如何从嵌套字典创建平面字典,其字符串是引用字典的子集? - How can I create a flat dictionary from a nested dictionary whose keys are a subset of a reference dictionary? 我需要在字典中保留某些键,并删除不必要的键 - I need to leave certain keys in the dictionary, and delete unnecessary ones 搜索字符串以查找字典键,如果包含,则显示该键值 - Search string for dictionary key and if contains, display that keys value Python-如何搜索字典键以查看其是否包含特定字符串 - Python - how can I search a dictionary key to see if it contains a certain string 如何使用 Python 搜索字典值是否包含某个字符串 - How to search if dictionary value contains certain string with Python 如何测试字典是否包含某些键 - How to test if a dictionary contains certain keys 如何计算字典中有多少键具有特定值? - How can I count how many keys have a certain value in a dictionary? 字典和输入:如果 `input` 包含我要打印的任何键值 - Dictionary and Input: if the `input` contains any of the keys I want to print the value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM