简体   繁体   English

Python 正则表达式从字典列表中删除字符串

[英]Python regex removing string from a list of dictionaries

I have the following list of dictionaries我有以下字典列表

d =
[
    {
        "Business": "Company A",
        "Category": "Supply Chain",
        "Date": "Posted Date\nDecember 21 2021",
    },
    {
        "Business": "Company B",
        "Category": "Manufacturing",
        "Date": "Posted Date\nDecember 21 2021",
    }
]

I'm trying to use re to remove the Posted Date\n string from the dictionaries but getting the following error:我正在尝试使用re从字典中删除Posted Date\n字符串,但出现以下错误:

TypeError: expected string or bytes-like object

My code is the following:我的代码如下:

regex = re.compile('Posted Date\n')
filtered = [i for i in d if not regex.match(i)]
print(filtered)

If I do the same on a normal list of strings with no dictionaries it's working.如果我在没有字典的普通字符串列表上做同样的事情,它就可以工作。 Would I have to convert my dictionaries into strings first?我必须先将我的字典转换成字符串吗?

Thanks!谢谢!

Assuming that d is the list of dictionaries, then you're looping over the dictionaries themselves.假设d是字典列表,那么您将遍历字典本身。 So for the first iteration:所以对于第一次迭代:

i = {
    "Business": "Company A",
    "Category": "Supply Chain",
    "Date": "Posted Date\nDecember 21 2021",
}

And indeed, you cannot use regex on a dictionary.事实上,你不能在字典上使用正则表达式。 You would need to go deeper and loop over the key and values in the dictionary.您需要更深入地 go 并遍历字典中的键和值。 But that can also cause RunTimeErrors if you're changing the dictionary while looping.但是,如果您在循环时更改字典,这也可能导致 RunTimeErrors。

import re

d = [{
    "Business": "Company A",
    "Category": "Supply Chain",
    "Date": "Posted Date\nDecember 21 2021",
}, {
    "Business": "Company B",
    "Category": "Manufacturing",
    "Date": "Posted Date\nDecember 21 2021",
}]

regex = re.compile('Posted Date\n')

for dikt in d:
    for key, value in list(dikt.items()):  # make a list to prevent RuntimeError
        if regex.match(value): 
            del dikt[key]

This would omit the Date key entirely:这将完全省略Date键:

d = [{
    "Business": "Company A",
    "Category": "Supply Chain",
}, {
    "Business": "Company B",
    "Category": "Manufacturing",
}]

If you just want to get rid of the "Posted Date\n", this suffices:如果您只想摆脱“发布日期\n”,这就足够了:

d = [{
    "Business": "Company A",
    "Category": "Supply Chain",
    "Date": "Posted Date\nDecember 21 2021",
}, {
    "Business": "Company B",
    "Category": "Manufacturing",
    "Date": "Posted Date\nDecember 21 2021",
}]


for dikt in d:
    for key, value in dikt.items():
        dikt[key] = value.replace('Posted Date\n', '') # replace string from all our values stupidly :)

Result:结果:

d = [{
    "Business": "Company A",
    "Category": "Supply Chain",
    "Date": "December 21 2021",
}, {
    "Business": "Company B",
    "Category": "Manufacturing",
    "Date": "December 21 2021",
}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM