简体   繁体   English

根据条件从列表中删除元素

[英]Remove elements from lists based on condition

I have the following code:我有以下代码:

from collections import defaultdict
import pandas as pd

THRESHOLD = 3 

item_counts = defaultdict(int)

df = {'col1':['1 2 3 4 5 6 7', '1 3 6 7','2 6 7']}
lines = pd.DataFrame(data=df)

print(lines)

for line in lines['col1']:
    for item in line.split():
        item_counts[item] += 1

print(item_counts)         
for line in lines['col1']:
    for item in line.split():
        if item_counts[item] < THRESHOLD:
            del item

print(lines)

My goal is that every item is getting counted and that the items below the threshold get eliminated from my dataframe.我的目标是每个项目都被计算在内,并且低于阈值的项目从我的 dataframe 中删除。 In this case, only 6 and 7 should be kept and the rest should be removed.在这种情况下,应仅保留 6 和 7,并应移除 rest。 The defaultdict is working fine, but the deletion of items is not working. defaultdict 工作正常,但删除项目不起作用。

Do you know what I am doing wrong?你知道我做错了什么吗?

using del is not the same as removing an element from a list.使用 del 与从列表中删除元素不同。 consider the following example考虑以下示例

>>> x=1
>>> y=2
>>> lst = [x,y]
>>> del x
>>> print(lst)
[1, 2]
>>> lst.remove(x)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'x' is not defined
>>> lst.remove(y)
>>> print(lst)
[1]
>>> print(y)
2

as you can see using del on the variable sharing the pointer to the element in the list only deleted the pointer leaving the list as it was.正如您所看到的,在共享指向列表中元素的指针的变量上使用 del 只删除了离开列表的指针。 remove did the opposite. remove 则相反。 it removed the element from the list but did not delete the variable pointer.它从列表中删除了元素,但没有删除变量指针。

as for fixing the problem: you should not directly remove from a list while iterating.至于解决问题:您不应该在迭代时直接从列表中删除。

IMO the best fix is using list comprehension to make a new list with only the wanted elements and replacing the old one: IMO 最好的解决方法是使用列表理解来创建一个仅包含所需元素的新列表并替换旧列表:

for line in lines['col1']:
    line = [item for item in line.split() if item >= THRESHOLD
    # line = ' '.join(line)

PS added the commented line if you wish to return the line to a string如果您希望将该行返回为字符串,则 PS 添加了注释行

If you don't need a DataFrame (I don't see why you would for this), you can do this:如果您不需要 DataFrame(我不明白您为什么要这样做),您可以这样做:

from collections import Counter

THRESHOLD = 3
lines = {'col1':['1 2 3 4 5 6 7', '1 3 6 7','2 6 7']}

# make proper list of ints
z = {k: [[int(x) for x in v.split()] for v in vals] for k, vals in lines.items()}
print(z)
# {'col1': [[1, 2, 3, 4, 5, 6, 7], [1, 3, 6, 7], [2, 6, 7]]}

# count the items within each value of the dict
z = {k: Counter(x for vals in arr for x in vals) for k, arr in z.items()}
print(z)
# {'col1': Counter({6: 3, 7: 3, 1: 2, 2: 2, 3: 2, 4: 1, 5: 1})}

# select the items that are seen at least THRESHOLD times
z = {col: [k for k, v in cnt.items() if v >= THRESHOLD] for col, cnt in z.items()}
print(z)
# {'col1': [6, 7]}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM