[英]How to perform conditional column operations in Pandas using list comprehensions?
[英]How to perform a conditional on a column of lists (considering each item in the list) in Pandas
假设我有一列列表。 如果列表在一组中至少有一个项目,我想保留该行,否则我想删除该行。
这是一个最小的例子
#create the df
d={'range':list(range(0,3))}
df=pd.DataFrame(d)
l=[1, 2, 3]
m =[4, 5, 6]
n =[1, 7, 8]
df['var_list']=''
df['var_list'][0]=l
df['var_list'][1]=m
df['var_list'][2]=n
df.head(3)
结果
range var_list
0 0 [1, 2, 3]
1 1 [4, 5, 6]
2 2 [1, 7, 8]
这是我想要使用的集合
setS = {1, 2}
我想要做的是,如果任何行的列表中有一个项目在集合中,请保留该行,否则删除该行。
所以这是想要的结果:
range var_list
0 0 [1, 2, 3]
2 2 [1, 7, 8]
我试过的
df2 = df[df['var_list'].isin(setS)]
这是我得到的错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: unhashable type: 'list'
The above exception was the direct cause of the following exception:
SystemError Traceback (most recent call last)
<ipython-input-56-90ea3b42ebf3> in <module>()
----> 1 df2 = df[df['var_list'].isin(setS)]
2 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in isin(self, values)
4512 Name: animal, dtype: bool
4513 """
-> 4514 result = algorithms.isin(self, values)
4515 return self._constructor(result, index=self.index).__finalize__(self)
4516
/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py in isin(comps, values)
478 comps = comps.astype(object)
479
--> 480 return f(comps, values)
481
482
/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py in <lambda>(x, y)
454
455 # faster for larger cases to use np.in1d
--> 456 f = lambda x, y: htable.ismember_object(x, values)
457
458 # GH16012
pandas/_libs/hashtable_func_helper.pxi in pandas._libs.hashtable.ismember_object()
SystemError: <built-in method view of numpy.ndarray object at 0x7fcc893844e0> returned a result with an error set
一列列表不是pandas
通常的工作方式。 您必须明确检查列表中的项目:
print (df[df["var_list"].transform(lambda x: bool(set(x)&sets))])
#
range var_list
0 0 [1, 2, 3]
2 2 [1, 7, 8]
列表理解与python设置交集以创建掩码和切片
m = [len(setS & x) > 0 for x in df.var_list.map(set)]
df[m]
Out[21]:
range var_list
0 0 [1, 2, 3]
2 2 [1, 7, 8]
您可以使用应用映射和/或运算符来完成此操作,方法是将列表转换为集合并进行比较:
[df.var_list.apply(lambda x: False if len(setS | set(x)) > 4 else True)]
Out[3343]:
range var_list
0 0 [1, 2, 3]
2 2 [1, 7, 8]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.