[英]Filter pandas dataframe based on column list values
My dataframe has many columns.我的dataframe有很多栏目。 one of these columns is array
这些列之一是数组
df
Out[191]:
10012005 10029008 10197000 ... filename_int filename result
0 0.0 0.0 0.0 ... 1 1.0 [280, NON]
1 0.0 0.0 0.0 ... 10 10.0 [286, NON]
2 0.0 0.0 0.0 ... 100 100.0 [NON, 285]
3 0.0 0.0 0.0 ... 10000 10000.0 [NON, 286]
4 0.0 0.0 0.0 ... 10001 10001.0 [NON]
... ... ... ... ... ... ...
52708 0.0 0.0 0.0 ... 9995 9995.0 [NON]
52709 0.0 0.0 0.0 ... 9996 9996.0 [NON]
52710 0.0 0.0 0.0 ... 9997 9997.0 [285, NON]
52711 0.0 0.0 0.0 ... 9998 9998.0 [NON]
52712 0.0 0.0 0.0 ... 9999 9999.0 [NON]
[52713 rows x 4289 columns]
the column result is an array of these values列结果是这些值的数组
[NON]
[123,NON]
[357,938,837]
[455,NON,288]
[388,929,NON,020]
I want my filter dataframe to only display records that has values other than NON我希望我的过滤器 dataframe 只显示具有非 NON 值的记录
therefore values such as因此值如
[NON,NON]
[NON]
[]
these will be excluded这些将被排除在外
only in the filer values like仅在文件管理器值中
[123,NON]
[357,938,837]
[455,NON,288]
[388,929,NON,020]
I tried this code我试过这段代码
df[len(df["result"])!="NON"]
but I get this error !!但我得到这个错误!
File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: True
how to filter my dataframe?如何过滤我的 dataframe?
Try map
with lambda
here:在此处尝试使用
map
和lambda
:
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [[280, 'NON'], ['NON'], [], [285]] })
df
A B
0 1 [280, NON]
1 2 [NON]
2 3 []
3 4 [285]
df[df['B'].map(lambda x: any(y != 'NON' for y in x))]
A B
0 1 [280, NON]
3 4 [285]
The generator expression inside map
returns True if there are at least 1 items in the list which are "NON".如果列表中至少有 1 个项目是“NON”,则
map
中的生成器表达式返回 True。
You can use apply
to identify rows that meet your criteria.您可以使用
apply
来识别满足您的条件的行。 Here, the filter works because apply returns a boolean
.在这里,过滤器起作用是因为 apply 返回
boolean
。
import pandas as pd
import numpy as np
vals = [280, 285, 286, 'NON', 'NON', 'NON']
listcol = [np.random.choice(vals, 3) for _ in range(100)]
df = pd.DataFrame({'vals': listcol})
def is_non(l):
return len([i for i in l if i != 'NON']) > 0
df.loc[df.vals.apply(is_non), :]
I will do我会做
s=pd.DataFrame(df.B.tolist())
df=df[(s.ne('NON')&s.notnull()).any(1).to_numpy()].copy()
A B
0 1 [280, NON]
3 4 [285]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.