[英]Filter Pandas by a Column with List Values
Given a pandas DataFrame
that contains a column with list values给定一个包含带有列表值的列的
DataFrame
> pd.DataFrame.from_dict(
{'name' : {0 : 'foo', 1: 'bar', 2: 'baz', 3: 'foz'},
'Attributes': {0: ['x', 'y'], 1: ['y', 'z'], 2: ['x', 'z'], 3: []}
})
name Attributes
0 foo ['x', 'y']
1 bar ['y', 'z']
2 baz ['x', 'z']
3 foz []
How can the DataFrame be filtered for only those rows don't contain a certain value, eg 'y'
, in the lists:如何仅针对列表中不包含特定值(例如
'y'
那些行过滤 DataFrame:
2 baz ['x', 'z']
3 foz []
Thank you in advance for your consideration and response.预先感谢您的考虑和回应。
you can convert the series of list to a dataframe and compare if all the columns are not equal to y
:您可以将列表系列转换为数据框并比较所有列是否不等于
y
:
# is they aren't actual list : df['Attributes'] = df['Attributes'].apply(ast.literal_eval)
df[pd.DataFrame(df['Attributes'].tolist()).ne('y').all(1)]
Name Attributes
2 baz [x, z]
If they are not actual lists:如果它们不是实际列表:
df[df['Attributes'].str.count('y').eq(0)]
This should work (although it's not very elegant)这应该有效(虽然它不是很优雅)
def filter_data_frame(df):
good_index = []
for i in range(len(df)):
if "y" not in df.iloc[i,1]:
good_index.append(i)
return df.iloc[good_index, :]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.