[英]Filter Pandas Dataframe based on List of substrings
I have a Pandas Dataframe containing multiple colums of strings.我有一个 Pandas Dataframe 包含多个字符串列。 I now like to check a certain column against a list of allowed substrings and then get a new subset with the result.
我现在想根据允许的子字符串列表检查特定列,然后得到一个包含结果的新子集。
substr = ['A', 'C', 'D']
df = pd.read_excel('output.xlsx')
df = df.dropna()
# now filter all rows where the string in the 2nd column doesn't contain one of the substrings
The only approach I found was creating a List of the corresponding column an then do a list comprehension, but then I loose the other columns.我发现的唯一方法是创建相应列的列表,然后进行列表理解,但随后我松开了其他列。 Can I use list comprehension as part of eg
df.str.contains()
?我可以使用列表理解作为例如
df.str.contains()
的一部分吗?
year type value price
2000 ty-A 500 10000
2002 ty-Q 200 84600
2003 ty-R 500 56000
2003 ty-B 500 18000
2006 ty-C 500 12500
2012 ty-A 500 65000
2018 ty-F 500 86000
2019 ty-D 500 51900
expected output:预计 output:
year type value price
2000 ty-A 500 10000
2006 ty-C 500 12500
2012 ty-A 500 65000
2019 ty-D 500 51900
You could use pandas.Series.isin
您可以使用
pandas.Series.isin
>>> df.loc[df['type'].isin(substr)]
year type value price
0 2000 A 500 10000
4 2006 C 500 12500
5 2012 A 500 65000
7 2019 D 500 51900
you could use pandas.DataFrame.any or pandas.DataFrame.all你可以使用pandas.DataFrame.any或pandas.DataFrame.all
if you want where all instances match如果你想要所有实例匹配的地方
df.loc[df['type'].apply(lambda x: all( word in x for word in substr)
or if you want any from the substr或者如果你想从 substr
df.loc[df['type'].apply(lambda x: any( word in x for word in substr)
That should if you print or return df a filtered list.如果您打印或返回 df 过滤列表,那应该。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.