简体   繁体   English

根据子串列表过滤 Pandas Dataframe

[英]Filter Pandas Dataframe based on List of substrings

I have a Pandas Dataframe containing multiple colums of strings.我有一个 Pandas Dataframe 包含多个字符串列。 I now like to check a certain column against a list of allowed substrings and then get a new subset with the result.我现在想根据允许的子字符串列表检查特定列,然后得到一个包含结果的新子集。

substr = ['A', 'C', 'D']
df = pd.read_excel('output.xlsx')
df = df.dropna()
# now filter all rows where the string in the 2nd column doesn't contain one of the substrings

The only approach I found was creating a List of the corresponding column an then do a list comprehension, but then I loose the other columns.我发现的唯一方法是创建相应列的列表,然后进行列表理解,但随后我松开了其他列。 Can I use list comprehension as part of eg df.str.contains() ?我可以使用列表理解作为例如df.str.contains()的一部分吗?

year  type     value   price
2000  ty-A     500     10000
2002  ty-Q     200     84600
2003  ty-R     500     56000
2003  ty-B     500     18000
2006  ty-C     500     12500
2012  ty-A     500     65000
2018  ty-F     500     86000
2019  ty-D     500     51900

expected output:预计 output:

year  type     value   price
2000  ty-A     500     10000
2006  ty-C     500     12500
2012  ty-A     500     65000
2019  ty-D     500     51900

You could use pandas.Series.isin 您可以使用pandas.Series.isin

>>> df.loc[df['type'].isin(substr)]
   year type  value  price
0  2000    A    500  10000
4  2006    C    500  12500
5  2012    A    500  65000
7  2019    D    500  51900

you could use pandas.DataFrame.any or pandas.DataFrame.all你可以使用pandas.DataFrame.anypandas.DataFrame.all

if you want where all instances match如果你想要所有实例匹配的地方

df.loc[df['type'].apply(lambda x: all( word in x for word in substr)

or if you want any from the substr或者如果你想从 substr

df.loc[df['type'].apply(lambda x: any( word in x for word in substr)

That should if you print or return df a filtered list.如果您打印或返回 df 过滤列表,那应该。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM