根據子串列表過濾 Pandas Dataframe

Question

我有一個 Pandas Dataframe 包含多個字符串列。 我現在想根據允許的子字符串列表檢查特定列，然后得到一個包含結果的新子集。

substr = ['A', 'C', 'D']
df = pd.read_excel('output.xlsx')
df = df.dropna()
# now filter all rows where the string in the 2nd column doesn't contain one of the substrings

我發現的唯一方法是創建相應列的列表，然后進行列表理解，但隨后我松開了其他列。 我可以使用列表理解作為例如df.str.contains()的一部分嗎？

year  type     value   price
2000  ty-A     500     10000
2002  ty-Q     200     84600
2003  ty-R     500     56000
2003  ty-B     500     18000
2006  ty-C     500     12500
2012  ty-A     500     65000
2018  ty-F     500     86000
2019  ty-D     500     51900

預計 output：

year  type     value   price
2000  ty-A     500     10000
2006  ty-C     500     12500
2012  ty-A     500     65000
2019  ty-D     500     51900

Answer 1

您可以使用pandas.Series.isin

>>> df.loc[df['type'].isin(substr)]
   year type  value  price
0  2000    A    500  10000
4  2006    C    500  12500
5  2012    A    500  65000
7  2019    D    500  51900

Answer 2

你可以使用pandas.DataFrame.any或pandas.DataFrame.all

如果你想要所有實例匹配的地方

df.loc[df['type'].apply(lambda x: all( word in x for word in substr)

或者如果你想從 substr

df.loc[df['type'].apply(lambda x: any( word in x for word in substr)

如果您打印或返回 df 過濾列表，那應該。

根據子串列表過濾 Pandas Dataframe

問題描述

2 個解決方案

解決方案1
2 2019-09-04 09:58:31

解決方案2
0 2022-04-19 02:17:49

根據子串列表過濾 Pandas Dataframe

問題描述

2 個解決方案

解決方案1 2 2019-09-04 09:58:31

解決方案2 0 2022-04-19 02:17:49

解決方案1
2 2019-09-04 09:58:31

解決方案2
0 2022-04-19 02:17:49