![](/img/trans.png)
[英]Filter all that rows by matching a substring in all the columns in Pandas df
[英]How to filter values in columns of df that contains all substring in passed list in Pandas DataFrame Python?
任何建議如何只保留包含任何列列表中所有 substring 的值?:
import pandas as pd
df = pd.DataFrame(
[
[1, 'foollish', 'molish'],
[2, 'barnylishon', 'chacha'],
[3, 'bazon', 'gazon'],
],
columns=['id', 'value_1', 'value_2'])
print (df)
search_list = ['a','on']
print ("Desire result for value_1 column:")
df_desire_result = pd.DataFrame(
[
[1, 'barnylishon', 'chacha'],
[2, 'bazon', 'gazon'],
],
columns=['id', 'value_1', 'value_2'])
print (df_desire_result)
從這個包含列表中任何列的所有 substring 的語句中? :我認為如果一行中的任何列都包含search_list
中的所有子字符串,則保留該行並刪除剩余的行。
然后IIUC:
cols = df.columns.drop('id').tolist()
m = df[cols].apply(lambda x: all([any(x.str.contains(s)) for s in search_list]), axis=1)
out = df[m]
打印):
id value_1 value_2
1 2 barnylishon chacha
2 3 bazon gazon
您可以使用:
# craft regex pattern
import re
pattern = '|'.join(map(re.escape, search_list))
# 'a|on'
out = df.loc[(df
# extract words from all cells
.filter(like='value')
.stack()
.str.extractall(fr'({pattern})')[0]
# ensure that each word is present at least once per row
.groupby(level=0).nunique()
.eq(len(search_list))
.reindex(df.index, fill_value=False)
)]
print(out)
Output:
id value_1 value_2
1 2 barnylishon chacha
2 3 bazon gazon
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.