![](/img/trans.png)
[英]Check if each row in a pandas series contains a string from a list using apply?
[英]For each row in Pandas dataframe, check if row contains string from list
我有一个给定的字符串列表,如下所示:
List=['plastic', 'carboard', 'wood']
我的 dataframe 中有一列 dtype 字符串,如下所示:
Column=['beer plastic', 'water cardboard', 'eggs plastic', 'fruits wood']
对于列中的每一行,我想知道该行是否包含列表中的一个单词,如果是,我只想保留该单词之前的文本,如下所示:
New_Column=['beer', 'water', 'eggs', 'fruits']
有没有办法对我的 dataframe (数百万行)的每一行进行系统化? 谢谢
PS。 我试过用正则表达式模式匹配这样的 function
pattern=re.compile('**Pattern to be defined to include element from list**')
def truncate(row, pattern):
Column=row['Column']
if bool(pattern.match(Column)):
Column=Column.replace(**word from list**,"")
return Column
df['New_column']=df.apply(truncate,axis=1, pattern=pattern)
##df
0
0 beer plastic
1 water cardboard
2 eggs plastic
3 fruits wood
l=['plastic', 'cardboard', 'wood']
str.findall
df[0].str.findall('\w+\s*(?=' + '|'.join(l) +')').apply(lambda x: x[0].strip() if len(x) else 'NotFound') ##output 0 beer 1 water 2 eggs 3 fruits Name: 0, dtype: object
import pandas as pd
...
for index, row in df.iterrows():
for word in List_name:
row['Column_name'] = row['Column_name'].partition(word)[0] if (word in row['Column_name']) else row['Column_name']
如果你想运行一个工作示例:
import pandas as pd
List=['plastic', 'carboard', 'wood']
df = pd.DataFrame([{'c1':"fun carboard", 'c2':"jolly plastic"}, {'c1':"meh wood",'c2':"aba"}, {'c1':"aaa",'c2':"bbb"}, {'c1':"old wood",'c2':"bbb"}])
for index, row in df.iterrows():
for word in List:
row['c1'] = row['c1'].partition(word)[0] if (word in row['c1']) else row['c1']
row['c2'] = row['c2'].partition(word)[0] if (word in row['c2']) else row['c2']
df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.