[英]Remove substring from string if substring in list in data frame column
I have the following data frame df1
我有以下数据框
df1
string lists
0 i have a dog ['fox', 'dog', 'cat']
1 there is a cat ['dog', 'house', 'car']
2 hello everyone ['hi', 'hello', 'everyone']
3 hi my name is Joe ['name', 'was', 'Joe']
I'm trying to return a data frame df2
that looks like this我正在尝试返回一个看起来像这样的数据框
df2
string lists new_string
0 i have a dog ['fox', 'dog', 'cat'] i have a
1 there is a cat ['dog', 'house', 'car'] there is a cat
2 hello everyone ['hi', 'hello', 'everyone']
3 hi my name is Joe ['name', 'was', 'Joe'] hi my is
I've referenced other questions such as https://stackoverflow.com/a/40493603/5879909 , but I'm having trouble searching through a list in a column as opposed to a preset list.我已经引用了其他问题,例如https://stackoverflow.com/a/40493603/5879909 ,但是我在搜索列中的列表而不是预设列表时遇到了麻烦。
Considering that the dataframe is df
, and that OP's goal is to create a new column named new_string
whose values are strings equal to the one's in the string
column without a string in the lists
column, for that specific row, the following will do the work考虑到 dataframe 是
df
,并且OP的目标是创建一个名为new_string
的新列,其值是字符串等于string
列中的字符串,而lists
列中没有字符串,对于该特定行,以下将完成工作
df['new_string'] = df['string'].apply(lambda x: ' '.join([word for word in x.split() if word not in df['lists'][df['string'] == x].values[0]]))
[Out]:
string lists new_string
0 i have a dog [fox, dog, cat] i have a
1 there is a cat [dog, house, car] there is a cat
2 hello everyone [hi, hello, everyone]
3 hi my name is Joe [name, was, Joe] hi my is
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.