简体   繁体   English

如果 substring 在数据框列的列表中,则从字符串中删除 substring

[英]Remove substring from string if substring in list in data frame column

I have the following data frame df1我有以下数据框df1

       string             lists
0      i have a dog       ['fox', 'dog', 'cat']
1      there is a cat     ['dog', 'house', 'car']
2      hello everyone     ['hi', 'hello', 'everyone']
3      hi my name is Joe  ['name', 'was', 'Joe']

I'm trying to return a data frame df2 that looks like this我正在尝试返回一个看起来像这样的数据框df2

       string             lists                         new_string
0      i have a dog       ['fox', 'dog', 'cat']         i have a
1      there is a cat     ['dog', 'house', 'car']       there is a cat
2      hello everyone     ['hi', 'hello', 'everyone']   
3      hi my name is Joe  ['name', 'was', 'Joe']        hi my is

I've referenced other questions such as https://stackoverflow.com/a/40493603/5879909 , but I'm having trouble searching through a list in a column as opposed to a preset list.我已经引用了其他问题,例如https://stackoverflow.com/a/40493603/5879909 ,但是我在搜索列中的列表而不是预设列表时遇到了麻烦。

Considering that the dataframe is df , and that OP's goal is to create a new column named new_string whose values are strings equal to the one's in the string column without a string in the lists column, for that specific row, the following will do the work考虑到 dataframe 是df ,并且OP的目标是创建一个名为new_string的新列,其值是字符串等于string列中的字符串,而lists列中没有字符串,对于该特定行,以下将完成工作

df['new_string'] = df['string'].apply(lambda x: ' '.join([word for word in x.split() if word not in df['lists'][df['string'] == x].values[0]]))

[Out]:
              string                  lists      new_string
0       i have a dog        [fox, dog, cat]        i have a
1     there is a cat      [dog, house, car]  there is a cat
2     hello everyone  [hi, hello, everyone]                
3  hi my name is Joe       [name, was, Joe]        hi my is

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM