[英]Python - Remove all sub-strings not in list
我想刪除df列中不存在於已定義列表中的所有子字符串。 例如:
mylist = {good, like, bad, hated, terrible, liked}
Current: Desired:
index content index content
0 a very good idea, I like it 0 good like
1 was the bad thing to do 1 bad
2 I hated it, it was terrible 2 hated terrible
... ...
k Why do you think she liked it k liked
我已經設法定義了一個函數,它保存所有單詞不在列表中,但是不知道如何反轉這個函數來實現我想要的:
pat = r'\b(?:{})\b'.format('|'.join(mylist))
df['column1'] = df['column1'].str.contains(pat, '')
任何幫助,將不勝感激。
將str.findall
與str.join
str.findall
使用:
df['column1'] = df['content'].str.findall('(' + pat + ')').str.join(' ')
print (df)
content column1
0 a very good idea, I like it good like
1 was the bad thing to do bad
2 I hated it, it was terrible hated terrible
3 Why do you think she liked it liked
或者使用拆分,過濾和連接列表理解:
df['column1'] = df['content'].apply(lambda x: ' '.join([y for y in x.split() if y in mylist]))
print (df)
content column1
0 a very good idea, I like it good like
1 was the bad thing to do bad
2 I hated it, it was terrible hated terrible
3 Why do you think she liked it liked
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.