簡體   English   中英

Python - 刪除列表中沒有的所有子字符串

[英]Python - Remove all sub-strings not in list

我想刪除df列中不存在於已定義列表中的所有子字符串。 例如:

mylist = {good, like, bad, hated, terrible, liked}

Current:                                         Desired:
index      content                               index        content                                          
0          a very good idea, I like it           0            good like
1          was the bad thing to do               1            bad
2          I hated it, it was terrible           2            hated terrible
...                                              ...
k          Why do you think she liked it         k            liked

我已經設法定義了一個函數,它保存所有單詞不在列表中,但是不知道如何反轉這個函數來實現我想要的:

pat = r'\b(?:{})\b'.format('|'.join(mylist))
df['column1'] = df['column1'].str.contains(pat, '')

任何幫助,將不勝感激。

str.findallstr.join str.findall使用:

df['column1'] = df['content'].str.findall('(' + pat + ')').str.join(' ')
print (df)
                         content         column1
0    a very good idea, I like it       good like
1        was the bad thing to do             bad
2    I hated it, it was terrible  hated terrible
3  Why do you think she liked it           liked

或者使用拆分,過濾和連接列表理解:

df['column1'] = df['content'].apply(lambda x: ' '.join([y for y in x.split() if y in mylist]))
print (df)
                         content         column1
0    a very good idea, I like it       good like
1        was the bad thing to do             bad
2    I hated it, it was terrible  hated terrible
3  Why do you think she liked it           liked

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM