Pandas：根據從其他列中提取的 substring 截斷列中的字符串（Python 3）

Question

我有一個 dataframe 有兩個相關的列，“rm_word”和“article”。

數據樣本：

,grouping,fts,article,rm_word
0,"1",fts,"This is the article. This is a sentence. This is a sentence. This is a sentence. This goes on for awhile and that's super ***crazy***. It goes on and on.",crazy

我想查詢每篇“文章”的最后 100 個字符，以確定其行的相應“rm_word”是否出現。 如果是這樣，那么我想刪除出現“rm_word”的整個句子以及“文章”中出現的所有句子。

期望的結果（當“crazy”是“rm_word”時）：

,grouping,fts,article,rm_word
0,"1",fts,"This is the article. This is a sentence. This is a sentence. This is a sentence.",crazy

這個掩碼能夠確定一篇文章何時包含它的“rm_word”，但我在句子刪除位上遇到了問題。

mask = ([ (str(a) in b[-100:].lower()) for a,b in zip(df["rm_word"], df["article"])])

print (df.loc[mask])

任何幫助將非常感激。 太感謝了。

Answer 1

這行得通嗎？

df = pd.DataFrame(
    columns=['article', 'rm_word'],
    data=[["This is the article. This is a sentence. This is a sentence. This is a sentence.", 'crazy'],
          ["This is the article. This is a sentence. This is a sentence. This is a sentence. This goes on for awhile and that's super crazy. It goes on and on.", 'crazy']]
)

def clean_article(x):
    if x['rm_word'] not in x['article'][-100:].lower():
        return x
    article = x['article'].rsplit(x['rm_word'])[0]
    article = article.split('.')[:-1]
    x['article'] = '.'.join(article) + '.'
    return x


df = df.apply(lambda x: clean_article(x), axis=1)
df['article'].values

退貨

array(['This is the article. This is a sentence. This is a sentence. This is a sentence.',
       'This is the article. This is a sentence. This is a sentence. This is a sentence.'],
      dtype=object)

Pandas：根據從其他列中提取的 substring 截斷列中的字符串（Python 3）

問題描述

1 個解決方案

解決方案1
1 已采納 2020-06-30 17:06:12

Pandas：根據從其他列中提取的 substring 截斷列中的字符串（Python 3）

問題描述

1 個解決方案

解決方案1 1 已采納 2020-06-30 17:06:12

解決方案1
1 已采納 2020-06-30 17:06:12