[英]Pandas: Truncate string in column based on substring pulled from other column (Python 3)
我有一個 dataframe 有兩個相關的列,“rm_word”和“article”。
數據樣本:
,grouping,fts,article,rm_word
0,"1",fts,"This is the article. This is a sentence. This is a sentence. This is a sentence. This goes on for awhile and that's super ***crazy***. It goes on and on.",crazy
我想查詢每篇“文章”的最后 100 個字符,以確定其行的相應“rm_word”是否出現。 如果是這樣,那么我想刪除出現“rm_word”的整個句子以及“文章”中出現的所有句子。
期望的結果(當“crazy”是“rm_word”時):
,grouping,fts,article,rm_word
0,"1",fts,"This is the article. This is a sentence. This is a sentence. This is a sentence.",crazy
這個掩碼能夠確定一篇文章何時包含它的“rm_word”,但我在句子刪除位上遇到了問題。
mask = ([ (str(a) in b[-100:].lower()) for a,b in zip(df["rm_word"], df["article"])])
print (df.loc[mask])
任何幫助將非常感激。 太感謝了。
這行得通嗎?
df = pd.DataFrame(
columns=['article', 'rm_word'],
data=[["This is the article. This is a sentence. This is a sentence. This is a sentence.", 'crazy'],
["This is the article. This is a sentence. This is a sentence. This is a sentence. This goes on for awhile and that's super crazy. It goes on and on.", 'crazy']]
)
def clean_article(x):
if x['rm_word'] not in x['article'][-100:].lower():
return x
article = x['article'].rsplit(x['rm_word'])[0]
article = article.split('.')[:-1]
x['article'] = '.'.join(article) + '.'
return x
df = df.apply(lambda x: clean_article(x), axis=1)
df['article'].values
退貨
array(['This is the article. This is a sentence. This is a sentence. This is a sentence.',
'This is the article. This is a sentence. This is a sentence. This is a sentence.'],
dtype=object)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.