[英]Pandas: Truncate string in column based on substring pulled from other column (Python 3)
I have a dataframe with two pertinent columns, "rm_word" and "article."我有一个 dataframe 有两个相关的列,“rm_word”和“article”。
Data Sample:数据样本:
,grouping,fts,article,rm_word
0,"1",fts,"This is the article. This is a sentence. This is a sentence. This is a sentence. This goes on for awhile and that's super ***crazy***. It goes on and on.",crazy
I want to query the last 100 characters of each "article" to determine if its row's respective "rm_word" appears.我想查询每篇“文章”的最后 100 个字符,以确定其行的相应“rm_word”是否出现。 If it does, then I want to delete the entire sentence in which "rm_word" appears as well as all the sentences that follows it from the "article."如果是这样,那么我想删除出现“rm_word”的整个句子以及“文章”中出现的所有句子。
Desired Result (when "crazy" is the "rm_word"):期望的结果(当“crazy”是“rm_word”时):
,grouping,fts,article,rm_word
0,"1",fts,"This is the article. This is a sentence. This is a sentence. This is a sentence.",crazy
This mask is able to determine when an article contains its "rm_word," but I'm having trouble with the sentence deletion bit.这个掩码能够确定一篇文章何时包含它的“rm_word”,但我在句子删除位上遇到了问题。
mask = ([ (str(a) in b[-100:].lower()) for a,b in zip(df["rm_word"], df["article"])])
print (df.loc[mask])
Any help would be much appreciated.任何帮助将非常感激。 Thank you so much.太感谢了。
Does this work?这行得通吗?
df = pd.DataFrame(
columns=['article', 'rm_word'],
data=[["This is the article. This is a sentence. This is a sentence. This is a sentence.", 'crazy'],
["This is the article. This is a sentence. This is a sentence. This is a sentence. This goes on for awhile and that's super crazy. It goes on and on.", 'crazy']]
)
def clean_article(x):
if x['rm_word'] not in x['article'][-100:].lower():
return x
article = x['article'].rsplit(x['rm_word'])[0]
article = article.split('.')[:-1]
x['article'] = '.'.join(article) + '.'
return x
df = df.apply(lambda x: clean_article(x), axis=1)
df['article'].values
Returns退货
array(['This is the article. This is a sentence. This is a sentence. This is a sentence.',
'This is the article. This is a sentence. This is a sentence. This is a sentence.'],
dtype=object)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.