比较数据框中的两个熊猫系列并对其应用说明

Question

I have been trying to compare substrings of two series from a pandas dataframe.我一直在尝试比较 Pandas 数据帧中两个系列的子字符串。 The two series are "titles" and "News" which are respectively the news headline and news body from a newspaper website that I scraped from.两个系列是“标题”和“新闻”，分别是我从一个报纸网站上抓取的新闻标题和新闻正文。 Now, many of the "News" indexes have the headline included in it at the first line and I want to remove that from the "News" series.现在，许多“新闻”索引的第一行都包含标题，我想将其从“新闻”系列中删除。

For example:例如：

df["News"][0] = "Mother Killed, police official injured in Madaripur road accidentA woman was killed .... flee the scene.AH/MUS"
df["titles"][0] = "Mother Killed, police official injured in Madaripur road accident"

I want to remove the titles from the News.我想从新闻中删除标题。 In the above example, this should yield "A woman was killed .... flee the scene.AH/MUS"在上面的例子中，这应该产生“一个女人被杀......逃离现场。AH/MUS”

I have done it like this:我是这样做的：

df["replaced"] = [(df["News"][i].replace(df["titles"][i], ""))
                   for i in range(df.shape[0])
                 ]

This does the work, but I want to know what should be the fastest method for this.这可以工作，但我想知道什么应该是最快的方法。 To be specific, I am looking for a more pandas way and don't want to loop over/use list comprehension.具体来说，我正在寻找一种更多的熊猫方式，并且不想循环/使用列表理解。 What could be a way of doing this so that I can apply it to the whole series without looping over?有什么方法可以做到这一点，以便我可以将其应用于整个系列而无需循环？

Answer 1

Try that it will work like charm尝试它会像魅力一样工作

def getit(row):
 try:
  return row.get("News").replace(row.get("titles"),"")
 except:
  return row.get("News") # in case row.get("titles") return non-string

df["replaced"] = df.apply(getit , axis = 1)

比较数据框中的两个熊猫系列并对其应用说明

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-07-05 17:44:28

比较数据框中的两个熊猫系列并对其应用说明

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-07-05 17:44:28

解决方案1
1 已采纳 2021-07-05 17:44:28