簡體   English   中英

如何刪除具有特定字符的句子?

[英]How to remove sentences with a specific character?

我有一個帶有文章文本的 dataframe。 其中,一行有幾個帶有版權符號“©”的句子。

文章_文本
© Aaron Davidson/Getty Images Aaron Davidson/Getty Images Beyond Meat 裁員 19%,包括首席運營官,據該公司發布的消息稱。 首席執行官 Ethan Brown 表示,這家以植物為基礎的公司正在“大幅削減開支”,以專注於增長。 這是我吃過的最好的快餐之一。 6/25 幻燈片 © Mary Meisenzahl/Insider 根據 Taco Bell 全球營養與可持續發展總監 Missy Schaaphok 的說法,植物性蛋白質並非意味着與 Taco Bell 的招牌牛肉無法區分,而是“同樣令人垂涎”。 22/25 幻燈片 © Diana G./Yelp 2019 年,Taco Bell 北美總裁 Julie Felss Masino 公開表示,該連鎖店依靠自己的素食選擇,而不是創造新的植物性肉類替代品。 盡管目前尚不清楚具體有多少員工被裁掉 go,但截至 2021 年,該公司約有 1,100 名員工。

我只想刪除帶有版權符號的行中的句子,並且我想對數據集中的每一行都這樣做。 這就是我想要的樣子:

文章_文本
首席執行官 Ethan Brown 表示,這家以植物為基礎的公司正在“大幅削減開支”,以專注於增長。 這是我吃過的最好的快餐之一。 /Yelp 2019 年,Taco Bell 北美總裁 Julie Felss Masino 公開表示,該連鎖店依靠自己的素食選擇,而不是創造新的植物性肉類替代品。 盡管目前尚不清楚具體有多少員工被裁掉 go,但截至 2021 年,該公司約有 1,100 名員工。

這是我試過的:

for i in df['article_texts']:
try:
 paragraph = i
 tokens = paragraph.split(".")
 for sentence in tokens:
  if "©" in sentence:
   tokens.remove(sentence)
   final = (".").join(tokens)
   df['summaries'].loc[(df['summaries'] == i)] = final
except:
 print("Yeah, we good.")

然而,我仍然得到這個:

文章_文本
首席執行官 Ethan Brown 表示,這家以植物為基礎的公司正在“大幅削減開支”,以專注於增長。 這是我吃過的最好的快餐之一。 6/25 幻燈片 © Mary Meisenzahl/Insider 根據 Taco Bell 全球營養與可持續發展總監 Missy Schaaphok 的說法,植物性蛋白質並非意味着與 Taco Bell 的招牌牛肉無法區分,而是“同樣令人垂涎”。 22/25 幻燈片 © Diana G./Yelp 2019 年,Taco Bell 北美總裁 Julie Felss Masino 公開表示,該連鎖店依靠自己的素食選擇,而不是創造新的植物性肉類替代品。 盡管目前尚不清楚具體有多少員工被裁掉 go,但截至 2021 年,該公司約有 1,100 名員工。

我究竟做錯了什么?

我想擴展一下其他人的答案。 任何需要轉換列值的問題都非常適合使用 .map()。 我聲稱這使得代碼更具可讀性。

def remove_sentences_with_copyright(paragraph):
    return '.'.join(sentence for sentence in paragraph.split(".") if "©" not in sentence)

df['summaries'] = df['article_texts'].map(remove_sentences_with_copyright)

我將分享簡單的過程。

©替換為掩碼 #

拆分字符串.

使用列表壓縮刪除元素

text ="""© Aaron Davidson/Getty Images Aaron Davidson/Getty Images Beyond Meat cuts 19% of workforce including disgraced COO, according to a release from the company. CEO Ethan Brown says the plant-based company is 'significantly reducing expenses' in an effort to focus on growth. It was one of the best fast food meals I've ever had. 6/25 SLIDES © Mary Meisenzahl/Insider The plant-based protein wasn't meant to be indistinguishable from Taco Bell's signature beef, but "equally cravable," according to Taco Bell's director of global nutrition & sustainability Missy Schaaphok. 22/25 SLIDES © Diana G./Yelp In 2019, Taco Bell North America president Julie Felss Masino publicly said that the chain was relying on
its own vegetarian options instead of creating new plant-based meat substitutes. Although it remains unclear exactly how
many employees were let go, the company ended 2021 with about 1,100 employees."""
my_list = text.replace("©", 'mask')
my_list = my_list.split(".")


mask = ['mask']

filtered = ([el for el in my_list if not any(ignore in el for ignore in mask)])
print(filtered)

output 清單#

[" CEO Ethan Brown says the plant-based company is 'significantly reducing expenses' in an effort to focus on growth", " It was one of the best fast food meals I've ever had", '/Yelp In 2019, Taco Bell North America president Julie Felss Masino publicly said that the chain was relying on\nits own vegetarian options instead of creating new plant-based meat substitutes', ' Although it remains unclear exactly how\nmany employees were let go, the company ended 2021 with about 1,100 employees', '']

加入列表

filtered ='. '.join(filtered)

output#

CEO Ethan Brown says the plant-based company is 'significantly reducing 
expenses' in an effort to focus on growth.  It was one of the best fast 
food meals I've ever had. /Yelp In 2019, Taco Bell North America 
president Julie Felss Masino publicly said that the chain was relying on
its own vegetarian options instead of creating new plant-based meat 
substitutes.  Although it remains unclear exactly how
many employees were let go, the company ended 2021 with about 1,100 
employees. 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM