[英]How to remove sentences with a specific character?
我有一个带有文章文本的 dataframe。 其中,一行有几个带有版权符号“©”的句子。
文章_文本 |
---|
© Aaron Davidson/Getty Images Aaron Davidson/Getty Images Beyond Meat 裁员 19%,包括首席运营官,据该公司发布的消息称。 首席执行官 Ethan Brown 表示,这家以植物为基础的公司正在“大幅削减开支”,以专注于增长。 这是我吃过的最好的快餐之一。 6/25 幻灯片 © Mary Meisenzahl/Insider 根据 Taco Bell 全球营养与可持续发展总监 Missy Schaaphok 的说法,植物性蛋白质并非意味着与 Taco Bell 的招牌牛肉无法区分,而是“同样令人垂涎”。 22/25 幻灯片 © Diana G./Yelp 2019 年,Taco Bell 北美总裁 Julie Felss Masino 公开表示,该连锁店依靠自己的素食选择,而不是创造新的植物性肉类替代品。 尽管目前尚不清楚具体有多少员工被裁掉 go,但截至 2021 年,该公司约有 1,100 名员工。 |
我只想删除带有版权符号的行中的句子,并且我想对数据集中的每一行都这样做。 这就是我想要的样子:
文章_文本 |
---|
首席执行官 Ethan Brown 表示,这家以植物为基础的公司正在“大幅削减开支”,以专注于增长。 这是我吃过的最好的快餐之一。 /Yelp 2019 年,Taco Bell 北美总裁 Julie Felss Masino 公开表示,该连锁店依靠自己的素食选择,而不是创造新的植物性肉类替代品。 尽管目前尚不清楚具体有多少员工被裁掉 go,但截至 2021 年,该公司约有 1,100 名员工。 |
这是我试过的:
for i in df['article_texts']:
try:
paragraph = i
tokens = paragraph.split(".")
for sentence in tokens:
if "©" in sentence:
tokens.remove(sentence)
final = (".").join(tokens)
df['summaries'].loc[(df['summaries'] == i)] = final
except:
print("Yeah, we good.")
然而,我仍然得到这个:
文章_文本 |
---|
首席执行官 Ethan Brown 表示,这家以植物为基础的公司正在“大幅削减开支”,以专注于增长。 这是我吃过的最好的快餐之一。 6/25 幻灯片 © Mary Meisenzahl/Insider 根据 Taco Bell 全球营养与可持续发展总监 Missy Schaaphok 的说法,植物性蛋白质并非意味着与 Taco Bell 的招牌牛肉无法区分,而是“同样令人垂涎”。 22/25 幻灯片 © Diana G./Yelp 2019 年,Taco Bell 北美总裁 Julie Felss Masino 公开表示,该连锁店依靠自己的素食选择,而不是创造新的植物性肉类替代品。 尽管目前尚不清楚具体有多少员工被裁掉 go,但截至 2021 年,该公司约有 1,100 名员工。 |
我究竟做错了什么?
我想扩展一下其他人的答案。 任何需要转换列值的问题都非常适合使用 .map()。 我声称这使得代码更具可读性。
def remove_sentences_with_copyright(paragraph):
return '.'.join(sentence for sentence in paragraph.split(".") if "©" not in sentence)
df['summaries'] = df['article_texts'].map(remove_sentences_with_copyright)
我将分享简单的过程。
将©
替换为掩码 #
拆分字符串.
使用列表压缩删除元素
text ="""© Aaron Davidson/Getty Images Aaron Davidson/Getty Images Beyond Meat cuts 19% of workforce including disgraced COO, according to a release from the company. CEO Ethan Brown says the plant-based company is 'significantly reducing expenses' in an effort to focus on growth. It was one of the best fast food meals I've ever had. 6/25 SLIDES © Mary Meisenzahl/Insider The plant-based protein wasn't meant to be indistinguishable from Taco Bell's signature beef, but "equally cravable," according to Taco Bell's director of global nutrition & sustainability Missy Schaaphok. 22/25 SLIDES © Diana G./Yelp In 2019, Taco Bell North America president Julie Felss Masino publicly said that the chain was relying on
its own vegetarian options instead of creating new plant-based meat substitutes. Although it remains unclear exactly how
many employees were let go, the company ended 2021 with about 1,100 employees."""
my_list = text.replace("©", 'mask')
my_list = my_list.split(".")
mask = ['mask']
filtered = ([el for el in my_list if not any(ignore in el for ignore in mask)])
print(filtered)
output 清单#
[" CEO Ethan Brown says the plant-based company is 'significantly reducing expenses' in an effort to focus on growth", " It was one of the best fast food meals I've ever had", '/Yelp In 2019, Taco Bell North America president Julie Felss Masino publicly said that the chain was relying on\nits own vegetarian options instead of creating new plant-based meat substitutes', ' Although it remains unclear exactly how\nmany employees were let go, the company ended 2021 with about 1,100 employees', '']
加入列表
filtered ='. '.join(filtered)
output#
CEO Ethan Brown says the plant-based company is 'significantly reducing
expenses' in an effort to focus on growth. It was one of the best fast
food meals I've ever had. /Yelp In 2019, Taco Bell North America
president Julie Felss Masino publicly said that the chain was relying on
its own vegetarian options instead of creating new plant-based meat
substitutes. Although it remains unclear exactly how
many employees were let go, the company ended 2021 with about 1,100
employees.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.