[英]Split data frame of comments into multiple rows
我有一個帶有長評論的數據框,我想使用 spacy sentencizer 將它們分成單獨的句子。
Comments = pd.read_excel('Comments.xlsx', sheet_name = 'Sheet1')
Comments
>>>
reviews
0 One of the rare films where every discussion leaving the theater is about how much you
just had, instead of an analysis of its quotients.
1 Gorgeous cinematography, insane flying action sequences, thrilling, emotionally moving,
and a sequel that absolutely surpasses its predecessor. Well-paced, executed & has that
re-watchability factor.
我像這樣加載了 model
import spacy
nlp = spacy.load("en_core_news_sm")
並使用sentencizer
from spacy.lang.en import English
nlp = English()
nlp.add_pipe('sentencizer')
Data = Comments.reviews.apply(lambda x : list( nlp(x).sents))
但是當我檢查句子是這樣的一行時
[One of the rare films where every discussion leaving the theater is about how much you just had.,
Instead of an analysis of its quotients.]
非常感謝您的幫助。 我是在數據框中使用 NLP 工具的新手。
目前, Data
是一個Series
,其行是句子列表,或者實際上是 Spacy 的Span
對象列表。 您可能想要獲取這些句子的文本並將每個句子放在不同的行上。
comments = [{'reviews': 'This is the first sentence of the first review. And this is the second.'},
{'reviews': 'This is the first sentence of the second review. And this is the second.'}]
comments = pd.DataFrame(comments) # building your input DataFrame
+----+--------------------------------------------------------------------------+
| | reviews |
|----+--------------------------------------------------------------------------|
| 0 | This is the first sentence of the first review. And this is the second. |
| 1 | This is the first sentence of the second review. And this is the second. |
+----+--------------------------------------------------------------------------+
現在讓我們定義一個 function,給定一個字符串,將其句子列表作為文本(字符串)返回。
def obtain_sentences(s):
doc = nlp(s)
sents = [sent.text for sent in doc.sents]
return sents
可以將 function 應用於comments
DataFrame
以生成包含句子的新DataFrame
。
data = comments.copy()
data['reviews'] = comments.apply(lambda x: obtain_sentences(x['reviews']), axis=1)
data = data.explode('reviews').reset_index(drop=True)
data
我使用explode
將句子列表的元素轉換為行。
這是獲得的輸出!
+----+--------------------------------------------------+
| | reviews |
|----+--------------------------------------------------|
| 0 | This is the first sentence of the first review. |
| 1 | And this is the second. |
| 2 | This is the first sentence of the second review. |
| 3 | And this is the second. |
+----+--------------------------------------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.