使用Regex從Pandas中的句子中提取單詞進行網絡分析

Question

我有一個 Pandas dataframe ，我想從列中的句子中提取每個單詞並創建一個新的 dataframe ，其中每個單詞都有自己的行。 此外，原始 dataframe 具有應添加到新行的評級。

dataframe 看起來像這樣：

base_network
Body    Rating
0   Very satisfied  4
1   My daughter lost 2 spoons, so I adjusted them ...   5
2   It was a fiftieth birthday present for my elde...   5
3   Love the shape, shine & elegance of the candle...   5
4   Poor description of what I was buying   3
... ... ...
476 Nice quality but it is too small, description ...   3
477 Edited 6 January 2020As you will have seen, th...   3
478 I love this piece of jewelleryIt is elegant an...   5
479 The leather cord is a little stiff…but I guess...   4
480 Unfortunately the lens is too small and not ve...   1
481 rows × 2 columns

我嘗試使用正則表達式來划分句子中的單詞並將它們存儲在新的 dataframe 中。 隨后嘗試添加匹配的評級。 使用此代碼：

spaces = r"\s+"

words = pd.DataFrame()
df = pd.DataFrame()

for rows in base_network:
    words = re.split(spaces, base_network['Body'])
    words['Rating'] = base_network['Rating']
    df = df.append(words)
    
df.head()

我收到以下錯誤：

TypeError                                 Traceback (most recent call last)
<ipython-input-19-4ff5191a493d> in <module>()
      5 
      6 for rows in base_network:
----> 7     words = re.split(spaces, base_network['Body'])
      8     words['Rating'] = base_network['Rating']
      9     df = df.append(words)

/usr/lib/python3.7/re.py in split(pattern, string, maxsplit, flags)
    213     and the remainder of the string is returned as the final element
    214     of the list."""
--> 215     return _compile(pattern, flags).split(string, maxsplit)
    216 
    217 def findall(pattern, string, flags=0):

TypeError: expected string or bytes-like object

我試圖將正文列轉換為字符串類型，但這並沒有解決問題。

Answer 1

這是否滿足您的需求？

# split by any space
df.Body = df.Body.str.split(pat="\s")

# "explode" the list column into a long format. 
# The Rating column is recycled accordingly
df.explode("Body")

一些額外的想法

可能有必要調整正則表達式以在任何標點符號等處拆分。
請注意您的輸入數據。 在第 477 行，“Edited 6 January 2020As you...”似乎漏掉了一個空格。

使用Regex從Pandas中的句子中提取單詞進行網絡分析

問題描述

1 個解決方案

解決方案1
1 已采納 2021-11-20 14:15:13

一些額外的想法

使用Regex從Pandas中的句子中提取單詞進行網絡分析

問題描述

1 個解決方案

解決方案1 1 已采納 2021-11-20 14:15:13

一些額外的想法

解決方案1
1 已采納 2021-11-20 14:15:13