將 pandas 中的句子（字符串）拆分為帶有句子編號的單獨單詞行

Question

我有一個 pandas dataframe 像這樣：

sn  sentence                    entity
1.  an apple is an example of?  an apple is example of fruit
2.  a potato is an example of?  a potato is example of vegetable

我想創建另一個 pandas dataframe 如下所示：其中句子和實體的長度與以下相同

Sentence#   Word    Entity
  1         An      an 
  1         apple   apple
  1         is      is
  1         an      example
  1         example of 
  1         of?     fruit
  2         A       a 
  2         potato  potato
  2         is      is
  2         an      example
  2         example of
  2         of?     vegetable

到目前為止我嘗試過的

df = data.sentence.str.split(expand=True).stack()

pd.DataFrame({
    'Sentence': df.index.get_level_values(0) + 1, 
    'Word': df.values, 
    'Entity': 
})

“實體”的最后一點是我似乎無法做到的

我也嘗試拆分和堆疊實體列，像這樣？

df2 = data.sentence.str.split(expand=True).stack() 

and then attempt to put all back together

pd.DataFrame({
    'Sentence': df.index.get_level_values(0) + 1, 
    'Word': df.values, 
    'Entity': df2.values
})

但后來我得到ValueError: arrays are must all be of the same length

len(df) = 536810, len(df2) = 536802

我是 python 的新手。 任何幫助或指針表示贊賞。

Answer 1

讓我們嘗試str.split然后做explode並concat回來

s=df.set_index('sn')
s=pd.concat([s[x].str.split(' ').explode() for x in s.columns],axis=1).reset_index()
s
Out[79]: 
    sn sentence     entity
0    1       an         an
1    1    apple      apple
2    1       is         is
3    1       an    example
4    1  example         of
5    1      of?      fruit
6    2        a          a
7    2   potato     potato
8    2       is         is
9    2       an    example
10   2  example         of
11   2      of?  vegetable

Answer 2

這是一種無需顯式迭代的簡單方法-

將 sn 設置為索引
Applymap字符串拆分到dataframe中的每個單元格
在軸 0 上展開列表
重置索引

df.set_index('sn').\
applymap(str.split).\
apply(pd.Series.explode, axis=0).\
reset_index()

    sn sentence     entity
0    1       an         an
1    1    apple      apple
2    1       is         is
3    1       an    example
4    1  example         of
5    1      of?      fruit
6    2        a          a
7    2   potato     potato
8    2       is         is
9    2       an    example
10   2  example         of
11   2      of?  vegetable

Answer 3

一種沒有循環的方法

new_df = (df.set_index('sn')
            .stack()
            .str.split(expand=True)
            .stack()
            .unstack(level=1)
            .reset_index(level=0, drop=0)
                        )
print(new_df)

Output

    sn sentence     entity
0  1.0       an         an
1  1.0    apple      apple
2  1.0       is         is
3  1.0       an    example
4  1.0  example         of
5  1.0      of?      fruit
0  2.0        a          a
1  2.0   potato     potato
2  2.0       is         is
3  2.0       an    example
4  2.0  example         of
5  2.0      of?  vegetable

將 pandas 中的句子（字符串）拆分為帶有句子編號的單獨單詞行

問題描述

3 個解決方案

解決方案1
2 2020-08-07 23:56:04

解決方案2
0 已采納 2020-08-08 00:19:22

解決方案3
0 2020-08-08 01:21:38

將 pandas 中的句子（字符串）拆分為帶有句子編號的單獨單詞行

問題描述

3 個解決方案

解決方案1 2 2020-08-07 23:56:04

解決方案2 0 已采納 2020-08-08 00:19:22

解決方案3 0 2020-08-08 01:21:38

解決方案1
2 2020-08-07 23:56:04

解決方案2
0 已采納 2020-08-08 00:19:22

解決方案3
0 2020-08-08 01:21:38