Pandas 數據框過濾特定列值后的 n 個后續行

Question

我有一個像這樣的熊貓數據框：

       Sentence #           Word     Tag
0     Sentence: 1           This       O
1             NaN             is       O
2             NaN              a       x
3             NaN           test       O
4     Sentence: 2           This       O
5             NaN             is       x
6             NaN        another       x
7             NaN           test       O
...

我想按句子分組，例如返回：

[['This is a test'], ['This is another test'], ...]

並為每個句子獲取所有標記為“x”的單詞的列表，例如：

[['a'], ['is', 'another'], ...]

我一直在嘗試使用 group() 尋找方法來做到這一點，但沒有成功。 解決它的最佳方法是什么？ 謝謝

Answer 1

解決了！ 我用了：

df = df.fillna(method='ffill')

向前傳播非空值。 然后我計算了句子：

df1 = df.groupby('Sentence #')['Word'].apply(lambda x: ' '.join(x))

要獲取標記詞：

df2 = df.groupby('Sentence #', as_index=False).apply(lambda g: g[g['Tag'] == 'x'])
df2 = df2.groupby('Sentence #')['Word'].apply(lambda x: ','.join(x))

不確定這是否是最有效的解決方案。

Pandas 數據框過濾特定列值后的 n 個后續行

問題描述

1 個解決方案

解決方案1
0 2020-02-23 23:38:39

Pandas 數據框過濾特定列值后的 n 個后續行

問題描述

1 個解決方案

解決方案1 0 2020-02-23 23:38:39

解決方案1
0 2020-02-23 23:38:39