Pandas 數據幀在列中每次出現值 (True) 時拆分或分組數據幀

Question

有一個這樣的 df：

df = pd.DataFrame({'words':['hi', 'this', 'is', 'a', 'sentence', 'this', 'is', 'another', 'sentence'], 'indicator':[1,0,0,0,0,1,0,0,0]})

這給了我：

    words  indicator
0        hi          1
1      this          0
2        is          0
3         a          0
4  sentence          0
5      this          1
6        is          0
7   another          0
8  sentence          0

現在我想合並“words”列的所有值，這些值跟在指示器中的“1”之后，直到下一個“1”出現。 這樣的事情將是理想的結果：

                      words  indicator  counter
0     hi this is a sentence          1        5
1  this is another sentence          1        4

解釋起來並不容易，這就是我依賴這個例子的原因。 我嘗試了 groupby 和 split，但無法找到解決方案。 最后一次嘗試是設置某種 df.iterrows()，但我現在想避免這種情況，因為實際的 df 非常大。

在此先感謝您的幫助！

Answer 1

您可以獲得指標的累積總和，然后將其分組以將所有單詞連接到一個空格上並計算每個句子中的單詞數。

df["indicator"] = df["indicator"].cumsum()
df = df.groupby(
    "indicator", as_index=False
).agg(
    words=("words", " ".join), 
    counter=("indicator", "size")
)
#    indicator                     words  counter
# 0          1     hi this is a sentence        5
# 1          2  this is another sentence        4

Pandas 數據幀在列中每次出現值 (True) 時拆分或分組數據幀

問題描述

1 個解決方案

解決方案1
2 已采納 2021-07-08 12:37:21

Pandas 數據幀在列中每次出現值 (True) 時拆分或分組數據幀

問題描述

1 個解決方案

解決方案1 2 已采納 2021-07-08 12:37:21

解決方案1
2 已采納 2021-07-08 12:37:21