如何循環pandas數據幀並在條件下修改值？

Question

我有這個pandas數據幀：

df = pd.DataFrame(
    {
    "col1": [1,1,2,3,3,3,4,5,5,5,5]
    }
)
df

如果col1中的值不等於下一行中col1的值，我想添加另一個列“last”的列。 它應該是這樣的：

到目前為止，如果col1中的值不等於下一行中col1的值，我可以創建一個包含True的列; 否則：

df["last_row"] = df["col1"].shift(-1)
df['last'] = df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df

現在像

df["last_row"] = df["col1"].shift(-1)
df['last'] = "last" if df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df

會很好，但這顯然是錯誤的語法。 我該如何設法做到這一點？

最后，我還想添加數字，表示在此之前值出現的時間，而最后一個值始終標記為“last”。 它應該如下所示：

我不確定這是否是我開發中的另一個步驟，或者這是否需要一種新的方法。 我讀過如果我想在修改值時循環遍歷數組，我應該使用apply（）。 但是，我不知道如何在此包含條件。 你能幫助我嗎？

非常感謝！

Answer 1

這是一種方式。 您可以根據col1的下一個值是否與當前行的值相同，定義自定義DataFrameGroupBy.cumsum以及獲取DataFrameGroupBy.cumsum來獲取累積計數。 然后使用df.shift使用類似的標准添加last ：

g = df.col1.ne(df.col1.shift(1)).cumsum()
df['update'] = df.groupby(g).cumcount()
ix = df[df.col1.ne(df.col1.shift(-1))].index
# Int64Index([1, 2, 5, 6, 10], dtype='int64')
df.loc[ix,'update'] = 'last'

 col1 update
0      1      0
1      1   last
2      2   last
3      3      0
4      3      1
5      3   last
6      4   last
7      5      0
8      5      1
9      5      2
10     5   last

Answer 2

考慮到索引是增量的，（1） cuncount每個組進行cuncount ，然后在每個組中取（2） max index並設置字符串

group = df.groupby('col1')

df['last'] = group.cumcount()
df.loc[group['last'].idxmax(), 'last'] = 'last'
#or df.loc[group.apply(lambda x: x.index.max()), 'last'] = 'last'


    col1    last
0   1   0
1   1   last
2   2   last
3   3   0
4   3   1
5   3   last
6   4   last
7   5   0
8   5   1
9   5   2
10  5   last

Answer 3

使用.shift找到變化的地方。 然后你可以使用.where適當地掩蓋然后.fillna

s = df.col1 != df.col1.shift(-1)
df['Update'] = df.groupby(s.cumsum().where(~s)).cumcount().where(~s).fillna('last')

輸出：

    col1 Update
0      1      0
1      1   last
2      2   last
3      3      0
4      3      1
5      3   last
6      4   last
7      5      0
8      5      1
9      5      2
10     5   last

另外， update是DataFrames的一種方法，因此您應該避免命名列'update'

Answer 4

另一種可能的解決

df['update'] = np.where(df['col1'].ne(df['col1'].shift(-1)), 'last', 0)

如何循環pandas數據幀並在條件下修改值？

問題描述

4 個解決方案

解決方案1
3 2019-04-26 15:49:27

解決方案2
2 已采納 2019-04-26 15:57:23

解決方案3
2 2019-04-26 16:06:28

輸出：

解決方案4
1 2019-04-26 15:57:57

如何循環pandas數據幀並在條件下修改值？

問題描述

4 個解決方案

解決方案1 3 2019-04-26 15:49:27

解決方案2 2 已采納 2019-04-26 15:57:23

解決方案3 2 2019-04-26 16:06:28

輸出：

解決方案4 1 2019-04-26 15:57:57

解決方案1
3 2019-04-26 15:49:27

解決方案2
2 已采納 2019-04-26 15:57:23

解決方案3
2 2019-04-26 16:06:28

解決方案4
1 2019-04-26 15:57:57