pandas，根據其他兩列的值創建一個新的唯一標識符列

Question

我有一個類似於下面的數據框。 我想根據其他列創建一個新的唯一標識符列。 我的新列應該是區號、商店號、字符串零（“0”）和增量計數值的串聯。

employee    date    district    store
0   1234    2021-12-1   336         450
1   1234    2021-12-1   336         450
2   1234    2021-12-2   336         450
3   5678    2021-12-1   336         650
4   5678    2021-12-2   336         650
5   5678    2021-12-3   336         650

PS：如果員工在同一天有多行，那么 shiftID 值應該是相同的，如果不是，那么 shiftID 值應該增加 1。如果商店發生變化，那么這個增量值應該從零開始。

我希望輸出像下面的數據框

employee    date    district    store   shiftID
0   1234    2021-12-1   336      450    33645000
1   1234    2021-12-1   336      450    33645000
2   1234    2021-12-2   336      450    33645001
3   5678    2021-12-1   336      650    33665000
4   5678    2021-12-2   336      650    33665001
5   5678    2021-12-3   336      650    33665002

我嘗試使用以下代碼，

df['shiftid'] = df['district']+df['store']+'0'+ df.groupby(['employee','date']).cumcount().astype(str)

這不是我想要的輸出

employee    date    district    store   shiftid
0   1234    2021-12-1   336      450    33645000
1   1234    2021-12-1   336      450    33645001
2   1234    2021-12-2   336      450    33645000
3   5678    2021-12-1   336      650    33665000
4   5678    2021-12-2   336      650    33665000
5   5678    2021-12-3   336      650    33665000

任何幫助將不勝感激。 提前致謝！

Answer 1

這是使用rank()的一種方法：

df['shiftID'] = df['district'].map(str) + df['store'].map(str) \
          + df.groupby(['employee'])['date'].rank(method="dense").sub(1).map(int).map(str).str.zfill(2)

輸出：

>>>
   employee       date  district  store   shiftID
0      1234  2021-12-1       336    450  33645000
1      1234  2021-12-1       336    450  33645000
2      1234  2021-12-2       336    450  33645001
3      5678  2021-12-1       336    650  33665000
4      5678  2021-12-2       336    650  33665001
5      5678  2021-12-3       336    650  33665002

Answer 2

你可以這樣做：

df['day_id'] = df.groupby(['employee', 'date']).ngroup()
df['day_id'] -= df.groupby('employee')['day_id'].transform('min')

df['shiftid'] = df['district'] + df['store'] + '0' + df['day_id'].astype(str)

print(df.drop(columns=['day_id']))

  employee       date district store   shiftid
0     1234  2021-12-1      336   450  33645000
1     1234  2021-12-1      336   450  33645000
2     1234  2021-12-2      336   450  33645001
3     5678  2021-12-1      336   650  33665000
4     5678  2021-12-2      336   650  33665001
5     5678  2021-12-2      336   650  33665001

注意：“預期輸出”中的最后一個日期與輸入不同，這就是最后一個 shiftid 不同的原因。 如果輸入中有 2021-12-3，結果如下：

  employee       date district store   shiftid
0     1234  2021-12-1      336   450  33645000
1     1234  2021-12-1      336   450  33645000
2     1234  2021-12-2      336   450  33645001
3     5678  2021-12-1      336   650  33665000
4     5678  2021-12-2      336   650  33665001
5     5678  2021-12-3      336   650  33665002

與您想要的輸出相匹配。

pandas，根據其他兩列的值創建一個新的唯一標識符列

問題描述

2 個解決方案

解決方案1
1 2022-05-12 13:52:03

解決方案2
1 2022-05-12 13:53:37

pandas，根據其他兩列的值創建一個新的唯一標識符列

問題描述

2 個解決方案

解決方案1 1 2022-05-12 13:52:03

解決方案2 1 2022-05-12 13:53:37

解決方案1
1 2022-05-12 13:52:03

解決方案2
1 2022-05-12 13:53:37