熊貓根據列值移動

Question

我有一個包含 5 個不同列的熊貓數據框：

Product_ID, Start_Date, End_Date, Turnover, cumcount

由於 Product_ID 不是唯一的，並且可以有多個 cumcount 跟蹤事件的發生； 所以它是從0-5。 該表根據 Product_ID 和開始日期排序。

由於同一 Product_ID 的 Start_Date 可以與另一個重疊，因此我只想包括第一個之外的事件。

代碼片段如下：

df= df.sort_values(by=[ "Product_ID", "Start_Date"])


check1 = df["Product_ID"] == df["Product_ID"].shift(1)

conditions = [check1 & ( df["End_Date"].shift(df["cumcount"]) > df["Start_Date"]),
check1 & ( df["End_Date"].shift(df["cumcount"]) < df["Start_Date"]),
~check1 ]

choices = [0, 1, 1]

df["result"] = np.select(conditions, choices)

這個想法是它向后移動盡可能多的行，以檢查它們是否在第一行中。

當我執行這個時，我得到一個值錯誤：

ValueError：Series 的真值不明確。 使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

有沒有人知道我如何使這項工作（沒有硬編碼 1/2）？

編輯：數據樣本

{'Product_ID': {0: 'CJ48HL',
  1: 'CL23P3',
  2: 'CL5WKS',
  3: 'DA0AAM',
  4: 'DA0AAM'},
 'Start_Date': {0: Timestamp('2022-02-11 00:00:00'),
  1: Timestamp('2022-11-11 00:00:00'),
  2: Timestamp('2022-10-24 00:00:00'),
  3: Timestamp('2022-04-01 00:00:00'),
  4: Timestamp('2022-04-06 00:00:00')},
 'Turnover': {0: 1143845.0,
  1: 512476.0,
  2: 178382.0,
  3: 2104083.0,
  4: 1300434.0},
 'count': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1},
 'End_Date': {0: Timestamp('2022-02-25 00:00:00'),
  1: Timestamp('2022-11-25 00:00:00'),
  2: Timestamp('2022-11-07 00:00:00'),
  3: Timestamp('2022-04-15 00:00:00'),
  4: Timestamp('2022-04-20 00:00:00')}}

Edit2：所需的輸出

   Product_ID  Start_Date Turnover    count  End_Date   result
0     CJ48HL   2022-02-11  1143845.0    0   2022-02-25    1
1     CL23P3   2022-02-11   512476.0    0   2022-11-07    1
2     CL5WKS   2022-10-24   178382.0    0   2022-11-07    1
3     DA0AAM   2022-04-01  2104083.0    0   2022-04-15    1
4     DA0AAM   2022-04-06  1300434.0    1   2022-04-20    0
5     DA0AAM   2022-04-10  1451521.0    2   2022-04-24    0
6     DA0AAM   2022-04-20  2501520.0    3   2022-05-04    1

Answer 1

如果我正確理解你想要什么，下面的代碼應該可以解決你的問題

# sort by Product_ID and Start_Date
df.sort_values(by=['Product_ID', 'Start_Date'], ignore_index=True, inplace=True)

# simply create another column that take 1 row ahead of it then compare. If the row ahead is the same then value is 0.0 otherwise take 1.0
df['Product_lead1'] = df['Product_ID'].shift(-1)
df['result'] = np.where(df['Product_ID'] != df['Product_lead1'], 1.0, 0.0)

熊貓根據列值移動

問題描述

1 個解決方案

解決方案1
0 2022-12-22 18:52:41

熊貓根據列值移動

問題描述

1 個解決方案

解決方案1 0 2022-12-22 18:52:41

解決方案1
0 2022-12-22 18:52:41