我怎样才能引入一个不算在每一行中的新列？

Question

I have following dataframe about auctions:我有以下 dataframe 关于拍卖：

id.product_permutation id.product_permutation	id.iteration识别.迭代	property.product财产.产品	property.price物业.价格
1 1个	1 1个	1 1个	200 200
1 1个	2 2个	1 1个	300 300
1 1个	3 3个	1 1个	400 400
1 1个	4 4个	3 3个	100 100
1 1个	5 5个	3 3个	200 200
1 1个	6 6个	3 3个	300 300
1 1个	7 7	2 2个	500 500
1 1个	8 8个	2 2个	600 600
2 2个	1 1个	3 3个	300 300
2 2个	2 2个	3 3个	400 400
2 2个	3 3个	1 1个	200 200
2 2个	4 4个	1 1个	300 300
2 2个	5 5个	2 2个	700 700
2 2个	6 6个	2 2个	800 800
2 2个	7 7	2 2个	900 900
2 2个	8 8个	2 2个	700 700
3 3个	1 1个	1 1个	200 200
... ...	... ...	... ...	... ...

There are 3 different products in the auction and the column property.product tells which product is for sale at the moment.拍卖中有 3 种不同的产品， property.product列说明了目前正在出售的产品。 If the product number in property.product changes then the product is sold.如果 property.product 中的产品编号发生变化，则该产品已售出。 property.price tells what the price is at the moment. property.price告诉当前价格是多少。 If the number in id.product_permutation changes then the whole auction is over, all 3 items are sold and a new auction starts (with the same 3 items).如果id.product_permutation中的数字发生变化，则整个拍卖结束，所有 3 件物品都售出，新的拍卖开始（使用相同的 3 件物品）。

Now I would like to introduce a new column amount_of_sold_items which counts how many products were already sold (like in the following).现在我想介绍一个新列amount_of_sold_items ，它计算已经售出的产品数量（如下所示）。 I tried a lot, but unfortunately I do not get the desired result.我尝试了很多，但不幸的是我没有得到想要的结果。 Can anyone help me please to solve this issue?谁能帮我解决这个问题？

id.product_permutation id.product_permutation	id.iteration识别.迭代	property.product财产.产品	property.price物业.价格	amount_of_sold_items数量_of_sold_items
1 1个	1 1个	1 1个	200 200	0 0
1 1个	2 2个	1 1个	300 300	0 0
1 1个	3 3个	1 1个	400 400	0 0
1 1个	4 4个	3 3个	100 100	1 1个
1 1个	5 5个	3 3个	200 200	1 1个
1 1个	6 6个	3 3个	300 300	1 1个
1 1个	7 7	2 2个	500 500	2 2个
1 1个	8 8个	2 2个	600 600	2 2个
1 1个	NaN钠盐	NaN钠盐	NaN钠盐	3 3个
2 2个	1 1个	3 3个	300 300	0 0
2 2个	2 2个	3 3个	400 400	0 0
2 2个	3 3个	1 1个	200 200	1 1个
2 2个	4 4个	1 1个	300 300	1 1个
2 2个	5 5个	2 2个	700 700	2 2个
2 2个	6 6个	2 2个	800 800	2 2个
2 2个	7 7	2 2个	900 900	2 2个
2 2个	8 8个	2 2个	700 700	2 2个
2 2个	NaN钠盐	NaN钠盐	NaN钠盐	3 3个
3 3个	1 1个	1 1个	200 200	0 0
... ...	... ...	... ...	... ...	... ...

Answer 1

df["n_items_sold"] = (df.groupby("id.product_permutation")["property.product"]
                        .transform(lambda x: x.diff().ne(0, fill_value=0).cumsum()))

For each id.product_permutation group, we assign a new series that looks at the turning points via difference not being equal to 0 ( fill_value=0 is there to prevent counting the very first one as a turning point).对于每个id.product_permutation组，我们分配一个新系列，通过差异不等于 0 来查看转折点（ fill_value=0是为了防止将第一个作为转折点计数）。 Cumulative sum of these turning points keeps track of the items sold thus far.这些转折点的累积总和跟踪到目前为止售出的物品。

This gives:这给出：

    id.product_permutation  id.iteration  property.product  property.price  n_items_sold
0                        1             1                 1             200             0
1                        1             2                 1             300             0
2                        1             3                 1             400             0
3                        1             4                 3             100             1
4                        1             5                 3             200             1
5                        1             6                 3             300             1
6                        1             7                 2             500             2
7                        1             8                 2             600             2
8                        2             1                 3             300             0
9                        2             2                 3             400             0
10                       2             3                 1             200             1
11                       2             4                 1             300             1
12                       2             5                 2             700             2
13                       2             6                 2             800             2
14                       2             7                 2             900             2
15                       2             8                 2             700             2
16                       3             1                 1             200             0

To put [id_prod_perm, NaN, NaN, NaN, 3] rows at the end of each id.product_permuation , we can detect the changing points of id.product_permuation and insert columns to the transposed frame which, in effect, inserts rows to the original one when transposed:要将[id_prod_perm, NaN, NaN, NaN, 3]行放在每个id.product_permuation的末尾，我们可以检测id.product_permuation的变化点并将列插入转置框架，实际上，将行插入原始框架一个当转置：

# following is [8, 16] for the above example
changing_points = np.where(df["id.product_permutation"]
                             .diff().ne(0, fill_value=0))[0].tolist()

# insert to transpose and then come back
df = df.T
offset = 0  # helper for insertion location
for j, point in enumerate(changing_points, start=1):
    # to the given point, insert a column with the same name
    df.insert(loc=point+offset, column=point, value=[j, *[np.nan]*3, 3],
              allow_duplicates=True)

    # since an insertion enlarges the frame, old changing points
    # need to increase, this is handled by the `offset`
    offset += 1

# go back to original form, and also reset the index to 0..N-1
df = df.T.reset_index(drop=True)

to get要得到

>>> df

    id.product_permutation  id.iteration  property.product  property.price  n_items_sold
0                      1.0           1.0               1.0           200.0           0.0
1                      1.0           2.0               1.0           300.0           0.0
2                      1.0           3.0               1.0           400.0           0.0
3                      1.0           4.0               3.0           100.0           1.0
4                      1.0           5.0               3.0           200.0           1.0
5                      1.0           6.0               3.0           300.0           1.0
6                      1.0           7.0               2.0           500.0           2.0
7                      1.0           8.0               2.0           600.0           2.0
8                      1.0           NaN               NaN             NaN           3.0
9                      2.0           1.0               3.0           300.0           0.0
10                     2.0           2.0               3.0           400.0           0.0
11                     2.0           3.0               1.0           200.0           1.0
12                     2.0           4.0               1.0           300.0           1.0
13                     2.0           5.0               2.0           700.0           2.0
14                     2.0           6.0               2.0           800.0           2.0
15                     2.0           7.0               2.0           900.0           2.0
16                     2.0           8.0               2.0           700.0           2.0
17                     2.0           NaN               NaN             NaN           3.0
18                     3.0           1.0               1.0           200.0           0.0

我怎样才能引入一个不算在每一行中的新列？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-06-20 17:10:24

我怎样才能引入一个不算在每一行中的新列？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-06-20 17:10:24

解决方案1
1 已采纳 2021-06-20 17:10:24