[英]How can I introduce a new column which does not count in every line?
I have following dataframe about auctions:我有以下 dataframe 关于拍卖:
id.product_permutation ![]() |
id.iteration![]() |
property.product![]() |
property.price![]() |
---|---|---|---|
1 ![]() |
1 ![]() |
1 ![]() |
200 ![]() |
1 ![]() |
2 ![]() |
1 ![]() |
300 ![]() |
1 ![]() |
3 ![]() |
1 ![]() |
400 ![]() |
1 ![]() |
4 ![]() |
3 ![]() |
100 ![]() |
1 ![]() |
5 ![]() |
3 ![]() |
200 ![]() |
1 ![]() |
6 ![]() |
3 ![]() |
300 ![]() |
1 ![]() |
7 ![]() |
2 ![]() |
500 ![]() |
1 ![]() |
8 ![]() |
2 ![]() |
600 ![]() |
2 ![]() |
1 ![]() |
3 ![]() |
300 ![]() |
2 ![]() |
2 ![]() |
3 ![]() |
400 ![]() |
2 ![]() |
3 ![]() |
1 ![]() |
200 ![]() |
2 ![]() |
4 ![]() |
1 ![]() |
300 ![]() |
2 ![]() |
5 ![]() |
2 ![]() |
700 ![]() |
2 ![]() |
6 ![]() |
2 ![]() |
800 ![]() |
2 ![]() |
7 ![]() |
2 ![]() |
900 ![]() |
2 ![]() |
8 ![]() |
2 ![]() |
700 ![]() |
3 ![]() |
1 ![]() |
1 ![]() |
200 ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
There are 3 different products in the auction and the column property.product tells which product is for sale at the moment.拍卖中有 3 种不同的产品, property.product列说明了目前正在出售的产品。 If the product number in property.product changes then the product is sold.
如果 property.product 中的产品编号发生变化,则该产品已售出。 property.price tells what the price is at the moment.
property.price告诉当前价格是多少。 If the number in id.product_permutation changes then the whole auction is over, all 3 items are sold and a new auction starts (with the same 3 items).
如果id.product_permutation中的数字发生变化,则整个拍卖结束,所有 3 件物品都售出,新的拍卖开始(使用相同的 3 件物品)。
Now I would like to introduce a new column amount_of_sold_items which counts how many products were already sold (like in the following).现在我想介绍一个新列amount_of_sold_items ,它计算已经售出的产品数量(如下所示)。 I tried a lot, but unfortunately I do not get the desired result.
我尝试了很多,但不幸的是我没有得到想要的结果。 Can anyone help me please to solve this issue?
谁能帮我解决这个问题?
id.product_permutation ![]() |
id.iteration![]() |
property.product![]() |
property.price![]() |
amount_of_sold_items![]() |
---|---|---|---|---|
1 ![]() |
1 ![]() |
1 ![]() |
200 ![]() |
0 ![]() |
1 ![]() |
2 ![]() |
1 ![]() |
300 ![]() |
0 ![]() |
1 ![]() |
3 ![]() |
1 ![]() |
400 ![]() |
0 ![]() |
1 ![]() |
4 ![]() |
3 ![]() |
100 ![]() |
1 ![]() |
1 ![]() |
5 ![]() |
3 ![]() |
200 ![]() |
1 ![]() |
1 ![]() |
6 ![]() |
3 ![]() |
300 ![]() |
1 ![]() |
1 ![]() |
7 ![]() |
2 ![]() |
500 ![]() |
2 ![]() |
1 ![]() |
8 ![]() |
2 ![]() |
600 ![]() |
2 ![]() |
1 ![]() |
NaN![]() |
NaN![]() |
NaN![]() |
3 ![]() |
2 ![]() |
1 ![]() |
3 ![]() |
300 ![]() |
0 ![]() |
2 ![]() |
2 ![]() |
3 ![]() |
400 ![]() |
0 ![]() |
2 ![]() |
3 ![]() |
1 ![]() |
200 ![]() |
1 ![]() |
2 ![]() |
4 ![]() |
1 ![]() |
300 ![]() |
1 ![]() |
2 ![]() |
5 ![]() |
2 ![]() |
700 ![]() |
2 ![]() |
2 ![]() |
6 ![]() |
2 ![]() |
800 ![]() |
2 ![]() |
2 ![]() |
7 ![]() |
2 ![]() |
900 ![]() |
2 ![]() |
2 ![]() |
8 ![]() |
2 ![]() |
700 ![]() |
2 ![]() |
2 ![]() |
NaN![]() |
NaN![]() |
NaN![]() |
3 ![]() |
3 ![]() |
1 ![]() |
1 ![]() |
200 ![]() |
0 ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
df["n_items_sold"] = (df.groupby("id.product_permutation")["property.product"]
.transform(lambda x: x.diff().ne(0, fill_value=0).cumsum()))
For each id.product_permutation
group, we assign a new series that looks at the turning points via difference not being equal to 0 ( fill_value=0
is there to prevent counting the very first one as a turning point).对于每个
id.product_permutation
组,我们分配一个新系列,通过差异不等于 0 来查看转折点( fill_value=0
是为了防止将第一个作为转折点计数)。 Cumulative sum of these turning points keeps track of the items sold thus far.这些转折点的累积总和跟踪到目前为止售出的物品。
This gives:这给出:
id.product_permutation id.iteration property.product property.price n_items_sold
0 1 1 1 200 0
1 1 2 1 300 0
2 1 3 1 400 0
3 1 4 3 100 1
4 1 5 3 200 1
5 1 6 3 300 1
6 1 7 2 500 2
7 1 8 2 600 2
8 2 1 3 300 0
9 2 2 3 400 0
10 2 3 1 200 1
11 2 4 1 300 1
12 2 5 2 700 2
13 2 6 2 800 2
14 2 7 2 900 2
15 2 8 2 700 2
16 3 1 1 200 0
To put [id_prod_perm, NaN, NaN, NaN, 3]
rows at the end of each id.product_permuation
, we can detect the changing points of id.product_permuation
and insert columns to the transposed frame which, in effect, inserts rows to the original one when transposed:要将
[id_prod_perm, NaN, NaN, NaN, 3]
行放在每个id.product_permuation
的末尾,我们可以检测id.product_permuation
的变化点并将列插入转置框架,实际上,将行插入原始框架一个当转置:
# following is [8, 16] for the above example
changing_points = np.where(df["id.product_permutation"]
.diff().ne(0, fill_value=0))[0].tolist()
# insert to transpose and then come back
df = df.T
offset = 0 # helper for insertion location
for j, point in enumerate(changing_points, start=1):
# to the given point, insert a column with the same name
df.insert(loc=point+offset, column=point, value=[j, *[np.nan]*3, 3],
allow_duplicates=True)
# since an insertion enlarges the frame, old changing points
# need to increase, this is handled by the `offset`
offset += 1
# go back to original form, and also reset the index to 0..N-1
df = df.T.reset_index(drop=True)
to get要得到
>>> df
id.product_permutation id.iteration property.product property.price n_items_sold
0 1.0 1.0 1.0 200.0 0.0
1 1.0 2.0 1.0 300.0 0.0
2 1.0 3.0 1.0 400.0 0.0
3 1.0 4.0 3.0 100.0 1.0
4 1.0 5.0 3.0 200.0 1.0
5 1.0 6.0 3.0 300.0 1.0
6 1.0 7.0 2.0 500.0 2.0
7 1.0 8.0 2.0 600.0 2.0
8 1.0 NaN NaN NaN 3.0
9 2.0 1.0 3.0 300.0 0.0
10 2.0 2.0 3.0 400.0 0.0
11 2.0 3.0 1.0 200.0 1.0
12 2.0 4.0 1.0 300.0 1.0
13 2.0 5.0 2.0 700.0 2.0
14 2.0 6.0 2.0 800.0 2.0
15 2.0 7.0 2.0 900.0 2.0
16 2.0 8.0 2.0 700.0 2.0
17 2.0 NaN NaN NaN 3.0
18 3.0 1.0 1.0 200.0 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.