简体   繁体   English

我怎样才能引入一个不算在每一行中的新列?

[英]How can I introduce a new column which does not count in every line?

I have following dataframe about auctions:我有以下 dataframe 关于拍卖:

id.product_permutation id.product_permutation id.iteration识别.迭代 property.product财产.产品 property.price物业.价格
1 1个 1 1个 1 1个 200 200
1 1个 2 2个 1 1个 300 300
1 1个 3 3个 1 1个 400 400
1 1个 4 4个 3 3个 100 100
1 1个 5 5个 3 3个 200 200
1 1个 6 6个 3 3个 300 300
1 1个 7 7 2 2个 500 500
1 1个 8 8个 2 2个 600 600
2 2个 1 1个 3 3个 300 300
2 2个 2 2个 3 3个 400 400
2 2个 3 3个 1 1个 200 200
2 2个 4 4个 1 1个 300 300
2 2个 5 5个 2 2个 700 700
2 2个 6 6个 2 2个 800 800
2 2个 7 7 2 2个 900 900
2 2个 8 8个 2 2个 700 700
3 3个 1 1个 1 1个 200 200
... ... ... ... ... ... ... ...

There are 3 different products in the auction and the column property.product tells which product is for sale at the moment.拍卖中有 3 种不同的产品, property.product列说明了目前正在出售的产品。 If the product number in property.product changes then the product is sold.如果 property.product 中的产品编号发生变化,则该产品已售出。 property.price tells what the price is at the moment. property.price告诉当前价格是多少。 If the number in id.product_permutation changes then the whole auction is over, all 3 items are sold and a new auction starts (with the same 3 items).如果id.product_permutation中的数字发生变化,则整个拍卖结束,所有 3 件物品都售出,新的拍卖开始(使用相同的 3 件物品)。

Now I would like to introduce a new column amount_of_sold_items which counts how many products were already sold (like in the following).现在我想介绍一个新列amount_of_sold_items ,它计算已经售出的产品数量(如下所示)。 I tried a lot, but unfortunately I do not get the desired result.我尝试了很多,但不幸的是我没有得到想要的结果。 Can anyone help me please to solve this issue?谁能帮我解决这个问题?

id.product_permutation id.product_permutation id.iteration识别.迭代 property.product财产.产品 property.price物业.价格 amount_of_sold_items数量_of_sold_items
1 1个 1 1个 1 1个 200 200 0 0
1 1个 2 2个 1 1个 300 300 0 0
1 1个 3 3个 1 1个 400 400 0 0
1 1个 4 4个 3 3个 100 100 1 1个
1 1个 5 5个 3 3个 200 200 1 1个
1 1个 6 6个 3 3个 300 300 1 1个
1 1个 7 7 2 2个 500 500 2 2个
1 1个 8 8个 2 2个 600 600 2 2个
1 1个 NaN钠盐 NaN钠盐 NaN钠盐 3 3个
2 2个 1 1个 3 3个 300 300 0 0
2 2个 2 2个 3 3个 400 400 0 0
2 2个 3 3个 1 1个 200 200 1 1个
2 2个 4 4个 1 1个 300 300 1 1个
2 2个 5 5个 2 2个 700 700 2 2个
2 2个 6 6个 2 2个 800 800 2 2个
2 2个 7 7 2 2个 900 900 2 2个
2 2个 8 8个 2 2个 700 700 2 2个
2 2个 NaN钠盐 NaN钠盐 NaN钠盐 3 3个
3 3个 1 1个 1 1个 200 200 0 0
... ... ... ... ... ... ... ... ... ...
df["n_items_sold"] = (df.groupby("id.product_permutation")["property.product"]
                        .transform(lambda x: x.diff().ne(0, fill_value=0).cumsum()))

For each id.product_permutation group, we assign a new series that looks at the turning points via difference not being equal to 0 ( fill_value=0 is there to prevent counting the very first one as a turning point).对于每个id.product_permutation组,我们分配一个新系列,通过差异不等于 0 来查看转折点( fill_value=0是为了防止将第一个作为转折点计数)。 Cumulative sum of these turning points keeps track of the items sold thus far.这些转折点的累积总和跟踪到目前为止售出的物品。

This gives:这给出:

    id.product_permutation  id.iteration  property.product  property.price  n_items_sold
0                        1             1                 1             200             0
1                        1             2                 1             300             0
2                        1             3                 1             400             0
3                        1             4                 3             100             1
4                        1             5                 3             200             1
5                        1             6                 3             300             1
6                        1             7                 2             500             2
7                        1             8                 2             600             2
8                        2             1                 3             300             0
9                        2             2                 3             400             0
10                       2             3                 1             200             1
11                       2             4                 1             300             1
12                       2             5                 2             700             2
13                       2             6                 2             800             2
14                       2             7                 2             900             2
15                       2             8                 2             700             2
16                       3             1                 1             200             0

To put [id_prod_perm, NaN, NaN, NaN, 3] rows at the end of each id.product_permuation , we can detect the changing points of id.product_permuation and insert columns to the transposed frame which, in effect, inserts rows to the original one when transposed:要将[id_prod_perm, NaN, NaN, NaN, 3]行放在每个id.product_permuation的末尾,我们可以检测id.product_permuation的变化点并将列插入转置框架,实际上,将行插入原始框架一个当转置:

# following is [8, 16] for the above example
changing_points = np.where(df["id.product_permutation"]
                             .diff().ne(0, fill_value=0))[0].tolist()

# insert to transpose and then come back
df = df.T
offset = 0  # helper for insertion location
for j, point in enumerate(changing_points, start=1):
    # to the given point, insert a column with the same name
    df.insert(loc=point+offset, column=point, value=[j, *[np.nan]*3, 3],
              allow_duplicates=True)

    # since an insertion enlarges the frame, old changing points
    # need to increase, this is handled by the `offset`
    offset += 1

# go back to original form, and also reset the index to 0..N-1
df = df.T.reset_index(drop=True)

to get要得到

>>> df

    id.product_permutation  id.iteration  property.product  property.price  n_items_sold
0                      1.0           1.0               1.0           200.0           0.0
1                      1.0           2.0               1.0           300.0           0.0
2                      1.0           3.0               1.0           400.0           0.0
3                      1.0           4.0               3.0           100.0           1.0
4                      1.0           5.0               3.0           200.0           1.0
5                      1.0           6.0               3.0           300.0           1.0
6                      1.0           7.0               2.0           500.0           2.0
7                      1.0           8.0               2.0           600.0           2.0
8                      1.0           NaN               NaN             NaN           3.0
9                      2.0           1.0               3.0           300.0           0.0
10                     2.0           2.0               3.0           400.0           0.0
11                     2.0           3.0               1.0           200.0           1.0
12                     2.0           4.0               1.0           300.0           1.0
13                     2.0           5.0               2.0           700.0           2.0
14                     2.0           6.0               2.0           800.0           2.0
15                     2.0           7.0               2.0           900.0           2.0
16                     2.0           8.0               2.0           700.0           2.0
17                     2.0           NaN               NaN             NaN           3.0
18                     3.0           1.0               1.0           200.0           0.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何每次都在新行上打印 - How can I print on a new line every single time 如何为我的代码的每一行添加一个新数字到我的列表中? - How can I append a new number to my list for every new line for my code? 如何通过从其他列中减去第一列来创建新的 dataframe? - How can I create a new dataframe by subtracting the first column from every other column? Swifter 库如何为对象引入新属性? - How Does the Swifter Library Introduce a New Attribute to Objects? 如何计算每列中有多少行的值等于 3 并将它们存储在新列中 - How can I count how many rows have a value equal to 3 in each column and store them in a new column Pandas dataframe,如何按多列分组并为特定列应用总和并添加新的计数列? - Pandas dataframe, how can I group by multiple columns and apply sum for specific column and add new count column? 如何使我的循环在每个输出处开始新的一行? - How can I make my loop start a new line every output? 如何在没有 for 循环的情况下对每一列应用 function 从现有的创建一个新的 pandas DataFrame? - How can I create a new pandas DataFrame out of an existing one applying a function to every column without a for loop? 有没有一种方法可以计算每天特定次数出现在特定列中的次数? - Is there a way in which I can count how many times per day a specific word is present in specific column? 如何在pySpark数据框中添加一个新列,该列包含计数大于0的列值? - How to add a new column to pySpark dataframe which contains count its column values which are greater to 0?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM