简体   繁体   English

pandas - 根据列值复制每行'n'次

[英]pandas - Copy each row 'n' times depending on column value

I'd like to copy or duplicate the rows of a DataFrame based on the value of a column, in this case orig_qty . 我想根据列的值复制或复制DataFrame的行,在本例中为orig_qty So if I have a DataFrame and using pandas==0.24.2 : 所以,如果我有一个DataFrame并使用pandas==0.24.2

import pandas as pd

d = {'a': ['2019-04-08', 4, 115.00], 'b': ['2019-04-09', 2, 103.00]}

df = pd.DataFrame.from_dict(
        d, 
        orient='index', 
        columns=['date', 'orig_qty', 'price']
    )

Input 输入

>>> print(df)
         date  orig_qty   price
a  2019-04-08         4   115.0
b  2019-04-09         2   103.0

So in the example above the row with orig_qty=4 should be duplicated 4 times and the row with orig_qty=2 should be duplicated 2 times. 因此,在上面的示例中, orig_qty=4的行应重复4次, orig_qty=2的行应重复2次。 After this transformation I'd like a DataFrame that looks like: 在转换之后,我想要一个看起来像这样的DataFrame:

Desired Output 期望的输出

>>> print(new_df)
         date  orig_qty  price  fifo_qty
1  2019-04-08         4  115.0         1
2  2019-04-08         4  115.0         1
3  2019-04-08         4  115.0         1
4  2019-04-08         4  115.0         1
5  2019-04-09         2  103.0         1
6  2019-04-09         2  103.0         1

Note I do not really care about the index after the transformation. 注意转换后我并不关心索引。 I can elaborate more on the use case for this, but essentially I'm doing some FIFO accounting where important changes can occur between values of orig_qty . 我可以详细说明这个用例,但实际上我正在做一些FIFO会计,其中orig_qty值之间可能会发生重要的变化。

Use Index.repeat , DataFrame.loc , DataFrame.assign and DataFrame.reset_index 使用Index.repeatDataFrame.locDataFrame.assignDataFrame.reset_index

 new_df = df.loc[df.index.repeat(df['orig_qty'])].assign(fifo_qty=1).reset_index(drop=True)

[output] [输出]

         date  orig_qty  price  fifo_qty
0  2019-04-08         4  115.0         1
1  2019-04-08         4  115.0         1
2  2019-04-08         4  115.0         1
3  2019-04-08         4  115.0         1
4  2019-04-09         2  103.0         1
5  2019-04-09         2  103.0         1

使用np.repeat

new_df = pd.DataFrame({col: np.repeat(df[col], df.orig_qty) for col in df.columns})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM