简体   繁体   English

根据列值复制 DataFrame 中的行

[英]Duplicating rows in a DataFrame based on column value

Below is a set of sample data I am working with:以下是我正在使用的一组示例数据:

sample_dat = pd.DataFrame(
    np.array([[1,0,1,1,1,5],
              [0,0,0,0,1,3],
              [1,0,0,0,1,1],
              [1,0,0,1,1,1],
              [1,0,0,0,1,1],
              [1,1,0,0,1,1]]),
    columns=['var1','var2','var3','var4','var5','cnt']
)

I need to change the data so the rows are duplicated according to the value in the last column.我需要更改数据,以便根据最后一列中的值复制行。 Specifically I wish for it to do be duplicated based on the value in the cnt column.具体来说,我希望它根据cnt列中的值进行复制。

My search yielded lots of stuff about melts, splits, and other stuff.我的搜索产生了很多关于融化、分裂和其他东西的东西。 I think what I am looking for is very basic, hopefully.我认为我正在寻找的是非常基本的,希望如此。 Please also note that I will likely have some kind of an id in the first column that will be either an integer or string.另请注意,我可能会在第一列中使用某种类型的 id,它可以是整数或字符串。

For example, the first record will be duplicated 4 more times.例如,第一条记录将再重复 4 次。 The second record will be duplicated twice more.第二个记录将再复制两次。

An example of what the DataFrame would look like if I were manually doing it with syntax is below:如果我使用语法手动执行DataFrame外观示例如下:

sample_dat2 = pd.DataFrame(
    np.array([[1,0,1,1,1,5],
              [1,0,1,1,1,5],
              [1,0,1,1,1,5],
              [1,0,1,1,1,5],
              [1,0,1,1,1,5],
              [0,0,0,0,1,3],
              [0,0,0,0,1,3],
              [0,0,0,0,1,3],
              [1,0,0,0,1,1],
              [1,0,0,1,1,1],
              [1,0,0,0,1,1],
              [1,1,0,0,1,1]]),
    columns=['var1','var2','var3','var4','var5','cnt']
)

Create an empty dataframe then iterate over your data, appending each row to the new dataframe x amount of times where x is the number in the 'cnt' column.创建一个空数据框,然后遍历您的数据,将每一行附加到新数据框 x 次,其中 x 是“cnt”列中的数字。

df =pd.DataFrame()

for index, row in sample_dat.iterrows():
    for x in range(row['cnt']):
        df = df.append(row, ignore_index=True)

Output输出

>>> df
   cnt  var1  var2  var3  var4  var5
0  5.0   1.0   0.0   1.0   1.0   1.0
0  5.0   1.0   0.0   1.0   1.0   1.0
0  5.0   1.0   0.0   1.0   1.0   1.0
0  5.0   1.0   0.0   1.0   1.0   1.0
0  5.0   1.0   0.0   1.0   1.0   1.0
1  3.0   0.0   0.0   0.0   0.0   1.0
1  3.0   0.0   0.0   0.0   0.0   1.0
1  3.0   0.0   0.0   0.0   0.0   1.0
2  1.0   1.0   0.0   0.0   0.0   1.0
3  1.0   1.0   0.0   0.0   1.0   1.0
4  1.0   1.0   0.0   0.0   0.0   1.0
5  1.0   1.0   1.0   0.0   0.0   1.0

I will use numpy repeat based on the dataframe index location.我将根据数据帧索引位置使用 numpy repeat。 Then reset the index.然后重置索引。

sample_dat.loc[numpy.repeat(sample_dat.index.values, sample_dat.cnt)].reset_index(drop=True)

Result:结果:

   var1 var2 var3 var4 var5 cnt
0      1    0   1   1   1   5
1      1    0   1   1   1   5
2      1    0   1   1   1   5
3      1    0   1   1   1   5
4      1    0   1   1   1   5
5      0    0   0   0   1   3
6      0    0   0   0   1   3
7      0    0   0   0   1   3
8      1    0   0   0   1   1
9      1    0   0   1   1   1
10     1    0   0   0   1   1
11     1    1   0   0   1   1

You can use numpy.repeat along with indexing to return an array of values from the column that determines the number of repetitions.您可以将numpy.repeat与索引一起使用,以从确定重复次数的列中返回一组值。

import numpy as np
import pandas as pd

arr = np.array(
    [[1,0,1,1,1,5],
     [0,0,0,0,1,3],
     [1,0,0,0,1,1],
     [1,0,0,1,1,1],
     [1,0,0,0,1,1],
     [1,1,0,0,1,1]]
    )

df = pd.DataFrame(
    np.repeat(arr, arr[:,5], axis=0),
    columns=['var1','var2','var3','var4','var5','cnt']
    )

print(df)
#     var1  var2  var3  var4  var5  cnt
# 0      1     0     1     1     1    5
# 1      1     0     1     1     1    5
# 2      1     0     1     1     1    5
# 3      1     0     1     1     1    5
# 4      1     0     1     1     1    5
# 5      0     0     0     0     1    3
# 6      0     0     0     0     1    3
# 7      0     0     0     0     1    3
# 8      1     0     0     0     1    1
# 9      1     0     0     1     1    1
# 10     1     0     0     0     1    1
# 11     1     1     0     0     1    1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM