简体   繁体   English

如何复制 dataframe1 中的行数以匹配 pandas 中 dataframe 2 中的 n 行

[英]How to replicate number of rows in dataframe1 to match n rows in dataframe 2 in pandas

I just started learning Python a few months ago and just started using StackOverflow as well, please bear with me:几个月前我刚开始学习 Python 并且刚刚开始使用 StackOverflow,请多多包涵:
We have the two data frames:我们有两个数据框:

df1: df1:

    0.1,0.2,0.3,0.4  
    1.0,2.0,3.0,4.0
    6.0,7.0,8.0,9.0 

df2: df2:

    Sequence, dataset_ID  
    1,1  
    2,4  
    10,5

I am using python iterrows function to transpose df1 into:我正在使用 python iterrows function 将 df1 转置为:

for ind,row in df1.iterrows():
    row.to_csv(path+'\df1Transposed')

df1Transposed: df1转置:

    0.1,1.0
    0.2,2.0  
    0.3,3.0  
    0.4,4.0
    0.1,6.0
    0.2,7.0  
    0.3,8.0  
    0.4,9.0

I am trying to find a good way to group/replicate each row in df2 to match the number of rows in df1 transposed.我正在尝试找到一种对 df2 中的每一行进行分组/复制以匹配 df1 转置的行数的好方法。 For example, 1 transposed header and row in df 1 creates 4 rows and two columns in df1Transposed (0.1-0.4) and repeats again for the next row in df1.例如,1 转置 header 和 df 1 中的行在 df1Transposed (0.1-0.4) 中创建 4 行和 2 列,并为 df1 中的下一行再次重复。 So the first row in df2 should repeat 4 times and then the second row should repeat another 4 times.所以 df2 中的第一行应该重复 4 次,然后第二行应该再重复 4 次。

dfout:输出:

Sequence, dataset_ID,V,I
1,1,0.1,1.0
1,1,0.2,2.0  
1,1,0.3,3.0  
1,1,0.4,4.0  
2,4,0.1,6.0  
2,4,0.2,7.0  
2,4,0.3,8.0  
2,4,0.4,9.0  

You can use a combination of numpy's repeat and arange to get the index, then concatenate the two dataframes horizontally.您可以使用 numpy 的repeatarange的组合来获取索引,然后水平连接两个数据帧。

First, get the transpose thanks to @sammywemmy handy one-liner:首先,感谢@sammywemmy 方便的单线器获得转置:

df1_T = pd.concat([df1.iloc[:2].T,
                   df1.iloc[::2].T.set_axis([0,1],axis=1)],ignore_index=True)

Second get the length of the transpose dataframe, select the number of rows you want to include from df2 and use the functions mentioned above:其次从 df2 获取转置 dataframe、select 的长度,并使用上面提到的函数:

df_1_l = df1_T.shape[0]
no_rows_from_df2 = 2
index = np.repeat(np.arange(no_rows_from_df2), df_1_l//rows_df2)

df3 = pd.concat([df1_T.reset_index(drop=True),
             df2.iloc[index].reset_index(drop=True)], axis=1)
df3

#     0 1   Sequence  dataset_ID
# 0 0.1 1.0   1       1
# 1 0.2 2.0   1       1
# 2 0.3 3.0   1       1
# 3 0.4 4.0   1       1
# 4 0.1 6.0   2       4
# 5 0.2 7.0   2       4
# 6 0.3 8.0   2       4
# 7 0.4 9.0   2       4

Few things, this works because the length of df1_T is a mutiple of the selected number of rows in df2, if for example you would like to repeat rows 0,1,2 then the length of df1 should be 3, 6, 9, 12...几件事,这是因为 df1_T 的长度是 df2 中所选行数的倍数,例如,如果您想重复行0,1,2 ,那么df1的长度应该是3, 6, 9, 12...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM