简体   繁体   English

使用 Python/Pandas 将两行数据合并到一行中

[英]merging two rows of data in a single row with Python/Pandas

I have a dataframe like this:我有一个像这样的 dataframe:

   ID   A1    A2    A3    A4                                      
0  01  100   101   103   104
1  01  501   502   503   504
2  01  701   702   703   704
3  02  1001  1002  1003  1004
4  03  2001  2002  2003  2004
5  03  5001  5002  5003  5004

I need the rows belonging to the same ID to be merged in a single row, the merged dataframe will be like this我需要将属于同一ID的行合并到一行中,合并后的dataframe会是这样的

   ID   A1    A2    A3    A4    B1    B2    B3     B4     C1   C2    C3    C4                                                   
0  01  101   102   103   104   501   502    503    504    701  702   703   704 
1  02  1001  2001  1003  1004  
2  03  2001  2002  2003  2004  5001  5002   5003   5004

I tried using np.random.permutation, np.roll etc but unable to get the desired result.我尝试使用 np.random.permutation、np.roll 等,但无法获得所需的结果。 The count of rows in my original data set is in thousands so loops and creating individual data sets and then merging is not helping我的原始数据集中的行数为数千,因此循环并创建单个数据集然后合并无济于事

unstacked = df.unstack() gives you the first step: unstacked = df.unstack()为您提供了第一步:

A1  0    1001
    1    5001
    2    7001
A2  0    1002
    1    5002
    2    7002
A3  0    1003
    1    5003
    2    7003
A4  0    1004
    1    5004
    2    7004

Then you can extract the two "levels" of the index:然后您可以提取索引的两个“级别”:

colname = unstacked.index.get_level_values(0) # A1,A1,A1,A2,...
rownum = unstacked.index.get_level_values(1) # 0,1,2,0,...

Then convert them to the desired format:然后将它们转换为所需的格式:

idxchr = (rownum + ord('A')).map(chr) # A,B,C,A,...
idxnum = colname.str[1] # 1,1,1,2,...

And finally, overwrite the unstacked index:最后,覆盖未堆叠的索引:

unstacked.index = idxchr + idxnum

Result:结果:

A1    1001
B1    5001
C1    7001
A2    1002
B2    5002
C2    7002
A3    1003
B3    5003
C3    7003
A4    1004
B4    5004
C4    7004

Edit: You edited your question while I wrote this answer, now you may need to enhance it a bit to work for the new example input you've posted.编辑:您在我写这个答案时编辑了您的问题,现在您可能需要对其进行一些增强以适用于您发布的新示例输入。

This is how you do it:这就是你的做法:

import pandas as pd


def widen(x):
    num_rows = len(x)
    num_cols = len(x.columns)

    new_index = [
        chr(ord('A') + row_number) + str(col_number + 1)
        for row_number in range(num_rows)
        for col_number in range(num_cols)
    ]

    return pd.Series(x.loc[:, 'A1':].unstack().values, index=new_index)

res = df.groupby('ID').apply(widen).unstack()

The output would be: output 将是:

        A1      A2      A3      A4      B1  ...      B4     C1     C2     C3     C4
ID                                          ...                                    
1    100.0   501.0   701.0   101.0   502.0  ...   503.0  703.0  104.0  504.0  704.0
2   1001.0  1002.0  1003.0  1004.0     NaN  ...     NaN    NaN    NaN    NaN    NaN
3   2001.0  5001.0  2002.0  5002.0  2003.0  ...  5004.0    NaN    NaN    NaN    NaN

caveat : this will only work assuming each ID won't have more than 26 rows.警告:这仅在每个 ID 不超过 26 行的情况下才有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM