[英]merging two rows of data in a single row with Python/Pandas
I have a dataframe like this:我有一个像这样的 dataframe:
ID A1 A2 A3 A4
0 01 100 101 103 104
1 01 501 502 503 504
2 01 701 702 703 704
3 02 1001 1002 1003 1004
4 03 2001 2002 2003 2004
5 03 5001 5002 5003 5004
I need the rows belonging to the same ID to be merged in a single row, the merged dataframe will be like this我需要将属于同一ID的行合并到一行中,合并后的dataframe会是这样的
ID A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4
0 01 101 102 103 104 501 502 503 504 701 702 703 704
1 02 1001 2001 1003 1004
2 03 2001 2002 2003 2004 5001 5002 5003 5004
I tried using np.random.permutation, np.roll etc but unable to get the desired result.我尝试使用 np.random.permutation、np.roll 等,但无法获得所需的结果。 The count of rows in my original data set is in thousands so loops and creating individual data sets and then merging is not helping我的原始数据集中的行数为数千,因此循环并创建单个数据集然后合并无济于事
unstacked = df.unstack()
gives you the first step: unstacked = df.unstack()
为您提供了第一步:
A1 0 1001
1 5001
2 7001
A2 0 1002
1 5002
2 7002
A3 0 1003
1 5003
2 7003
A4 0 1004
1 5004
2 7004
Then you can extract the two "levels" of the index:然后您可以提取索引的两个“级别”:
colname = unstacked.index.get_level_values(0) # A1,A1,A1,A2,...
rownum = unstacked.index.get_level_values(1) # 0,1,2,0,...
Then convert them to the desired format:然后将它们转换为所需的格式:
idxchr = (rownum + ord('A')).map(chr) # A,B,C,A,...
idxnum = colname.str[1] # 1,1,1,2,...
And finally, overwrite the unstacked index:最后,覆盖未堆叠的索引:
unstacked.index = idxchr + idxnum
Result:结果:
A1 1001
B1 5001
C1 7001
A2 1002
B2 5002
C2 7002
A3 1003
B3 5003
C3 7003
A4 1004
B4 5004
C4 7004
Edit: You edited your question while I wrote this answer, now you may need to enhance it a bit to work for the new example input you've posted.编辑:您在我写这个答案时编辑了您的问题,现在您可能需要对其进行一些增强以适用于您发布的新示例输入。
This is how you do it:这就是你的做法:
import pandas as pd
def widen(x):
num_rows = len(x)
num_cols = len(x.columns)
new_index = [
chr(ord('A') + row_number) + str(col_number + 1)
for row_number in range(num_rows)
for col_number in range(num_cols)
]
return pd.Series(x.loc[:, 'A1':].unstack().values, index=new_index)
res = df.groupby('ID').apply(widen).unstack()
The output would be: output 将是:
A1 A2 A3 A4 B1 ... B4 C1 C2 C3 C4
ID ...
1 100.0 501.0 701.0 101.0 502.0 ... 503.0 703.0 104.0 504.0 704.0
2 1001.0 1002.0 1003.0 1004.0 NaN ... NaN NaN NaN NaN NaN
3 2001.0 5001.0 2002.0 5002.0 2003.0 ... 5004.0 NaN NaN NaN NaN
caveat : this will only work assuming each ID won't have more than 26 rows.警告:这仅在每个 ID 不超过 26 行的情况下才有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.