[英]Python. Merge repeated columns
I have to create a dataframe
from a file that contains some columns repeated and their values split as it follows: 我必须从包含重复一些列的文件创建一个
dataframe
,并按如下所示拆分它们的值:
As you can see c1
for example is split into 3 parts or c2
into 2 如您所见,例如
c1
分为3部分或c2
分为2
What i want to get it is something like: 我想要得到的是这样的:
I know that i can merge the columns by: 我知道我可以通过合并列:
df.sum(index=1) or df.max(index=1)
but don't know how to specify that I want to do it with specific columns. 但不知道如何指定要对特定列执行的操作。
Another possibility could be to create dataframes with only the repeated columns, apply either sum or max and then merge everything. 另一种可能性是创建仅包含重复列的数据框,应用sum或max,然后合并所有内容。
But I was wondering if there is something less "ugly". 但是我想知道是否还有一些“丑陋”的东西。
In a much more simple fashion, you can use groupby for that. 您可以通过一种更为简单的方式使用groupby。
In [1]: df = pd.DataFrame(np.random.random_integers(0,10,(5,8)), columns=['C1','C2','C3','C1','C4','C1','C5','C2'])
In [2]: df
Out[2]:
C1 C2 C3 C1 C4 C1 C5 C2
0 5 0 9 1 7 3 3 8
1 3 1 10 7 1 2 3 8
2 1 0 0 0 4 10 6 10
In [3]: # Groupby level 0 on axis 1 (columns) and apply a sum
df.groupby(level=0, axis=1).sum()
Out[3]:
C1 C2 C3 C4 C5
0 9 8 9 7 3
1 12 9 10 1 3
2 11 10 0 4 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.