My input excel (xlsx) file has a format like:
mz n n n n g_1 g_1 g_2 g_2 g_2
1 2 3 4 5 6 7 8 8 8
1 2 3 4 5 6 7 8 8 8
1 2 3 4 5 6 7 8 8 8
1 2 3 4 5 6 7 8 8 8
When I read the file using pd.read_excel, it somehow adds numbers to each column like:
mz n n.1 n.2 n.3 g_1 g_1.1 g_2 g_2.1 g_2.2
1 2 3 4 5 6 7 8 8 8
1 2 3 4 5 6 7 8 8 8
1 2 3 4 5 6 7 8 8 8
so I am unable to use groupby to group those with 'n', 'g_1', and so forth. Is there a way to make the groupby work on the specific groups? I tried merging the column header with the same types but to no avail.
Edit: The answer that I've chosen solved the question. However, I do have one additional question. When I add the code from the answer, the resulting grouped dataframe has the columns all out of order. Is there a way to conserve the order of the column names? Thanks!
IIUC, use split
then group on the first part before '.':
df.groupby(df.columns.str.split('.').str[0], axis=1).sum()
Output:
g_1 g_2 mz n
0 13 24 1 14
1 13 24 1 14
2 13 24 1 14
Where df is:
mz n n.1 n.2 n.3 g_1 g_1.1 g_2 g_2.1 g_2.2
0 1 2 3 4 5 6 7 8 8 8
1 1 2 3 4 5 6 7 8 8 8
2 1 2 3 4 5 6 7 8 8 8
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.