[英]Python: Pandas - tidy data frame from multiple header excel sheet
I have an excel sheet output from a software tool that is structured in the following multi-header way.我有一个来自软件工具的 excel 表 output,该软件工具以以下多标题方式构建。 excel structure: excel结构:
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | not relevant |
+---+-------+--------------+--------------+
| | | X1 | Y1 |
+---+-------+--------------+--------------+
|fr | Time | not relevant | not relevant |
+---+-------+--------------+--------------+
| 1 | 0.000 | 12 | 32 |
+---+-------+--------------+--------------+
| 2 | 0.010 | 23 | 3 |
+---+-------+--------------+--------------+
| 3 | 0.020 | 45 | 4 |
+---+-------+--------------+--------------+
| 4 | 0.030 | 4 | 1 |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | Y2 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 5 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 89 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 5 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 3 | |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | X3 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 17 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 2 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 4 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 23 | |
+---+-------+--------------+--------------+
csv structure: csv结构:
,,,
,,not relevant,not relevant
,,X1,Y1
fr,Time,not relevant,not relevant
1,0.000,12,32
2,0.010,23,3
3,0.020,45,4
4,0.030,4,1
,,,
,,not relevant,
,,Y2,
fr,Time,not relevant,
1,0.000,5,
2,0.010,89,
3,0.020,5,
4,0.030,3,
,,,
,,not relevant,
,,X3,
fr,Time,not relevant,
1,0.000,17,
2,0.010,2,
3,0.020,4,
4,0.030,23,
I am looking for a fast way to convert this messy data into a tidy pandas dataframe.我正在寻找一种快速的方法来将这些杂乱的数据转换为整洁的 pandas dataframe。
The end result should look as follows.最终结果应如下所示。
Time X1 Y1 Y2 X3
0.000 12 32 5 17
0.010 23 3 89 2
0.020 45 4 5 4
0.030 4 1 3 23
I did the following... not super happy about it, but it works.我做了以下......不是很高兴,但它有效。
import numpy as np
import pandas as pd
filename = 'test_data'
df = pd.read_excel(filename + '.xlsx', header=None)
df_list = np.split(df, df[df.isnull().all(1)].index)
del df_list[0]
for i, df in enumerate(df_list):
df.iloc[3, 2:] = df.iloc[2, 2:]
new_header = df.iloc[3]
df.columns = new_header
df = df.iloc[4:]
df_tmp = df.drop(['Frame'], axis=1)
df = df_tmp.set_index("Time")
df.dropna(axis=1, how='all', inplace=True)
df.columns.name = None
df_list[i] = df
df = pd.concat(df_list, axis=1)
df = df.reindex(sorted(df.columns), axis=1)
df.to_csv(filename + '.csv')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.