![](/img/trans.png)
[英]Python: How to quickly create a pandas data frame with only specific columns from a big excel sheet?
[英]Python: Pandas - tidy data frame from multiple header excel sheet
我有一個來自軟件工具的 excel 表 output,該軟件工具以以下多標題方式構建。 excel結構:
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | not relevant |
+---+-------+--------------+--------------+
| | | X1 | Y1 |
+---+-------+--------------+--------------+
|fr | Time | not relevant | not relevant |
+---+-------+--------------+--------------+
| 1 | 0.000 | 12 | 32 |
+---+-------+--------------+--------------+
| 2 | 0.010 | 23 | 3 |
+---+-------+--------------+--------------+
| 3 | 0.020 | 45 | 4 |
+---+-------+--------------+--------------+
| 4 | 0.030 | 4 | 1 |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | Y2 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 5 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 89 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 5 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 3 | |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | X3 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 17 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 2 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 4 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 23 | |
+---+-------+--------------+--------------+
csv結構:
,,,
,,not relevant,not relevant
,,X1,Y1
fr,Time,not relevant,not relevant
1,0.000,12,32
2,0.010,23,3
3,0.020,45,4
4,0.030,4,1
,,,
,,not relevant,
,,Y2,
fr,Time,not relevant,
1,0.000,5,
2,0.010,89,
3,0.020,5,
4,0.030,3,
,,,
,,not relevant,
,,X3,
fr,Time,not relevant,
1,0.000,17,
2,0.010,2,
3,0.020,4,
4,0.030,23,
我正在尋找一種快速的方法來將這些雜亂的數據轉換為整潔的 pandas dataframe。
最終結果應如下所示。
Time X1 Y1 Y2 X3
0.000 12 32 5 17
0.010 23 3 89 2
0.020 45 4 5 4
0.030 4 1 3 23
我做了以下......不是很高興,但它有效。
import numpy as np
import pandas as pd
filename = 'test_data'
df = pd.read_excel(filename + '.xlsx', header=None)
df_list = np.split(df, df[df.isnull().all(1)].index)
del df_list[0]
for i, df in enumerate(df_list):
df.iloc[3, 2:] = df.iloc[2, 2:]
new_header = df.iloc[3]
df.columns = new_header
df = df.iloc[4:]
df_tmp = df.drop(['Frame'], axis=1)
df = df_tmp.set_index("Time")
df.dropna(axis=1, how='all', inplace=True)
df.columns.name = None
df_list[i] = df
df = pd.concat(df_list, axis=1)
df = df.reindex(sorted(df.columns), axis=1)
df.to_csv(filename + '.csv')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.