简体   繁体   English

Python: Pandas - 来自多个 header ZBF57C906FA7D2BB856D07372E41 的整洁数据框

[英]Python: Pandas - tidy data frame from multiple header excel sheet

I have an excel sheet output from a software tool that is structured in the following multi-header way.我有一个来自软件工具的 excel 表 output,该软件工具以以下多标题方式构建。 excel structure: excel结构:

+---+-------+--------------+--------------+
|   |       |              |              |
+---+-------+--------------+--------------+
|   |       | not relevant | not relevant |
+---+-------+--------------+--------------+
|   |       | X1           | Y1           |
+---+-------+--------------+--------------+
|fr | Time  | not relevant | not relevant |
+---+-------+--------------+--------------+
| 1 | 0.000 | 12           | 32           |
+---+-------+--------------+--------------+
| 2 | 0.010 | 23           | 3            |
+---+-------+--------------+--------------+
| 3 | 0.020 | 45           | 4            |
+---+-------+--------------+--------------+
| 4 | 0.030 | 4            | 1            |
+---+-------+--------------+--------------+
|   |       |              |              |
+---+-------+--------------+--------------+
|   |       | not relevant |              |
+---+-------+--------------+--------------+
|   |       | Y2           |              |
+---+-------+--------------+--------------+
|fr | Time  | not relevant |              |
+---+-------+--------------+--------------+
| 1 | 0.000 | 5            |              |
+---+-------+--------------+--------------+
| 2 | 0.010 | 89           |              |
+---+-------+--------------+--------------+
| 3 | 0.020 | 5            |              |
+---+-------+--------------+--------------+
| 4 | 0.030 | 3            |              |
+---+-------+--------------+--------------+
|   |       |              |              |
+---+-------+--------------+--------------+
|   |       | not relevant |              |
+---+-------+--------------+--------------+
|   |       | X3           |              |
+---+-------+--------------+--------------+
|fr | Time  | not relevant |              |
+---+-------+--------------+--------------+
| 1 | 0.000 | 17           |              |
+---+-------+--------------+--------------+
| 2 | 0.010 | 2            |              |
+---+-------+--------------+--------------+
| 3 | 0.020 | 4            |              |
+---+-------+--------------+--------------+
| 4 | 0.030 | 23           |              |
+---+-------+--------------+--------------+

csv structure: csv结构:

,,,
,,not relevant,not relevant
,,X1,Y1
fr,Time,not relevant,not relevant
1,0.000,12,32
2,0.010,23,3
3,0.020,45,4
4,0.030,4,1
,,,
,,not relevant,
,,Y2,
fr,Time,not relevant,
1,0.000,5,
2,0.010,89,
3,0.020,5,
4,0.030,3,
,,,
,,not relevant,
,,X3,
fr,Time,not relevant,
1,0.000,17,
2,0.010,2,
3,0.020,4,
4,0.030,23,

I am looking for a fast way to convert this messy data into a tidy pandas dataframe.我正在寻找一种快速的方法来将这些杂乱的数据转换为整洁的 pandas dataframe。

  • The timestamps are identical in value and number for each individual sub-series.每个子系列的时间戳值和数量相同。
  • the number of sub-series is variable.子系列的数量是可变的。

The end result should look as follows.最终结果应如下所示。

  Time    X1     Y1     Y2     X3  
  0.000   12     32     5      17    
  0.010   23     3      89     2     
  0.020   45     4      5      4     
  0.030   4      1      3      23 

I did the following... not super happy about it, but it works.我做了以下......不是很高兴,但它有效。

import numpy as np
import pandas as pd

filename = 'test_data'

df = pd.read_excel(filename + '.xlsx', header=None)

df_list = np.split(df, df[df.isnull().all(1)].index)

del df_list[0]

for i, df in enumerate(df_list):

    df.iloc[3, 2:] = df.iloc[2, 2:]

    new_header = df.iloc[3]
    df.columns = new_header

    df = df.iloc[4:]
    df_tmp = df.drop(['Frame'], axis=1)    
    df = df_tmp.set_index("Time")
    df.dropna(axis=1, how='all', inplace=True)    
    df.columns.name = None

    df_list[i] = df

df = pd.concat(df_list, axis=1)
df = df.reindex(sorted(df.columns), axis=1)

df.to_csv(filename + '.csv')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM