Python: Pandas - 来自多个 header ZBF57C906FA7D2BB856D07372E41 的整洁数据框

Question

I have an excel sheet output from a software tool that is structured in the following multi-header way.我有一个来自软件工具的 excel 表 output，该软件工具以以下多标题方式构建。 excel structure: excel结构：

+---+-------+--------------+--------------+
|   |       |              |              |
+---+-------+--------------+--------------+
|   |       | not relevant | not relevant |
+---+-------+--------------+--------------+
|   |       | X1           | Y1           |
+---+-------+--------------+--------------+
|fr | Time  | not relevant | not relevant |
+---+-------+--------------+--------------+
| 1 | 0.000 | 12           | 32           |
+---+-------+--------------+--------------+
| 2 | 0.010 | 23           | 3            |
+---+-------+--------------+--------------+
| 3 | 0.020 | 45           | 4            |
+---+-------+--------------+--------------+
| 4 | 0.030 | 4            | 1            |
+---+-------+--------------+--------------+
|   |       |              |              |
+---+-------+--------------+--------------+
|   |       | not relevant |              |
+---+-------+--------------+--------------+
|   |       | Y2           |              |
+---+-------+--------------+--------------+
|fr | Time  | not relevant |              |
+---+-------+--------------+--------------+
| 1 | 0.000 | 5            |              |
+---+-------+--------------+--------------+
| 2 | 0.010 | 89           |              |
+---+-------+--------------+--------------+
| 3 | 0.020 | 5            |              |
+---+-------+--------------+--------------+
| 4 | 0.030 | 3            |              |
+---+-------+--------------+--------------+
|   |       |              |              |
+---+-------+--------------+--------------+
|   |       | not relevant |              |
+---+-------+--------------+--------------+
|   |       | X3           |              |
+---+-------+--------------+--------------+
|fr | Time  | not relevant |              |
+---+-------+--------------+--------------+
| 1 | 0.000 | 17           |              |
+---+-------+--------------+--------------+
| 2 | 0.010 | 2            |              |
+---+-------+--------------+--------------+
| 3 | 0.020 | 4            |              |
+---+-------+--------------+--------------+
| 4 | 0.030 | 23           |              |
+---+-------+--------------+--------------+

csv structure: csv结构：

,,,
,,not relevant,not relevant
,,X1,Y1
fr,Time,not relevant,not relevant
1,0.000,12,32
2,0.010,23,3
3,0.020,45,4
4,0.030,4,1
,,,
,,not relevant,
,,Y2,
fr,Time,not relevant,
1,0.000,5,
2,0.010,89,
3,0.020,5,
4,0.030,3,
,,,
,,not relevant,
,,X3,
fr,Time,not relevant,
1,0.000,17,
2,0.010,2,
3,0.020,4,
4,0.030,23,

I am looking for a fast way to convert this messy data into a tidy pandas dataframe.我正在寻找一种快速的方法来将这些杂乱的数据转换为整洁的 pandas dataframe。

The timestamps are identical in value and number for each individual sub-series.每个子系列的时间戳值和数量相同。
the number of sub-series is variable.子系列的数量是可变的。

The end result should look as follows.最终结果应如下所示。

  Time    X1     Y1     Y2     X3  
  0.000   12     32     5      17    
  0.010   23     3      89     2     
  0.020   45     4      5      4     
  0.030   4      1      3      23

Answer 1

I did the following... not super happy about it, but it works.我做了以下......不是很高兴，但它有效。

import numpy as np
import pandas as pd

filename = 'test_data'

df = pd.read_excel(filename + '.xlsx', header=None)

df_list = np.split(df, df[df.isnull().all(1)].index)

del df_list[0]

for i, df in enumerate(df_list):

    df.iloc[3, 2:] = df.iloc[2, 2:]

    new_header = df.iloc[3]
    df.columns = new_header

    df = df.iloc[4:]
    df_tmp = df.drop(['Frame'], axis=1)    
    df = df_tmp.set_index("Time")
    df.dropna(axis=1, how='all', inplace=True)    
    df.columns.name = None

    df_list[i] = df

df = pd.concat(df_list, axis=1)
df = df.reindex(sorted(df.columns), axis=1)

df.to_csv(filename + '.csv')

Python: Pandas - 来自多个 header ZBF57C906FA7D2BB856D07372E41 的整洁数据框

问题描述

1 个解决方案

解决方案1
0 2020-05-31 07:04:56

Python: Pandas - 来自多个 header ZBF57C906FA7D2BB856D07372E41 的整洁数据框

问题描述

1 个解决方案

解决方案1 0 2020-05-31 07:04:56

解决方案1
0 2020-05-31 07:04:56