解析（從文本）具有兩行的表格 header

Question

我正在解析 a.ipynb 的 output。 output 生成為純文本（使用打印）而不是 dataframe（不使用打印），本着以下精神：

print( athletes.groupby('NOC').count() )

我想出了針對各種情況的技巧（例如使用pandas.read_fwf() ），但我想知道是否有人有更優雅的解決方案的想法。

它一直困擾着我，pandas 無法解析 pandas.dataframe 的默認打印，這很奇怪（糟糕的設計？）。

編輯：在第一個表中添加了更多示例

表格1

                            Name  Discipline
NOC                                         
United States of America     615         615
Japan                        586         586
Australia                    470         470
People's Republic of China   401         401
Germany                      400         400

表 2

                                     Name
NOC                      Discipline
United States of America Athletics    144
Germany                  Athletics     95
Great Britain            Athletics     75
Italy                    Athletics     73
Japan                    Athletics     70
Bermuda                  Triathlon      1
Libya                    Athletics      1
Palestine                Athletics      1
San Marino               Swimming       1
Kiribati                 Athletics      1

表3

                       Name       NOC Discipline
1410             CA Liliana  Portugal  Athletics
1411   CABAL Juan-Sebastian  Colombia     Tennis
1412        CABALLERO Denia      Cuba  Athletics
1413  CABANA PEREZ Cristina     Spain       Judo
1414          CABECINHA Ana  Portugal  Athletics

Answer 1

假設以下輸入：

text = '''                            Name  Discipline
NOC                                         
United States of America     615         615
Japan                        586         586
Australia                    470         470
People's Republic of China   401         401
Germany                      400         400'''

您可以將pandas.read_csv與 '\s\s+' 分隔符一起使用：

import pandas as pd
import io
df = pd.read_csv(io.StringIO(text), sep='\s\s+', engine='python')

Output：

>>> df.index
Index(['United States of America', 'Japan', 'Australia',
       'People's Republic of China', 'Germany'],
      dtype='object', name='NOC')

>>> df.columns
Index(['Name', 'Discipline'], dtype='object')

>>> df
                            Name  Discipline
NOC                                         
United States of America     615         615
Japan                        586         586
Australia                    470         470
People's Republic of China   401         401
Germany                      400         400

解析（從文本）具有兩行的表格 header

問題描述

1 個解決方案

解決方案1
1 2021-10-02 05:44:33

解析（從文本）具有兩行的表格 header

問題描述

1 個解決方案

解決方案1 1 2021-10-02 05:44:33

解決方案1
1 2021-10-02 05:44:33