如何將一個空格分隔的文件（包含一些需要的、格式錯誤的行）讀入 Pandas？

Question

我更改了一個 CSV 文件並將所有內容復制到 Excel 文件中。

我正在嘗試將一列中的列表拆分為另外 2 列。

這是我從 CVS 文件復制到新 Excel 文件的代碼：

read_file = pd.read_csv('new_names.csv', sep='\t')

read_file.to_excel('Setup_Loss.xlsx', index=None, header=True)

問題是我想將每個數字分成不同的列。

這是我的結果：

Answer 1

給定名為test.csv的文件中的以下空格分隔數據
前兩行的格式看起來不正確，因此將使用skiprows=[0, 1]跳過它們。
其中兩列都命名為'Formatted Data' ，因此 Pandas 將'.1'附加到重復的列名。
- 使用df = df.rename(columns={'Formatted Data': 'some name', 'Formatted Data.1': 'some other name'})重命名列。
如果sep=' '不起作用，請嘗試sep='\\\\s+'

import pandas as pd

# read the data in and skip the first 2 rows
df = pd.read_csv('test.csv', skiprows=[0, 1], sep=' ')

# display(df)
   Frequency  Formatted Data  Formatted Data.1
0        3.0             2.1               0.0
1        3.0             2.1               0.1
2        3.0             2.1               0.2
3        3.0             2.1               0.3
4        3.0             2.1               0.4
5        3.0             2.1               0.5
6        3.0             2.1               0.6

# save to Excel
df.to_excel('Setup_Loss.xlsx', index=None, header=True)

或者

如果第1行和第2行的信息需要在新文件中
可以使用open讀入文件，清理行，然后將它們添加到數據框中

import pandas as pd

# read the file in to clean the headers, and split the data
with open('test.csv') as f:
    rows = list(f.readlines())
    
    # select the header rows and clean them
    h1 = rows[0].strip().split('# ')[1]
    h2 = rows[1].strip().split('# ')[1]
    h3 = [v.replace('"', '') for v in rows[2].strip().split(' "')]
    
    # select and split the data
    data = [r.strip().split(' ') for r in rows[3:]]

# create the dataframe
df = pd.DataFrame(data, columns=h3)

# add h1 and h2 as multi-level headers
df.columns = pd.MultiIndex.from_product([[h1], [h2], df.columns])

# save to Excel
# in order to save multi-level headers to Excel, the index must be True
df.to_excel('Setup_Loss.xlsx', header=True)

# alternatively, save as a csv
df.to_csv('updated_test.csv', index=False)

# display(df)
                                Channel 1                              
                                  Trace 1                              
  Frequency Formatted Data Formatted Data
0       3.0            2.1            0.0
1       3.0            2.1            0.1
2       3.0            2.1            0.2
3       3.0            2.1            0.3
4       3.0            2.1            0.4
5       3.0            2.1            0.5
6       3.0            2.1            0.6

`test.csv`

# Channel 1
# Trace 1
Frequency "Formatted Data" "Formatted Data"
3.0 2.1 0.0
3.0 2.1 0.1
3.0 2.1 0.2
3.0 2.1 0.3
3.0 2.1 0.4
3.0 2.1 0.5
3.0 2.1 0.6

如何將一個空格分隔的文件（包含一些需要的、格式錯誤的行）讀入 Pandas？

問題描述

1 個解決方案

解決方案1
1 2020-09-28 05:32:20

或者

`test.csv`

如何將一個空格分隔的文件（包含一些需要的、格式錯誤的行）讀入 Pandas？

問題描述

1 個解決方案

解決方案1 1 2020-09-28 05:32:20

或者

test.csv

解決方案1
1 2020-09-28 05:32:20

`test.csv`