[英]How to read a space separated file, with some needed, misformatted rows, into pandas?
test.csv
的文件中的以下空格分隔數據
skiprows=[0, 1]
跳過它們。'Formatted Data'
,因此 Pandas 將'.1'
附加到重復的列名。
df = df.rename(columns={'Formatted Data': 'some name', 'Formatted Data.1': 'some other name'})
重命名列。sep=' '
不起作用,請嘗試sep='\\\\s+'
import pandas as pd
# read the data in and skip the first 2 rows
df = pd.read_csv('test.csv', skiprows=[0, 1], sep=' ')
# display(df)
Frequency Formatted Data Formatted Data.1
0 3.0 2.1 0.0
1 3.0 2.1 0.1
2 3.0 2.1 0.2
3 3.0 2.1 0.3
4 3.0 2.1 0.4
5 3.0 2.1 0.5
6 3.0 2.1 0.6
# save to Excel
df.to_excel('Setup_Loss.xlsx', index=None, header=True)
open
讀入文件,清理行,然后將它們添加到數據框中import pandas as pd
# read the file in to clean the headers, and split the data
with open('test.csv') as f:
rows = list(f.readlines())
# select the header rows and clean them
h1 = rows[0].strip().split('# ')[1]
h2 = rows[1].strip().split('# ')[1]
h3 = [v.replace('"', '') for v in rows[2].strip().split(' "')]
# select and split the data
data = [r.strip().split(' ') for r in rows[3:]]
# create the dataframe
df = pd.DataFrame(data, columns=h3)
# add h1 and h2 as multi-level headers
df.columns = pd.MultiIndex.from_product([[h1], [h2], df.columns])
# save to Excel
# in order to save multi-level headers to Excel, the index must be True
df.to_excel('Setup_Loss.xlsx', header=True)
# alternatively, save as a csv
df.to_csv('updated_test.csv', index=False)
# display(df)
Channel 1
Trace 1
Frequency Formatted Data Formatted Data
0 3.0 2.1 0.0
1 3.0 2.1 0.1
2 3.0 2.1 0.2
3 3.0 2.1 0.3
4 3.0 2.1 0.4
5 3.0 2.1 0.5
6 3.0 2.1 0.6
test.csv
# Channel 1
# Trace 1
Frequency "Formatted Data" "Formatted Data"
3.0 2.1 0.0
3.0 2.1 0.1
3.0 2.1 0.2
3.0 2.1 0.3
3.0 2.1 0.4
3.0 2.1 0.5
3.0 2.1 0.6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.