简体   繁体   English

pandas.read_csv在标题下放置行时导致列标签移位

[英]pandas.read_csv leads to shifted column labels when dropping lines below header

I am trying to read a .csv file with pandas, with a header looking like this: 我正在尝试使用熊猫读取.csv文件,其标题如下所示:

System Information_1
System Information_2
System Information_3
System Information_4

"Label1"; "Label2"; "Label3"; "Label4"; "Label5"; "Label6"
"alternative Label1"; "alternative Label2"; "alternative Label3"; "alternative Label4"; "alternative Label5"; "alternative Label6"
"unit1"; "unit2"; "unit3"; "unit4"; "unit5"; "unit6"

I'm using the following code to read it: 我正在使用以下代码来阅读它:
df = pd.read_csv('data.csv', sep=';', header=5, skiprows=[6,7], encoding='latin1')

My dataframe does however end up having "unit1", "unit2", "unit3", "unit4", "unit5", "unit6" instead of "Label1", "Label2", "Label3", "Label4", "Label5", "Label6" as column labels. 但是,我的数据框确实具有"unit1", "unit2", "unit3", "unit4", "unit5", "unit6"而不是"Label1", "Label2", "Label3", "Label4", "Label5", "Label6"作为列标签。

In an older version of my csv-file, however, the import code works properly. 但是,在我的csv文件的旧版本中,导入代码可以正常工作。 The difference I could spot between the files was that the older file has a full set of separators in the first 4 rows: 我可以在文件之间发现的区别是,较旧的文件在前4行中具有全套分隔符:

System Information_1;;;;;
System Information_2;;;;; 
etc.  

Does anyone know where that error comes from and how to solve it? 有谁知道该错误来自何处以及如何解决?

您也可以跳过第一行,但是也不要将header设置为5 ,因为那时它是0,因此可以将其保留为自动检测:

df = pd.read_csv('data.csv', sep=';', skiprows=[0,1,2,3,4,6,7], encoding='latin1')

You could use a list as your header argument: 您可以使用列表作为标题参数:

import pandas as pd
from io import StringIO

data = """System Information_1
System Information_2
System Information_3
System Information_4

"Label1"; "Label2"; "Label3"; "Label4"; "Label5"; "Label6"
"alternative Label1"; "alternative Label2"; "alternative Label3" "alternative Label4"; "alternative Label5"; "alternative Label6"
"unit1"; "unit2"; "unit3"; "unit4"; "unit5"; "unit6" 
1;2;3;4;5;6
10;20;30;40;50;60
"""

df = pd.read_csv(StringIO(data), sep=';', header=[4], skiprows=[6, 7], encoding='latin1')

gives: 得到:

在此处输入图片说明

The "header" parameter starts counting after the "skiprows" parameter. “ header”参数在“ skiprows”参数之后开始计数。

If you want to use the label as header: 如果要使用标签作为标题:

df = pd.read_csv('pruebasof.csv', sep=';', skiprows=[0,1,2,3,4,6], encoding='latin1')

Otherwhise, if you want to use the alternative label as header: 另外,如果您想使用替代标签作为标题:

df = pd.read_csv('pruebasof.csv', sep=';', skiprows=6, encoding='latin1')

I made it so you can use the label while keeping the "units" as data for the labels. 我做到了,因此您可以在保留“单位”作为标签数据的同时使用标签。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM