簡體   English   中英

熊貓read_csv:標頭/ skiprows不起作用

[英]pandas read_csv: header/skiprows not working

所有-

第一次在這里問一個問題,如果格式不好,我很抱歉,請讓我知道如何改善我的問題。

我正在尋求對pandas.read_csv()函數的header和skiprows參數的更好的理解。

這是我嘗試在python中讀取的原始數據的示例:

MiniSonde 5 43656
"Log File Name : lwrhyp_deploy_20170104"
"Setup Date (MMDDYY) : 010417"
"Setup Time (HHMMSS) : 114539"
"Starting Date (MMDDYY) : 010417"
"Starting Time (HHMMSS) : 140000"
"Stopping Date (MMDDYY) : 123169"
"Stopping Time (HHMMSS) : 235959"
"Interval (HHMMSS) : 010000"
"Sensor warmup (HHMMSS) : 000100"
"Circltr warmup (HHMMSS) : 000030"


"Date","Time","","Temp","","SpCond","","Sal","","Dep25","","TDG","","TDG","","LDO%","","LDO","","IBatt",""
"MMDDYY","HHMMSS","","øC","","mS/cm","","ppt","","meters","","mmHg","","psia","","Sat","","mg/l","","Volts",""

01/04/17,14:00:00,"",7.97,"",.0691,"",.02,"",.75,"",735,"",14.22,"",52.7,"",6.15,"",11.4,""
01/04/17,15:00:00,"",7.9,"",.0692,"",.02,"",.76,"",736,"",14.23,"",52.8,"",6.17,"",11.4,""
01/04/17,16:00:00,"",7.89,"",.0694,"",.02,"",.77,"",736,"",14.23,"",52.3,"",6.12,"",11.4,""
01/04/17,17:00:00,"",7.88,"",.0699,"",.02,"",.78,"",735,"",14.21,"",51.8,"",6.06,"",11.4,""
01/04/17,18:00:00,"",7.85,"",.0699,"",.02,"",.78,"",733,"",14.18,"",51.3,"",6.01,"",11.4,""
01/04/17,19:00:00,"",7.83,"",.0706,"",.02,"",.78,"",731,"",14.14,"",51.3,"",6.01,"",11.4,""
01/04/17,20:00:00,"",7.81,"",.0706,"",.02,"",.79,"",730,"",14.12,"",51.1,"",5.99,"",11.4,""
01/04/17,21:00:00,"",7.81,"",.0699,"",.02,"",.79,"",730,"",14.11,"",50.8,"",5.95,"",11.4,""
01/04/17,22:00:00,"",7.76,"",.0702,"",.02,"",.8,"",729,"",14.1,"",50.5,"",5.92,"",11.3,""
01/04/17,23:00:00,"",7.76,"",.0704,"",.02,"",.8,"",729,"",14.09,"",50.5,"",5.93,"",11.3,""
01/05/17,00:00:00,"",7.76,"",.07,"",.02,"",.8,"",729,"",14.09,"",50.5,"",5.92,"",11.3,""

我試圖將以“ Date”開頭的行或以“ MMDDYY”開頭的行用作標題行。 當我在文本編輯器中打開原始數據時,對應於“日期”的行是第14行,這將是零索引python土地中的第13行。

我使用以下代碼,認為它應該跳過前12行並開始讀取第13行的數據:

test = pd.read_csv(filepath, skiprows=12, skip_blank_lines=True)

但這會產生錯誤:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte

經過反復嘗試和反復嘗試的風格之后,我發現以下代碼產生了我所追求的結果類型,但是我不明白為什么它起作用:

test = pd.read_csv(filepath, skiprows=[14], header=11, skip_blank_lines=True)

我不明白read_csv如何計算行數。 我是否不正確,因為標題行不在第11行上,而是在第13行上? 該代碼僅在skiprows = [14]時有效,為什么呢?

附帶說明一下,是否有一種方法可以防止將原始數據中存在的空白列讀入數據幀?

首先, skiprows並沒有按照您的想法做。 當給它一個列表作為輸入時,在解析文件時它將跳過那些行。 對於您想要的內容,只需使用header

其次,熊貓對文件行進行零索引。

第三,當您具有skip_blank_lines=True ,在考慮#header#值之前,它似乎為文件的行重新編制了索引。 因此,在您的示例中,它不會在標題之前(和標題之后的空白行)索引空白行11和12。 記住熊貓對文件行進行了零索引,我們可以看到header=11上的header header=11行如何:

line/ : content
0:MiniSonde 5 43656
1:"Log File Name : lwrhyp_deploy_20170104"
2:"Setup Date (MMDDYY) : 010417"
3:"Setup Time (HHMMSS) : 114539"
4:"Starting Date (MMDDYY) : 010417"
5:"Starting Time (HHMMSS) : 140000"
6:"Stopping Date (MMDDYY) : 123169"
7:"Stopping Time (HHMMSS) : 235959"
8:"Interval (HHMMSS) : 010000"
9:"Sensor warmup (HHMMSS) : 000100"
10:"Circltr warmup (HHMMSS) : 000030"


11:"Date","Time","","Temp","","SpCond","","Sal","","Dep25","","TDG","","TDG","","LDO%","","LDO","","IBatt",""
12:"MMDDYY","HHMMSS","","øC","","mS/cm","","ppt","","meters","","mmHg","","psia","","Sat","","mg/l","","Volts",""

13:01/04/17,14:00:00,"",7.97,"",.0691,"",.02,"",.75,"",735,"",14.22,"",52.7,"",6.15,"",11.4,""
14:01/04/17,15:00:00,"",7.9,"",.0692,"",.02,"",.76,"",736,"",14.23,"",52.8,"",6.17,"",11.4,""
15:01/04/17,16:00:00,"",7.89,"",.0694,"",.02,"",.77,"",736,"",14.23,"",52.3,"",6.12,"",11.4,""
16:01/04/17,17:00:00,"",7.88,"",.0699,"",.02,"",.78,"",735,"",14.21,"",51.8,"",6.06,"",11.4,""
17:01/04/17,18:00:00,"",7.85,"",.0699,"",.02,"",.78,"",733,"",14.18,"",51.3,"",6.01,"",11.4,""
18:01/04/17,19:00:00,"",7.83,"",.0706,"",.02,"",.78,"",731,"",14.14,"",51.3,"",6.01,"",11.4,""
19:01/04/17,20:00:00,"",7.81,"",.0706,"",.02,"",.79,"",730,"",14.12,"",51.1,"",5.99,"",11.4,""
20:01/04/17,21:00:00,"",7.81,"",.0699,"",.02,"",.79,"",730,"",14.11,"",50.8,"",5.95,"",11.4,""
21:01/04/17,22:00:00,"",7.76,"",.0702,"",.02,"",.8,"",729,"",14.1,"",50.5,"",5.92,"",11.3,""
22:01/04/17,23:00:00,"",7.76,"",.0704,"",.02,"",.8,"",729,"",14.09,"",50.5,"",5.93,"",11.3,""
23:01/05/17,00:00:00,"",7.76,"",.07,"",.02,"",.8,"",729,"",14.09,"",50.5,"",5.92,"",11.3,""

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM