在Python中讀取具有不同頁腳行長的.csv文件

Question

我是Python的入門者，如果解決方案顯而易見，我深表歉意。 我正在嘗試在python上讀取一些.csv字段數據以進行處理。 目前我有：

data = pd.read_csv('somedata.csv', sep=' |,', engine='python', usecols=(range(0,10)), skiprows=155, skipfooter=3)

但是，取決於數據收集是否中斷，文件的最后幾行可能類似於：

#data_end

運行完成

要么

運行中斷

錯誤

一堆錯誤代碼

因此，我不能只使用skipfooter = 3。 Python有沒有辦法檢測頁腳的長度並跳過它？ 謝謝。

Answer 1

您可以先將文件內容作為純文本文件讀取到Python列表中，刪除那些不包含預期數量的分隔符的行，然后將列表轉換為IO流。 然后，此IO流將傳遞到pd.read_csv ，就好像它是文件對象一樣。

代碼可能看起來像這樣：

from io import StringIO
import pandas as pd

# adjust these variables to meet your requirements:
number_of_columns = 11
separator = " |, "

# read the content of the file as plain text:
with open("somedata.csv", "r") as infile:
    raw = infile.readlines()

# drop the rows that don't contain the expected number of separators:
raw = [x for x in raw if x.count(separator) == number_of_columns]

# turn the list into an IO stream (after joining the rows into a big string):
stream = StringIO("".join(raw))

# pass the string as an argument to pd.read_csv():
df = pd.read_csv(stream, sep=separator, engine='python', 
                 usecols=(range(0,10)), skiprows=155)

如果使用Python 2.7，則必須用以下兩行替換from io import StringIO的第一行：

from __future__ import unicode_literals
from cStringIO import StringIO

之所以如此，是因為StringIO需要一個unicode字符串（這不是Python 2.7中的默認字符串），並且因為StringIO類位於Python 2.7中的另一個模塊中。

Answer 2

我認為您必須簡單地對每行的逗號進行計數，然后手動找到最后一個正確的逗號。 我不知道read_csv的參數可以自動執行該操作。

在Python中讀取具有不同頁腳行長的.csv文件

問題描述

2 個解決方案

解決方案1
1 已采納 2017-01-25 17:11:42

解決方案2
0 2017-01-25 16:36:47

在Python中讀取具有不同頁腳行長的.csv文件

問題描述

2 個解決方案

解決方案1 1 已采納 2017-01-25 17:11:42

解決方案2 0 2017-01-25 16:36:47

解決方案1
1 已采納 2017-01-25 17:11:42

解決方案2
0 2017-01-25 16:36:47