在Python中读取具有不同页脚行长的.csv文件

Question

我是Python的入门者，如果解决方案显而易见，我深表歉意。 我正在尝试在python上读取一些.csv字段数据以进行处理。 目前我有：

data = pd.read_csv('somedata.csv', sep=' |,', engine='python', usecols=(range(0,10)), skiprows=155, skipfooter=3)

但是，取决于数据收集是否中断，文件的最后几行可能类似于：

#data_end

运行完成

要么

运行中断

错误

一堆错误代码

因此，我不能只使用skipfooter = 3。 Python有没有办法检测页脚的长度并跳过它？ 谢谢。

Answer 1

您可以先将文件内容作为纯文本文件读取到Python列表中，删除那些不包含预期数量的分隔符的行，然后将列表转换为IO流。 然后，此IO流将传递到pd.read_csv ，就好像它是文件对象一样。

代码可能看起来像这样：

from io import StringIO
import pandas as pd

# adjust these variables to meet your requirements:
number_of_columns = 11
separator = " |, "

# read the content of the file as plain text:
with open("somedata.csv", "r") as infile:
    raw = infile.readlines()

# drop the rows that don't contain the expected number of separators:
raw = [x for x in raw if x.count(separator) == number_of_columns]

# turn the list into an IO stream (after joining the rows into a big string):
stream = StringIO("".join(raw))

# pass the string as an argument to pd.read_csv():
df = pd.read_csv(stream, sep=separator, engine='python', 
                 usecols=(range(0,10)), skiprows=155)

如果使用Python 2.7，则必须用以下两行替换from io import StringIO的第一行：

from __future__ import unicode_literals
from cStringIO import StringIO

之所以如此，是因为StringIO需要一个unicode字符串（这不是Python 2.7中的默认字符串），并且因为StringIO类位于Python 2.7中的另一个模块中。

Answer 2

我认为您必须简单地对每行的逗号进行计数，然后手动找到最后一个正确的逗号。 我不知道read_csv的参数可以自动执行该操作。

在Python中读取具有不同页脚行长的.csv文件

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-01-25 17:11:42

解决方案2
0 2017-01-25 16:36:47

在Python中读取具有不同页脚行长的.csv文件

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-01-25 17:11:42

解决方案2 0 2017-01-25 16:36:47

解决方案1
1 已采纳 2017-01-25 17:11:42

解决方案2
0 2017-01-25 16:36:47