使用Python搜索大CSV文件中的元素

Question

我试图过滤CSV文件并获取另一个列表内的列表的第五个值，但是我一直都超出范围。

import csv
from operator import itemgetter
teste=[]
f = csv.reader(open('power_supply_info.csv'), delimiter =',' )
for word in f:
    teste.append(word)
    #print teste    
    #print ('\n') 
print map( itemgetter(5), teste)

但是，我得到了这个错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\rafael.paiva\Dev\Python2.7\WinPython-64bit-2.7.6.4\python-2.7.6.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
    execfile(filename, namespace)
  File "C:/Users/rafael.paiva/Desktop/Rafael/CSV.py", line 24, in <module>
    print map( itemgetter(5), teste)
IndexError: list index out of range

根据步骤附加到“ teste”的“ word”变量中的内容是：

[['2015-12-31-21:02:30.754271', '25869', '500000', 'Unknown', '1', '0', '4790780', '1', '0', '0', '375', '0', '-450060', '-326040', '3437000', 'Normal', 'N/A', '93', 'Good', '19', '1815372', 'Unknown', 'Charging', '4195078', '4440000', '4208203', '4171093', '0', '44290', 'Li-ion', '95', '1', '3000000', '1', '375', '-450060', '-326040', '3437000', '93', 'Good', '1815372', '4195000', '4440000', '4208203', '4165625', '0', '44290', '95', '3000000', '1', ''],
 ['2015-12-31-21:03:30.910972', '25930', '500000', 'Unknown', '1', '0', '4794730', '1', '0', '0', '377', '0', '55692', '107328', '3437000', 'Normal', 'N/A', '92', 'Good', '19', '1814234', 'Unknown', 'Charging', '4200390', '4440000', '4207734', '4214062', '0', '41200', 'Li-ion', '95', '1', '3000000', '1', '377', '55692', '107328', '3437000', '92', 'Good', '1814234', '4200390', '4440000', '4207734', '4214062', '0', '41200', '95', '3000000', '1', '']]

有人可以帮我吗？

Answer 1

您应该在循环中添加一些诊断程序，这将有助于向您显示csv文件中可能存在问题的位置：

import csv
from operator import itemgetter

teste = []

with open('power_supply_info.csv', 'rb') as f_input:
    for line, words in enumerate(csv.reader(f_input, delimiter =',' ), start=1):
        if len(words) <= 5:
            print "Line {} only has {} elements".format(line, len(words))
        teste.append(words)

print map(itemgetter(5), teste)

你们其中的一行很可能是空白或条目太少，此脚本将列出出现问题的行号。

Answer 2

我不知道您的power_supply_info.csv文件中包含什么，但是在csv.reader完成其工作后，您会清楚知道什么：

包含2个列表的列表（即2个元素）

这就是为什么访问第5个元素时出错，只有2个

解决您的问题的可能方法：

import csv

f = csv.reader(open('power_supply_info.csv'), delimiter =',' )
# First iterate over the rows and then get each list in the row
teste = [x for x in (row for row in f)]
print map(lambda x: x[5], teste)

真正的挑战是要查看csv文件中的输入，以了解为什么最终将这2个列表放在列表中。

注意：如果您输出的内容属于teste而不是word ，则代码可能是：

import csv

f = csv.reader(open('power_supply_info.csv'), delimiter =',' )
teste = [row for row in f]
print [x[5] for x in teste]

最好的祝福

Answer 3

您显示的代码可与您提供的数据样本正确配合使用：

In [8]: l = [['2015-12-31-21:02:30.754271', '25869', '500000', 'Unknown', '1', '0', '4790780', '1', '0', '0'],
   ...:      ['2015-12-31-21:03:30.910972', '25930', '500000', 'Unknown', '1', '0', '4794730', '1', '0', '0']]

In [9]: list(map(itemgetter(5),l))
Out[9]: ['0', '0']

我怀疑您的CSV文件中的一行（可能是最后一行）是空白的，因此teste的最后一个元素实际上是一个空列表，因此itemgetter(5)对于最后一行失败。

与其将所有内容塞成一行，不如尝试

for item in teste:
    if item:
        print item[5]

使用Python搜索大CSV文件中的元素

问题描述

3 个解决方案

解决方案1
1 2016-01-15 12:03:16

解决方案2
0 2016-01-15 12:00:17

解决方案3
-1 已采纳 2016-01-15 11:53:54

使用Python搜索大CSV文件中的元素

问题描述

3 个解决方案

解决方案1 1 2016-01-15 12:03:16

解决方案2 0 2016-01-15 12:00:17

解决方案3 -1 已采纳 2016-01-15 11:53:54

解决方案1
1 2016-01-15 12:03:16

解决方案2
0 2016-01-15 12:00:17

解决方案3
-1 已采纳 2016-01-15 11:53:54