简体   繁体   English

在Python中解析大量数据时,如何处理索引超出范围的错误?

[英]How do I handle index out of range errors when parsing large quantities of data in Python?

I have a huge amount of data an a .txt file that I'm trying to parse to objects in a list using Pyhon . 我有一个巨大的数据量.txt ,我试图解析到文件objectslist使用Pyhon The data structure looks like this for most part, and when it does, the parsing is successful. 数据结构在大多数情况下看起来都像这样,当解析成功时,解析就成功了。

2315462;3/13/2015 8:00:00 AM;3/13/2015 1:00:00 PM
778241;1/3/2015 12:30:00 PM;1/3/2015 1:00:00 PM

As you can see, there's an id, a start time and an end time. 如您所见,有一个ID,一个开始时间和一个结束时间。 It is parsed using this code: 使用以下代码进行解析:

my_array_with_objects = []

with open("test.txt", newline='\n') as f:
reader = csv.reader(f, delimiter=';')

for row in reader:
    my_array_with_objects.append(Employee(row[0], row[1], row[2]))

Employee being a class that looks like this: Employee是一个看起来像这样的班级:

class Employee:

def __init__(self, id, time_start, time_end):
    self.id = id
    self.time_start = time_start
    self.time_end = time_end

Occasionally though, time_end is missing from the data: 有时候, time_end中缺少time_end

276908;1/3/20152015 8:00:00 AM

At this point the program crashes with an index out of range exception. 此时,程序因index out of range异常而崩溃。 I'm new to Python but heard there is no such thing as a null value. 我是Python的新手,但听说没有null值之类的东西。 Then why does it crash? 那为什么会崩溃呢? I assumed that it could be handled with something along the line: 我认为可以用一些方法处理它:

if row[2] is None:
    print("error, do things to fix")

...but it doesn't trigger. ...但不会触发。 How do I handle these errors? 如何处理这些错误? I don't want anything special to happen if the row[2] is missing. 如果row[2]丢失,我不希望发生任何特殊情况。 It's fine with an empty value. 值为空就可以了。

You could add a check if len(row) < 3 as suggested by @Torxed. 您可以按照@Torxed的建议添加if len(row) < 3 A better solution might be to rewrite Employee class and use the 'splat' operator to expand the row (a list). 更好的解决方案可能是重写Employee类并使用'splat'运算符扩展行(列表)。 For missing values an empty string '' is used. 对于缺少的值,使用空字符串“”。

This also covers the cases where both start_time and end_time, or all 3 values are missing. 这还涵盖了start_time和end_time或所有3个值都缺失的情况。

class Employee:
    def __init__(self, id='', start_time='', end_time=''):
        self.id = id
        self.start_time = start_time
        self.end_time = end_time

        # check values and convert to int, datetime...

for row in reader:
    my_array_with_objects.append(Employee(*row))

If you want to cover missing time_end, this should do the trick: 如果您想弥补缺少的time_end,可以使用以下技巧:

for row in reader:
    try:
        my_array_with_objects.append(Employee(row[0], row[1], row[2]))
    except IndexError:
        my_array_with_objects.append(Employee(row[0], row[1], None))

You can replace None with a default value or choose how to deal with the missing field however you want in the except block 您可以使用默认值替换“无”,也可以选择在except块中选择如何处理缺少的字段

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM