高效解析CSV文件中的值

Question

我有一个.csv格式，在其中需要索引的标题或词典中的标题才能定位和替换字段。 有人可以帮我找到更好的方法吗？

例：

header      1990_X   1991_X   1990_B    1991_B

            ''        1       4         0
            5         0       ''        -3

输出应在年份连续匹配的情况下寻找第一个正值，并用0替换任何“空”值，否则将其保留。 所以输出下面更新为新的.csv

         0        1        4         0
         5        0        0         -3

我遇到一个带有负值的问题，并且事实并非总是数字。 我还担心我目前正在使用的输出方式是每行使用字典来查找年份。 该文件有150和750,000行。

def stripMatch(match):
string = str(match)
strip = string.strip("'[]'")
return strip

如果name ==' main '：

fn = 'test.csv'

with open(fn) as f:
    reader=csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    header=next(reader)
    print header
    print "Number of fields: %s" %(len(header))

    for row in reader:
        #print row
        posKey = 0
        data={}
        for field,key in zip(row,header):
            #print '%s , %s ' %(field,key)
            value = stripMatch(field)
            data.setdefault(key, []).append(value)
            if value.isalnum() == True and int(value) > 0:
                        print "Above zero found: %d at key: %s \n " %(int(field),key)
                        posKey = key
        print "Before : %s " % data

        for d in data:
            #print d,data[d],posKey
            ##if d.startswith(posKey):
            if d[:4] == posKey[:4]:
                #print "Found it"
                print d,data[d],posKey
                numCheck = stripMatch(data[d])
                print numCheck
                print numCheck.isalnum()
                if numCheck.isalnum() == False:
                    ## Replace it with a 0
                    data[d] = 0
                    print "processing %s " % data[d]



        print "After %s " % data
        print '\n'

Answer 1

我不确定您要完成什么，但是让我给您一些提示，以朝着正确的方向前进：

使用DictReader而不是标准阅读器将为您提供该行的字典。
reader = csv.DictReader(f, delimiter=',', quoting=csv.QUOTE_NONE)
除非您明确指定其他行，否则它将自动使用第一行作为标题。
stripMatch函数stripMatch什么作用？ 似乎在这里很重要。
此外，为什么stripMatch使用stripMatch两次？一次是在构造data字典时，一次是在迭代它时？
posKey代表什么？ 当您遍历行时，您正在覆盖其值。 您是否保证每行只有一个int> 0的值？ 您的示例以其他方式显示。 为您提供的只是最后一个int> 0值的键。
您是否有特定的原因要使用setdefault实例化每个键的列表？ 这意味着您具有相同名称的列。
尝试使用try / except子句可以更好地检查整数值> 0，例如以下示例：

例如：

valid = False
try:
    valid = int(value) > 0
except ValueError:
    value = 0

对我来说，这更表明您正在寻找大于0的整数。通过这种方式，它也使得替换非整数字符变得容易，同时仍然尊重负数。

我仍然不清楚您要解决什么问题，所以也许这不会完全有帮助。 但是，对我而言，这是更清晰和直接的。 也许您可以对其进行调整以更好地满足您的需求：

with open(fn) as f:
    reader = csv.DictReader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        out = {}
        for key, val in row.iteritems():
            value = stripMatch(val)

            valid = False
            try:
                # All values become ints. Non-ints raise ValueError
                value = int(value)
                valid = value > 0
            except ValueError:
                # If the value = int(value) statement raises,
                # valid will still be False
                value = 0

            # HERE is a good place for setdefault. This way we can simply see
            # Which years have more than one valid value
            if valid:
                out.setdefault(key[:4], []).append(value)

            # The above try-except ensures that all values are now ints
            # If you need to work with the entire row, this statement will
            # Normalize so that you can work with it as a dict of int values
            # If you odn't need to work with it later, this part can be omitted
            row[key] = value

        # Now if we need to find the years with more than one valid value,
        # We can simply iterate over the dict `out`
        for year, val_list in out.iteritems():
            if len(val_list) > 1:
                print year
                # Do something with year, or with the values you have

高效解析CSV文件中的值

问题描述

1 个解决方案

解决方案1
0 2014-07-07 22:11:05

高效解析CSV文件中的值

问题描述

1 个解决方案

解决方案1 0 2014-07-07 22:11:05

解决方案1
0 2014-07-07 22:11:05