简体   繁体   中英

Python CSV touble

I writing a code that reads a very large CSV file line by line with readlines(). I call the function with a global variable and access that variable to search for specific words and count the number of times it comes up in the file.

def init(filename):
    global lines
    with open(filename) as file:
        lines = file.readlines()


def total():
    males = 0
    females = 0
    for i in range(0, len(lines)):
        current_line = lines[i].split(",")
        if current_line[5] == 'M\n':
            males += 1
        elif current_line[5] == 'F\n':
            females += 1

    total_dict = {"Gender": {"M": males, "F": females}}
    return total_dict

for some reason this code works with smaller file, but I can't seem to get to work with a super large one.

If by "super large" you mean something that does not fit in RAM, then it's normal: you read the whole file in RAM, and then you deal with one row at a time: why not read the file line by line then? You could do for line in file: ...

def total(name):
    males = females = 0
    with open(name, "rt") as f:
        for line in f:
            current = line.rstrip("\r\n").split(",")
            if current[5] == "M":
                males += 1
            elif current[5] == "F":
                females += 1
    return {"Gender": {"M": males, "F": females}}

Or with a Counter (it's like a dict but you don't have to initialize zero values, entries are automatically added when you do gender[...] += 1 ):

from collections import Counter

def total(name):
    gender = Counter()
    with open(name, "rt") as f:
        for line in f:
            current = line.rsplit("\r\n").split(",")
            gender[current[5]] += 1
    return {"Gender": gender}

Note also that to read a CSV file, you could use the csv module.

import csv

def total(name):
    gender = Counter()
    with open(name, "rt") as f:
        for current in csv.reader(f):
            gender[current[5]] += 1
    return {"Gender": gender}

Another coding advice, not directly related to you current problem: avoid global variables unless there is a very good reason to use one: here you could simply return the list, if you insist in reading the while file in init . And when looping over a list, don't use a range as in for i in range(len(a)): , write instead for x in a: , unless you really need the index for some reason. And if you need the index, it's often better to write for i, x in enumerate(a):

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM