简体   繁体   中英

ValueError: invalid literal for int() with base 10: 'FALSE' When removing empty string from a dataset

with open(filename,'r') as input_file:
    
csv_reader = csv.reader(input_file,delimiter = ',')
    
    for line_number, line in enumerate(csv_reader):
        if line_number == 0: # skip the header
            continue
        #if line[10] == '':
            #line.insert(10,0)
        my_dic.append({
            
            'First Name':line[11],
            'Last name':line[13],
            'Age(Years)':int(line[3]),
            'Sex':line[18],
            'type of car':line[16],
            'Marital Status':line[14],
            'Dependants':line[10],
            'Yearly Salary':int(line[17]),
            'Yearly Pension':int(line[15]),
            'Company':line[5],
            'Commuted Distance':float(line[4]),
            'vehicle':{
                'Make':line[19],
                'model':line[20],
                'year':int(line[21]),
                'category':line[22]
                
            },
            'Credit Card':{
                'Start Date':line[6],
                'End Date':line[7],
                'Card number':line[8],
                'Card CCV':int(line[9]),
                'iban':line[12]
                
            },
            'Address':{
                'Street':line[0],
                'City':line[1],
                'Postcode':line[2]
            }
            
            
            
        })

I have the code above converting csv file to dictionary, and I also want to replace the empty strings in column 10 of the dataset. If I remove the commented code (the code that tries to replace the empty string in column 10 (of the data) with a number, my code works. However, if I remove the comment, it gives value error at 'yearly salary' key that I typecast to integers)

1.How else can I replace empty string in the column with a number (I don't want to use pandas)

  1. I also want know the rows where the correction takes place

line.insert(10,0) inserts additional value to the array. For example, if the array was length 20, after insert , it will be of length 21. You get wrong values, and on position 17, you get a non-integer value.

You want to replace the value, not insert it.

line[10] = 0

However, it would be easier to understand the code if the conversion is done in the dictionary itself.

{
...
"Dependants": line[10] or 0,
...
}
  1. To get the line number, the easiest way is to store line_number to some array... For example,
missing_data_on_dependants_row_idxs = list()
for line_number, line in enumerate(csv_reader):
    if line_number == 0: # skip the header
        continue
    if line[10] is None:
        missind_data_on_dependants_row_idxs.append(line_number)
...

Also, referring to the columns by indexes is hard to understand. Did you consider using csv.DictReader ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM