简体   繁体   中英

How can I remove NaN values from some columns in a csv file?

I tried to do some basic statistics on a csv file in python.What I'm trying to do is making a dictionary of headers and the values of special columns.But there are some NaN values which makes the following error on my code

import csv
reader=csv.reader(f,delimiter=',')
import numpy as np
header=next(reader)
dataset=[]
  for line in reader:
d=dict(zip(header,line))
for field in ['Reviews','Rating']:
    np.isnan('Rating','Reviews')
    d[field]=int(float(d[field]))
    dataset.append(d)

I tried to use numpy.isnan to remove NaN values but I got this error

 return arrays must be of ArrayType

Therefore,How can I remove the NaN values?

not sure what your data looks like but I'm guessing the NaN values are strings

you can do

d=dict(zip(header,[l for l in line if l != "NaN"]))

while you're reading them in to drop the NaNs

but best to post a sample of your data so we can actually see what youre working with

Depends if the NaN values are NaN or "NaN" , you can use:

NaN not a string:

df=df.dropna() #take rows from your dataframe that are finite or not equal to NaN as NaN.

NaN as string "NaN":

df[(df != "Nan").all(1)] # take rows from your DataFrame that does not have any "NaN" value from any column

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM