简体   繁体   中英

Error “numpy.float64 object is not iterable” for CSV file creation in Python

I have some very noisy (astronomy) data in csv format. Its shape is (815900,2) with 815k points giving information of what the mass of a disk is at a certain time. The fluctuations are pretty noticeable when you look at it close up. For example, here is an snippet of the data where the first column is time in seconds and the second is mass in kg:

40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40511600,1.535E+028
40633500,2.19067E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41120800,2.34767E+028
41242600,2.40936E+028

So it looks like there is a 1.53E+028 data point of noise, and also probably the 2.19E+028 and 2.35E+028 points.

To fix this, I am trying to set a Python script that will read in the csv data, then put some restriction on it so that if the mass is eg < 2.35E+028, it will remove the whole row and then create a new csv file with only the "good" data points:

40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41242600,2.40936E+028

Following this old question top answer by n8henrie, I so far have:

import pandas as pd
import csv

# Here are the locations of my csv file of my original data and an EMPTY csv file that will contain my good, noiseless set of data

originaldata = '/Users/myname/anaconda2/originaldata.csv'
gooddata = '/Users/myname/anaconda2/gooddata.csv'

# I use pandas to read in the original data because then I can separate the columns of time as 'T' and mass as 'M'

originaldata = pd.read_csv('originaldata.csv',delimiter=',',header=None,names=['t','m'])

# Numerical values of the mass values

M = originaldata['m'].values

# Now to put a restriction in

for row in M:
    new_row = []
    for column in row:
        if column > 2.35E+028:
            new_row.append(column)

    csv.writer(open(newfile,'a')).writerow(new_row)

print('\n\n')
print('After:')
print(open(newfile).read())

However, when I run this, I get this error:

TypeError: 'numpy.float64' object is not iterable

I know the first column (time) is dtype int64 and the second column (mass) is dtype float64... but as a beginner, I'm still not quite sure what this error means or where I'm going wrong. Any help at all would be appreciated. Thank you very much in advance.

You can select rows by a boolean operation. Example:

import pandas as pd
from io import StringIO

data = StringIO('''\
40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40511600,1.535E+028
40633500,2.19067E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41120800,2.34767E+028
41242600,2.40936E+028
''')

df = pd.read_csv(data,names=['t','m'])
good = df[df.m > 2.35e+28]
out = StringIO()
good.to_csv(out,index=False,header=False)
print(out.getvalue())

Output:

40023700,2.40896e+28
40145700,2.44487e+28
40267700,2.44487e+28
40389700,2.44478e+28
40755400,2.44496e+28
40877200,2.44489e+28
40999000,2.44489e+28
41242600,2.40936e+28

This returns a column: M = originaldata['m'].values

So when you do for row in M: , you get only one value in row , so you can't iterate on it again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM