I'm trying to add column headers to the following set of data. As per specifications of the project, I cannot simply modify the file to add those headers manually.
Sample of the data that I'm working with:
38.049133 0.224026 0.05398 -19.11 -20.03
38.352526 0.212491 0.05378 -18.35 -19.19
38.363598 0.210654 0.05401 -20.11 -20.89
54.936819 0.216794 0.20114 -20.94 -21.88
54.534881 0.578615 0.12887 -19.75 -20.66
54.743075 0.508774 0.18331 -20.54 -21.53
54.867240 0.562636 0.13956 -19.95 -20.85
54.856908 0.544031 0.13938 -20.14 -21.03
54.977748 0.501912 0.13923 -20.27 -21.01
54.992762 0.460376 0.12723 -20.24 -20.83
I've created an array of 5 strings to act as the headers of each of the columns within this DataFrame. Using the designated header does select only that column (ie print(df['z'])
does only print that one column (supposedly) but all of the data in the DataFrame, that displays just fine (ie shows the above sample lines exactly and detects the columns properly) when I do not specify columns, suddenly becomes "NaN" when I specify column titles from the array of strings.
Sample of my code:
... imports and whatnot not shown
dataColumns = ['RA', 'DEC', 'z', 'M(g)', 'M(r)']
dataFile = pd.read_csv(data = 'file_name', delim_whitespace = True)
df = pd.DataFrame(data = dataFile, columns = dataColumns)
print(df)
Sample output of the above code (it is supposed to display exactly the sample data above but with added column headers):
RA DEC z M(g) M(r)
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
Why is it that, without specifying the 'columns' parameter for DataFrame, the data will properly print wheras after specifying the parameter, everything displays as NaN?
Any help would be appreciated!
-- paanvaannd
To fix your problem, use this line instead:
df = pd.read_csv('file_name', header=None, names=dataColumns)
pd.read_csv
returns a DataFrame, so the above line should handle the entirety of the import (ie calling pd.DataFrame
on the result of pd.read_csv
is superfluous). header=None
indicates that pandas shouldn't interpret the first line of the CSV as headers, and then names=...
allows you to specify the column names you'd like to use. delim_whitespace
shouldn't be used, since commas, not whitespace, appears to be the delimiter in your data ('comma' is the 'c' in 'csv', after all). In fact, without testing your data, I'd say the use of delim_whitespace
is the most likely culprit behind the NaN values.
You are passing a dataframe that you created when you used .read_csv
to a the dataframe constructor pd.DataFrame
. I am actually surprised it didn't throw an error.
Try this:
df = pd.read_csv(data = 'file_name', delim_whitespace = True)
df.columns = dataColumns
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.