添加列值后 Pandas 将所有数据转换为 NaN

Question

I'm trying to add column headers to the following set of data.我正在尝试将列标题添加到以下数据集。 As per specifications of the project, I cannot simply modify the file to add those headers manually.根据项目的规范，我不能简单地修改文件来手动添加这些标题。

Sample of the data that I'm working with:我正在使用的数据示例：

38.049133   0.224026 0.05398  -19.11 -20.03
38.352526   0.212491 0.05378  -18.35 -19.19
38.363598   0.210654 0.05401  -20.11 -20.89
54.936819   0.216794 0.20114  -20.94 -21.88
54.534881   0.578615 0.12887  -19.75 -20.66
54.743075   0.508774 0.18331  -20.54 -21.53
54.867240   0.562636 0.13956  -19.95 -20.85
54.856908   0.544031 0.13938  -20.14 -21.03
54.977748   0.501912 0.13923  -20.27 -21.01
54.992762   0.460376 0.12723  -20.24 -20.83

I've created an array of 5 strings to act as the headers of each of the columns within this DataFrame.我创建了一个包含 5 个字符串的数组，用作此 DataFrame 中每一列的标题。 Using the designated header does select only that column (ie print(df['z']) does only print that one column (supposedly) but all of the data in the DataFrame, that displays just fine (ie shows the above sample lines exactly and detects the columns properly) when I do not specify columns, suddenly becomes "NaN" when I specify column titles from the array of strings.使用指定的标题只选择那一列（即print(df['z'])只打印那一列（据说）但 DataFrame 中的所有数据，显示得很好（即准确显示上述示例行并正确检测列）当我不指定列时，当我从字符串数组中指定列标题时突然变成“NaN”。

Sample of my code:我的代码示例：

... imports and whatnot not shown

dataColumns = ['RA', 'DEC', 'z', 'M(g)', 'M(r)']
dataFile = pd.read_csv(data = 'file_name', delim_whitespace = True)
df = pd.DataFrame(data = dataFile, columns = dataColumns)

print(df)

Sample output of the above code (it is supposed to display exactly the sample data above but with added column headers):上面代码的示例输出（它应该准确显示上面的示例数据，但添加了列标题）：

RA   DEC z  M(g) M(r)
NaN   NaN NaN  NaN NaN
NaN   NaN NaN  NaN NaN
NaN   NaN NaN  NaN NaN
NaN   NaN NaN  NaN NaN
NaN   NaN NaN  NaN NaN
NaN   NaN NaN  NaN NaN
NaN   NaN NaN  NaN NaN
NaN   NaN NaN  NaN NaN
NaN   NaN NaN  NaN NaN
NaN   NaN NaN  NaN NaN

Why is it that, without specifying the 'columns' parameter for DataFrame, the data will properly print wheras after specifying the parameter, everything displays as NaN?为什么在没有为 DataFrame 指定 'columns' 参数的情况下，数据会在指定参数后正确打印，但所有内容都显示为 NaN？

Any help would be appreciated!任何帮助，将不胜感激！

-- paanvaannd ——潘万

Answer 1

To fix your problem, use this line instead:要解决您的问题，请改用此行：

df = pd.read_csv('file_name', header=None, names=dataColumns)

pd.read_csv returns a DataFrame, so the above line should handle the entirety of the import (ie calling pd.DataFrame on the result of pd.read_csv is superfluous). pd.read_csv返回一个 DataFrame，所以上面的行应该处理整个导入（即在pd.DataFrame的结果上调用pd.read_csv是多余的）。 header=None indicates that pandas shouldn't interpret the first line of the CSV as headers, and then names=... allows you to specify the column names you'd like to use. header=None表示熊猫不应该将 CSV 的第一行解释为标题，然后names=...允许您指定要使用的列名称。 delim_whitespace shouldn't be used, since commas, not whitespace, appears to be the delimiter in your data ('comma' is the 'c' in 'csv', after all).不应使用delim_whitespace ，因为逗号，而不是空格，似乎是数据中的分隔符（毕竟，“逗号”是“csv”中的“c”）。 In fact, without testing your data, I'd say the use of delim_whitespace is the most likely culprit behind the NaN values.事实上，如果没有测试您的数据，我会说delim_whitespace的使用最有可能是 NaN 值背后的罪魁祸首。

Answer 2

You are passing a dataframe that you created when you used .read_csv to a the dataframe constructor pd.DataFrame .您将使用.read_csv时创建的数据帧传递给数据帧构造函数pd.DataFrame 。 I am actually surprised it didn't throw an error.我实际上很惊讶它没有抛出错误。

Try this:尝试这个：

df = pd.read_csv(data = 'file_name', delim_whitespace = True)
df.columns = dataColumns

添加列值后 Pandas 将所有数据转换为 NaN

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-02-11 04:14:28

解决方案2
0 2017-02-11 04:13:45

添加列值后 Pandas 将所有数据转换为 NaN

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-02-11 04:14:28

解决方案2 0 2017-02-11 04:13:45

解决方案1
1 已采纳 2017-02-11 04:14:28

解决方案2
0 2017-02-11 04:13:45