使用pandas.read_csv和索引读取csv文件会创建NaN条目

Question

My .csv-file is comma separated, which is the standard setting from read_csv. 我的.csv文件以逗号分隔，这是read_csv的标准设置。

This is working: 这是有效的：

T1 = pd.DataFrame(pd.read_csv(loggerfile, header = 2)) #header contains column "1"

But as soon as I add something to DataFrame 's constructor besides the read_csv , all my values are suddenly NaN . 但是只要我在DataFrame之外向read_csv的构造函数添加read_csv ，我的所有值都会突然read_csv NaN 。 Why? 为什么？ How to solve this? 怎么解决这个？

datetimeIdx = pd.to_datetime( T1["1"] )                #timestamp-column
T2 = pd.DataFrame(pd.read_csv(loggerfile, header = 2), index = datetimeIdx)

Answer 1

It's not necessary to wrap read_csv in a DataFrame call, as it already returns a DataFrame . 没有必要将read_csv包装在DataFrame调用中，因为它已经返回了一个DataFrame 。

If you want to change the index, you can use set_index or directly set the index: 如果要更改索引，可以使用set_index或直接设置索引：

T1 = pd.read_csv(loggerfile, header = 2)
T1.index = pd.DatetimeIndex(T1["1"])

If you want to keep the column in the dataframe as a datetime (and not string): 如果要将数据框中的列保留为日期时间（而不是字符串）：

T1 = pd.read_csv(loggerfile, header = 2)
T1["1"] = pd.DatetimeIndex(T1["1"])
T2 = T1.set_index("1", drop=False)

But even better, you can do this directly in read_csv (assuming the column "1" is the first column): 但更好的是，您可以直接在read_csv执行此read_csv （假设列“1”是第一列）：

pd.read_csv(loggerfile, header=2, index_col=0, parse_dates=True)

The reason it returns a DataFrame with NaNs is because the DataFrame() call with a DataFrame as input will do a reindex operation with the provided input. 它返回带有NaNs的DataFrame的原因是因为使用DataFrame作为输入的DataFrame()调用将使用提供的输入执行reindex操作。 As none of the labels in datetimeIdx are in the original index of T1 you get a dataframe with all NaNs. 由于datetimeIdx中的所有标签都不在T1的原始索引中，因此您将获得包含所有NaN的数据框。

使用pandas.read_csv和索引读取csv文件会创建NaN条目

问题描述

1 个解决方案

解决方案1
9 2014-03-26 08:46:58

使用pandas.read_csv和索引读取csv文件会创建NaN条目

问题描述

1 个解决方案

解决方案1 9 2014-03-26 08:46:58

解决方案1
9 2014-03-26 08:46:58