[英]Reading a csv-file with pandas.read_csv and an index creates NaN entries
My .csv-file is comma separated, which is the standard setting from read_csv. 我的.csv文件以逗号分隔,这是read_csv的标准设置。
This is working: 这是有效的:
T1 = pd.DataFrame(pd.read_csv(loggerfile, header = 2)) #header contains column "1"
But as soon as I add something to DataFrame
's constructor besides the read_csv
, all my values are suddenly NaN
. 但是只要我在DataFrame
之外向read_csv
的构造函数添加read_csv
,我的所有值都会突然read_csv
NaN
。 Why? 为什么? How to solve this? 怎么解决这个?
datetimeIdx = pd.to_datetime( T1["1"] ) #timestamp-column
T2 = pd.DataFrame(pd.read_csv(loggerfile, header = 2), index = datetimeIdx)
It's not necessary to wrap read_csv
in a DataFrame
call, as it already returns a DataFrame
. 没有必要将read_csv
包装在DataFrame
调用中,因为它已经返回了一个DataFrame
。
If you want to change the index, you can use set_index
or directly set the index: 如果要更改索引,可以使用set_index
或直接设置索引:
T1 = pd.read_csv(loggerfile, header = 2)
T1.index = pd.DatetimeIndex(T1["1"])
If you want to keep the column in the dataframe as a datetime (and not string): 如果要将数据框中的列保留为日期时间(而不是字符串):
T1 = pd.read_csv(loggerfile, header = 2)
T1["1"] = pd.DatetimeIndex(T1["1"])
T2 = T1.set_index("1", drop=False)
But even better, you can do this directly in read_csv
(assuming the column "1" is the first column): 但更好的是,您可以直接在read_csv
执行此read_csv
(假设列“1”是第一列):
pd.read_csv(loggerfile, header=2, index_col=0, parse_dates=True)
The reason it returns a DataFrame with NaNs
is because the DataFrame()
call with a DataFrame as input will do a reindex
operation with the provided input. 它返回带有NaNs
的DataFrame的原因是因为使用DataFrame作为输入的DataFrame()
调用将使用提供的输入执行reindex
操作。 As none of the labels in datetimeIdx
are in the original index of T1
you get a dataframe with all NaNs. 由于datetimeIdx
中的所有标签都不在T1
的原始索引中,因此您将获得包含所有NaN的数据框。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.