简体   繁体   中英

Reading a csv-file with pandas.read_csv and an index creates NaN entries

My .csv-file is comma separated, which is the standard setting from read_csv.

This is working:

T1 = pd.DataFrame(pd.read_csv(loggerfile, header = 2)) #header contains column "1"

But as soon as I add something to DataFrame 's constructor besides the read_csv , all my values are suddenly NaN . Why? How to solve this?

datetimeIdx = pd.to_datetime( T1["1"] )                #timestamp-column
T2 = pd.DataFrame(pd.read_csv(loggerfile, header = 2), index = datetimeIdx)

It's not necessary to wrap read_csv in a DataFrame call, as it already returns a DataFrame .

If you want to change the index, you can use set_index or directly set the index:

T1 = pd.read_csv(loggerfile, header = 2)
T1.index = pd.DatetimeIndex(T1["1"])

If you want to keep the column in the dataframe as a datetime (and not string):

T1 = pd.read_csv(loggerfile, header = 2)
T1["1"] = pd.DatetimeIndex(T1["1"])
T2 = T1.set_index("1", drop=False)

But even better, you can do this directly in read_csv (assuming the column "1" is the first column):

pd.read_csv(loggerfile, header=2, index_col=0, parse_dates=True)

The reason it returns a DataFrame with NaNs is because the DataFrame() call with a DataFrame as input will do a reindex operation with the provided input. As none of the labels in datetimeIdx are in the original index of T1 you get a dataframe with all NaNs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM