简体   繁体   中英

Creating Time Series from Pandas DataFrame

I have a dataframe with various attributes, including one datetime column. I want to extract one of the attribute columns as a time series indexed by the datetime column. This seemed pretty straightforward, and I can construct time series with random values, as all the pandas docs show.. but when I do so from a dataframe, my attribute values all convert to NaN.

Here's an analogous example.

df = pd.DataFrame({'a': [0,1], 'date':[pd.to_datetime('2017-04-01'),
                                       pd.to_datetime('2017-04-02')]})
s = pd.Series(df.a, index=df.date)

In this case, the series will have correct time series index, but all the values will be NaN.

I can do the series in two steps, as below, but I don't understand why this should be required.

s = pd.Series(df.a)
s.index = df.date

What am I missing? I assume it has to do with series references, but don't understand at all why the values would go to NaN.

I am also able to get it to work by copying the index column.

s = pd.Series(df.a, df.date.copy())

The problem is that pd.Series() is trying to use the values specified in index to select values from the dataframe, but the date values in the dataframe are not present in the index.

You can set the index to the date column and then select the one data column you want. This will return a series with the dates as the index

import pandas as pd

df = pd.DataFrame({'a': [0,1], 'date':[pd.to_datetime('2017-04-01'),
                                       pd.to_datetime('2017-04-02')]})    
s = df.set_index('date')['a']

Examining s gives:

In [1]: s
Out[1]: 
date
2017-04-01    0
2017-04-02    1
Name: a, dtype: int64

And you can confirm that s is a Series :

In [2]: isinstance(s, pd.Series)
Out[2]: True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM