简体   繁体   中英

Pandas DataFrame and DateTimeIndex

I would like to group rows by time and I tried the following approach

import pandas as pd

df = pd.DataFrame({'time': ["2001-01-01 10:20:30,000", 
                            "2001-01-01 10:20:31,000",
                            "2001-01-02 5:00:00,000"],
                    'val': [1, 2, 3]})

t = pd.DatetimeIndex(df.time)
df = df.groupby([t.day, t.hour, t.minute]).count()

The resulting dataframe is

                   time val
    time time time      
       1   10   20    2   2
       2    5    0    1   1

The output I expect (or something similar):

           time   count             
     1  1-10-20       2
     2    2-5-0       1

The plot I want: X -axis for minutes, Y -axis for count , ticks by day + hour (coarser than just minutes).

Questions:

1) Why the index consist of 3 time columns and how can I have the index with just a single column with elements like 1-10-20 and 2-5-0 ?

2) What is the best practice to have only one column with the results of count() instead of two columns time and val ?

2) How can I plot this data (grouped by days/hours/minutes) with ticks in days and hours?

To answer your first question, it's because you're grouping by three separate series. If you really want them combined, group by a strftime :

df.time = pd.to_datetime(df.time)

df.groupby([df.time.dt.strftime('%d-%H-%M')]).val.count()

time
01-10-20    2
02-05-00    1
Name: val, dtype: int64

The above also answers your second question. Instead of counting the DataFrame, count a single series, your val series.


Finally, to plot, you can use the builtin plot functionality of pandas . I am creating a more complex example to demonstrate the ticks you want:

r = pd.date_range(start='2001-01-01', freq='5T', periods=100)
df = pd.DataFrame({'time':r, 'val': np.random.randint(1, 10, 100)})

out = df.groupby([df.time.dt.strftime('%d-%H-%M')]).val.count().reset_index()

ax = out.assign(label=out.time.str[:5]).plot(x='label', y='val', kind='bar')

seen_ticks = set()

for idx, label in enumerate(ax.xaxis.get_ticklabels()):
    if label.get_text() in seen_ticks:
        label.set_visible(False)
    else:
        seen_ticks.add(label.get_text())
plt.tight_layout()
plt.show()

This will show only unique x-ticks for minute/hour

在此处输入图片说明

1) Use pandas.DataFrame.from_dict(data) to create a dataframe from a dictionary. (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html )

2) This question isn't entirely clear, but I think what you want is

df['time'] = pd.to_datetime(df['time'])
df.set_index('time', inplace=True)

and then apply your count() aggregation.

3) This question isn't clear to me.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM