I would like to group rows by time and I tried the following approach
import pandas as pd
df = pd.DataFrame({'time': ["2001-01-01 10:20:30,000",
"2001-01-01 10:20:31,000",
"2001-01-02 5:00:00,000"],
'val': [1, 2, 3]})
t = pd.DatetimeIndex(df.time)
df = df.groupby([t.day, t.hour, t.minute]).count()
The resulting dataframe is
time val
time time time
1 10 20 2 2
2 5 0 1 1
The output I expect (or something similar):
time count
1 1-10-20 2
2 2-5-0 1
The plot I want: X
-axis for minutes, Y
-axis for count
, ticks by day + hour (coarser than just minutes).
Questions:
1) Why the index consist of 3 time
columns and how can I have the index with just a single column with elements like 1-10-20
and 2-5-0
?
2) What is the best practice to have only one column with the results of count()
instead of two columns time
and val
?
2) How can I plot this data (grouped by days/hours/minutes) with ticks in days and hours?
To answer your first question, it's because you're grouping by three separate series. If you really want them combined, group by a strftime
:
df.time = pd.to_datetime(df.time)
df.groupby([df.time.dt.strftime('%d-%H-%M')]).val.count()
time
01-10-20 2
02-05-00 1
Name: val, dtype: int64
The above also answers your second question. Instead of counting the DataFrame, count a single series, your val
series.
Finally, to plot, you can use the builtin plot
functionality of pandas
. I am creating a more complex example to demonstrate the ticks you want:
r = pd.date_range(start='2001-01-01', freq='5T', periods=100)
df = pd.DataFrame({'time':r, 'val': np.random.randint(1, 10, 100)})
out = df.groupby([df.time.dt.strftime('%d-%H-%M')]).val.count().reset_index()
ax = out.assign(label=out.time.str[:5]).plot(x='label', y='val', kind='bar')
seen_ticks = set()
for idx, label in enumerate(ax.xaxis.get_ticklabels()):
if label.get_text() in seen_ticks:
label.set_visible(False)
else:
seen_ticks.add(label.get_text())
plt.tight_layout()
plt.show()
This will show only unique x-ticks for minute/hour
1) Use pandas.DataFrame.from_dict(data)
to create a dataframe from a dictionary. (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html )
2) This question isn't entirely clear, but I think what you want is
df['time'] = pd.to_datetime(df['time'])
df.set_index('time', inplace=True)
and then apply your count()
aggregation.
3) This question isn't clear to me.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.