[英]Pandas dataframe datetime conversion and min/max calculation
I have the following pandas dataframe:我有以下 pandas dataframe:
NAME | 2020-12-03 12:00| 2020-12-03 15:00| 2020-12-03 18:00| 2020-12-03 21:00| etc.
London| 5 | 4 | 3.6 | 1.7 | ...
Berlin| 4 | 4.5 | 2.8 | 0.1 | ...
etc.
It is basically a long table with serveral cities and columns with the °C and the header is the corresponding timestamp.它基本上是一个带有多个城市和列的长表,带有°C,header 是相应的时间戳。 I now want to calculate the min and max temperature aggregated per day per city.我现在想计算每个城市每天汇总的最低和最高温度。 The final table is probably going to look as following:决赛桌可能如下所示:
NAME |Minimum | Maximum |timestamp |
London | 1.7 | 5 |2020-12-03|
Berlin | 4.5 | 0.1 |2020-12-03|
To make things even more complex I want to draw the graphs with matplotlib for each city with the min and max values as bar charts per timestamp.为了让事情变得更加复杂,我想用 matplotlib 为每个城市绘制图表,并将最小值和最大值作为每个时间戳的条形图。 So I am not sure whether the final table should look like the above.所以我不确定决赛桌是否应该像上面那样。
I have already tried transposing the table and grouping by the timestamps (did not work as the column headers couldnt be set to a datetime value).我已经尝试过转置表格并按时间戳进行分组(因为列标题无法设置为日期时间值,所以不起作用)。 I can print out the values of the first table just fine with the following script, but as mentioned before, I want to get the min and max values.我可以使用以下脚本很好地打印出第一个表的值,但如前所述,我想获取最小值和最大值。
for i in range(0, fcpanda3.shape[0]):
plt.rcParams["figure.figsize"] = (15,15)
ax = fcpanda3.iloc[i].T.plot(kind="bar", color=(fcpanda3.iloc[i].T > 0).map({True: 'r',False: 'b'}))
ax.set_xticklabels([t if not i%5 else "" for i,t in enumerate(ax.get_xticklabels())])
ax.yaxis.set_major_formatter(FormatStrFormatter('%d °C'))
plt.tight_layout()
plt.savefig("D:/graph/"+str(i+1)+".png")
ax = plt.close()
df.melt()
serves your purpose by unpivoting the table first. df.melt()
通过首先取消透视表来达到您的目的。 Regular .groupby()
aggregation applies subsequently.随后应用常规.groupby()
聚合。
Code代码
df2 = df.melt(id_vars="NAME", var_name="timestamp", value_name="degree")
df2["timestamp"] = pd.to_datetime(df2["timestamp"]).dt.date
df2 = df2.groupby(["NAME", "timestamp"])["degree"].agg(Min=min, Max=max).sort_index().reset_index()
Output Output
print(df2)
NAME timestamp Min Max
0 Berlin 2020-12-03 0.1 4.5
1 London 2020-12-03 1.7 5.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.