简体   繁体   English

从每小时数据中找出每天的最大值

[英]Find maximum value of each day from hourly data

I have problem getting max value of each day from hourly data.我在从每小时数据中获取每天的最大值时遇到问题。 Original file contain 24 data for each name each day(there are too many name).原始文件每天包含每个名称的 24 个数据(名称太多)。 as example here is 24 data for one name:例如,这里是一个名称的 24 个数据:

Start Time  Period  name    value
2/23/2019 0:00  60  MBTS_H2145X 100
2/23/2019 1:00  60  MBTS_H2145X 100
2/23/2019 2:00  60  MBTS_H2145X 1
2/23/2019 3:00  60  MBTS_H2145X 1
2/23/2019 4:00  60  MBTS_H2145X 1
2/23/2019 5:00  60  MBTS_H2145X 2324
2/23/2019 6:00  60  MBTS_H2145X 2323
2/23/2019 7:00  60  MBTS_H2145X 2323
2/23/2019 8:00  60  MBTS_H2145X 2323
2/23/2019 9:00  60  MBTS_H2145X 2323
2/23/2019 10:00 60  MBTS_H2145X 2323
2/23/2019 11:00 60  MBTS_H2145X 2323
2/23/2019 12:00 60  MBTS_H2145X 1
2/23/2019 13:00 60  MBTS_H2145X 21
2/23/2019 14:00 60  MBTS_H2145X 21
2/23/2019 15:00 60  MBTS_H2145X 23
2/23/2019 16:00 60  MBTS_H2145X 350
2/23/2019 17:00 60  MBTS_H2145X 323
2/23/2019 18:00 60  MBTS_H2145X 23
2/23/2019 19:00 60  MBTS_H2145X 23
2/23/2019 20:00 60  MBTS_H2145X 2323
2/23/2019 21:00 60  MBTS_H2145X 23
2/23/2019 22:00 60  MBTS_H2145X 23
2/23/2019 23:00 60  MBTS_H2145X 2

the result I get is: (which is wrong and should be 2324)我得到的结果是:(这是错误的,应该是 2324)

    Start Time  name    max value
0   2/23/2019   MBTS_H2145X 350

I have below codes but I get wrong result我有以下代码,但我得到错误的结果

import dask.dataframe as dd
import numpy as np
import pandas as pd

filename='V.csv'
df = dd.read_csv(filename, dtype='str')


#_________changing date format 
df['Start Time'] = df['Start Time'].map(lambda x: pd.to_datetime(x, errors='coerce'))
#_________change to pure date without hour
df['Start Time'] = df['Start Time'].dt.date


grouped_df=(df.groupby(['Start Time','name']).agg({'value':'max'}).rename(columns={'value':'max value'}).reset_index())

grouped_df.to_csv('e1.csv')

print(grouped_df.head(12))

Keep your code the exact same.保持您的代码完全相同。 Just Change this line to:只需将此行更改为:

grouped_df=(df.groupby(['Start Time','name']).agg({'value':'max'}).rename(columns={'value':'max value'}).reset_index())

Change to:改成:

df.value = pd.to_numeric(df.value)

grouped_df= (df.groupby(['Start Time','name'])['value'].max().rename(columns={'value':'max value'}).reset_index()

df = pd.merge(df, grouped_df, on  = ['Start Time','name'])

There might be something happening with the aggregate function.聚合函数可能会发生一些事情。

OR IF YOUR DTYPE IS JUST STRING, then just add the pd.to_numeric line, and keep everything else the same.或者,如果您的 DTYPE 只是字符串,则只需添加 pd.to_numeric 行,并保持其他所有内容相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Python 中的每小时数据 netcdf 文件中找到最高日温度? - How to find maximum daily temperature from hourly data netcdf file in Python? 显示每天的最大值(python) - Display Maximum Value For Each Day (python) Python Map Reduce 从每小时数据中查找每个气象站的每日最高、最低、平均和温度变化 - Python Map Reduce to find daily max, min, mean and variance in temperature for each weather station from hourly data NumPy:从每一行中找到最大值,将其设置为1并休息为0 - NumPy: Find the maximum value from each row set it to 1 and rest as 0 Plot 每天作为一个时间序列,按小时计算 - Plot each day as a time serie on a hourly basis 每天随机抽样N小时以获取熊猫中的多指标,多年和每小时数据 - Randomly sampling N hours each day for multi-index,multi-year, and hourly data in Pandas 从每小时数据中,获取每列的每日 n 最小值 - From hourly data, get daily nsmallest values for each column 重新采样`pandas``Series`时每天保持24小时(从每天到每小时) - Keep 24h for each day when resampling `pandas` `Series` (from daily to hourly) 如何找到列表中每个键的最大值? - How to find the maximum value for each key in a list? 每天比较列表值和 select 最大值 - Python - Comparing list values and select maximum from each day - Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM