简体   繁体   English

如何按时间值将CSV中的数据帧分组,以便可以计算每小时的频率?

[英]How do I group dataframe from CSV by Time Value, So that I can count frequency for each hour?

Here is my dataset dataset_for_this_Question 这是我的数据集dataset_for_this_Question

I want to group dataset according to 'Time' and 'Type', So that I can get frequency of 'Name' for each hourly basis. 我想根据“时间”和“类型”对数据集进行分组,这样我就可以每小时获取一次“名称”的频率。 [Per Hour How many Types and what are their Names]. [每小时多少种类型及其名称]。 My first requirement is to group dataset according to 'Time' - Hourly basis. 我的第一个要求是根据“时间”-每小时对数据集进行分组。

I am using Pandas in Python. 我在Python中使用Pandas。

You can groupby the first 13 characters of your Time column and Type , and then just use value_counts , or group by all three and use .size . 您可以groupby你的第一个13个字符Time列和Type ,然后只需使用value_counts所有三个,或组,并使用.size

df.groupby([df.Time.str[0:13], 'Type']).Name.value_counts()
# or
df.groupby([df.Time.str[0:13], 'Type', 'Name']).size()

Outputs: 输出:

Time           Type                      Name                               
2018-04-07 15  COMMUNICATIONS ALARM      Device Management IP is Unreachable    141
2018-04-07 16  COMMUNICATIONS ALARM      Device Management IP is Unreachable     64
2018-04-07 17  COMMUNICATIONS ALARM      Device Management IP is Unreachable      6
...
2018-04-09 14  COMMUNICATIONS ALARM      Device Management IP is Unreachable      8
2018-04-09 15  COMMUNICATIONS ALARM      Device Management IP is Unreachable     11
2018-04-09 16  COMMUNICATIONS ALARM      Device Management IP is Unreachable      5
2018-04-09 17  QUALITY_OF_SERVICE_ALARM  Temperature Absolute High               64
                                         Memory Absolute High                     1

Given your data format, slicing by the string characters is perfectly fine, but perhaps case specific. 给定您的数据格式,按字符串字符进行切片是完全可以的,但可能要视情况而定。 In general, you can convert your Time column to a datetime object which gives you access to a lot of additional functionality. 通常,您可以将“ Time列转换为datetime Time对象,从而可以访问许多其他功能。 In this case, you can floor to the nearest hour. 在这种情况下,您可以下限到最近的小时。

df['Time'] = pd.to_datetime(df.Time)
df.groupby([df.Time.dt.floor('1H'), 'Type', 'Name']).size()

Will yield: 将产生:

Time                 Type                      Name                               
2018-04-07 15:00:00  COMMUNICATIONS ALARM      Device Management IP is Unreachable    141
2018-04-07 16:00:00  COMMUNICATIONS ALARM      Device Management IP is Unreachable     64
2018-04-07 17:00:00  COMMUNICATIONS ALARM      Device Management IP is Unreachable      6
2018-04-07 18:00:00  COMMUNICATIONS ALARM      Device Management IP is Unreachable      7
...

Create time format for each hour and then collect data for each hour and then add each 2 hours of data 为每个小时创建时间格式,然后为每个小时收集数据,然后每两个小时添加一次数据

dates = pd.date_range(start='2018-04-09', end='2018-05-17', freq='H')
dates

Then you will get these output: 然后,您将获得以下输出:

DatetimeIndex(['2018-04-09 00:00:00', '2018-04-09 01:00:00',
           '2018-04-09 02:00:00', '2018-04-09 03:00:00',
           '2018-04-09 04:00:00', '2018-04-09 05:00:00',
           '2018-04-09 06:00:00', '2018-04-09 07:00:00',
           '2018-04-09 08:00:00', '2018-04-09 09:00:00',
           ...
           '2018-05-16 15:00:00', '2018-05-16 16:00:00',
           '2018-05-16 17:00:00', '2018-05-16 18:00:00',
           '2018-05-16 19:00:00', '2018-05-16 20:00:00',
           '2018-05-16 21:00:00', '2018-05-16 22:00:00',
           '2018-05-16 23:00:00', '2018-05-17 00:00:00'],
          dtype='datetime64[ns]', length=913, freq='H')

df_new = pd.DataFrame()   

This dataframe is to collect each hour of data from main dataframe [df] 此数据帧用于从主数据帧[df]收集每小时的数据

for x in range(0, len(dates) - 2, 2):
    start_date = str(dates[x])[:13]
    end_date = str(dates[x+1])[:13]
    print(start_date, end_date)


df_temp = df[start_date:end_date]   
# Each hour of data collected to new dataframe. 

After getting data in dataframe, we can do a lot of operations. 在数据框中获取数据后,我们可以执行很多操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为多列在pyspark数据框中的一列中计算每个分类变量的频率? - How do I count frequency of each categorical variable in a column in pyspark dataframe for multiple columns? 如何在 dataframe 中对列表值进行分组和计数 - How can I group by and count list value in dataframe 如何从数据框的每组中删除特定行? - How do I delete specific rows from each group of a dataframe? 如何计算几个 csv 列中所有值的频率 - How can I count frequency of all values in several csv columns 如何计算dataframe列中重复值的频率? - How can I count the frequency of repeated values in dataframe column? 如何计算每个嵌套列表的项目频率? - How do I count the frequency of items for each nested list? 如何按一天中的小时分组时间序列? - How do I group a time series by hour of day? 如何从组中的每个其他值计算每个组中的第一个值以计算随时间的变化? - How do I calculate the first value in each group from every other value in the group to calculate change over time? 如何按时间对 pandas dataframe 进行分组,每个组的行数最少? - How can I group a pandas dataframe by time with a minimal amount of rows for each group? 如何将数据帧中的单个值输出到 CSV? - How do I output a single value from a dataframe to a CSV?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM