屏蔽和索引 pandas dataframe

Question

I have a pandas dataframe about crime statistics, where i want to mask and count the total number of crime values in my dataset:我有一个关于犯罪统计的 pandas dataframe ，我想在我的数据集中屏蔽和计算犯罪值的总数：

min = 0
max = 24

days = df[::24].count()['Year']
print(days)
df['daily_crime'] = np.NAN

for i in range(days):
    #print(df.loc[df.index[24], 'daily_crime'])
    print(df[min:max][df['Personfarlig_krim'] == 'Yes'])
    max += 24
    min += 24

Above I have placer a min and max counter, for each of the 24 hours in a day, i want to add an extra column to my dataframe, that counts the amount of Yes counts in the Personfarlig_krim for the last 24 hours.上面我有一个最小和最大计数器，对于一天中的每个 24 小时，我想在我的 dataframe 中添加一个额外的列，它计算过去 24 小时Personfarlig_krim中Yes计数的数量。 This row should be placed every day in a seperate column.这一行应该每天放在一个单独的列中。 I have tried both masking and slice, and then assigning a given row, but i have no luck so far.我已经尝试过屏蔽和切片，然后分配给定的行，但到目前为止我还没有运气。

Unnamed: 0  District    Neighbourhood.x Year    Month   Day Hour    Weekday Sun Personfarlig_krim   Date2
0   1   1   MANHATTAN   2015    4   1   0   4   False   No  2015-04-01 00:00:00
1   2   1   MANHATTAN   2015    4   1   1   4   False   No  2015-04-01 01:00:00
2   3   1   MANHATTAN   2015    4   1   2   4   False   No  2015-04-01 02:00:00
3   4   1   MANHATTAN   2015    4   1   3   4   False   No  2015-04-01 03:00:00
4   5   1   MANHATTAN   2015    4   1   4   4   False   No  2015-04-01 04:00:00

Above I have tried formatting the data.上面我已经尝试格式化数据。 It is supposed, to have another column, that has the crime rate for the last 24 hours (24 rows) stored.应该有另一列存储过去 24 小时（24 行）的犯罪率。

Answer 1

You can use groupby and transform :您可以使用groupby和transform ：

df["Date2"] = pd.to_datetime(df["Date2"])
df["day_total"] = df.groupby(["Year","Month","Day"])["Personfarlig_krim"].transform(lambda d: sum(d.eq("Yes")))
print (df)

   District Neighbourhood.x  Year  Month  Day  Hour  Weekday    Sun Personfarlig_krim               Date2  day_total
0         1       MANHATTAN  2015      4    1     0        4  False                No 2015-04-01 00:00:00          0
1         1       MANHATTAN  2015      4    1     1        4  False                No 2015-04-01 01:00:00          0
2         1       MANHATTAN  2015      4    1     2        4  False                No 2015-04-01 02:00:00          0
3         1       MANHATTAN  2015      4    1     3        4  False                No 2015-04-01 03:00:00          0
4         1       MANHATTAN  2015      4    1     4        4  False                No 2015-04-01 04:00:00          0

Change the values back to 0 for results less than 24:对于小于 24 的结果，将值改回 0：

df.loc[(df.groupby(["Year","Month","Day"])["day_total"].transform("count").ne(24)),"day_total"] = 0

屏蔽和索引 pandas dataframe

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-27 15:48:42

屏蔽和索引 pandas dataframe

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-27 15:48:42

解决方案1
1 已采纳 2020-05-27 15:48:42