简体   繁体   English

屏蔽和索引 pandas dataframe

[英]Masking and indexing pandas dataframe

I have a pandas dataframe about crime statistics, where i want to mask and count the total number of crime values in my dataset:我有一个关于犯罪统计的 pandas dataframe ,我想在我的数据集中屏蔽和计算犯罪值的总数:

min = 0
max = 24

days = df[::24].count()['Year']
print(days)
df['daily_crime'] = np.NAN

for i in range(days):
    #print(df.loc[df.index[24], 'daily_crime'])
    print(df[min:max][df['Personfarlig_krim'] == 'Yes'])
    max += 24
    min += 24

Above I have placer a min and max counter, for each of the 24 hours in a day, i want to add an extra column to my dataframe, that counts the amount of Yes counts in the Personfarlig_krim for the last 24 hours.上面我有一个最小和最大计数器,对于一天中的每个 24 小时,我想在我的 dataframe 中添加一个额外的列,它计算过去 24 小时Personfarlig_krimYes计数的数量。 This row should be placed every day in a seperate column.这一行应该每天放在一个单独的列中。 I have tried both masking and slice, and then assigning a given row, but i have no luck so far.我已经尝试过屏蔽和切片,然后分配给定的行,但到目前为止我还没有运气。

Unnamed: 0  District    Neighbourhood.x Year    Month   Day Hour    Weekday Sun Personfarlig_krim   Date2
0   1   1   MANHATTAN   2015    4   1   0   4   False   No  2015-04-01 00:00:00
1   2   1   MANHATTAN   2015    4   1   1   4   False   No  2015-04-01 01:00:00
2   3   1   MANHATTAN   2015    4   1   2   4   False   No  2015-04-01 02:00:00
3   4   1   MANHATTAN   2015    4   1   3   4   False   No  2015-04-01 03:00:00
4   5   1   MANHATTAN   2015    4   1   4   4   False   No  2015-04-01 04:00:00

Above I have tried formatting the data.上面我已经尝试格式化数据。 It is supposed, to have another column, that has the crime rate for the last 24 hours (24 rows) stored.应该有另一列存储过去 24 小时(24 行)的犯罪率。

You can use groupby and transform :您可以使用groupbytransform

df["Date2"] = pd.to_datetime(df["Date2"])
df["day_total"] = df.groupby(["Year","Month","Day"])["Personfarlig_krim"].transform(lambda d: sum(d.eq("Yes")))
print (df)

   District Neighbourhood.x  Year  Month  Day  Hour  Weekday    Sun Personfarlig_krim               Date2  day_total
0         1       MANHATTAN  2015      4    1     0        4  False                No 2015-04-01 00:00:00          0
1         1       MANHATTAN  2015      4    1     1        4  False                No 2015-04-01 01:00:00          0
2         1       MANHATTAN  2015      4    1     2        4  False                No 2015-04-01 02:00:00          0
3         1       MANHATTAN  2015      4    1     3        4  False                No 2015-04-01 03:00:00          0
4         1       MANHATTAN  2015      4    1     4        4  False                No 2015-04-01 04:00:00          0

Change the values back to 0 for results less than 24:对于小于 24 的结果,将值改回 0:

df.loc[(df.groupby(["Year","Month","Day"])["day_total"].transform("count").ne(24)),"day_total"] = 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM