[英]Masking and indexing pandas dataframe
I have a pandas dataframe about crime statistics, where i want to mask and count the total number of crime values in my dataset:我有一个关于犯罪统计的 pandas dataframe ,我想在我的数据集中屏蔽和计算犯罪值的总数:
min = 0
max = 24
days = df[::24].count()['Year']
print(days)
df['daily_crime'] = np.NAN
for i in range(days):
#print(df.loc[df.index[24], 'daily_crime'])
print(df[min:max][df['Personfarlig_krim'] == 'Yes'])
max += 24
min += 24
Above I have placer a min and max counter, for each of the 24 hours in a day, i want to add an extra column to my dataframe, that counts the amount of Yes
counts in the Personfarlig_krim
for the last 24 hours.上面我有一个最小和最大计数器,对于一天中的每个 24 小时,我想在我的 dataframe 中添加一个额外的列,它计算过去 24 小时
Personfarlig_krim
中Yes
计数的数量。 This row should be placed every day in a seperate column.这一行应该每天放在一个单独的列中。 I have tried both masking and slice, and then assigning a given row, but i have no luck so far.
我已经尝试过屏蔽和切片,然后分配给定的行,但到目前为止我还没有运气。
Unnamed: 0 District Neighbourhood.x Year Month Day Hour Weekday Sun Personfarlig_krim Date2
0 1 1 MANHATTAN 2015 4 1 0 4 False No 2015-04-01 00:00:00
1 2 1 MANHATTAN 2015 4 1 1 4 False No 2015-04-01 01:00:00
2 3 1 MANHATTAN 2015 4 1 2 4 False No 2015-04-01 02:00:00
3 4 1 MANHATTAN 2015 4 1 3 4 False No 2015-04-01 03:00:00
4 5 1 MANHATTAN 2015 4 1 4 4 False No 2015-04-01 04:00:00
Above I have tried formatting the data.上面我已经尝试格式化数据。 It is supposed, to have another column, that has the crime rate for the last 24 hours (24 rows) stored.
应该有另一列存储过去 24 小时(24 行)的犯罪率。
You can use groupby
and transform
:您可以使用
groupby
和transform
:
df["Date2"] = pd.to_datetime(df["Date2"])
df["day_total"] = df.groupby(["Year","Month","Day"])["Personfarlig_krim"].transform(lambda d: sum(d.eq("Yes")))
print (df)
District Neighbourhood.x Year Month Day Hour Weekday Sun Personfarlig_krim Date2 day_total
0 1 MANHATTAN 2015 4 1 0 4 False No 2015-04-01 00:00:00 0
1 1 MANHATTAN 2015 4 1 1 4 False No 2015-04-01 01:00:00 0
2 1 MANHATTAN 2015 4 1 2 4 False No 2015-04-01 02:00:00 0
3 1 MANHATTAN 2015 4 1 3 4 False No 2015-04-01 03:00:00 0
4 1 MANHATTAN 2015 4 1 4 4 False No 2015-04-01 04:00:00 0
Change the values back to 0 for results less than 24:对于小于 24 的结果,将值改回 0:
df.loc[(df.groupby(["Year","Month","Day"])["day_total"].transform("count").ne(24)),"day_total"] = 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.