[英]How do I count values in one dataframe based on the conditions in another dataframe
I have two dataframes.我有两个数据框。 df1 shows annual rainfall over a certain area: df1 显示特定区域的年降雨量:
df1:
longitude latitude year
-13.0 8.0 1979 15.449341
1980 21.970507
1981 18.114307
1982 16.881737
1983 24.122467
1984 27.108953
1985 27.401234
1986 18.238272
1987 25.421076
1988 11.796293
1989 17.778618
1990 18.095036
1991 20.414757
and df2 shows the upper limits of each bin: df2 显示每个 bin 的上限:
bin limits
0 16.655970
1 18.204842
2 19.526524
3 20.852657
4 22.336731
5 24.211905
6 27.143820
I'm trying to add a new column to df2 that shows the frequency of rainfall events from df1 in their corresponding bin.我正在尝试向 df2 添加一个新列,以显示来自 df1 的降雨事件在其相应 bin 中的频率。 For example, in bin 1 I'd be looking for the values in df1 that fall between 16.65 and 18.2.例如,在 bin 1 中,我将寻找 df1 中介于 16.65 和 18.2 之间的值。
I've tried the following:我试过以下方法:
rain = df1['tp1']
for i in range 7:
limit = df2.iloc[i]
out4['count']=rain[rain>limit].count()
However, I get the following message:但是,我收到以下消息:
ValueError: Can only compare identically-labeled Series objects
Which I think is referring to the fact that I'm comparing two df's that are different sizes?我认为这是指我正在比较两个不同大小的 df 的事实? I'm also unsure if that loop is correct or not.我也不确定该循环是否正确。
Any help is much appreciated, thanks!非常感谢任何帮助,谢谢!
Use pd.cut
to assign your rainfall into bins:使用pd.cut
将降雨量分配到箱中:
# Define the limits for your bins
# Bin 0: (-np.inf , 16.655970]
# Bin 1: (16.655970, 18.204842]
# Bin 2: (18.204842, 19.526524]
# ...
# note that your bins only go up to 27.14 while max rainfall is 27.4 (row 6).
# You may need to add / adjust your limits.
limits = [-np.inf] + df2["limits"].to_list()
# Assign the rainfall to each bin
bins = pd.cut(df1["rainfall"], limits, labels=df2["bin"])
# Count how many values fall into each bin
bins.value_counts(sort=False).rename_axis("bin")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.