如何根据另一个 dataframe 中的条件计算一个 dataframe 中的值

Question

I have two dataframes.我有两个数据框。 df1 shows annual rainfall over a certain area: df1 显示特定区域的年降雨量：

df1:

longitude latitude year           
-13.0     8.0      1979  15.449341
                   1980  21.970507
                   1981  18.114307
                   1982  16.881737
                   1983  24.122467
                   1984  27.108953
                   1985  27.401234
                   1986  18.238272
                   1987  25.421076
                   1988  11.796293
                   1989  17.778618
                   1990  18.095036
                   1991  20.414757

and df2 shows the upper limits of each bin: df2 显示每个 bin 的上限：

   bin limits
0   16.655970
1   18.204842
2   19.526524
3   20.852657
4   22.336731
5   24.211905
6   27.143820

I'm trying to add a new column to df2 that shows the frequency of rainfall events from df1 in their corresponding bin.我正在尝试向 df2 添加一个新列，以显示来自 df1 的降雨事件在其相应 bin 中的频率。 For example, in bin 1 I'd be looking for the values in df1 that fall between 16.65 and 18.2.例如，在 bin 1 中，我将寻找 df1 中介于 16.65 和 18.2 之间的值。

I've tried the following:我试过以下方法：

rain = df1['tp1']
for i in range 7:
    limit = df2.iloc[i]
    out4['count']=rain[rain>limit].count()

However, I get the following message:但是，我收到以下消息：

ValueError: Can only compare identically-labeled Series objects

Which I think is referring to the fact that I'm comparing two df's that are different sizes?我认为这是指我正在比较两个不同大小的 df 的事实？ I'm also unsure if that loop is correct or not.我也不确定该循环是否正确。

Any help is much appreciated, thanks!非常感谢任何帮助，谢谢！

Answer 1

Use pd.cut to assign your rainfall into bins:使用pd.cut将降雨量分配到箱中：

# Define the limits for your bins
# Bin 0: (-np.inf  , 16.655970]
# Bin 1: (16.655970, 18.204842]
# Bin 2: (18.204842, 19.526524]
# ...
# note that your bins only go up to 27.14 while max rainfall is 27.4 (row 6).
# You may need to add / adjust your limits.
limits = [-np.inf] + df2["limits"].to_list()

# Assign the rainfall to each bin
bins = pd.cut(df1["rainfall"], limits, labels=df2["bin"])

# Count how many values fall into each bin
bins.value_counts(sort=False).rename_axis("bin")

如何根据另一个 dataframe 中的条件计算一个 dataframe 中的值

问题描述

1 个解决方案

解决方案1
0 2022-10-06 18:26:38

如何根据另一个 dataframe 中的条件计算一个 dataframe 中的值

问题描述

1 个解决方案

解决方案1 0 2022-10-06 18:26:38

解决方案1
0 2022-10-06 18:26:38