[英]Python, pandas group one dataframe by the results from another of different size
I have two dataframes, df_DD carries all my data, and df_GS carries the ranges of data that I want to break df_DD into.我有两个数据帧,df_DD 包含我所有的数据,df_GS 包含我想将 df_DD 分成的数据范围。 df_GS is much shorter than df_DD, I want to group all the df_DD by df_GS for when the ranges are equated.
df_GS 比 df_DD 短得多,我想在范围相等时按 df_GS 对所有 df_DD 进行分组。
Small range of df_GS df_GS 的小范围
From To DHID
0 69.0 88.5 CR22-200
1 88.5 90.0 CR22-200
2 90.0 99.0 CR22-200
3 99.0 100.5 CR22-200
4 100.5 112.5 CR22-200
5 112.5 114.0 CR22-200
6 114.0 165.0 CR22-200
for i in range(len(df_GS)):
df_DD['Samples'].loc[(df_DD[From] >= df_GS[From].iloc[i]) & (df_DD[To] <= df_GS[To].iloc[i]) & (df_DD[DHID]==df_GS[DHID].iloc[i])] = i+1
Here is an output of df_DD这是 df_DD 的 output
Samples From To DHID
0 1 69.0 70.5 CR22-200
1 1 70.5 72.0 CR22-200
2 1 72.0 73.5 CR22-200
3 1 73.5 75.0 CR22-200
4 1 75.0 76.5 CR22-200
5 1 76.5 78.0 CR22-200
6 1 78.0 79.5 CR22-200
7 1 79.5 81.0 CR22-200
8 1 81.0 82.5 CR22-200
9 1 82.5 84.0 CR22-200
10 1 84.0 85.5 CR22-200
11 1 85.5 87.0 CR22-200
12 1 87.0 88.5 CR22-200
13 2 88.5 90.0 CR22-200
14 3 90.0 91.5 CR22-200
15 3 91.5 93.0 CR22-200
The code above does what I want it to by creating a new column named Samples giving values a sample index, after which I can use the groupby function. But I wanted to know if there was a better way to do this cause it's quite cumbersome.上面的代码通过创建一个名为 Samples 的新列来完成我想要的,为值提供一个示例索引,之后我可以使用 groupby function。但我想知道是否有更好的方法来执行此操作,因为它非常麻烦。
With merge_asof you can merge on the closest key instead of an exact key.使用merge_asof您可以合并最近的键而不是精确的键。
df = pd.DataFrame({"from": [69.0, 88.5, 90.0], "DHID":['CR22-1', 'CR22-2','CR22-3']})
df['samples'] = df.index + 1
df_DD = pd.DataFrame({"from": [69.0, 75.1, 86.7, 88.5, 90.0]})
result = pd.merge_asof(df_DD, df, on='from')
The default direction is backward
, so it merges df
's row where from
is the closest value that is less or equal to the df_DD
's from
column.默认方向是
backward
,所以它合并df
的行,其中from
是小于或等于df_DD
的from
列的最接近值。
This code gives you the result:此代码为您提供结果:
from DHID samples
0 69.0 CR22-1 1
1 75.1 CR22-1 1
2 86.7 CR22-1 1
3 88.5 CR22-2 2
4 90.0 CR22-3 3
You can keep the original to
column or add a new one and also you can set the column order you want at the end like this:您可以保留原始
to
或添加新列,也可以在末尾设置所需的列顺序,如下所示:
result['to'] = result["from"].shift(-1)
result = result[['samples', 'from', 'to', 'DHID']]
That gives you:这给你:
samples from to DHID
0 1 69.0 75.1 CR22-1
1 1 75.1 86.7 CR22-1
2 1 86.7 88.5 CR22-1
3 2 88.5 90.0 CR22-2
4 3 90.0 NaN CR22-3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.