I have 2 pandas dataframes:
df1:
ksat muacres SAND SILT CLAY
0 5326 0 0 0
0.1 4346 0 0 0
0.4 4146 0 0 0
0.8 3476 0 0 0
1.2 2006 0 0 0
and, df2:
PERCENTILE ksat b theta
0 1 0.370684 11.55 46.8
1 2 0.558053 11.55 46.8
2 3 0.794836 10.39 46.8
3 4 0.962329 11.55 46.8
4 5 1.202368 10.39 46.8
I want to add a column, 'st' to df1, where for each row in df1, I find the ksat value in df2, which is greater than or equal to ksat value in df1. For this example, the result would be:
df1:
ksat muacres SAND SILT CLAY st
0 5326 0 0 0 1
0.1 4346 0 0 0 1
0.4 4146 0 0 0 2
0.8 3476 0 0 0 4
1.2 2006 0 0 0 5
Currently, I am looping within a loop, but that is very inefficient. Any better ways in pandas?
thanks!
One way is to merge twice. First with just the percentile column so you can backwards fill:
In [11]: merged = df1[['ksat']].merge(df2[['ksat', 'PERCENTILE']], how='outer', sort=True)
In [12]: merged
Out[12]:
ksat PERCENTILE
0 0.000000 NaN
1 0.100000 NaN
2 0.370684 1
3 0.400000 NaN
4 0.558053 2
5 0.794836 3
6 0.800000 NaN
7 0.962329 4
8 1.200000 NaN
9 1.202368 5
In [13]: merged.bfill()
Out[13]:
ksat PERCENTILE
0 0.000000 1
1 0.100000 1
2 0.370684 1
3 0.400000 2
4 0.558053 2
5 0.794836 3
6 0.800000 4
7 0.962329 4
8 1.200000 5
9 1.202368 5
and then merge with this result:
In [14]: df.merge(merged.bfill())
Out[14]:
ksat muacres SAND SILT CLAY PERCENTILE
0 0.0 5326 0 0 0 1
1 0.1 4346 0 0 0 1
2 0.4 4146 0 0 0 2
3 0.8 3476 0 0 0 4
4 1.2 2006 0 0 0 5
you can try numpy.searchsorted
df1['st'] = np.searchsorted(df2.ksat, df1.ksat, side='left') + 1
if the PERCENTILE
values are not ordinal then there is an extra step:
idx = np.searchsorted(df2.ksat, df1.ksat, side='left')
df1['st'] = df2.PERCENTILE[idx].values
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.