[英]Get values from between two other values for each row in the dataframe
I want to extract the integer values for each Hole_ID between the From and To values (inclusive).我想为 From 和 To 值(含)之间的每个 Hole_ID 提取整数值。 And save them to a new data frame with the Hole IDs as the column headers.
并将它们保存到以 Hole ID 作为列标题的新数据框中。
import pandas as pd
import numpy as np
df=pd.DataFrame(np.array([['Hole_1',110,117],['Hole_2',220,225],['Hole_3',112,114],['Hole_4',248,252],['Hole_5',116,120],['Hole_6',39,45],['Hole_7',65,72],['Hole_8',79,83]]),columns=['HOLE_ID','FROM', 'TO'])
Example starting data示例起始数据
HOLE_ID FROM TO
0 Hole_1 110 117
1 Hole_2 220 225
2 Hole_3 112 114
3 Hole_4 248 252
4 Hole_5 116 120
5 Hole_6 39 45
6 Hole_7 65 72
7 Hole_8 79 83
This is what I would like:这就是我想要的:
Out[5]:
Hole_1 Hole_2 Hole_3 Hole_4 Hole_5 Hole_6 Hole_7 Hole_8
0 110 220 112 248 116 39 65 79
1 111 221 113 249 117 40 66 80
2 112 222 114 250 118 41 67 81
3 113 223 Nan 251 119 42 68 82
4 114 224 Nan 252 120 43 69 83
5 115 225 Nan Nan Nan 44 70 Nan
6 116 Nan Nan Nan Nan 45 71 Nan
7 117 Nan Nan Nan Nan Nan 72 Nan
I have tried to use the range function, which works if I manually define the range:我尝试使用 range 函数,如果我手动定义范围,它会起作用:
for i in df['HOLE_ID']:
df2[i]=range(int(1),int(10))
gives给
Hole_1 Hole_2 Hole_3 Hole_4 Hole_5 Hole_6 Hole_7 Hole_8
0 1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5
5 6 6 6 6 6 6 6 6
6 7 7 7 7 7 7 7 7
7 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9
but this won't take the df To and From values as inputs to the range.但这不会将 df To 和 From 值作为范围的输入。
df2=pd.DataFrame()
for i in df['HOLE_ID']:
df2[i]=range(df['To'],df['From'])
gives an error.给出一个错误。
Apply a method that returns a series of a range between from and to and then transpose the result, eg:应用一个方法,该方法返回一系列 from 和 to 之间的范围,然后转置结果,例如:
import numpy as np
df.set_index('HOLE_ID').apply(lambda v: pd.Series(np.arange(v['FROM'], v['TO'] + 1)), axis=1).T
Gives you:给你:
HOLE_ID Hole_1 Hole_2 Hole_3 Hole_4 Hole_5 Hole_6 Hole_7 Hole_8
0 110.0 220.0 112.0 248.0 116.0 39.0 65.0 79.0
1 111.0 221.0 113.0 249.0 117.0 40.0 66.0 80.0
2 112.0 222.0 114.0 250.0 118.0 41.0 67.0 81.0
3 113.0 223.0 NaN 251.0 119.0 42.0 68.0 82.0
4 114.0 224.0 NaN 252.0 120.0 43.0 69.0 83.0
5 115.0 225.0 NaN NaN NaN 44.0 70.0 NaN
6 116.0 NaN NaN NaN NaN 45.0 71.0 NaN
7 117.0 NaN NaN NaN NaN NaN 72.0 NaN
Let's try:咱们试试吧:
df[['FROM','TO']] = df[['FROM', 'TO']].apply(pd.to_numeric)
dfe = df.set_index('HOLE_ID').apply(lambda x: np.arange(x['FROM'], x['TO']+1), axis=1).explode().to_frame()
dfe.set_index(dfe.groupby(level=0).cumcount(), append=True)[0].unstack(0)
Output:输出:
HOLE_ID Hole_1 Hole_2 Hole_3 Hole_4 Hole_5 Hole_6 Hole_7 Hole_8
0 110 220 112 248 116 39 65 79
1 111 221 113 249 117 40 66 80
2 112 222 114 250 118 41 67 81
3 113 223 NaN 251 119 42 68 82
4 114 224 NaN 252 120 43 69 83
5 115 225 NaN NaN NaN 44 70 NaN
6 116 NaN NaN NaN NaN 45 71 NaN
7 117 NaN NaN NaN NaN NaN 72 NaN
Here is another way that creates a range from the 2 columns and creates a df:这是从 2 列创建范围并创建 df 的另一种方法:
out = (pd.DataFrame(df[['FROM','TO']].astype(int).agg(tuple,1)
.map(lambda x: range(x[0],x[1]+1).tolist(),index=df['HOLE_ID']).T)
HOLE_ID Hole_1 Hole_2 Hole_3 Hole_4 Hole_5 Hole_6 Hole_7 Hole_8
0 110.0 220.0 112.0 248.0 116.0 39.0 65.0 79.0
1 111.0 221.0 113.0 249.0 117.0 40.0 66.0 80.0
2 112.0 222.0 114.0 250.0 118.0 41.0 67.0 81.0
3 113.0 223.0 NaN 251.0 119.0 42.0 68.0 82.0
4 114.0 224.0 NaN 252.0 120.0 43.0 69.0 83.0
5 115.0 225.0 NaN NaN NaN 44.0 70.0 NaN
6 116.0 NaN NaN NaN NaN 45.0 71.0 NaN
7 117.0 NaN NaN NaN NaN NaN 72.0 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.