[英]how to replace column value with range in pandas dataframe
我有一個名為'df'的數據框,我想用一個數據框中的列范圍內的值替換另一列中的相應值。
6 <=年齡<11然后1
11 <=年齡<16然后2
16 <=年齡<21然后3
21歲<=年齡4歲
age 86508 12.0 86509 6.0 86510 7.0 86511 8.0 86512 10.0 86513 15.0 86514 15.0 86515 16.0 86516 20.0 86517 23.0 86518 23.0 86519 7.0 86520 18.0
結果是
age stage
86508 12.0 2
86509 6.0 1
86510 7.0 1
86511 8.0 1
86512 10.0 1
86513 15.0 2
86514 15.0 2
86515 16.0 2
86516 20.0 3
86517 23.0 4
86518 23.0 4
86519 7.0 1
86520 18.0 3
謝謝。
使用pd.cut() :
In [37]: df['stage'] = pd.cut(df.age, bins=[0,11,16,21,300], labels=[1,2,3,4])
In [38]: df
Out[38]:
age stage
86508 12.0 2
86509 6.0 1
86510 7.0 1
86511 8.0 1
86512 10.0 1
86513 15.0 2
86514 15.0 2
86515 16.0 2
86516 20.0 3
86517 23.0 4
86518 23.0 4
86519 7.0 1
86520 18.0 3
In [39]: df['stage'] = pd.cut(df.age, bins=[0, 11, 16, 21, np.inf], labels=False, right=True) + 1
In [40]: df
Out[40]:
age stage
86508 12.0 2
86509 6.0 1
86510 7.0 1
86511 8.0 1
86512 10.0 1
86513 15.0 2
86514 15.0 2
86515 16.0 2
86516 20.0 3
86517 23.0 4
86518 23.0 4
86519 7.0 1
86520 18.0 3
使用np.searchsorted
a = np.array([-np.inf, 6, 11, 16, 21, np.inf])
df.assign(stage=a.searchsorted(df.age, side='right') - 1)
age stage
86508 12.0 2
86509 6.0 1
86510 7.0 1
86511 8.0 1
86512 10.0 1
86513 15.0 2
86514 15.0 2
86515 16.0 3
86516 20.0 3
86517 23.0 4
86518 23.0 4
86519 7.0 1
86520 18.0 3
定時
小數據
%%timeit
a = np.array([-np.inf, 6, 11, 16, 21, np.inf])
df.assign(stage=a.searchsorted(df.age, side='right') - 1)
1000 loops, best of 3: 288 µs per loop
%%timeit
df.assign(stage=pd.cut(df.age, bins=[0,11,16,21,300], labels=[1,2,3,4]))
1000 loops, best of 3: 668 µs per loop
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.