I'm using the titanic dataset and have a created a series Famsize. I'd like to create a second series that outputs 'single' if famsize =1, 'small' if 1 < famsize < 5 and 'large' if famsize >=5.
Famsize FamsizeDisc
1 single
2 small
5 large
I've tried using np.where but as I have three outputs I haven't been able to find a solution.
Any suggestions?
Its called binning
so use pd.cut
ie
df['new'] = pd.cut(df['Famsize'],bins=[0,1,4,np.inf],labels=['single','small','large'])
Output:
Famsize FamsizeDisc new 0 1 single single 1 2 small small 2 5 large large
Either you could create a function which does the mapping:
def get_sizeDisc(x):
if x == 1:
return 'single'
elif x < 5:
return 'small'
elif x >= 5:
return 'large'
df['FamsizeDisc'] = df.Famsize.apply(get_sizeDisc)
Or you could use .loc
df.loc[df.Famsize==1, 'FamsizeDisc'] = 'single'
df.loc[df.Famsize.between(1,5, inclusive = False), 'FamsizeDisc'] = 'small'
df.loc[df.Famsize>=5, 'FamsizeDisc'] = 'large'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.