[英]how to compare two columns in pandas to make a third column ?
i have two columns age and sex in a pandas dataframe 我在熊猫数据框中有两列年龄和性别
sex = ['m', 'f' , 'm', 'f', 'f', 'f', 'f']
age = [16 , 15 , 14 , 9 , 8 , 2 , 56 ]
now i want to extract a third column : like this if age <=9 then output ' child' and if age >9 then output the respective gender 现在我想提取第三列:如果年龄<= 9则输出'child',如果年龄> 9,则输出相应的性别
sex = ['m', 'f' , 'm','f' ,'f' ,'f' , 'f']
age = [16 , 15 , 14 , 9 , 8 , 2 , 56 ]
yes = ['m', 'f' ,'m' ,'child','child','child','f' ]
please help ps . 请帮助ps。 i am still working on it if i get anything i will immediately update 我仍在努力,如果我得到任何东西,我会立即更新
Use numpy.where
: 使用numpy.where
:
df['col3'] = np.where(df['age'] <= 9, 'child', df['sex'])
The resulting output: 结果输出:
age sex col3
0 16 m m
1 15 f f
2 14 m m
3 9 f child
4 8 f child
5 2 f child
6 56 f f
Timings 计时
Using the following setup to get a larger sample DataFrame: 使用以下设置获取更大的示例DataFrame:
np.random.seed([3,1415])
n = 10**5
df = pd.DataFrame({'sex': np.random.choice(['m', 'f'], size=n), 'age': np.random.randint(0, 100, size=n)})
I get the following timings: 我得到以下时间:
%timeit np.where(df['age'] <= 9, 'child', df['sex'])
1000 loops, best of 3: 1.26 ms per loop
%timeit df['sex'].where(df['age'] > 9, 'child')
100 loops, best of 3: 3.25 ms per loop
%timeit df.apply(lambda x: 'child' if x['age'] <= 9 else x['sex'], axis=1)
100 loops, best of 3: 3.92 ms per loop
You could use pandas.DataFrame.where . 你可以使用pandas.DataFrame.where 。 For example 例如
child.where(age<=9, sex)
df = pd.DataFrame({'sex':['m', 'f' , 'm', 'f', 'f', 'f', 'f'],
'age':[16, 15, 14, 9, 8, 2, 56]})
df['yes'] = df.apply(lambda x: 'child' if x['age'] <= 9 else x['sex'], axis=1)
Result: 结果:
age sex yes
0 16 m m
1 15 f f
2 14 m m
3 9 f child
4 8 f child
5 2 f child
6 56 f f
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.