[英]pandas - add column based on conditions
Starting from a simple dataframe df
like: 从一个简单的数据框
df
开始,例如:
C,n
AAA,1
AAA,2
BBB,1
BBB,2
CCC,1
CCC,2
DDD,1
DDD,2
I would like to add a column based on some conditions on values in the C
column. 我想根据
C
列中值的某些条件添加一列。 The column I would like to add is: 我想添加的列是:
df['H'] = df['n'] / 10
which returns: 返回:
C n H
0 AAA 1 0.1
1 AAA 2 0.2
2 BBB 1 0.1
3 BBB 2 0.2
4 CCC 1 0.1
5 CCC 2 0.2
6 DDD 1 0.1
7 DDD 2 0.2
Now I would like to add the same column but with a different normalization factor only for values CCC
and DDD
in column C
, as, for instance: 现在,我只想为列
C
CCC
和DDD
值添加同一列,但归一化因子不同,例如:
df['H'] = df['n'] / 100
so that: 以便:
C n H
0 AAA 1 0.1
1 AAA 2 0.2
2 BBB 1 0.1
3 BBB 2 0.2
4 CCC 1 0.01
5 CCC 2 0.02
6 DDD 1 0.01
7 DDD 2 0.02
So far I tried to mask the dataframe as: 到目前为止,我尝试将数据框屏蔽为:
mask = df['C'] == 'CCC'
df = df[mask]
df['H'] = df['n'] / 100
and that worked on the masked sample. 这对蒙版的样本有效。 But since I have to apply several filters keeping the original
H
column for non-filtered values I'm getting confused. 但是由于我必须应用几个过滤器,将原始
H
列保留为未过滤的值,所以我很困惑。
df.loc[df['C'] == 'CCC' , 'H'] = df['n'] / 100
也可以使用iloc
df.ix[df['C'].isin(['CCC','DDD']),['H']] = df['n'] / 100
Using the examples in this answer you can use: 使用此答案中的示例,您可以使用:
df['H'][mask] = df['H'][mask]/100
You could also calculate the H column separately based ('CCC'/'DDD' or not 'CCC'/'DDD'): 您还可以根据以下内容分别计算H列(“ CCC” /“ DDD”或“ CCC” /“ DDD”):
import numpy as np
mask = np.logical_or(df['C'] == 'CCC', df['C']=='DDD')
not_mask = np.logical_not(mask)
df['H'][not_mask] = df['H'][not_mask]/10
df['H'][mask] = df['H'][mask]/100
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.