![](/img/trans.png)
[英]Creating new variables based on two columns as index one column as new variable names python pandas or R
[英]Pandas creating a new variable based on two existing variables
我认为以下代码效率很低。 有没有更好的方法在熊猫中进行这种类型的常见重新编码?
df['F'] = 0
df['F'][(df['B'] >=3) & (df['C'] >=4.35)] = 1
df['F'][(df['B'] >=3) & (df['C'] < 4.35)] = 2
df['F'][(df['B'] < 3) & (df['C'] >=4.35)] = 3
df['F'][(df['B'] < 3) & (df['C'] < 4.35)] = 4
使用numpy.select
并将布尔掩码缓存到变量以获得更好的性能:
m1 = df['B'] >= 3
m2 = df['C'] >= 4.35
m3 = df['C'] < 4.35
m4 = df['B'] < 3
df['F'] = np.select([m1 & m2, m1 & m3, m4 & m2, m4 & m3], [1,2,3,4], default=0)
在您的特定情况下,您可以利用布尔实际上是整数(False == 0,True == 1)并使用简单算术的事实:
df['F'] = 1 + (df['C'] < 4.35) + 2 * (df['B'] < 3)
请注意,这将忽略B
和C
列中的任何NaN,这些将被指定为高于您的限制。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.