Pandas基于两个现有变量创建一个新变量

Question

I have the following code I think is highly inefficient. 我认为以下代码效率很低。 Is there a better way to do this type common recoding in pandas? 有没有更好的方法在熊猫中进行这种类型的常见重新编码？

df['F'] = 0
df['F'][(df['B'] >=3) & (df['C'] >=4.35)] = 1
df['F'][(df['B'] >=3) & (df['C'] < 4.35)] = 2
df['F'][(df['B'] < 3) & (df['C'] >=4.35)] = 3
df['F'][(df['B'] < 3) & (df['C'] < 4.35)] = 4

Answer 1

Use numpy.select and cache boolean masks to variables for better performance: 使用numpy.select并将布尔掩码缓存到变量以获得更好的性能：

m1 = df['B'] >= 3
m2 = df['C'] >= 4.35
m3 = df['C'] < 4.35
m4 = df['B'] < 3

df['F'] = np.select([m1 & m2, m1 & m3, m4 & m2, m4 & m3], [1,2,3,4], default=0)

Answer 2

In your specific case, you can make use of the fact that booleans are actually integers (False == 0, True == 1) and use simple arithmetic: 在您的特定情况下，您可以利用布尔实际上是整数（False == 0，True == 1）并使用简单算术的事实：

df['F'] = 1 + (df['C'] < 4.35) + 2 * (df['B'] < 3)

Note that this will ignore any NaN's in your B and C columns, these will be assigned as being above your limit. 请注意，这将忽略B和C列中的任何NaN，这些将被指定为高于您的限制。

Pandas基于两个现有变量创建一个新变量

问题描述

2 个解决方案

解决方案1
11 已采纳 2018-06-14 06:47:52

解决方案2
3 2018-06-14 06:56:21

Pandas基于两个现有变量创建一个新变量

问题描述

2 个解决方案

解决方案1 11 已采纳 2018-06-14 06:47:52

解决方案2 3 2018-06-14 06:56:21

解决方案1
11 已采纳 2018-06-14 06:47:52

解决方案2
3 2018-06-14 06:56:21