繁体   English   中英

熊猫:条件列创建

[英]Pandas: Conditional column creating

我正在尝试根据给定以下条件的A和B列中的值创建C列:

if A < 5000: C = A * B
else: C = A

以下给出了语法错误:

df['C'] = df.apply(lambda x (x['A'] * x['B)'] if x['A'] < 5000 else x = x['A']),axis=1)

我离我有多远?

使用向量化numpy.where

df['C'] = np.where(df['A'] < 5000, df['A'] * df['B'], df['A'])

性能

np.random.seed(2019)

N = 1000
data = np.asarray([np.random.rand(N).tolist(), list(range(N))]).T
df = pd.DataFrame(data, columns=['A', 'B'])

In [56]: %timeit df['C'] = np.where(df['A'] < 5000, df['A'] * df['B'], df['A'])
536 µs ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [57]: %timeit df['C'] = df.apply(lambda x: x.A * x.B if x.A > 0.5 else x.A, 1)
30.9 ms ± 597 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

N = 100000
data = np.asarray([np.random.rand(N).tolist(), list(range(N))]).T
df = pd.DataFrame(data, columns=['A', 'B'])

In [59]: %timeit df['C'] = np.where(df['A'] < 5000, df['A'] * df['B'], df['A'])
1.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [60]: %timeit df['C'] = df.apply(lambda x: x.A * x.B if x.A > 0.5 else x.A, 1)
3.32 s ± 374 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我想你想要类似的东西

df['C'] = df.apply(lambda x: x.A * x.B if x.A > 0.5 else x.A, 1)

完整的例子:

import pandas as pd
import numpy as np

N = 10
data = np.asarray([np.random.rand(N).tolist(), list(range(N))]).T
df = pd.DataFrame(data, columns=['A', 'B'])

df['C'] = df.apply(lambda x: x.A * x.B if x.A > 0.5 else x.A, 1)

我确信在此之前提供的解决方案会更好,但是我通过第三种方式解决了。 数据集很小,所以现在就可以做。

乘法= df ['A'] * df ['B'] df ['C'] =乘法(df ['A'] <5000,other = df ['A'])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM