将列添加到熊猫数据框中，这是另一列的功能

Question

I have a data frame that has measurements in it and a second data frame with stats on those measurements. 我有一个包含测量值的数据框，以及带有这些测量值统计信息的第二个数据框。 For example: 例如：

def calc_zscore(x, mean, stdev):
    return (x - mean)/stdev


metrics = ['Temperature', 'Length', 'Width']
values = ['mean', 'stdev']

data = pd.DataFrame(columns = metrics)
stats = pd.DataFrame(index = metrics, columns = values)

stats.ix['Temperature', 'mean'] = 72.1
stats.ix['Temperature', 'stdev'] = 6.3

data.loc[0, 'Temperature'] = 68.2
data.loc[1, 'Temperature'] = 76.2
data.loc[2, 'Temperature'] = 73.6

metric = 'Temperature'

for row in data.index:

    data.ix[row, metric + '_zscore'] = calc_zscore( data.ix[row, metric], stats.ix[metric, 'mean'], stats.ix[metric, 'stdev'])

print data

This works as I want it to, however I have to iterate over every row in the data frame. 这可以按我想要的方式工作，但是我必须遍历数据帧中的每一行。 It's slow and the data frame has 300k rows. 它很慢，数据帧有30万行。 I also need to calc the z_score for each column, however to keep it simple I'm only doing the Temperature column for this example. 我还需要计算每列的z_score，但是为了简单起见，我仅在此示例中执行“温度”列。

  Temperature Length Width  Temperature_zscore
0        68.2    NaN   NaN           -0.619048
1        76.2    NaN   NaN            0.650794
2        73.6    NaN   NaN            0.238095

Using the apply() method seems to be the path to pandas magic heaven, however I'm not sure how to pass the correct values to the calc_zscore function using apply(). 使用apply（）方法似乎是熊猫魔术天堂的途径，但是我不确定如何使用apply（）将正确的值传递给calc_zscore函数。

PS I'm not actually calculating the z-score, I'm justing using this as an example. PS我实际上并没有计算z分数，我只是以此为例。 I know I could also used mean() and std() methods however it's just an example and let's pretend they don't exist. 我知道我也可以使用mean（）和std（）方法，但这只是一个示例，我们假装它们不存在。

Answer 1

这等效于您的for循环：

data['Temperature_zscore'] = data['Temperature'].apply(calc_zscore, args=(stats.ix[metric, 'mean'], stats.ix[metric, 'stdev']))

Answer 2

In addition to palako's answer, which shows how you can pass arguments to the function you are applying, you can also use a lambda function in the apply: 除了palako的答案（它显示了如何将参数传递给要应用的函数）之外，还可以在apply中使用lambda函数：

data['Temp_zscore'] = data['Temperature'].apply(lambda x: calc_zscore(x, stats.ix[metric, 'mean'], stats.ix[metric, 'stdev']))

Alternatively, consider creating a partial from functools. 或者，考虑从functools创建局部模型。

from functools import partial

mean = 5.0
stdv = 2.0

def yourfunc(x, m, s):
    return (x - m) / s

partfunc = partial(yourfunc, m=mean, s=stdv)

Then apply that partial function: 然后应用该部分函数：

data['Temp_zscore'] = data['Temperature'].apply(partfunc)

将列添加到熊猫数据框中，这是另一列的功能

问题描述

2 个解决方案

解决方案1
0 2016-11-21 01:12:27

解决方案2
0 2016-11-21 01:27:38

将列添加到熊猫数据框中，这是另一列的功能

问题描述

2 个解决方案

解决方案1 0 2016-11-21 01:12:27

解决方案2 0 2016-11-21 01:27:38

解决方案1
0 2016-11-21 01:12:27

解决方案2
0 2016-11-21 01:27:38