简体   繁体   English

将列添加到熊猫数据框中,这是另一列的功能

[英]Add a column to a pandas data frame that is a function of another column

I have a data frame that has measurements in it and a second data frame with stats on those measurements. 我有一个包含测量值的数据框,以及带有这些测量值统计信息的第二个数据框。 For example: 例如:

def calc_zscore(x, mean, stdev):
    return (x - mean)/stdev


metrics = ['Temperature', 'Length', 'Width']
values = ['mean', 'stdev']

data = pd.DataFrame(columns = metrics)
stats = pd.DataFrame(index = metrics, columns = values)

stats.ix['Temperature', 'mean'] = 72.1
stats.ix['Temperature', 'stdev'] = 6.3

data.loc[0, 'Temperature'] = 68.2
data.loc[1, 'Temperature'] = 76.2
data.loc[2, 'Temperature'] = 73.6

metric = 'Temperature'

for row in data.index:

    data.ix[row, metric + '_zscore'] = calc_zscore( data.ix[row, metric], stats.ix[metric, 'mean'], stats.ix[metric, 'stdev'])

print data

This works as I want it to, however I have to iterate over every row in the data frame. 这可以按我想要的方式工作,但是我必须遍历数据帧中的每一行。 It's slow and the data frame has 300k rows. 它很慢,数据帧有30万行。 I also need to calc the z_score for each column, however to keep it simple I'm only doing the Temperature column for this example. 我还需要计算每列的z_score,但是为了简单起见,我仅在此示例中执行“温度”列。

  Temperature Length Width  Temperature_zscore
0        68.2    NaN   NaN           -0.619048
1        76.2    NaN   NaN            0.650794
2        73.6    NaN   NaN            0.238095

Using the apply() method seems to be the path to pandas magic heaven, however I'm not sure how to pass the correct values to the calc_zscore function using apply(). 使用apply()方法似乎是熊猫魔术天堂的途径,但是我不确定如何使用apply()将正确的值传递给calc_zscore函数。

PS I'm not actually calculating the z-score, I'm justing using this as an example. PS我实际上并没有计算z分数,我只是以此为例。 I know I could also used mean() and std() methods however it's just an example and let's pretend they don't exist. 我知道我也可以使用mean()和std()方法,但这只是一个示例,我们假装它们不存在。

这等效于您的for循环:

data['Temperature_zscore'] = data['Temperature'].apply(calc_zscore, args=(stats.ix[metric, 'mean'], stats.ix[metric, 'stdev']))

In addition to palako's answer, which shows how you can pass arguments to the function you are applying, you can also use a lambda function in the apply: 除了palako的答案(它显示了如何将参数传递给要应用的函数)之外,还可以在apply中使用lambda函数:

data['Temp_zscore'] = data['Temperature'].apply(lambda x: calc_zscore(x, stats.ix[metric, 'mean'], stats.ix[metric, 'stdev']))

Alternatively, consider creating a partial from functools. 或者,考虑从functools创建局部模型

from functools import partial

mean = 5.0
stdv = 2.0

def yourfunc(x, m, s):
    return (x - m) / s

partfunc = partial(yourfunc, m=mean, s=stdv)

Then apply that partial function: 然后应用该部分函数:

data['Temp_zscore'] = data['Temperature'].apply(partfunc)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM