[英]Add a column to a pandas data frame that is a function of another column
I have a data frame that has measurements in it and a second data frame with stats on those measurements. 我有一个包含测量值的数据框,以及带有这些测量值统计信息的第二个数据框。 For example: 例如:
def calc_zscore(x, mean, stdev):
return (x - mean)/stdev
metrics = ['Temperature', 'Length', 'Width']
values = ['mean', 'stdev']
data = pd.DataFrame(columns = metrics)
stats = pd.DataFrame(index = metrics, columns = values)
stats.ix['Temperature', 'mean'] = 72.1
stats.ix['Temperature', 'stdev'] = 6.3
data.loc[0, 'Temperature'] = 68.2
data.loc[1, 'Temperature'] = 76.2
data.loc[2, 'Temperature'] = 73.6
metric = 'Temperature'
for row in data.index:
data.ix[row, metric + '_zscore'] = calc_zscore( data.ix[row, metric], stats.ix[metric, 'mean'], stats.ix[metric, 'stdev'])
print data
This works as I want it to, however I have to iterate over every row in the data frame. 这可以按我想要的方式工作,但是我必须遍历数据帧中的每一行。 It's slow and the data frame has 300k rows. 它很慢,数据帧有30万行。 I also need to calc the z_score for each column, however to keep it simple I'm only doing the Temperature column for this example. 我还需要计算每列的z_score,但是为了简单起见,我仅在此示例中执行“温度”列。
Temperature Length Width Temperature_zscore
0 68.2 NaN NaN -0.619048
1 76.2 NaN NaN 0.650794
2 73.6 NaN NaN 0.238095
Using the apply() method seems to be the path to pandas magic heaven, however I'm not sure how to pass the correct values to the calc_zscore function using apply(). 使用apply()方法似乎是熊猫魔术天堂的途径,但是我不确定如何使用apply()将正确的值传递给calc_zscore函数。
PS I'm not actually calculating the z-score, I'm justing using this as an example. PS我实际上并没有计算z分数,我只是以此为例。 I know I could also used mean() and std() methods however it's just an example and let's pretend they don't exist. 我知道我也可以使用mean()和std()方法,但这只是一个示例,我们假装它们不存在。
这等效于您的for循环:
data['Temperature_zscore'] = data['Temperature'].apply(calc_zscore, args=(stats.ix[metric, 'mean'], stats.ix[metric, 'stdev']))
In addition to palako's answer, which shows how you can pass arguments to the function you are applying, you can also use a lambda function in the apply: 除了palako的答案(它显示了如何将参数传递给要应用的函数)之外,还可以在apply中使用lambda函数:
data['Temp_zscore'] = data['Temperature'].apply(lambda x: calc_zscore(x, stats.ix[metric, 'mean'], stats.ix[metric, 'stdev']))
Alternatively, consider creating a partial from functools. 或者,考虑从functools创建局部模型 。
from functools import partial
mean = 5.0
stdv = 2.0
def yourfunc(x, m, s):
return (x - m) / s
partfunc = partial(yourfunc, m=mean, s=stdv)
Then apply that partial function: 然后应用该部分函数:
data['Temp_zscore'] = data['Temperature'].apply(partfunc)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.