简体   繁体   English

pandas Dataframe 创建新列

[英]pandas Dataframe create new column

I have this snippet of the code working with pandas dataframe, i am trying to use the apply function to create a new column called STDEV_TV but i keep running into this error all the columns i am working with are type float我有与 pandas dataframe 一起使用的代码片段,我正在尝试使用应用 function 创建一个名为 STDEV 的新列,该列运行错误

TypeError: ("'float' object is not iterable", 'occurred at index 0')

Can someone help me understand why i keep getting this error有人可以帮我理解为什么我不断收到这个错误

def sigma(df):
    val = df.volume2Sum / df.volumeSum - df.vwap * df.vwap
    return math.sqrt(max(val))


df['STDEV_TV'] = df.apply(sigma, axis=1)

Try:尝试:

import pandas as pd
import numpy as np
import math

df = pd.DataFrame(np.random.randint(1, 10, (5, 3)),
                  columns=['volume2Sum', 'volumeSum', 'vwap'])

def sigma(df):
    val = df.volume2Sum / df.volumeSum - df.vwap * df.vwap
    return math.sqrt(val) if val >= 0 else val

df['STDEV_TV'] = df.apply(sigma, axis=1)

Output: Output:

>>> df
   volume2Sum  volumeSum  vwap   STDEV_TV
0           4          5     8 -63.200000
1           2          8     4 -15.750000
2           3          3     3  -8.000000
3           8          3     4 -13.333333
4           4          2     3  -7.000000

Change改变

return math.sqrt(max(val)) 

to

return math.sqrt(val)

max() iterates over an iterable and find the maximum value. max()遍历一个可迭代对象并找到最大值。 The problem here is since you're applying sigma to every row, local variable val is a float, not a list, so what you have similar to max(1.3) .这里的问题是,由于您将sigma应用于每一行,因此局部变量val是一个浮点数,而不是一个列表,所以您拥有的类似于max(1.3)

You need to apply sigma to each set of values not the whole DataFrame.您需要将 sigma 应用于每组值,而不是整个 DataFrame。 I would use a lambda function, eg:我会使用 lambda function,例如:

def sigma(volume2Sum, volumeSum, vwap):
    val = volume2Sum / volumeSum - vwap * vwap
    return math.sqrt(val)


df['STDEV_TV'] = df.apply(lambda x: sigma(x.volume2Sum, x.volumeSum, x.vwap), axis=1)

That should put val into the STDEV_TV column and you can find the max value separately.这应该将 val 放入 STDEV_TV 列,您可以单独找到最大值。 Take care you not to take the squareroot of a negative number.注意不要取负数的平方根。

You function sigma gives you one number as a result.结果,您 function sigma 给您一个数字。 Because, the first step you find the maximum:因为,第一步你找到最大值:

max(val)

and it's only the one number... After that you try uses you function for data series.它只是一个数字......之后你尝试使用你的 function 数据系列。 You should use in your code this last string:您应该在代码中使用最后一个字符串:

df['STDEV_TV'] = sigma(df)

It will be working它会工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM