简体   繁体   English

使用函数获取每列中的平均值/中位数/众数/四分位数/分位数

[英]Getting a Mean/Median/Mode/Quartile/Quantile in each column using a function

I'm new in jupyter notebook and wondering how to get a quantile of a column inside a function:我是 jupyter notebook 的新手,想知道如何在函数中获取列的分位数:

DataFrame:数据框:

num_likes | num_post | ... | 
464.0     | 142.0    | ... |
364.0     | 125.0    | ... |
487.0     | 106.0    | ... |
258.0     | 123.0    | ... |
125.0     | 103.0    | ... |

myFunction:我的功能:

def myFunction(x):
    q22 = dataframe["num_likes"].quantile(0.22)
    q45 = dataframe["num_likes"].quantile(0.45)
    qc = q45 - q22
    k = 3

    if x >= q45 + k * qc:
        return q45 + k * qc
    elif x <= q22 - k * qc:
        return q22 - k * qc

Right now, since I don't know how to get it, I ended up running the function for each column that I have.现在,由于我不知道如何获得它,我最终为我拥有的每一列运行了该函数。 Also, I tried to run it and it seems that it is not working另外,我尝试运行它,但它似乎不起作用

data["num_likes"].apply(lambda x : myFunction(x))[:5]

Also, the result seems to be wrong as I didn't see the any returns此外,结果似乎是错误的,因为我没有看到任何回报

    num_likes | num_post | ... | 
    NaN       | None     | ... |
    NaN       | None     | ... |
    NaN       | None     | ... |
    NaN       | None     | ... |
    NaN       | None     | ... |

The reason you're getting None is because no paths of your if-elseif block return true so myFunction is returning None .你得到None的原因是因为你的if-elseif块的路径没有返回 true 所以myFunction返回None Did you mean if-else ?你是说if-else吗?

Besides that to clean up what you have I would do things a little bit different.除此之外,为了清理你拥有的东西,我会做一些不同的事情。 First q22, q45, and qc only need to be calculated once (based on logic above) and these can be passed into the function instead calculated each time in the function.首先 q22、q45 和 qc 只需要计算一次(基于上述逻辑),这些可以传递到函数中,而不是每次在函数中计算。 Second you do not need to create a lambda in this situation, apply ( docs ) takes a python callable (your function) and one can pass additional arguments like below.其次,在这种情况下您不需要创建lambdaapply ( docs ) 接受一个 python 可调用(您的函数),并且可以传递如下附加参数。

df = pd.DataFrame({
    'num_likes': [464.0, 364.0, 487.0, 258.0, 125.0],
    'num_post': [142.0, 125.0, 106.0, 123.0, 103.0]
})

def myFunction(x, q22, q45, qc):
    k = 3

    if x >= q45 + k * qc:
        return q45 + k * qc
    elif x <= q22 - k * qc:
        return q22 - k * qc
    else:
        return -1

q22 = df["num_likes"].quantile(0.22)
q45 = df["num_likes"].quantile(0.45)
qc = q45 - q22

# pass additional arguments in an tuple, they will be passed to myFunction
df.num_likes.apply(myFunction, args=(q22, q45, qc))

# this will return a series which can be assigned to new column
# 0   -1
# 1   -1
# 2   -1
# 3   -1
# 4   -1
# Name: num_likes, dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM