用于计算pandas列中每个值的百分比的函数

Question

I am taking part in the Titanic Tutorials over at Kaggle to learn pandas/machine learning. 我正在参加Kaggle的泰坦尼克号教程学习熊猫/机器学习。

Here is my kernel: https://www.kaggle.com/trenzalore888/titanic/titanic-learning 这是我的内核： https ： //www.kaggle.com/trenzalore888/titanic/titanic-learning

I want to create a function which takes two arguments, dataframe and column name. 我想创建一个带有两个参数的函数，dataframe和column name。 I want this function to calculate the percentage each class is (assuming it's binary, ie 0 or 1). 我希望这个函数计算每个类的百分比（假设它是二进制的，即0或1）。

I can do this hard coded ie to work specifically for the Titanic set, but I want to create a function so I can use it in the future. 我可以做这个硬编码，即专门为泰坦尼克号设置工作，但我想创建一个功能，以便我将来可以使用它。

Here is my failed attempt: 这是我失败的尝试：

traintotal=(len(train.index))
testtotal=(len(test.index))

def Is_data_imbalanced (df,objectivecolumn) :
    objectivecount= df.objectivecolumn[df.objectivecolumn > 0].sum()
    objectivecountpercentage=(objectivecount/traintotal)*100
    objectivecountrounded= np.ceil(objectivecountpercentage)
    return objectivecountrounded

Is_data_imbalanced(train,"Survived")

Unfortunately I get an attribute error: 不幸的是我收到属性错误：

AttributeError: 'DataFrame' object has no attribute 'objectivecolumn' AttributeError：'DataFrame'对象没有属性'objectivecolumn'

Below is the hardcoded version that works: 以下是有效的硬编码版本：

traintotal=(len(train.index))
print("there are", traintotal,"rows in the train data")

testtotal=(len(test.index))
print("there are {} rows in the test data".format(testtotal))

Survialcount= train.Survived[train.Survived > 0].sum()
Survialcountpercentage=(Survialcount/traintotal)*100
print(Survialcountpercentage)

survivalcountrounded= np.ceil(Survialcountpercentage)

print(" ",survivalcountrounded,"percent survived")

Does anyone know how I can get this to work? 有谁知道我怎么能让这个工作？ It seems like it takes df for train fine, but the 2nd argument columnname for .Survived is not working. 好像火车需要df ，但是.Survived的第二个参数columnname不起作用。

Answer 1

Assuming it really is binary then all you need is 假设它真的是二进制的，那么你需要的只是

def Is_data_imbalanced(df, objectivecolumn):
    return int(df[objectivecolumn].mean() * 100)

用于计算pandas列中每个值的百分比的函数

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-04-05 10:09:50

用于计算pandas列中每个值的百分比的函数

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-04-05 10:09:50

解决方案1
1 已采纳 2017-04-05 10:09:50