简体   繁体   English

用于计算pandas列中每个值的百分比的函数

[英]Function to calculate the percentage each value has in a pandas column

I am taking part in the Titanic Tutorials over at Kaggle to learn pandas/machine learning. 我正在参加Kaggle的泰坦尼克号教程学习熊猫/机器学习。

Here is my kernel: https://www.kaggle.com/trenzalore888/titanic/titanic-learning 这是我的内核: https//www.kaggle.com/trenzalore888/titanic/titanic-learning

I want to create a function which takes two arguments, dataframe and column name. 我想创建一个带有两个参数的函数,dataframe和column name。 I want this function to calculate the percentage each class is (assuming it's binary, ie 0 or 1). 我希望这个函数计算每个类的百分比(假设它是二进制的,即0或1)。

I can do this hard coded ie to work specifically for the Titanic set, but I want to create a function so I can use it in the future. 我可以做这个硬编码,即专门为泰坦尼克号设置工作,但我想创建一个功能,以便我将来可以使用它。

Here is my failed attempt: 这是我失败的尝试:

traintotal=(len(train.index))
testtotal=(len(test.index))

def Is_data_imbalanced (df,objectivecolumn) :
    objectivecount= df.objectivecolumn[df.objectivecolumn > 0].sum()
    objectivecountpercentage=(objectivecount/traintotal)*100
    objectivecountrounded= np.ceil(objectivecountpercentage)
    return objectivecountrounded

Is_data_imbalanced(train,"Survived")

Unfortunately I get an attribute error: 不幸的是我收到属性错误:

AttributeError: 'DataFrame' object has no attribute 'objectivecolumn' AttributeError:'DataFrame'对象没有属性'objectivecolumn'

Below is the hardcoded version that works: 以下是有效的硬编码版本:

traintotal=(len(train.index))
print("there are", traintotal,"rows in the train data")

testtotal=(len(test.index))
print("there are {} rows in the test data".format(testtotal))

Survialcount= train.Survived[train.Survived > 0].sum()
Survialcountpercentage=(Survialcount/traintotal)*100
print(Survialcountpercentage)

survivalcountrounded= np.ceil(Survialcountpercentage)

print(" ",survivalcountrounded,"percent survived")

Does anyone know how I can get this to work? 有谁知道我怎么能让这个工作? It seems like it takes df for train fine, but the 2nd argument columnname for .Survived is not working. 好像火车需要df ,但是.Survived的第二个参数columnname不起作用。

Assuming it really is binary then all you need is 假设它真的是二进制的,那么你需要的只是

def Is_data_imbalanced(df, objectivecolumn):
    return int(df[objectivecolumn].mean() * 100)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫-如何为每个组计算列中的每个值等于或小于该值的百分比 - Pandas - How to calculate for each group ,for each value in a column what percentage of value is equal and less than that 如何计算列中每个值的百分比遵循 python pandas dataframe 中的每个类别 - How to calculate the percentage of each value in a column follow each category in python pandas dataframe 计算数据框中每个列值的订单百分比 - Calculate order percentage for each column value in dataframe pandas groupby 和列的每个值出现的百分比 - pandas groupby and percentage of occurrences of each value of a column Pandas:每列值的nan百分比 - Pandas : Percentage of nan for each value of a column 如何分组并计算熊猫每列中不丢失值的百分比? - how to groupby and calculate the percentage of non missing values in each column in pandas? 如何使用pandas groupby计算每列中的总数百分比 - How to use pandas groupby to calculate percentage of total in each column Pandas 按列值计算百分比 - Pandas Calculate percentage by column values Pandas:计算两行之间的百分比并将值添加为列 - Pandas: Calculate the percentage between two rows and add the value as a column 为 pandas dataframe 中的列中的每个值计算列表中每个元素的 perc - Calculate perc of each element in a list for each value in column in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM