简体   繁体   中英

Function to calculate the percentage each value has in a pandas column

I am taking part in the Titanic Tutorials over at Kaggle to learn pandas/machine learning.

Here is my kernel: https://www.kaggle.com/trenzalore888/titanic/titanic-learning

I want to create a function which takes two arguments, dataframe and column name. I want this function to calculate the percentage each class is (assuming it's binary, ie 0 or 1).

I can do this hard coded ie to work specifically for the Titanic set, but I want to create a function so I can use it in the future.

Here is my failed attempt:

traintotal=(len(train.index))
testtotal=(len(test.index))

def Is_data_imbalanced (df,objectivecolumn) :
    objectivecount= df.objectivecolumn[df.objectivecolumn > 0].sum()
    objectivecountpercentage=(objectivecount/traintotal)*100
    objectivecountrounded= np.ceil(objectivecountpercentage)
    return objectivecountrounded

Is_data_imbalanced(train,"Survived")

Unfortunately I get an attribute error:

AttributeError: 'DataFrame' object has no attribute 'objectivecolumn'

Below is the hardcoded version that works:

traintotal=(len(train.index))
print("there are", traintotal,"rows in the train data")

testtotal=(len(test.index))
print("there are {} rows in the test data".format(testtotal))

Survialcount= train.Survived[train.Survived > 0].sum()
Survialcountpercentage=(Survialcount/traintotal)*100
print(Survialcountpercentage)

survivalcountrounded= np.ceil(Survialcountpercentage)

print(" ",survivalcountrounded,"percent survived")

Does anyone know how I can get this to work? It seems like it takes df for train fine, but the 2nd argument columnname for .Survived is not working.

Assuming it really is binary then all you need is

def Is_data_imbalanced(df, objectivecolumn):
    return int(df[objectivecolumn].mean() * 100)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM