简体   繁体   中英

How to count occurrences based on multiple criteria in a DataFrame

I'm trying to figure out how to count a number of occurrences in the DataFrame using multiple criteria. In this particular example, I'd like to know the number of female passengers in Pclass 3.

    PassengerId Pclass  Sex Age SibSp   Parch   Ticket  Fare    Cabin   Embarked
0       892 3   male    34.5    0   0   330911  7.8292  NaN Q
1       893 3   female  47.0    1   0   363272  7.0000  NaN S
2       894 2   male    62.0    0   0   240276  9.6875  NaN Q
3       895 3   male    27.0    0   0   315154  8.6625  NaN S
4       896 3   female  22.0    1   1   3101298 12.2875 NaN S

Here's my few failed attempts:

    len(test[test["Sex"] == "female", test["Pclass"] == 3])
    sum(test.Pclass == 3 & test.Sex == "female")
    test.[test["Sex"] == "female", test["Pclass"] == 3].count()

None of them seem to be working. At the end I've created my own function, but there must be a simpler way to calculate that.

def countif(sex, pclass):
    x = 0
    for i in range(0,len(test)):
        s = test.iloc[i]['Sex']
        c = test.iloc[i]['Pclass']
        if s == sex and c == pclass:
                x = x + 1
    return x

Thank you in advance

There are a few ways to do this:

test = pd.DataFrame({'PassengerId': {0: 892, 1: 893, 2: 894, 3: 895, 4: 896}, 
      'Pclass': {0: 3, 1: 3, 2: 2, 3: 3, 4: 3}, 
      'Sex': {0: 'male', 1: 'female', 2: 'male', 3: 'male', 4: 'female'}, 
      'Age': {0: 34.5, 1: 47.0, 2: 62.0, 3: 27.0, 4: 22.0}, 
      'SibSp': {0: 0, 1: 1, 2: 0, 3: 0, 4: 1}, 
      'Parch': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1}, 
      'Ticket': {0: 330911, 1: 363272, 2: 240276, 3: 315154, 4: 3101298}, 
      'Fare': {0: 7.8292, 1: 7.0, 2: 9.6875, 3: 8.6625, 4: 12.2875}, 
      'Cabin': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan}, 
      'Embarked': {0: 'Q', 1: 'S', 2: 'Q', 3: 'S', 4: 'S'}})

You need to put the boolean in round brackets and join with an &

sum((test.Pclass == 3) & (test.Sex == "female"))
len(test[(test.Pclass == 3) & (test.Sex == "female")])
test[(test["Sex"] == "female") & (test["Pclass"] == 3)].shape[0]

Or you can do:

tab = pd.crosstab(df.Pclass,df.Sex)

Sex female  male
Pclass      
2   0   1
3   2   2

tab.iloc[tab.index==3]['female']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM