简体   繁体   中英

How to calculate conditional probability of values in dataframe pandas-python?

I want to calculate conditional probabilites of ratings('A','B','C') in ratings column.

    company     model    rating   type
0   ford       mustang     A      coupe
1   chevy      camaro      B      coupe
2   ford       fiesta      C      sedan
3   ford       focus       A      sedan
4   ford       taurus      B      sedan
5   toyota     camry       B      sedan

Output:

Prob(rating=A) = 0.333333 
Prob(rating=B) = 0.500000 
Prob(rating=C) = 0.166667 

Prob(type=coupe|rating=A) = 0.500000 
Prob(type=sedan|rating=A) = 0.500000 
Prob(type=coupe|rating=B) = 0.333333 
Prob(type=sedan|rating=B) = 0.666667 
Prob(type=coupe|rating=C) = 0.000000 
Prob(type=sedan|rating=C) = 1.000000 

Any help, Thanks..!!

You can use .groupby() and the built-in .div() :

rating_probs = df.groupby('rating').size().div(len(df))

rating
A    0.333333
B    0.500000
C    0.166667

and the conditional probs:

df.groupby(['type', 'rating']).size().div(len(df)).div(rating_probs, axis=0, level='rating')

coupe  A         0.500000
       B         0.333333
sedan  A         0.500000
       B         0.666667
       C         1.000000

You can use groupby :

In [2]: df = pd.DataFrame({'company': ['ford', 'chevy', 'ford', 'ford', 'ford', 'toyota'],
                     'model': ['mustang', 'camaro', 'fiesta', 'focus', 'taurus', 'camry'],
                     'rating': ['A', 'B', 'C', 'A', 'B', 'B'],
                     'type': ['coupe', 'coupe', 'sedan', 'sedan', 'sedan', 'sedan']})

In [3]: df.groupby('rating').count()['model'] / len(df)
Out[3]:
rating
A    0.333333
B    0.500000
C    0.166667
Name: model, dtype: float64

In [4]: (df.groupby(['rating', 'type']).count() / df.groupby('rating').count())['model']
Out[4]:
rating  type
A       coupe    0.500000
        sedan    0.500000
B       coupe    0.333333
        sedan    0.666667
C       sedan    1.000000
Name: model, dtype: float64

You need add reindex for add 0 values for missing pairs:

mux = pd.MultiIndex.from_product([df['rating'].unique(), df['type'].unique()])
s = (df.groupby(['rating', 'type']).count() / df.groupby('rating').count())['model']
s = s.reindex(mux, fill_value=0)
print (s)
A  coupe    0.500000
   sedan    0.500000
B  coupe    0.333333
   sedan    0.666667
C  coupe    0.000000
   sedan    1.000000
Name: model, dtype: float64

And another solution, thanks Zero :

s.unstack(fill_value=0).stack()

pd.crosstab(df.type, df.rating, margins=True, normalize="index")

   rating     A       B       C
   type                           
   coupe   0.500000  0.5  0.000000
   sedan   0.250000  0.5  0.250000
   All     0.333333  0.5  0.166667

Here the All row gives you probabilities for A, B, and C, now for conditional probabilities.

pd.crosstab(df.type, df.rating, margins=True, normalize="columns")

 rating   A      B       C     All
 type                                
 coupe   0.5  0.333333  0.0  0.333333
 sedan   0.5  0.666667  1.0  0.666667

Here your conditional probabilities are in the table for example conditional probability for a given type is a coupe and it has an A rating is 0.5 in row coupe and column A. Prob(type=coupe|rating=A) = 0.5

first, convert into a pandas dataframe. by doing so, you can take advantage of pandas' groupby methods.

collection = {"company": ["ford", "chevy", "ford", "ford", "ford", "toyota"],
              "model": ["mustang", "camaro", "fiesta", "focus", "taurus", "camry"],
              "rating": ["A", "B", "C", "A", "B", "B"],
              "type": ["coupe", "coupe", "sedan", "sedan", "sedan", "sedan"]}

df = pd.DataFrame(collection)

then, groupby based on events (ie rating).

df_s = df.groupby('rating')['type'].value_counts() / df.groupby('rating')['type'].count()
df_f = df_s.reset_index(name='cpt')
df_f.head()  # your conditional probability table

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM