How to calculate conditional probability of values in dataframe pandas-python?

Question

I want to calculate conditional probabilites of ratings('A','B','C') in ratings column.

    company     model    rating   type
0   ford       mustang     A      coupe
1   chevy      camaro      B      coupe
2   ford       fiesta      C      sedan
3   ford       focus       A      sedan
4   ford       taurus      B      sedan
5   toyota     camry       B      sedan

Output:

Prob(rating=A) = 0.333333 
Prob(rating=B) = 0.500000 
Prob(rating=C) = 0.166667 

Prob(type=coupe|rating=A) = 0.500000 
Prob(type=sedan|rating=A) = 0.500000 
Prob(type=coupe|rating=B) = 0.333333 
Prob(type=sedan|rating=B) = 0.666667 
Prob(type=coupe|rating=C) = 0.000000 
Prob(type=sedan|rating=C) = 1.000000

Any help, Thanks..!!

Answer 1

You can use .groupby() and the built-in .div() :

rating_probs = df.groupby('rating').size().div(len(df))

rating
A    0.333333
B    0.500000
C    0.166667

and the conditional probs:

df.groupby(['type', 'rating']).size().div(len(df)).div(rating_probs, axis=0, level='rating')

coupe  A         0.500000
       B         0.333333
sedan  A         0.500000
       B         0.666667
       C         1.000000

Answer 2

You can use groupby :

In [2]: df = pd.DataFrame({'company': ['ford', 'chevy', 'ford', 'ford', 'ford', 'toyota'],
                     'model': ['mustang', 'camaro', 'fiesta', 'focus', 'taurus', 'camry'],
                     'rating': ['A', 'B', 'C', 'A', 'B', 'B'],
                     'type': ['coupe', 'coupe', 'sedan', 'sedan', 'sedan', 'sedan']})

In [3]: df.groupby('rating').count()['model'] / len(df)
Out[3]:
rating
A    0.333333
B    0.500000
C    0.166667
Name: model, dtype: float64

In [4]: (df.groupby(['rating', 'type']).count() / df.groupby('rating').count())['model']
Out[4]:
rating  type
A       coupe    0.500000
        sedan    0.500000
B       coupe    0.333333
        sedan    0.666667
C       sedan    1.000000
Name: model, dtype: float64

Answer 3

You need add reindex for add 0 values for missing pairs:

mux = pd.MultiIndex.from_product([df['rating'].unique(), df['type'].unique()])
s = (df.groupby(['rating', 'type']).count() / df.groupby('rating').count())['model']
s = s.reindex(mux, fill_value=0)
print (s)
A  coupe    0.500000
   sedan    0.500000
B  coupe    0.333333
   sedan    0.666667
C  coupe    0.000000
   sedan    1.000000
Name: model, dtype: float64

And another solution, thanks Zero :

s.unstack(fill_value=0).stack()

Answer 4

pd.crosstab(df.type, df.rating, margins=True, normalize="index")

   rating     A       B       C
   type                           
   coupe   0.500000  0.5  0.000000
   sedan   0.250000  0.5  0.250000
   All     0.333333  0.5  0.166667

Here the All row gives you probabilities for A, B, and C, now for conditional probabilities.

pd.crosstab(df.type, df.rating, margins=True, normalize="columns")

 rating   A      B       C     All
 type                                
 coupe   0.5  0.333333  0.0  0.333333
 sedan   0.5  0.666667  1.0  0.666667

Here your conditional probabilities are in the table for example conditional probability for a given type is a coupe and it has an A rating is 0.5 in row coupe and column A. Prob(type=coupe|rating=A) = 0.5

Answer 5

first, convert into a pandas dataframe. by doing so, you can take advantage of pandas' groupby methods.

collection = {"company": ["ford", "chevy", "ford", "ford", "ford", "toyota"],
              "model": ["mustang", "camaro", "fiesta", "focus", "taurus", "camry"],
              "rating": ["A", "B", "C", "A", "B", "B"],
              "type": ["coupe", "coupe", "sedan", "sedan", "sedan", "sedan"]}

df = pd.DataFrame(collection)

then, groupby based on events (ie rating).

df_s = df.groupby('rating')['type'].value_counts() / df.groupby('rating')['type'].count()
df_f = df_s.reset_index(name='cpt')
df_f.head()  # your conditional probability table

How to calculate conditional probability of values in dataframe pandas-python?

Question

5 answers

solution1
15 2016-06-14 17:19:39

solution2
4 2016-06-14 17:16:48

solution3
3 2017-10-01 18:03:56

solution4
0 2020-12-30 20:28:27

solution5
-1 2019-04-15 01:56:02

How to calculate conditional probability of values in dataframe pandas-python?

Question

5 answers

solution1 15 2016-06-14 17:19:39

solution2 4 2016-06-14 17:16:48

solution3 3 2017-10-01 18:03:56

solution4 0 2020-12-30 20:28:27

solution5 -1 2019-04-15 01:56:02

solution1
15 2016-06-14 17:19:39

solution2
4 2016-06-14 17:16:48

solution3
3 2017-10-01 18:03:56

solution4
0 2020-12-30 20:28:27

solution5
-1 2019-04-15 01:56:02