Could you, please, help me to crack the calculation?
I have the following table:
What I need to do is to calculate the expected frequency as (row total * col total) / grand total
I assume that I need to iterate through rows and columns. I have tried to do it with:
for i, row in df_dropped.iterrows():
for j, column in row.iteritems():
data[row][column] = df_dropped.iloc[i, 3] * df_dropped.iloc[2, j]
The error appears: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
What am I doing wrong?
Use numpy.outer
for outer product of last column and last row and divide by
scalar selected by loc
to numpy array:
t = df.loc['col_sum', 'row_sum']
arr = np.outer(df['row_sum'], df.loc['col_sum']) / t
Then create DataFrame by contructor with indexing for remove last column ans row:
df1 = pd.DataFrame(arr[:-1, :-1],
columns=df.columns[:-1],
index=df.index[:-1]).add_prefix('exp_')
print (df1)
exp_satisfied exp_neutral exp_dissatisfied
0 24.605263 20.842105 9.552632
1 145.394737 123.157895 56.447368
Get new columns names:
cols = [item for x in df.columns[:-1] for item in (x, 'exp_' + x)]
print (cols)
['satisfied', 'exp_satisfied', 'neutral', 'exp_neutral', 'dissatisfied', 'exp_dissatisfied']
Join together by concat
and reindex
for expected ordering of columns:
df = pd.concat([df.iloc[:-1, :-1], df1], axis=1).reindex(columns=cols)
print (df)
satisfied exp_satisfied neutral exp_neutral dissatisfied \
0 30 24.605263 17 20.842105 8
1 140 145.394737 127 123.157895 58
exp_dissatisfied
0 9.552632
1 56.447368
Jezrael gave a great answer in which you are calculating the expected frequencies using numpy and pandas. You can also use the python statistical libary statsmodels to calculate these kinds of statistics.
For example to calculate a table of expected frequencies, you could do:
import statsmodels.api as sm
expected_values = sm.stats.Table(df).fittedvalues
More info on: statsmodels contingency tables
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.