简体   繁体   中英

Iterate through rows and columns, python

Could you, please, help me to crack the calculation?

I have the following table:

在此输入图像描述

What I need to do is to calculate the expected frequency as (row total * col total) / grand total

The expected result: 在此输入图像描述

I assume that I need to iterate through rows and columns. I have tried to do it with:

for i, row in df_dropped.iterrows():
for j, column in row.iteritems():
    data[row][column] = df_dropped.iloc[i, 3] * df_dropped.iloc[2, j]

The error appears: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

What am I doing wrong?

Use numpy.outer for outer product of last column and last row and divide by
scalar selected by loc to numpy array:

t = df.loc['col_sum', 'row_sum']
arr = np.outer(df['row_sum'], df.loc['col_sum']) / t

Then create DataFrame by contructor with indexing for remove last column ans row:

df1 = pd.DataFrame(arr[:-1, :-1], 
                   columns=df.columns[:-1],
                   index=df.index[:-1]).add_prefix('exp_')
print (df1)
   exp_satisfied  exp_neutral  exp_dissatisfied
0      24.605263    20.842105          9.552632
1     145.394737   123.157895         56.447368

Get new columns names:

cols = [item for x in df.columns[:-1] for item in (x, 'exp_' + x)]
print (cols)
['satisfied', 'exp_satisfied', 'neutral', 'exp_neutral', 'dissatisfied', 'exp_dissatisfied']

Join together by concat and reindex for expected ordering of columns:

df = pd.concat([df.iloc[:-1, :-1], df1], axis=1).reindex(columns=cols)
print (df)
   satisfied  exp_satisfied  neutral  exp_neutral  dissatisfied  \
0         30      24.605263       17    20.842105             8   
1        140     145.394737      127   123.157895            58   

   exp_dissatisfied  
0          9.552632  
1         56.447368  

Jezrael gave a great answer in which you are calculating the expected frequencies using numpy and pandas. You can also use the python statistical libary statsmodels to calculate these kinds of statistics.

For example to calculate a table of expected frequencies, you could do:

import statsmodels.api as sm
expected_values = sm.stats.Table(df).fittedvalues

More info on: statsmodels contingency tables

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM