简体   繁体   中英

Divide columns in pandas with different size dataframes

I'm facing a little challenge with pandas which i'm having a hard to time figure out.

I have created two dataframes with the following code

df5 = dataFrame[['PdDistrict' , 'Category']]
df5 = df5[pd.notnull(df5['PdDistrict'])]
df5 = df5.groupby(['Category', 'PdDistrict']).size()
df5 = df5.reset_index()
df5 = df5.sort_values(['PdDistrict',0], ascending=False)

df6 = df5.groupby('PdDistrict')[0].sum()
df6 = df6.reset_index()

This gives me two dataframes. df5 contains how many times a specific category occurs in a given district. eg

'Category'   'PdDistrict'  'count'
   Drugs       Bayview       200
   Theft       Bayview       200
   Gambling    Bayview       200
   Drugs       CENTRAL       300
   Theft       CENTRAL       300
   Gambling    CENTRAL       300

the df6 frame contains the total number of categories for a given PdDistrict.

This gives df6 the following look

'PdDistrict' 'total count'
  Bayview        600
  CENTRAL        900

Now what I want is that df5 gets to look like this eg:

'Category'   'PdDistrict'  'count'      'Average'
   Drugs       Bayview       200           0.33
   Theft       Bayview       200           0.33
   Gambling    Bayview       200           0.33
   Drugs       CENTRAL       200           0.22
   Theft       CENTRAL       200           0.22
   Gambling    CENTRAL       200           0.22

So it's basically taking count from df5 and dividing it by totalcount from df6, but for the same district. How can I do this?

res = df5.set_index('PdDistrict', append = False) / df6.set_index('PdDistrict', append = False)

The above gives me NaN in Category.

You could add the total count col to your first df and then you can perform the calculation:

In [45]:
df['total count'] = df['PdDistrict'].map(df1.set_index('PdDistrict')['total count'])
df

Out[45]:
   Category PdDistrict  count  total count
0     Drugs    Bayview    200          600
1     Theft    Bayview    200          600
2  Gambling    Bayview    200          600
3     Drugs    CENTRAL    300          900
4     Theft    CENTRAL    300          900
5  Gambling    CENTRAL    300          900

In [46]:
df['Average'] = df['count']/df['total count']
df

Out[46]:
   Category PdDistrict  count  total count   Average
0     Drugs    Bayview    200          600  0.333333
1     Theft    Bayview    200          600  0.333333
2  Gambling    Bayview    200          600  0.333333
3     Drugs    CENTRAL    300          900  0.333333
4     Theft    CENTRAL    300          900  0.333333
5  Gambling    CENTRAL    300          900  0.333333

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM