简体   繁体   中英

Pandas division of two columns with groupby

This is obviously simple, but as a pandas newbe I'm getting stuck.

I have a CSV file that contains 3 columns, the State, bene_1_count, and bene_2_count.

I want to calculate the ratio of 'bene_1_count' and 'bene_2_count' in a given state.

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
           'bene_1_count': [np.random.randint(10000, 99999)
                     for _ in range(12)],
           'bene_2_count': [np.random.randint(10000, 99999)
                     for _ in range(12)]})

I am trying the following, but it is giving me an error: 'No objects to concatenate'

df['ratio'] = df.groupby(['state']).agg(df['bene_1_count']/df['bene_2_count'])

I am not able to figure out how to "reach up" to the state level of the groupby to take the ratio of columns.

I want the ratio of columns wrt a state, like I want my output as follows:

    State       ratio

    CA  
    WA  
    CO  
    AZ  

Alternatively, stated: You can create custom functions that accept a dataframe. The groupby will return sub-dataframes. You can then use the apply function to apply your custom function to each sub-dataframe.

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
           'bene_1_count': [np.random.randint(10000, 99999)
                     for _ in range(12)],
           'bene_2_count': [np.random.randint(10000, 99999)
                     for _ in range(12)]})

def divide_two_cols(df_sub):
    return df_sub['bene_1_count'].sum() / float(df_sub['bene_2_count'].sum())

df.groupby('state').apply(divide_two_cols)

Now say you want each row to be divided by the sum of each group (eg, the total sum of AZ) and also retain all the original columns. Just adjust the above function (change the calculation and return the whole sub dataframe):

def divide_two_cols(df_sub):
    df_sub['divs'] = df_sub['bene_1_count'] / float(df_sub['bene_2_count'].sum())
    return df_sub

df.groupby('state').apply(divide_two_cols)

I believe what you first need to do is sum the counts by state before finding the ratio. You can use apply to access the other columns in the df, and then store them in a dictionary to map to the corresponding state in the original dataframe.

import pandas as pd
import numpy as np
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
            'bene_1_count': [np.random.randint(10000, 99999)
                      for _ in range(12)],
            'bene_2_count': [np.random.randint(10000, 99999)
                      for _ in range(12)]})

ratios = df.groupby('state').apply(lambda x: x['bene_1_count'].sum() /
                                   x['bene_2_count'].sum().astype(float)).to_dict()

df['ratio'] = df['state'].map(ratios)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM