简体   繁体   中英

Check one column with strings and get sum of values from second column (pythonic way)

Given this data frame:

d = {'SITE':['AB', 'ON', 'YO', 'YO', 'AB'],
 'MARK':['ss', 'ss', 'tt', 'ss', 'tt'], 
 'SIZE': [4, 5, 2, 3, 4]} 

ex_df = pd.DataFrame(data=d) 

在此处输入图像描述

To get the column['SIZE'] sum for only the column['SITE'] == 'AB' one can slice the AB containing only df using AB_df = ex_df[ex_df.SITE == 'AB'] and then AB_df.SIZE.sum() , which is 8 .

However, given a similar data frame with 10,000+ rows and 12 columns, and over 40 unique column['SITE'] strings.

Q1: How can you get the SIZE sum for each SITE without having to write 40 lines of the same code as above (changing the SITE name).

Q2: How can you add more conditions, such as check that if a condition matches two columns, the SITE & MARK , and then get the SIZE sum, without having to again write 40 lines of repetitive code.

I'd like to save the result either in a list containing the sums or dictionary with the site and sum {AB:8, ON:5, ...} or even a new data frame with that information.

I've tried to use a list of the 40 unique sites to iterate through the data frame column, but without success given length differences, etc.

I'm looking to make this pythonic ideally. Thanks!

Q1 Can be accomplished with a groupby in Pandas:

grouped_df = ex_df.groupby('SITE').agg({'SIZE': 'sum'}

To accomplish Q2, you likely need to implement a custom function to pass to the .agg call, something like:

def my_filter(df: pd.Dataframe):
    # Filters can be modified as needed
    return df[df['SITE'].startswith('A') & df['MARK'] == 'tt']['SIZE'].sum()

grouped_df = ex_df.groupby('SITE').agg(my_filter)

However, if your goal for Q2 is simply to group the rows by SITE AND MARK, you can do:

grouped_df = ex_df.groupby(['SITE', 'MARK']).agg({'SIZE': 'sum'})

Then you don't have to worry about writing a custom filtering function.

IIUC this should give you a sum of each SITE for each row:

ex_df['Max'] = ex_df.groupby(['SITE'])['SIZE'].transform(sum)

If not please clarify further for further assistance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM