简体   繁体   中英

Conditional Sum of a column python pandas

I'm very new to python and pandas and was looking for some help. I'm working off of a CSV, and trying to use pandas to calculate totals for each name based on the value of the middle column. I want the sum of 'count' for each name for 'GEN' and 'NPR' added together. This is my dataset as a CSV:

StartingCSV.csv:

Name, Specialty, Count
Smith, GEN, 1
Smith, INT, 2
Smith, NPR, 5
Smith, PSC, 4
Zane, GEN, 3
Zane, PSC, 4
Zane, NPR, 4
Charles, NPR, 4
Charles, AUD, 4

Desired output:

Smith: 6
Zane: 7
Charles: 4

This is what I have so far:

import csv
import pandas as pd
df = pd.read_csv("StartingCSV.csv")
newdf = df.groupby("Name")
newdf.apply(lambda x: x[x['Specialty'] == 'NPR']['Count'].sum())

This is the output I get:

Smith: 5
Zane: 4
Charles: 4

This returns the number of NPR for each name, but I can't figure out a way to ADD the NPR values to the GEN values for each name, to create the desired output as listed above. Trying to add an "or" after 'NPR' in the final line returns an error like this:

newdf.apply(lambda x: x[x['Specialty'] == 'NPR' or 'GEN']['Count'].sum())

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Any help is appreciated! (sorry if I'm way off base or if this is unclear)

You can do it this way

df[df['Specialty'].isin(['NPR','GEN'])].groupby('Name').sum().reset_index()

with df[df['Specialty'].isin(['NPR','GEN'])] we are getting only those rows of the dataframe which have the value 'NPR' or 'GEN' in column 'Specialty'. After that it is the usual groupby and sum

Output

Name    Count
Charles   4
Smith     6
Zane      7

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM