简体   繁体   中英

How to sum a column where row equals a certain value

I have a dataframe with several columns. The relevant ones are 'Search term' and 'Clicks'. 'Search term' might be something like "widget maker" or "widget app". Because there will undoubtedly be multiple entries for each, I want to sum the Clicks for all rows that start with "widget". I've been tinkering with it for hours and this is as close as I've come but it still doesn't work:

df = pd.read_csv('/Users/me/widgets.csv', encoding='utf8')

column_names = df.columns.values

df.loc[df['Search term'] == df['Search term'].str.extract(r'^(widget.*)'), ['Clicks']].sum()

This is as close as I've come, but it runs VERY slow and returns:

Users/me/opt/anaconda3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)

[insert insanely long Traceback]

TypeError: 'int' object is not iterable

Any ideas what I'm doing wrong here?

IIUC that you just want a scalar, you could use a mask.

data = {
    'Clicks':[2,3,4,5,6,7,8,9,10,2,3,4,5],
    'Search term':['t1','t2','t3','t4','widget','widget','t7','t8','t9','t10','widget','t12','t13']
}

df = pd.DataFrame(data)

print(df)

    Clicks  Search term
0   2   t1
1   3   t2
2   4   t3
3   5   t4
4   6   widget
5   7   widget
6   8   t7
7   9   t8
8   10  t9
9   2   t10
10  3   widget
11  4   t12
12  5   t13

mask = (df['Search term'].astype(str).str.contains('widget'))

the_sum  = df.loc[mask, 'Clicks'].sum()

print(the_sum)

16

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM