How to sum a column where row equals a certain value

Question

I have a dataframe with several columns. The relevant ones are 'Search term' and 'Clicks'. 'Search term' might be something like "widget maker" or "widget app". Because there will undoubtedly be multiple entries for each, I want to sum the Clicks for all rows that start with "widget". I've been tinkering with it for hours and this is as close as I've come but it still doesn't work:

df = pd.read_csv('/Users/me/widgets.csv', encoding='utf8')

column_names = df.columns.values

df.loc[df['Search term'] == df['Search term'].str.extract(r'^(widget.*)'), ['Clicks']].sum()

This is as close as I've come, but it runs VERY slow and returns:

Users/me/opt/anaconda3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)

[insert insanely long Traceback]

TypeError: 'int' object is not iterable

Any ideas what I'm doing wrong here?

Answer 1

IIUC that you just want a scalar, you could use a mask.

data = {
    'Clicks':[2,3,4,5,6,7,8,9,10,2,3,4,5],
    'Search term':['t1','t2','t3','t4','widget','widget','t7','t8','t9','t10','widget','t12','t13']
}

df = pd.DataFrame(data)

print(df)

    Clicks  Search term
0   2   t1
1   3   t2
2   4   t3
3   5   t4
4   6   widget
5   7   widget
6   8   t7
7   9   t8
8   10  t9
9   2   t10
10  3   widget
11  4   t12
12  5   t13

mask = (df['Search term'].astype(str).str.contains('widget'))

the_sum  = df.loc[mask, 'Clicks'].sum()

print(the_sum)

16

How to sum a column where row equals a certain value

Question

1 answers

solution1
0 2020-05-05 18:09:15

How to sum a column where row equals a certain value

Question

1 answers

solution1 0 2020-05-05 18:09:15

solution1
0 2020-05-05 18:09:15