简体   繁体   中英

Python - Finding the average of a column given matched strings in another column

I'm trying to count the the number of products in a dataframe that contain words from a wordlist , and then find the average price of those products. The below attempt -

for word in wordlist:
    total_count += dframe.Product.str.contains(word, case=False).sum()
    total_price += dframe[dframe['Product'].str.contains(word)]['Price']
    print(dframe[dframe['Product'].str.contains(word)]['Price'])
average_price = total_price / total_count

returns the average_price as Series([], Name: Price, dtype: float64) and not a float value as expected.

What am I doing wrong?

Thanks!

Need sum of column Price per condition for scalar value:

total_count, total_price = 0, 0
for word in wordlist:
    total_count += dframe.Product.str.contains(word, case=False).sum()
    total_price += dframe.loc[dframe['Product'].str.contains(word), 'Price'].sum()
average_price = total_price / total_count

Or chache mask to variable for better readibility and performance:

total_count, total_price = 0, 0
for word in wordlist:
    mask = dframe.Product.str.contains(word, case=False)
    total_count += mask.sum()
    total_price += dframe.loc[mask, 'Price'].sum()

average_price = total_price / total_count

Solution should be simplify with regex word1|word2|word3 - | means or :

mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
total_count = mask.sum()
total_price = dframe.loc[mask, 'Price'].sum()

average_price = total_price / total_count

mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
average_price = dframe.loc[mask, 'Price'].mean()

Sample :

dframe = pd.DataFrame({
    'Product': ['a1','a2','a3','c1','c1','b','b2','c3','d2'],
    'Price': [1,3,5,6,3,2,3,5,2]
})
print (dframe)
   Price Product
0      1      a1
1      3      a2
2      5      a3
3      6      c1
4      3      c1
5      2       b
6      3      b2
7      5      c3
8      2      d2

wordlist = ['b','c']
mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
average_price = dframe.loc[mask, 'Price'].mean()
print (average_price)
3.8

You can use value function in order to avoid Series.

total_count += dframe.Product.str.contains(word, case=False).value.sum()

total_price += dframe[dframe['Product'].str.contains(word)]['Price'].value

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM