I'm trying to count the the number of products in a dataframe that contain words from a wordlist
, and then find the average price of those products. The below attempt -
for word in wordlist:
total_count += dframe.Product.str.contains(word, case=False).sum()
total_price += dframe[dframe['Product'].str.contains(word)]['Price']
print(dframe[dframe['Product'].str.contains(word)]['Price'])
average_price = total_price / total_count
returns the average_price
as Series([], Name: Price, dtype: float64)
and not a float value as expected.
What am I doing wrong?
Thanks!
Need sum
of column Price
per condition for scalar value:
total_count, total_price = 0, 0
for word in wordlist:
total_count += dframe.Product.str.contains(word, case=False).sum()
total_price += dframe.loc[dframe['Product'].str.contains(word), 'Price'].sum()
average_price = total_price / total_count
Or chache mask
to variable for better readibility and performance:
total_count, total_price = 0, 0
for word in wordlist:
mask = dframe.Product.str.contains(word, case=False)
total_count += mask.sum()
total_price += dframe.loc[mask, 'Price'].sum()
average_price = total_price / total_count
Solution should be simplify with regex word1|word2|word3
- |
means or
:
mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
total_count = mask.sum()
total_price = dframe.loc[mask, 'Price'].sum()
average_price = total_price / total_count
mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
average_price = dframe.loc[mask, 'Price'].mean()
Sample :
dframe = pd.DataFrame({
'Product': ['a1','a2','a3','c1','c1','b','b2','c3','d2'],
'Price': [1,3,5,6,3,2,3,5,2]
})
print (dframe)
Price Product
0 1 a1
1 3 a2
2 5 a3
3 6 c1
4 3 c1
5 2 b
6 3 b2
7 5 c3
8 2 d2
wordlist = ['b','c']
mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
average_price = dframe.loc[mask, 'Price'].mean()
print (average_price)
3.8
You can use value function in order to avoid Series.
total_count += dframe.Product.str.contains(word, case=False).value.sum()
total_price += dframe[dframe['Product'].str.contains(word)]['Price'].value
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.