[英]How to count the number of true values in a pandas DF column based on another column?
我有一個與此類似的 df:
import pandas as pd
import as np
df = pd.DataFrame({'district':['A','B','C','D','B','B','A'], 'price': [10.0,20.0,50.0,27.0,21.0,19.0,12.0]})
我正在嘗試開發 function 以查看在預算限制的情況下哪個區提供最多的住宿機會。
如果我通過了以下 function:
def district_recommender(df, min_budget = 17.0, max_budget = 30.0)
一旦調用,output 應該是:
"For your specified budget {} - {}, the B District has 3 opportunities within your criteria.format(min_budget, max_budget)"
我嘗試了許多不同的嘗試,但沒有多大成功。 這就是我想出的。
def district_recommender(df, min_budget, max_budget):
columns = df[['district', 'price']].copy()
for i in columns.iloc[:, 0]:
if columns.iloc[:, 1] >= min_budget and columns.iloc[:, 1] <= max_budget:
return columns.iloc[:, 0].count().sum()
我認為我在從 if 條件計算 boolean 值時遇到問題,因為我收到的錯誤消息是:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我們歡迎所有的建議! 謝謝
解決這個問題的一種方法是對價格進行過濾,然后只取結果集的value_counts
,這將為您提供每個地區的計數。
In [350]: df[df['price'].between(17, 30)]['district'].value_counts()
Out[350]:
B 3
D 1
Name: district, dtype: int64
然后您可以使用它來獲取您的敘述 output:
In [354]: vc = df[df['price'].between(17, 30)]['district'].value_counts()
In [355]: print(f"District {vc.idxmax()} has {vc.max()} options in your range.")
District B has 3 options in your range.
您可以迭代不同的district
值,並使用Series.between
為每個檢查所有價格是否在范圍內
def district_recommender(df, min_budget=17.0, max_budget=30.0):
for district in df['district'].unique():
values = df[df['district'] == district]['price']
if values.between(min_budget, max_budget).all():
return f"For your specified budget [{min_budget};{max_budget}], the {district} " \
f"District has {len(values)} opportunities"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.