[英]count the frequency of float64 or int64 with not equal(!=)
我知道有很多帖子,但這並不能解決我的問題。
我的數據框是這樣的:
df1 = [{"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator" : "k","Money" : 100},
{"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator": "k","Money" : 200},
{"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator" : "D", "Money" : 0}]
df1 = pd.DataFrame(df1)
df1
Account Name Customer Number Debit/Credit Indicator Money
Sunarto AFIMBN01000BCA17030001177 k 100
Sunarto AFIMBN01000BCA17030001177 k 200
Sunarto AFIMBN01000BCA17030001177 D 0
Account Name object
Customer Number object
Debit/Credit Indicator object
Money int64 (or let's say float64)
我想根據“金錢”來計算頻率
如果 Money為0,則不計算在內。
我已經嘗試過df1["Money"].value_counts()
不起作用
df1.loc[df1["Money"] != 0, "Per item"] = df1["Money"].value_counts()
df1
Account Name Customer Number Debit/Credit Indicator Money Per item
Sunarto AFIMBN01000BCA17030001177 k 100 1
Sunarto AFIMBN01000BCA17030001177 k 200 NaN
Sunarto AFIMBN01000BCA17030001177 D 0 NaN
但我的期望是
Account Name Customer Number Debit/Credit Indicator Money Per item
Sunarto AFIMBN01000BCA17030001177 k 100 1
Sunarto AFIMBN01000BCA17030001177 k 200 1
Sunarto AFIMBN01000BCA17030001177 D 0 0
因此,當我在數據透視中應用時,我的期望是,我可以獲得具有“金錢”價值的商品
我的期望
gdf = pd.pivot_table(df1, index = ["Account Name","Customer Number"],values = ["Money", "Per item"],aggfunc = np.sum)
gdf.head()
Money Per item
Account Name Customer Number
Sunarto AFIMBN01000BCA17030001177 300 2.0
您需要為每個條件分配1
:
df1.loc[df1["Money"] != 0, "Per item"] = 1
或將布爾值掩碼轉換為整數:
df1["Per item"] = (df1["Money"] != 0).astype(int)
不帶數據pivot_table
另一種解決方案:
gdf = (df1.groupby(["Account Name","Customer Number"])['Money']
.agg([('Money','sum'), ('Per item', lambda x: x.ne(0).sum())]))
print (gdf)
Money Per item
Account Name Customer Number
Sunarto AFIMBN01000BCA17030001177 300 2
編輯:
我可以知道為什么我的代碼不起作用嗎?
問題是Series.value_counts
返回帶有計數器值的Series,但是索引值是由原始Series
值創建的,此處為100, 200
。 因此索引不匹配並獲得缺失值。 解決方法是使用Series.map
:
df1.loc[df1["Money"] != 0, "Per item"] = df1["Money"].map(df1["Money"].value_counts())
print (df1)
Account Name Customer Number Debit/Credit Indicator Money \
0 Sunarto AFIMBN01000BCA17030001177 k 100
1 Sunarto AFIMBN01000BCA17030001177 k 200
2 Sunarto AFIMBN01000BCA17030001177 D 0
Per item
0 1.0
1 1.0
2 NaN
但是,如果有多個重復的值,那么這不是問題,而是沒有分配1
而是計數器值並得到錯誤的輸出,這里,將200
值加倍會錯誤地返回4
值,而不是2
:
df1 = [{"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator" : "k","Money" : 200},
{"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator": "k","Money" : 200},
{"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator" : "D", "Money" : 0}]
df1 = pd.DataFrame(df1)
df1.loc[df1["Money"] != 0, "Per item"] = df1["Money"].map(df1["Money"].value_counts())
print (df1)
Account Name Customer Number Debit/Credit Indicator Money \
0 Sunarto AFIMBN01000BCA17030001177 k 200
1 Sunarto AFIMBN01000BCA17030001177 k 200
2 Sunarto AFIMBN01000BCA17030001177 D 0
Per item
0 2.0
1 2.0
2 NaN
gdf = pd.pivot_table(df1, index = ["Account Name","Customer Number"],values = ["Money", "Per item"],aggfunc = np.sum)
print (gdf)
Money Per item
Account Name Customer Number
Sunarto AFIMBN01000BCA17030001177 400 4.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.