[英]How to sum up values in a dataframe and add them to another one?
我有兩個數據框,一個用於個人交易,另一個用於會計科目表。
我試圖總結每個 CompanyKey 上個月(在本例中為 3 月)的所有交易。 然后我想將此結果作為新列添加到會計科目表 dataframe 中,CompanyKey 作為列 header。
這是交易數據的一小部分樣本(實際上有數千筆交易):
import pandas as pd
df = pd.DataFrame({
'CompanyKey': ["1","1","1","1","1","1","1","2","2","2"],
'DateOccurred': ["31/12/2021","25/02/2022","15/03/2022","31/03/2022","31/12/2021","22/02/2022","16/03/2022","31/12/2021","25/02/2022","31/03/2022"],
'Account.Name': ["Cash at Bank","Cash at Bank","Cash at Bank","Cash at Bank","GST Paid","GST Paid","GST Paid","Cash at Bank","Cash at Bank","Cash at Bank"],
'Amount': [150,112200,234065,19167.08,-39080.03,-10200,-27.5,15000,-234567,340697]})
以下是相應的會計科目表:
df1 = pd.DataFrame({
'ConsolidatedAccountName': ["Cash at Bank","GST Paid", "Cash at Bank", "GST Paid"],
'Level 1': ["Fund Statement","Fund Statement", "Cash Flow Statement", "Cash Flow Statement"],
'Level 2': ["Cash at Bank","GST Paid", "Cash at Bank", "GST Paid"]})
這是我想要的結果。 我只希望將總和應用於具有df['Level 1'] == "Fund Statement"
行。
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+
| ConsolidatedAccountName | Level 1 | Level 2 | Company 1 Sum | Company 2 Sum |
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+
| Cash at Bank | Fund Statement | Cash at Bank | 253,232.08 | 340,697 |
| GST Paid | Fund Statement | GST Paid | -27.50 | 0 |
| Cash at Bank | Cash Flow Statement | Cash at Bank | NaN | NaN |
| GST Paid | Cash Flow Statement | GST Paid | NaN | NaN |
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+
這是我遇到問題之前的情況。
company_keys = [1, 2]
for company in company_keys:
d1['Company 1 Sum'] = np.where((d3['CompanyKey'] == company) &
(d3['DateOccurred'] >= '01/03/2022') &
(d3['DateOccurred'] <= '31/03/2022') &
(d1['Level 1'] == 'Fund Statement'),
d3['Amount'].sum(),
0)
This is the error I get.
ValueError: Length of values (10) does not match length of index (4)
# Setup
df["DateOccurred"] = pd.to_datetime(df["DateOccurred"], format="%d/%m/%Y")
# Sum transactions per companies and accounts
df_sum = (
df.loc[df["DateOccurred"].dt.month == 3, :]
.groupby(["CompanyKey", "Account.Name"])
.agg({"Amount": sum})
)
# Add new columns
for idx in df["CompanyKey"].unique():
df1[f"Company {idx} Sum"] = df1.apply(
lambda x: df_sum.loc[(idx, x["ConsolidatedAccountName"]), "Amount"]
if (x["ConsolidatedAccountName"] in df_sum.loc[(idx), :].index.unique())
and (x["Level 1"] == "Fund Statement")
else None,
axis=1,
)
# Cleanup
df1.loc[df1["Level 1"] == "Fund Statement"] = df1.loc[
df1["Level 1"] == "Fund Statement"
].fillna(0)
然后:
print(df1)
# Output
ConsolidatedAccountName Level 1 Level 2 Company 1 Sum Company 2 Sum
0 Cash at Bank Fund Statement Cash at Bank 253232.08 340697.0
1 GST Paid Fund Statement GST Paid -27.50 0.0
2 Cash at Bank Cash Flow Statement Cash at Bank NaN NaN
3 GST Paid Cash Flow Statement GST Paid NaN NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.