[英]Creating summary from stock transactions table - current code execution SLOW
I have a table of stock transactions that looks like this...我有一个看起来像这样的股票交易表......
The account number may be duplicated many times, also that account may have also ordered the same product multiple times.帐号可能重复多次,该帐号也可能多次订购同一产品。
+------------+------------+------------+--------+---------+--------------+
| SA_ACCOUNT | SA_TRDATE | SA_TRVALUE | SA_QTY | SA_COST | SA_PRODUCT |
+------------+------------+------------+--------+---------+--------------+
| CSU1 | 23/03/2017 | 21.01 | 1 | 30 | W100/18 |
| AAA1 | 12/07/2018 | 38.04 | 6 | 19.8 | GPLR03REC800 |
| BWR1 | 01/11/2018 | 0 | -1 | 0 | W562/20 |
| CNT1 | 01/11/2018 | -2.22 | -1 | -1.23 | RX613S/12 |
| GBH1 | 15/09/2017 | 0 | 1 | 0 | COR2 |
+------------+------------+------------+--------+---------+--------------+
I want to output a table that has each account as a row and ALL products as a column - with a total sales value for that customer and a total pcs for the customer.我想 output 一个表,其中每个帐户作为一行,所有产品作为一列 - 该客户的总销售额和客户的总件数。
Expected output (there would be a lot more columns than the example below):预期的 output(列数将比下面的示例多得多):
+---------+----------+------------+---------------+-----------------+--------------+----------------+-----------+
| Account | MISC_PCS | MISC_VALUE | RX613S/12_PCS | RX613S/12_VALUE | R623S/12_PCS | R623S/12_VALUE | SP377_PCS |
+---------+----------+------------+---------------+-----------------+--------------+----------------+-----------+
| AGT1 | 25 | 32.65 | 2 | 5.26 | 0 | 0 | 0 |
| AHB1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| AHB2 | 0 | 0 | 0 | 0 | 2 | 1.25 | 0 |
| AJB1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| AJE2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| AJT4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| AJW1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| AK11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| AKS1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+---------+----------+------------+---------------+-----------------+--------------+----------------+-----------+
I've written the below code buts its terribly slow and although it works its unusable (i've got 300,000 rows to iterate through)我已经编写了下面的代码,但是它非常慢,尽管它无法使用(我有 300,000 行要迭代)
Can anyone offer a better solution?谁能提供更好的解决方案?
my code:我的代码:
acc=""
index_test = -1
test_df = pd.DataFrame()
#For every row in the dataframe, iterate through the list of genres and place a 1 into the corresponding column
for index, row in stock_tran_df.iterrows():
if acc != row["SA_DACCNT"]:
acc = row["SA_DACCNT"]
print(acc)
index_test += 1
test_df.loc[index_test,"Account"] = acc
try:
test_df.loc[index_test,row["SA_PRODUCT"] + "_PCS"] = test_df.loc[index_test,row["SA_PRODUCT"] + "_PCS"] + row["SA_QTY"]
test_df.loc[index_test,row["SA_PRODUCT"] + "_VALUE"] = test_df.loc[index_test,row["SA_PRODUCT"] + "_VALUE"] + row["SA_TRVALUE"]
except:
test_df.loc[index_test,row["SA_PRODUCT"] + "_PCS"] = row["SA_QTY"]
test_df.loc[index_test,row["SA_PRODUCT"] + "_VALUE"] = row["SA_TRVALUE"]
test_df.fillna(0,inplace=True)
Looks like what you're looking for is看起来你正在寻找的是
pandas.pivot_table pandas.pivot_table
function with parameter aggfunc=np.sum
function 参数aggfunc=np.sum
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.