簡體   English   中英

如何根據 Python 中的條件合並兩行 pandas dataframe?

[英]How to merge two rows of a pandas dataframe depending on a condition in Python?

我有一個dataframe

           order_creationdate orderid productid  quantity             prod_name  price  Amount
0  2021-01-18 22:27:03.341260       1     SnyTV       3.0           Sony LED TV  412.0  1236.0
1  2021-01-18 17:28:03.343089       1     AMDR5       1.0           AMD Ryzen 5  313.0   313.0
2  2021-01-18 13:19:03.343842       1     INTI0       8.0             Intel I10  146.0  1168.0
3  2021-01-18 10:24:03.344399       1     INTI0       5.0             Intel I10  146.0   730.0
4  2021-01-18 12:29:03.344880       1     CMCFN       4.0  coolermaster CPU FAN  675.0  2700.0

索引 2 和 3 具有相同的產品 ID,因此其順序相同,因此我試圖將這些行合並為一行,以獲得:

INTI0        13 .0       146.0       1898.0

最終的df是:

           order_creationdate orderid productid  quantity             prod_name  price  Amount
0  2021-01-18 22:27:03.341260       1     SnyTV       3.0           Sony LED TV  412.0  1236.0
1  2021-01-18 17:28:03.343089       1     AMDR5       1.0           AMD Ryzen 5  313.0   313.0
2  2021-01-18 13:19:03.343842       1     INTI0       13.0         Intel I10    146.0  1898.0
3  2021-01-18 12:29:03.344880       1     CMCFN       4.0  coolermaster CPU FAN  675.0  2700.0

我試過使用df.groupby function:

df2['productid'] =df2['productid'].astype('str')

arr = np.sort(df2[['productid','quantity']], axis=1)

df2 = (df2.groupby([arr[:, 0],arr[:, 1]])
       .agg({'price':'sum', 'Amount':'sum'})
       .rename_axis(('X','Y'))
       .reset_index())
print(df2)

但它會引發數據類型錯誤

File "/home/anti/Documents/db/create_rec.py", line 65, in <module>
    arr = np.sort(df2[['productid','quantity']], axis=1)
  File "<__array_function__ internals>", line 5, in sort
  File "/home/anti/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 991, in sort
    a.sort(axis=axis, kind=kind, order=order)
TypeError: '<' not supported between instances of 'float' and 'str'

嘗試:

df2 = df2.groupby('productid').agg({'quantity':'sum','Amount':'sum'}).reset_index()
df2.groupby(['productid', 'orderid'], as_index=False).agg(
    {'quantity': sum, 'Amount': sum, 'order_creationdate': min, 'prod_name': min, 'price': min}
)

output 是:

  productid  orderid  quantity  Amount         order_creationdate             prod_name  price
0     AMDR5        1       1.0   313.0 2021-01-18 17:28:03.343089           AMD Ryzen 5  313.0
1     CMCFN        1       4.0  2700.0 2021-01-18 12:29:03.344880  coolermaster CPU FAN  675.0
2     INTI0        1      13.0  1898.0 2021-01-18 10:24:03.344399             Intel I10  146.0
3     SnyTV        1       3.0  1236.0 2021-01-18 22:27:03.341260           Sony LED TV  412.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM