[英]python pandas new column categorization based on conditions in other columns
使用以下python pandas dataframe df:
df = pd.DataFrame({'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678'],
'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678],
'product_category': ['X','X','Y','Y','X','Y','Y','X']})
transaction_id | product_id | product_category
A123 255472 X
A123 251235 X
B345 253764 Y
B345 257344 Y
C567 221577 X
C567 209809 Y
D678 223551 Y
D678 290678 X
我需要添加另一列“transaction_category”,它查看transaction_id以及transaction_id中的哪些產品類別。 這是我要找的輸出:
transaction_id | product_id | product_category | transaction_id
123 255472 X X only
123 251235 X X only
345 253764 Y Y only
345 257344 Y Y only
567 221577 X X & Y
567 209809 Y X & Y
678 223551 Y X & Y
678 290678 X X & Y
請注意,我的數據框中有其他列,我沒有使用,所以我想我需要從一個群開始?
df2 = df.groupby(['transaction_id','product_category']).reset_index()
IIUC通過使用transform
和join
df.groupby('transaction_id').product_category.transform(lambda x : '&'.join(set(x)))
Out[468]:
0 X
1 X
2 Y
3 Y
4 X&Y
5 X&Y
6 X&Y
7 X&Y
Name: product_category, dtype: object
來自scott匹配您的預期出局:
df['transaction_category']=df.groupby('transaction_id')['product_category'].transform(lambda x: x + ' only' if len(set(x)) < 2 else ' & '.join(set(x)))
df
Out[479]:
product_category product_id transaction_id transaction_category
0 X 255472 A123 X only
1 X 251235 A123 X only
2 Y 253764 B345 Y only
3 Y 257344 B345 Y only
4 X 221577 C567 X & Y
5 Y 209809 C567 X & Y
6 Y 223551 D678 X & Y
7 X 290678 D678 X & Y
groupby
對象的transform
方法允許您通過assign
將完整長度的列添加回數據幀:
import pandas
def squeezer(x):
_x = list(set(x.values))
if len(_x) == 1:
return '{} only'.format(_x[0])
else:
return ' & '.join(sorted(_x))
df = pandas.DataFrame({
'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678'],
'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678],
'product_category': ['X','X','Y','Y','X','Y','Y','X']
}).assign(
products=lambda df:
df.groupby(by=['transaction_id'])['product_category']
.transform(squeezer)
)
我們得到:
product_category product_id transaction_id products
0 X 255472 A123 X only
1 X 251235 A123 X only
2 Y 253764 B345 Y only
3 Y 257344 B345 Y only
4 X 221577 C567 X & Y
5 Y 209809 C567 X & Y
6 Y 223551 D678 X & Y
7 X 290678 D678 X & Y
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.