[英]python pandas new column categorization based on conditions in other columns
Working with the following python pandas dataframe df: 使用以下python pandas dataframe df:
df = pd.DataFrame({'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678'],
'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678],
'product_category': ['X','X','Y','Y','X','Y','Y','X']})
transaction_id | product_id | product_category
A123 255472 X
A123 251235 X
B345 253764 Y
B345 257344 Y
C567 221577 X
C567 209809 Y
D678 223551 Y
D678 290678 X
I need to add another column "transaction_category", which looks at the transaction_id and which product categories are in the transaction_id. 我需要添加另一列“transaction_category”,它查看transaction_id以及transaction_id中的哪些产品类别。 This is the output I am looking for: 这是我要找的输出:
transaction_id | product_id | product_category | transaction_id
123 255472 X X only
123 251235 X X only
345 253764 Y Y only
345 257344 Y Y only
567 221577 X X & Y
567 209809 Y X & Y
678 223551 Y X & Y
678 290678 X X & Y
Please note that I have other columns in my dataframe that I am not using, so I guess I need to start with a grouby? 请注意,我的数据框中有其他列,我没有使用,所以我想我需要从一个群开始?
df2 = df.groupby(['transaction_id','product_category']).reset_index()
IIUC by using transform
and join
IIUC通过使用transform
和join
df.groupby('transaction_id').product_category.transform(lambda x : '&'.join(set(x)))
Out[468]:
0 X
1 X
2 Y
3 Y
4 X&Y
5 X&Y
6 X&Y
7 X&Y
Name: product_category, dtype: object
From scott match your expected out put : 来自scott匹配您的预期出局:
df['transaction_category']=df.groupby('transaction_id')['product_category'].transform(lambda x: x + ' only' if len(set(x)) < 2 else ' & '.join(set(x)))
df
Out[479]:
product_category product_id transaction_id transaction_category
0 X 255472 A123 X only
1 X 251235 A123 X only
2 Y 253764 B345 Y only
3 Y 257344 B345 Y only
4 X 221577 C567 X & Y
5 Y 209809 C567 X & Y
6 Y 223551 D678 X & Y
7 X 290678 D678 X & Y
the transform
method of the groupby
object allows your to add full-length columns back to your dataframe via assign
: groupby
对象的transform
方法允许您通过assign
将完整长度的列添加回数据帧:
import pandas
def squeezer(x):
_x = list(set(x.values))
if len(_x) == 1:
return '{} only'.format(_x[0])
else:
return ' & '.join(sorted(_x))
df = pandas.DataFrame({
'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678'],
'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678],
'product_category': ['X','X','Y','Y','X','Y','Y','X']
}).assign(
products=lambda df:
df.groupby(by=['transaction_id'])['product_category']
.transform(squeezer)
)
And we get: 我们得到:
product_category product_id transaction_id products
0 X 255472 A123 X only
1 X 251235 A123 X only
2 Y 253764 B345 Y only
3 Y 257344 B345 Y only
4 X 221577 C567 X & Y
5 Y 209809 C567 X & Y
6 Y 223551 D678 X & Y
7 X 290678 D678 X & Y
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.