I have one dataframe called products
that looks like this:
order_number sku units revenue
1 5000 754 1 20.0
2 5000 900 4 30.0
3 5001 754 2 40.0
4 5002 754 10 200.0
. ... ... .. ...
and another called orders
that looks like this
date order_number units revenue country new_customer ...
1 1-jan 5000 5 50.0 russia yes
2 1-jan 5001 2 40.0 china yes
3 2-jan 5002 10 200.0 france no
4 2-jan 5003 1 70.0 brazil yes
. .... ... .. ... ...
I would like to create a single dataframe, which has the rows from the products
dataframe but additionally has the columns from the orders
dataframe, where the order number in orders
matches the order number in products
.
I've tried to find a way to express this via both pandas.concat
and pandas.merge
, but I can't get around the problem that the key I'm joining on (order_number) is unique in the orders
dataframe but not in the products
dataframe.
How do I do a many-to-one join like this in pandas?
I think you are looking for join
(you have to provide a suffix since you have a duplicate column revenue
):
>>> import pandas as pd
>>> products = pd.DataFrame({'order_number': [5000, 5000, 5001, 5002, 5004],
... 'sku': [ 754, 900, 754, 754, 900],
... 'revenue': [20.0, 30.0, 40.0,200.0, 90.0]})
>>> orders = pd.DataFrame({'order_number': [5000, 5001, 5002, 5003],
... 'units': [ 5, 2, 10, 1],
... 'revenue': [50.0, 40.0,200.0, 70.0]})
>>> products.join(orders.set_index('order_number'), 'order_number', rsuffix='_o')
order_number revenue sku revenue_o units
0 5000 20 754 50 5
1 5000 30 900 50 5
2 5001 40 754 40 2
3 5002 200 754 200 10
4 5004 90 900 NaN NaN
Edit : the same result can be achieved with products.merge(orders, 'left', 'order_number', suffixes=('', '_o'))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.