I am following this Datafish tutorial as I have been tasked to update a price list. There are over 5000 (Target) rows of data in the one dataframe and 900 (Source)in the other. I am stuck as to how to add (in the context of the tutorial) the difference that is produced by comparing the two dataframes to the second dataframe so as to update the second dataframe. Could someone point me in the direction I should be heading, which method or a snippet of how to add things.
The snippet from the tutorial here creates a price difference column (second line). I want to take that result and add it ti the Price2 column or if there is a way to simply use the True/False logic that is created in the first line and copy Price1 to Price2.
df1['pricesMatch?'] = np.where(df1['Price1'] == df2['Price2'], 'True', 'False')
df1['priceDiff?'] = np.where(df1['Price1'] == df2['Price2'], 0, df1['Price1'] - df2['Price2'])
Sample DataFrame
firstProductSet = {'Product1': ['Computer','Phone','Printer','Desk'],
'Price1': [1200,800,200,350]}
df1 = pd.DataFrame(firstProductSet,columns= ['Product1', 'Price1'])
secondProductSet = {'Product2': ['Computer','Phone','Printer','Desk'],
'Price2': [900,800,300,350]}
df2 = pd.DataFrame(secondProductSet,columns= ['Product2', 'Price2'])
IIUC then I would merge on products and then calculate the difference:
# Sample data
firstProductSet = {'Product1': ['Computer','Phone','Printer','Desk'],
'Price1': [1200,800,200,350]}
df1 = pd.DataFrame(firstProductSet,columns= ['Product1', 'Price1'])
secondProductSet = {'Product2': ['Computer','Phone','Printer','Desk'],
'Price2': [900,800,300,350]}
df2 = pd.DataFrame(secondProductSet,columns= ['Product2', 'Price2'])
# merge your frames together on products
df_m = df1.merge(df2, left_on='Product1', right_on='Product2')
# use .diff to calculate the difference in price
df_m['diff'] = df_m[['Price2', 'Price1']].diff(axis=1)['Price1']
Product1 Price1 Product2 Price2 diff
0 Computer 1200 Computer 900 300.0
1 Phone 800 Phone 800 0.0
2 Printer 200 Printer 300 -100.0
3 Desk 350 Desk 350 0.0
Also, the reason for using merge is because np.where
will compare data with the same index so if the products do not have the same index you will not get the expected result. For example if we move computer in df2 from index 0 to index 3.
firstProductSet = {'Product1': ['Computer','Phone','Printer','Desk'],
'Price1': [1200,800,200,350]}
df1 = pd.DataFrame(firstProductSet,columns= ['Product1', 'Price1'])
secondProductSet = {'Product2': ['Phone','Printer','Desk', 'Computer'],
'Price2': [800,300,350,900]}
df2 = pd.DataFrame(secondProductSet,columns= ['Product2', 'Price2'])
Then when you do np.where(df1['Price1'] == df2['Price2'], 'True', 'False')
every result will be false.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.