简体   繁体   中英

Merge two data frame by comparing values but not the column name

DataFrame 1 - Price of Fruits by date (Index is a date )

fruits_price = {'Apple': [9,5,14],
                'Orange': [10,12,10],
                'Kiwi': [5,4,20],
                'Watermelon': [4.4,5.4,6.4]}
df1 = pd.DataFrame(fruits_price,
                  columns = ['Apple','Orange','Kiwi','Watermelon'],
                  index=['2020-01-01','2020-01-02','2020-01-10'])

    date        Apple    Oranges  Kiwi   Watermelon ... Fruit_100
    2020-01-01   9        10       5     4.4
    2002-01-02   5        12       4     5.4
    ...
    2002-12-10   14       10       20    6.4  

Dataframe 2 (Top fruits by Rank) (Index is a date )

top_fruits = {'Fruit_1': ['Apple','Apple','Apple'],
              'Fruit_2': ['Kiwi','Orange','Kiwi'],
              'Fruit_3': ['Orange','Watermelon','Watermelon'],
              'Fruit_4': ['Watermelon','Kiwi','Orange']}
    
df2 = pd.DataFrame(top_fruits, 
                   columns = ['Fruit_1','Fruit_2','Fruit_3','Fruit_4'],
                   index=['2020-01-01','2020-01-02','2020-01-10'])

   date        Fruit_1  Fruit_2   Fruit_3        Fruit_4         ... Fruit_100
   2020-01-01   Apple   Kiwi      Oranges        Watermelon      Pineapple 
   2002-01-02   Apple   Oranges   Watermelon     Kiwi            Pineapple
   ...
   2002-12-10   Apple   Kiwi      Watermelon     Oranges         Pineapple

I want DataFrame 3 (Price of the top fruit for the given date) which actually tells me the price of the top fruit at the given date

    date        Price_1    Price_2   Price_3     Price_4 ..... Price_100 
    2020-01-01   9        5          10           4.4
    2002-01-02   5        12         5.4          4
    ...
    2002-12-10   14       20         6.4          10

Spent almost 1 night and have tried iterating Dataframe 2 and then Inner loop on DataFrame 1 and added values to DataFrame 3. I have I tried almost 6-7 different ways by iterrow,iteritems, and then storing output directly via iloc to df3. None of those worked.

Just wondering there is an easier way to do this. This I will later then multiply with sales of fruits in the same dataframe formate.

Just use apply function with axis=1, what this does is row by row, and each row is a series, its name is the date, replace the value with corresponding row in df1.

df2.apply(lambda x: x.replace(df1.to_dict('index')[x.name]), axis=1)

Make a dict by df1, and then use replace on df2:

import pandas as pd

fruits_price = {'Apple': [9,5,14],
            'Orange': [10,12,10],
            'Kiwi': [5,4,20],
            'Watermelon': [4.4,5.4,6.4]}
df1 = pd.DataFrame(fruits_price,
              columns = ['Apple','Orange','Kiwi','Watermelon'],
              index=['2020-01-01','2020-01-02','2020-01-10'])

top_fruits = {'Fruit_1': ['Apple','Apple','Apple'],
          'Fruit_2': ['Kiwi','Orange','Kiwi'],
          'Fruit_3': ['Orange','Watermelon','Watermelon'],
          'Fruit_4': ['Watermelon','Kiwi','Orange']}

df2 = pd.DataFrame(top_fruits, 
               columns = ['Fruit_1','Fruit_2','Fruit_3','Fruit_4'],
               index=['2020-01-01','2020-01-02','2020-01-10'])

result = df2.T.replace(df1.T.to_dict()).T
result.columns = [f"Price_{i}" for i in range(1, len(result.columns)+1)]
result

output:

            Price_1 Price_2 Price_3 Price_4
2020-01-01  9.0     5.0     10.0    4.4
2020-01-02  5.0     12.0    5.4     4.0
2020-01-10  14.0    20.0    6.4     10.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM