Replace column values in Pandas DataFrame using dict with tuple key

Question

I have two Pandas DataFrames, one contains data I want to update and the other provides a lookup based on a MultiIndex key to set a value.

As an example, I have two csv's:

fruit.csv

Fruit,Color,State,more,data
Apple,Red,Good,etc.,etc.
Apple,Green,Mouldy,etc.,etc.
Apple,Green,Excellent,etc.,etc.
Pear,Red,Excellent,etc.,etc.
Pear,Green,Good,etc.,etc.
Lime,Green,Bad,etc.,etc.

rating.csv

Fruit,State,Rating
Apple,Excellent,11
Apple,Good,8
Apple,Bad,4
Apple,Mouldy,0
Pear,Excellent,9
Pear,Good,5
Pear,Bad,2
Pear,Mouldy,1
Lime,Excellent,10
Lime,Good,7
Lime,Bad,5
Lime,Mouldy,2

Which I've read into DataFrames:

static_data_dir = Path(__file__).resolve().parent
fruit = pd.read_csv(static_data_dir.joinpath("fruit.csv"), index_col=["Fruit","Color"])
rating = pd.read_csv(static_data_dir.joinpath("rating.csv"), index_col=["Fruit","State"])

                  State  more  data
Fruit Color
Apple Red         Good  etc.  etc.
      Green     Mouldy  etc.  etc.
      Green  Excellent  etc.  etc.
Pear  Red    Excellent  etc.  etc.
      Green       Good  etc.  etc.
Lime  Green        Bad  etc.  etc.

                 Rating
Fruit State
Apple Excellent      11
      Good            8
      Bad             4
      Mouldy          0
Pear  Excellent       9
      Good            5
      Bad             2
      Mouldy          1
Lime  Excellent      10
      Good            7
      Bad             5
      Mouldy          2

And now would like to replace the State value in the fruit DataFrame with the Rating value from the rating DataFrame, ending up with the below.

                  State  more  data
Fruit Color
Apple Red            8  etc.  etc.
      Green          0  etc.  etc.
      Green         11  etc.  etc.
Pear  Red            9  etc.  etc.
      Green          5  etc.  etc.
Lime  Green          5  etc.  etc.

Effectively I want to use pandas.Series.replace but pass in a dict with a tuple key, but that doesn't appear to be supported.

{'Rating': {('Apple', 'Bad'): 4,
            ('Apple', 'Excellent'): 11,
            ('Apple', 'Good'): 8,
            ('Apple', 'Mouldy'): 0,
            ('Lime', 'Bad'): 5,
            ('Lime', 'Excellent'): 10,
            ('Lime', 'Good'): 7,
            ('Lime', 'Mouldy'): 2,
            ('Pear', 'Bad'): 2,
            ('Pear', 'Excellent'): 9,
            ('Pear', 'Good'): 5,
            ('Pear', 'Mouldy'): 1}}

How best would I go about achieving this?

Answer 1

Read two csv as normal dataframe, then merge on Fruit and State columns using keys from fruit dataframe by setting how="left" . At last set Fruit and Color column as index.

import pandas as pd

fruit = pd.read_csv("fruit.csv")
rating = pd.read_csv("rating.csv")

fruit['State'] = fruit.merge(rating, on=["Fruit", "State"], how="left")["Rating"]

fruit.set_index(["Fruit","Color"], inplace=True)

print(fruit)

             State  more  data
Fruit Color                   
Apple Red        8  etc.  etc.
      Green      0  etc.  etc.
      Green     11  etc.  etc.
Pear  Red        9  etc.  etc.
      Green      5  etc.  etc.
Lime  Green      5  etc.  etc.

Replace column values in Pandas DataFrame using dict with tuple key

Question

1 answers

solution1
2 ACCPTED 2021-04-22 13:36:47

Replace column values in Pandas DataFrame using dict with tuple key

Question

1 answers

solution1 2 ACCPTED 2021-04-22 13:36:47

solution1
2 ACCPTED 2021-04-22 13:36:47