Populate one dataframe based on information in another dataframe

Question

I have two large dataframes, but am only showing a small subset of them for convenience. One is in the following form (Table 1):

| Country |   Date   | flag | M   | notes     |  V |
|--------------------------------------------------|
| UK      | 20210319 |  1   | 3.0 | No Change | C1 |
| UK      | 20210320 |  0   | 2.0 | Extension | C2 |
| USA     | 20210405 |  0   | 4.0 | New Policy| C1 |
| CAN     | 20210405 |  0   | 1.0 | Update    | C3 |

and the other is in the form (Table 2):

| Country |   Date   |  C1 | C1_flag | C1_notes | 
|-----------------------------------------------|
| UK      | 20210319 | NaN |   NaN   | NaN      | 
| USA     | 20210405 | NaN |   NaN   | NaN      |
| AUS     | 20210505 | NaN |   NaN   | NaN      |
| NZ      | 20210506 | NaN |   NaN   | NaN      |

where after the 'C1_notes' column there are additional columns (C2, C2_flag, and C2_notes). In fact this column structure is present for the following codes in the 'V' column of the first table (C1, C2, C3, C4, C5, E1, E2, H1, H2, H3), but for brevity I am only showing C1 in Table 2.

I would like Table 2 to be populated using the information from the Table 1, such that it ultimately ends up having the form:

| Country |   Date   |  C1 | C1_flag | C1_notes  |
|------------------------------------------------|
| UK      | 20210319 | 3.0 |   1     | No Change | 
| USA     | 20210405 | 4.0 |   0     | New Policy|
| AUS     | 20210505 | NaN |   NaN   | NaN       |
| NZ      | 20210506 | NaN |   NaN   | NaN       |

where the 'C1' columns in Table 2 use the 'M' column of Table 1 as their values, and similarly 'C1_flag' column uses the 'flag' column. The populating of Table 2 is on the basis that the 'Country' and 'Date' columns in Table 2 match with their counterparts in Table 1.

I was hoping that a 'merge' type operation could deal with this, but the catch is that the values (C1...H3) are column values in Table 1, but column names in Table 2. I also thought that this was a Pivot/Stack/Melt type operation in Pandas but it turns out not to be so, and a more general mapping may be required. How would I perform this correspondence (both for C1, and also generalizing for the other C, E, and H codes)? Thanks.

Answer 1

Restructure the 1st dataframe / manipulate columns and then update the other dataframe with this restructured dataframe.

k = df1.pivot(index=['Country','Date'] , columns= ['V'] , values= ['flag','M','notes'])
k.columns = ['_'.join(col[::-1]) if 'M' not in col else col[-1] for col in k.columns]
k = k[sorted(k.columns)]
df2 = df2.set_index(['Country', 'Date'])
df2.update(k)

OUTPUT:

                   C1 C1_flag   C1_notes
Country Date                            
UK      20210319  3.0       1   NoChange
USA     20210405  4.0       0  NewPolicy
AUS     20210505  NaN     NaN        NaN
NZ      20210506  NaN     NaN        NaN

Complete code:

from numpy import nan

d1 = {'Country': {0: 'UK', 1: 'UK', 2: 'USA', 3: 'CAN'}, 'Date': {0: 20210319, 1: 20210320, 2: 20210405, 3: 20210405}, 'flag': {0: 1, 1: 0, 2: 0, 3: 0},
      'M': {0: 3.0, 1: 2.0, 2: 4.0, 3: 1.0}, 'notes': {0: 'NoChange', 1: 'Extension', 2: 'NewPolicy', 3: 'Update'}, 'V': {0: 'C1', 1: 'C2', 2: 'C1', 3: 'C3'}}
d2 = {'Country': {0: 'UK', 1: 'USA', 2: 'AUS', 3: 'NZ'}, 'Date': {0: 20210319, 1: 20210405, 2: 20210505, 3: 20210506}, 'C1': {
    0: nan, 1: nan, 2: nan, 3: nan}, 'C1_flag': {0: nan, 1: nan, 2: nan, 3: nan}, 'C1_notes': {0: nan, 1: nan, 2: nan, 3: nan}}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
k = df1.pivot(index=['Country', 'Date'], columns=[
              'V'], values=['flag', 'M', 'notes'])
k.columns = ['_'.join(col[::-1]) if 'M' not in col else col[-1]
             for col in k.columns]
k = k[sorted(k.columns)]
df2 = df2.set_index(['Country', 'Date'])
df2.update(k)

Populate one dataframe based on information in another dataframe

Question

1 answers

solution1
1 ACCPTED 2021-05-26 15:56:18

Populate one dataframe based on information in another dataframe

Question

1 answers

solution1 1 ACCPTED 2021-05-26 15:56:18

solution1
1 ACCPTED 2021-05-26 15:56:18