I have two large dataframes, but am only showing a small subset of them for convenience. One is in the following form (Table 1):
| Country | Date | flag | M | notes | V |
|--------------------------------------------------|
| UK | 20210319 | 1 | 3.0 | No Change | C1 |
| UK | 20210320 | 0 | 2.0 | Extension | C2 |
| USA | 20210405 | 0 | 4.0 | New Policy| C1 |
| CAN | 20210405 | 0 | 1.0 | Update | C3 |
and the other is in the form (Table 2):
| Country | Date | C1 | C1_flag | C1_notes |
|-----------------------------------------------|
| UK | 20210319 | NaN | NaN | NaN |
| USA | 20210405 | NaN | NaN | NaN |
| AUS | 20210505 | NaN | NaN | NaN |
| NZ | 20210506 | NaN | NaN | NaN |
where after the 'C1_notes' column there are additional columns (C2, C2_flag, and C2_notes). In fact this column structure is present for the following codes in the 'V' column of the first table (C1, C2, C3, C4, C5, E1, E2, H1, H2, H3), but for brevity I am only showing C1 in Table 2.
I would like Table 2 to be populated using the information from the Table 1, such that it ultimately ends up having the form:
| Country | Date | C1 | C1_flag | C1_notes |
|------------------------------------------------|
| UK | 20210319 | 3.0 | 1 | No Change |
| USA | 20210405 | 4.0 | 0 | New Policy|
| AUS | 20210505 | NaN | NaN | NaN |
| NZ | 20210506 | NaN | NaN | NaN |
where the 'C1' columns in Table 2 use the 'M' column of Table 1 as their values, and similarly 'C1_flag' column uses the 'flag' column. The populating of Table 2 is on the basis that the 'Country' and 'Date' columns in Table 2 match with their counterparts in Table 1.
I was hoping that a 'merge' type operation could deal with this, but the catch is that the values (C1...H3) are column values in Table 1, but column names in Table 2. I also thought that this was a Pivot/Stack/Melt type operation in Pandas but it turns out not to be so, and a more general mapping may be required. How would I perform this correspondence (both for C1, and also generalizing for the other C, E, and H codes)? Thanks.
Restructure
the 1st dataframe / manipulate columns
and then update
the other dataframe with this restructured dataframe.
k = df1.pivot(index=['Country','Date'] , columns= ['V'] , values= ['flag','M','notes'])
k.columns = ['_'.join(col[::-1]) if 'M' not in col else col[-1] for col in k.columns]
k = k[sorted(k.columns)]
df2 = df2.set_index(['Country', 'Date'])
df2.update(k)
OUTPUT:
C1 C1_flag C1_notes
Country Date
UK 20210319 3.0 1 NoChange
USA 20210405 4.0 0 NewPolicy
AUS 20210505 NaN NaN NaN
NZ 20210506 NaN NaN NaN
Complete code:
from numpy import nan
d1 = {'Country': {0: 'UK', 1: 'UK', 2: 'USA', 3: 'CAN'}, 'Date': {0: 20210319, 1: 20210320, 2: 20210405, 3: 20210405}, 'flag': {0: 1, 1: 0, 2: 0, 3: 0},
'M': {0: 3.0, 1: 2.0, 2: 4.0, 3: 1.0}, 'notes': {0: 'NoChange', 1: 'Extension', 2: 'NewPolicy', 3: 'Update'}, 'V': {0: 'C1', 1: 'C2', 2: 'C1', 3: 'C3'}}
d2 = {'Country': {0: 'UK', 1: 'USA', 2: 'AUS', 3: 'NZ'}, 'Date': {0: 20210319, 1: 20210405, 2: 20210505, 3: 20210506}, 'C1': {
0: nan, 1: nan, 2: nan, 3: nan}, 'C1_flag': {0: nan, 1: nan, 2: nan, 3: nan}, 'C1_notes': {0: nan, 1: nan, 2: nan, 3: nan}}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
k = df1.pivot(index=['Country', 'Date'], columns=[
'V'], values=['flag', 'M', 'notes'])
k.columns = ['_'.join(col[::-1]) if 'M' not in col else col[-1]
for col in k.columns]
k = k[sorted(k.columns)]
df2 = df2.set_index(['Country', 'Date'])
df2.update(k)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.