简体   繁体   中英

Efficiently Merging two Dataframes Column and Row wise in Pandas

I have a detailed Transaction data Dataframe as below

 df_col = pd.DataFrame({'SQ':[1,1,2],
                 'City':['A','A','B'],
                 'Date':['7-1-2020','7-2-2020','7-1-2020'],
                 'Loc 1':[40,21,27],
                 'Loc 2':[37,40,14],
                 'Loc 3':[49,38,36],
                 'Loc 4':[20,14,18],
                 'Loc 5':[48,27,36]})


+----+------+----------+-------+-------+-------+-------+-------+
| SQ | City |   Date   | Loc 1 | Loc 2 | Loc 3 | Loc 4 | Loc 5 |
+----+------+----------+-------+-------+-------+-------+-------+
|  1 |   A  | 7-1-2020 |   40  |   37  |   49  |   20  |   48  |
+----+------+----------+-------+-------+-------+-------+-------+
|  1 |   A  | 7-2-2020 |   21  |   40  |   38  |   14  |   27  |
+----+------+----------+-------+-------+-------+-------+-------+
|  2 |   B  | 7-1-2020 |   27  |   14  |   36  |   18  |   36  |
+----+------+----------+-------+-------+-------+-------+-------+

Also I've a separate Location data DataFrame as below

+------------+--------------+
| LocationNo | LocationType |
+------------+--------------+
|    Loc 1   |    Class A   |
+------------+--------------+
|    Loc 2   |    Class A   |
+------------+--------------+
|    Loc 3   |    Class B   |
+------------+--------------+
|    Loc 4   |    Class C   |
+------------+--------------+
|    Loc 5   |    Class C   |
+------------+--------------+

df_row = pd.DataFrame({'LocationNo':['Loc 1','Loc 2','Loc 3','Loc 4','Loc 5'],
             'LocationType':['Class A', 'Class A', 'Class B', 'Class C', 'Class C']
             })

Now my task is to merge columns of df_col to rows in df_row and sum the values. merging Vertical to Horizontal ie Columns to Rows .

my desired output is as below

+----+------+----------+---------+---------+---------+
| SQ | City |   Date   | Class A | Class B | Class C |
+----+------+----------+---------+---------+---------+
|  1 |   A  | 7-1-2020 |    77   |    49   |    68   |
+----+------+----------+---------+---------+---------+
|  1 |   A  | 7-2-2020 |    61   |    38   |    41   |
+----+------+----------+---------+---------+---------+
|  2 |   B  | 7-1-2020 |    41   |    36   |    54   |
+----+------+----------+---------+---------+---------+

I wrote the below code

# setting the index
df_col.set_index(['SQ','City','Date'], inplace=True)
df_row.set_index('LocationNo', inplace=True)


 # I tried to merge vertically columns to columns. Hence, transpose the df_col.T
df_final = df_col.T.merge(df_row, left_index=True, right_index=True, how='left').groupby('LocationType').agg('sum').T

What result outputted from the above code is as below

+------------------+-----------+-----------+-----------+
|   LocationType   | Class   A | Class   B | Class   C |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) |     77    |     49    |     68    |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) |     61    |     38    |     41    |
+------------------+-----------+-----------+-----------+
| (2, B, 7-2-2020) |     41    |     36    |     54    |
+------------------+-----------+-----------+-----------+

The answer is correct. But, the First 3 columns are merged into one column. I need to separate this into individual columns similar to my desired output mentioned above

how should I solve this issue and What is the Efficient way to do this?

Let's filter the loc like columns from df_col , then map these columns to LocationType based on LocationNo from df_row , finally groupby the dataframe d on these mapped columns along axis=1 and agg using sum :

d = df_col.filter(like='Loc')
g = d.columns.map(df_row.set_index('LocationNo')['LocationType'])
out = df_col[['SQ','City','Date']].join(d.groupby(g, axis=1).sum())

   SQ City      Date  Class A  Class B  Class C
0   1    A  7-1-2020       77       49       68
1   1    A  7-1-2020       61       38       41
2   2    B  7-2-2020       41       36       54

One way using melt , merge and groupby :

print (df_col.melt(id_vars=["SQ", "City", "Date"], var_name="LocationNo")
             .merge(df_row, how="left", on="LocationNo")
             .groupby(["SQ", "City","LocationType", "Date"])["value"].sum()
             .unstack("LocationType"))

LocationType      Class A  Class B  Class C
SQ City Date                               
1  A    7-1-2020       77       49       68
        7-2-2020       61       38       41
2  B    7-2-2020       41       36       54

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM