简体   繁体   中英

Need help to build a specific Dataframe (PYTHON PANDAS)

I recently asked about how to merge Dataframes, the thing is i need this issue in adition to what i previously asked. (Previous question: Python pandas, build a dataframe from 2 dataframes with these properties )

i'm in need to solve this issue.

I need to build a whole dataframe from two dataframes, but include only certain info from a second Dataframe if required.

EXAMPLE:

DF1:

        MATERIAL_N°    Description    DATE DUE
    0   123123300      Lightbulb X    01/05/2018
    1   220466         Lightbulb Y    04/04/2018
    2   220000         Lightbulb Z    07/07/2018
    3   1241241        Lightbulb A    02/01/2019
    4   7775447        Lightbulb B    02/01/2019

DF2:

            BG GROUP    MATERIAL N° TRANSIT TIME   QUANTITY
        0   9001        123123300   45D            40
        1   9002        220466      30D            50
        2   9004        220000      30D            120
        3   9003        44124       20D            110
        4   9000        2512222     15D            170
        5   9002        1241241     40D            10
        6   9001        123123300   45D            60

EXPECTED RESULT DF3:

    MATERIAL N° Description     DATE DUE    BG GROUP TRANSIT TIME  INCOMING
0   123123300   Lightbulb X     01/05/2018  9001.0        45D      100
1   220466      Lightbulb Y     04/04/2018  9002.0        30D      50
2   220000      Lightbulb Z     07/07/2018  9004.0        30D      120
3   1241241     Lightbulb A     02/01/2019  9002.0        40D      10
4   7775447     Lightbulb B     02/01/2019  NaN           NaN      NaN

The result i expect to get is the sum of incoming loads of products acording to DF2 after the merge of these two dataframes (Bear in mind that i am adding quantities for each material number)

I've tried via functions outside the DF or pandas, but im sure there is a much more simple way to deal with this that Pandas module covers.

PD: This community is awesome, i really appreciate its feedback, it really helps with my personal projects.

Kudos

You can do the left merge then re-create the quantity columns using transform and then drop duplicates

df3 = df1.merge(df2, how = 'left')
df3['QUANTITY'] = df3.groupby('MATERIAL N°').QUANTITY.transform('sum')
df3.drop_duplicates(subset = 'MATERIAL N°', keep = 'last', inplace = True)

You get

    MATERIAL N° Description DATE DUE BG GROUP TRANSIT TIME  QUANTITY
1   123123300   Lightbulb X 01/05/2018  9001.0    45D       100.0
2   220466      Lightbulb Y 04/04/2018  9002.0    30D       50.0
3   220000      Lightbulb Z 07/07/2018  9004.0    30D       120.0
4   1241241     Lightbulb A 02/01/2019  9002.0    40D       10.0
5   7775447     Lightbulb B 02/01/2019  NaN       NaN       NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM