简体   繁体   中英

Adding rows in dataframe based on values of another dataframe

I have the following two dataframes. Please note that 'amt' is grouped by 'id' in both dataframes.

  df1

   id  code  amt
0   A    1    5
1   A    2    5
2   B    3    10
3   C    4    6
4   D    5    8
5   E    6    11

  df2 

    id  code amt
0   B   1    9
1   C   12   10

I want to add a row in df2 for every id of df1 not contained in df2 . For example as Id's A, D and E are not contained in df2 ,I want to add a row for these Id's. The appended row should contain the id not contained in df2 , null value for the attribute code and stored value in df1 for attribute amt

The result should be something like this:

   id  code name
0   B    1    9
1   C    12   10
2   A    nan  5
3   D    nan  8
4   E    nan  11

I would highly appreciate if I can get some guidance on it.

By using pd.concat

df=df1.drop('code',1).drop_duplicates()
df[~df.id.isin(df2.id)]
pd.concat([df2,df[~df.id.isin(df2.id)]],axis=0).rename(columns={'amt':'name'}).reset_index(drop=True)
Out[481]: 
   name  code id
0     9   1.0  B
1    10  12.0  C
2     5   NaN  A
3     8   NaN  D
4    11   NaN  E

Drop dups from df1 then append df2 then drop more dups then append again.

df2.append(
    df1.drop_duplicates('id').append(df2)
       .drop_duplicates('id', keep=False).assign(code=np.nan),
    ignore_index=True
)

  id  code  amt
0  B   1.0    9
1  C  12.0   10
2  A   NaN    5
3  D   NaN    8
4  E   NaN   11

Slight variation

m = ~np.in1d(df1.id.values, df2.id.values)
d = ~df1.duplicated('id').values

df2.append(df1[m & d].assign(code=np.nan), ignore_index=True)

  id  code  amt
0  B   1.0    9
1  C  12.0   10
2  A   NaN    5
3  D   NaN    8
4  E   NaN   11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM