简体   繁体   English

连接和分组填充NaN值

[英]Concatenate and group-wise filling NaN values

I have this dataframe: 我有这个数据帧:

df:
companycode    name    address    A     B     C     ...
1234           asd     qwe,56     Tyh   123   923
1234           asd     qwe,56     Zfhs  4828  01992
6472           yui     iop,56     Retgh 8484  8484
...

I have another one that looks like this: 我有另一个看起来像这样的:

df2:
companycode    A     B     C       ...
1234           Jid   4123  141
6472           Low   1312  3234
...

name and address are always the same for a single companycode 单个公司代码的名称和地址始终相同

I want to concatenate or join or merge or append them in a way that in the end it looks like this: 我想以一种最终看起来像这样的方式连接或连接或合并或追加它们:

companycode    name    address    A     B     C     ...
1234           asd     qwe,56     Tyh   123   923
1234           asd     qwe,56     Zfhs  4828  01992
6472           yui     iop,56     Retgh 8484  8484
1234           asd     qwe,56     Jid   4123  141
6472           yui     iop,56     Low   1312  3234
...

Since name and address are always the same for a single companycode, basically I want to concat df2 with df in axis=0 and pull to this new rows the name and address from the original df's companycodes. 由于单个公司代码的名称和地址总是相同的,基本上我想在轴= 0中用df连接df2,并从原始df的公司代码中将这个名称和地址拉到这个新行。 Quite confusing to write but I think that visually it works better. 写的相当混乱,但我认为它在视觉上更好用。

Any ideas how could I do that? 任何想法我怎么能这样做?

pd.concat followed by a groupby operation should do it. pd.concat后跟一个groupby操作应该这样做。

df = pd.concat([df1, df2], 0, ignore_index=True)\
                          .groupby('companycode').ffill()
df

       A     B     C address  companycode name
0    Tyh   123   923  qwe,56         1234  asd
1   Zfhs  4828  1992  qwe,56         1234  asd
2  Retgh  8484  8484  iop,56         6472  yui
3    Jid  4123   141  qwe,56         1234  asd
4    Low  1312  3234  iop,56         6472  yui

  • ignore_index=True is set to create a new index upon concatenation ignore_index=True设置为在连接时创建新索引
  • Concatenation leaves NaN values in the columns of df2 that didn't previously exist 连接在之前不存在的df2列中留下NaN
  • Perform a groupby operation on companycode followed by ffill to fill those NaN s with the right values from the same group. 执行groupby上操作companycode随后ffill填补这些NaN与来自同一组的权值秒。

For those with SQL-mindsets, consider a merge with concat (ie, JOIN with UNION ): 对于具有SQL-mindsets的用户,请考虑使用concat进行merge (即,使用UNION JOIN ):

mdf = df1[['companycode', 'name', 'address']]\ 
              .merge(df2, on='companycode').drop_duplicates()
finaldf = pd.concat([df1, mdf]).reset_index(drop=True)

print(finaldf)
#    companycode name address      A     B     C
# 0         1234  asd  qwe,56    Tyh   123   923
# 1         1234  asd  qwe,56   Zfhs  4828  1992
# 2         6472  yui  iop,56  Retgh  8484  8484
# 3         1234  asd  qwe,56    Jid  4123   141
# 4         6472  yui  iop,56    Low  1312  3234

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM