[英]Concatenate and group-wise filling NaN values
I have this dataframe: 我有这个数据帧:
df:
companycode name address A B C ...
1234 asd qwe,56 Tyh 123 923
1234 asd qwe,56 Zfhs 4828 01992
6472 yui iop,56 Retgh 8484 8484
...
I have another one that looks like this: 我有另一个看起来像这样的:
df2:
companycode A B C ...
1234 Jid 4123 141
6472 Low 1312 3234
...
name and address are always the same for a single companycode 单个公司代码的名称和地址始终相同
I want to concatenate or join or merge or append them in a way that in the end it looks like this: 我想以一种最终看起来像这样的方式连接或连接或合并或追加它们:
companycode name address A B C ...
1234 asd qwe,56 Tyh 123 923
1234 asd qwe,56 Zfhs 4828 01992
6472 yui iop,56 Retgh 8484 8484
1234 asd qwe,56 Jid 4123 141
6472 yui iop,56 Low 1312 3234
...
Since name and address are always the same for a single companycode, basically I want to concat df2 with df in axis=0 and pull to this new rows the name and address from the original df's companycodes. 由于单个公司代码的名称和地址总是相同的,基本上我想在轴= 0中用df连接df2,并从原始df的公司代码中将这个名称和地址拉到这个新行。 Quite confusing to write but I think that visually it works better.
写的相当混乱,但我认为它在视觉上更好用。
Any ideas how could I do that? 任何想法我怎么能这样做?
pd.concat
followed by a groupby
operation should do it. pd.concat
后跟一个groupby
操作应该这样做。
df = pd.concat([df1, df2], 0, ignore_index=True)\
.groupby('companycode').ffill()
df
A B C address companycode name
0 Tyh 123 923 qwe,56 1234 asd
1 Zfhs 4828 1992 qwe,56 1234 asd
2 Retgh 8484 8484 iop,56 6472 yui
3 Jid 4123 141 qwe,56 1234 asd
4 Low 1312 3234 iop,56 6472 yui
ignore_index=True
is set to create a new index upon concatenation ignore_index=True
设置为在连接时创建新索引 NaN
values in the columns of df2
that didn't previously exist df2
列中留下NaN
值 groupby
operation on companycode
followed by ffill
to fill those NaN
s with the right values from the same group. groupby
上操作companycode
随后ffill
填补这些NaN
与来自同一组的权值秒。 For those with SQL-mindsets, consider a merge
with concat
(ie, JOIN
with UNION
): 对于具有SQL-mindsets的用户,请考虑使用
concat
进行merge
(即,使用UNION
JOIN
):
mdf = df1[['companycode', 'name', 'address']]\
.merge(df2, on='companycode').drop_duplicates()
finaldf = pd.concat([df1, mdf]).reset_index(drop=True)
print(finaldf)
# companycode name address A B C
# 0 1234 asd qwe,56 Tyh 123 923
# 1 1234 asd qwe,56 Zfhs 4828 1992
# 2 6472 yui iop,56 Retgh 8484 8484
# 3 1234 asd qwe,56 Jid 4123 141
# 4 6472 yui iop,56 Low 1312 3234
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.