[英]How to fill empty values in a dataframe based on columns in another dataframe?
I have a dataframe called df1
: 我有一个名为
df1
的数据df1
:
ID Value Name Score
-1 10 A -1
-1 5 B -1
NaN 0.2 Track C 100
NaN 0.5 Track C 200
1 0 D 100
5 0 D 200
I want to fill the NaN
in column ID
with multiple rows of Score
data from dataframe df2
. 我想用数据帧
df2
多行Score
数据填充列ID
的NaN
。
df2
: df2
:
Score ID
100 1
100 2
100 3
100 4
200 5
200 6
200 7
So that ultimately, my final dataframe looks like this: df3
: 最终,我的最终数据帧如下所示:
df3
:
ID Value Name Score
-1 10 A -1
-1 5 B -1
1 0.2 Track C 100
2 0.2 Track C 100
3 0.2 Track C 100
4 0.2 Track C 100
5 0.5 Track C 200
6 0.5 Track C 200
7 0.5 Track C 200
1 0 D 100
5 0 D 200
How could I accomplish this? 我该怎么做?
I have a solution, but it is not elegant, I plea experienced users to take a look at this. 我有一个解决方案,但是它并不优雅,我恳请经验丰富的用户来看看。
to ease others, here are the code to setup the test case: 为了使其他人感到轻松,以下是设置测试用例的代码:
df1 = pd.DataFrame(
columns=\
'ID Value Name Score'.split(),
data = [
re.split('\s{2,}', line) for line in \
"""
-1 10 A -1
-1 5 B -1
NaN 0.2 Track C 100
NaN 0.5 Track C 200
1 0 D 100
5 0 D 200
""".strip().split('\n')
],
)
df1 = df1.replace({'NaN':np.nan})
df2 = pd.DataFrame(
columns=\
'Score ID'.split(),
data = [
re.split('\s{2,}', line) for line in \
"""
100 1
100 2
100 3
100 4
200 5
200 6
200 7
""".strip().split('\n')
],
)
and my solution is: 我的解决方案是:
"""
the general first reaction is to pd.merge().
however the hurdle is, how to deal with the fillna of the column "ID".
mine works, but it is too hard coded.
"""
df = pd.merge(left=df1, right=df2, on='Score', how='left')
df['ID'] = df['ID_x'].fillna(df['ID_y'])
finalresult = df.drop(columns=['ID_x', 'ID_y']).drop_duplicates(subset=['ID','Name'])
OUTPUT: OUTPUT:
Value Name Score ID
0 10 A -1 -1
1 5 B -1 -1
2 0.2 Track C 100 1
3 0.2 Track C 100 2
4 0.2 Track C 100 3
5 0.2 Track C 100 4
6 0.5 Track C 200 5
7 0.5 Track C 200 6
8 0.5 Track C 200 7
9 0 D 100 1
13 0 D 200 5
You can first use pandas.merge
then use pandas.concat
to concat both dataframes over axis=0
: 您可以先使用
pandas.merge
然后使用pandas.concat
在axis=0
pandas.concat
两个数据帧:
s = pd.merge(df2, df, on='Score', how='left', suffixes=['', '_2'])\
.drop('ID_2', axis=1)\
.drop_duplicates('ID')
df3 = pd.concat([df.dropna(), s], ignore_index=True)
Output 产量
print(df3)
ID Name Score Value
0 -1.0 A -1 10.0
1 -1.0 B -1 5.0
2 1.0 D 100 0.0
3 5.0 D 200 0.0
4 1.0 Track C 100 0.2
5 2.0 Track C 100 0.2
6 3.0 Track C 100 0.2
7 4.0 Track C 100 0.2
8 5.0 Track C 200 0.5
9 6.0 Track C 200 0.5
10 7.0 Track C 200 0.5
split your df, then using merge
and concat
back 分割您的df,然后使用
merge
和concat
返回
df1_1=df1.loc[df1.ID.isnull()].copy()
df1_2=df1.loc[df1.ID.notnull()].copy()
df1_1=df1_1.reset_index().drop('ID',1).merge(df2,on='Score',how='left').set_index('index')
yourdf=pd.concat([df1_1,df1_2],sort=False).sort_index()
yourdf
Out[645]:
Value Name Score ID
0 10.0 A -1 -1.0
1 5.0 B -1 -1.0
2 0.2 TrackC 100 1.0
2 0.2 TrackC 100 2.0
2 0.2 TrackC 100 3.0
2 0.2 TrackC 100 4.0
3 0.5 TrackC 200 5.0
3 0.5 TrackC 200 6.0
3 0.5 TrackC 200 7.0
4 0.0 D 100 1.0
5 0.0 D 200 5.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.