简体   繁体   English

如何根据另一个数据框中的列填充数据框中的空值?

[英]How to fill empty values in a dataframe based on columns in another dataframe?

I have a dataframe called df1 : 我有一个名为df1的数据df1

ID     Value       Name      Score
-1      10           A         -1
-1       5           B         -1
NaN     0.2       Track C     100
NaN     0.5       Track C     200
1        0           D        100
5        0           D        200

I want to fill the NaN in column ID with multiple rows of Score data from dataframe df2 . 我想用数据帧df2多行Score数据填充列IDNaN

df2 : df2

Score    ID
100      1
100      2
100      3
100      4
200      5
200      6
200      7

So that ultimately, my final dataframe looks like this: df3 : 最终,我的最终数据帧如下所示: df3

ID     Value       Name      Score
-1      10           A         -1
-1       5           B         -1
1       0.2       Track C     100
2       0.2       Track C     100
3       0.2       Track C     100
4       0.2       Track C     100
5       0.5       Track C     200
6       0.5       Track C     200
7       0.5       Track C     200
1        0           D        100
5        0           D        200

How could I accomplish this? 我该怎么做?

I have a solution, but it is not elegant, I plea experienced users to take a look at this. 我有一个解决方案,但是它并不优雅,我恳请经验丰富的用户来看看。

to ease others, here are the code to setup the test case: 为了使其他人感到轻松,以下是设置测试用例的代码:

df1 = pd.DataFrame(
columns=\
'ID     Value       Name      Score'.split(),

data = [
re.split('\s{2,}', line)  for line in \
"""
-1      10           A         -1
-1       5           B         -1
NaN     0.2       Track C     100
NaN     0.5       Track C     200
1        0           D        100
5        0           D        200
""".strip().split('\n')  
],
)

df1 = df1.replace({'NaN':np.nan})

df2 = pd.DataFrame(

columns=\
'Score    ID'.split(),

data = [
re.split('\s{2,}', line)  for line in \
"""
100      1
100      2
100      3
100      4
200      5
200      6
200      7
""".strip().split('\n')  
],
)

and my solution is: 我的解决方案是:

"""
the general first reaction is to pd.merge().
however the hurdle is, how to deal with the fillna of the column "ID".
mine works, but it is too hard coded.
"""

df = pd.merge(left=df1, right=df2, on='Score', how='left')

df['ID'] = df['ID_x'].fillna(df['ID_y'])

finalresult = df.drop(columns=['ID_x', 'ID_y']).drop_duplicates(subset=['ID','Name'])

OUTPUT: OUTPUT:

   Value     Name Score  ID
0     10        A    -1  -1
1      5        B    -1  -1
2    0.2  Track C   100   1
3    0.2  Track C   100   2
4    0.2  Track C   100   3
5    0.2  Track C   100   4
6    0.5  Track C   200   5
7    0.5  Track C   200   6
8    0.5  Track C   200   7
9      0        D   100   1
13     0        D   200   5

You can first use pandas.merge then use pandas.concat to concat both dataframes over axis=0 : 您可以先使用pandas.merge然后使用pandas.concataxis=0 pandas.concat两个数据帧:

s = pd.merge(df2, df, on='Score', how='left', suffixes=['', '_2'])\
      .drop('ID_2', axis=1)\
      .drop_duplicates('ID')

df3 = pd.concat([df.dropna(), s], ignore_index=True)

Output 产量

print(df3)
     ID     Name  Score  Value
0  -1.0        A     -1   10.0
1  -1.0        B     -1    5.0
2   1.0        D    100    0.0
3   5.0        D    200    0.0
4   1.0  Track C    100    0.2
5   2.0  Track C    100    0.2
6   3.0  Track C    100    0.2
7   4.0  Track C    100    0.2
8   5.0  Track C    200    0.5
9   6.0  Track C    200    0.5
10  7.0  Track C    200    0.5

split your df, then using merge and concat back 分割您的df,然后使用mergeconcat返回

df1_1=df1.loc[df1.ID.isnull()].copy()
df1_2=df1.loc[df1.ID.notnull()].copy()
df1_1=df1_1.reset_index().drop('ID',1).merge(df2,on='Score',how='left').set_index('index')

yourdf=pd.concat([df1_1,df1_2],sort=False).sort_index()
yourdf
Out[645]: 
   Value    Name  Score   ID
0   10.0       A     -1 -1.0
1    5.0       B     -1 -1.0
2    0.2  TrackC    100  1.0
2    0.2  TrackC    100  2.0
2    0.2  TrackC    100  3.0
2    0.2  TrackC    100  4.0
3    0.5  TrackC    200  5.0
3    0.5  TrackC    200  6.0
3    0.5  TrackC    200  7.0
4    0.0       D    100  1.0
5    0.0       D    200  5.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一个数据框中的列填充数据框中的空值 - Fill empty values in a dataframe based on columns in another dataframe 如果其他两个列在Pandas中具有匹配的值,如何用另一个数据框的值填充空列的值? - How to fill empty column values with another dataframe's value if two other columns have matching values in Pandas? 如何根据另外两个数据帧的值填充 Pandas 数据帧 - How to fill the Pandas Dataframe based on values from another two dataframes 根据与另一个 dataframe 重复的匹配列填充 dataframe - Fill dataframe based on matching columns with another dataframe with duplicates 熊猫从另一个数据帧填充一个数据帧上的空值 - Pandas fill empty values on one dataframe from another dataframe 如何填充(基于数据框的索引)空列 - How to fill (based on the index of a dataframe) an empty column 如何根据另一列的值更改 Pandas DataFrame 中的值 - How to change values in a Pandas DataFrame based on values of another columns 根据两者的索引将一个数据帧中的值填充到另一个数据帧中 - Fill values from one dataframe into another dataframe based on index of the two Pandas DataFrame - 根据其他列的值填充列的 NaN - Pandas DataFrame - Fill NaNs of columns based on values of other columns 根据另一个数据框中的列填充一个数据框中的空值? - Filling empty values in one dataframe based on column in another dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM