[英]Fill values in another dataframe based on data from first one
I have dataframe like this:我有这样的数据框:
ID 2018-01 2018-02 2018-03 2018-04
A1 8500 8500 8500 8500
A2 NA 1900 1900 1900
A3 NA NA NA 3000
A4 NA NA NA 0
Now I have other dataframe that I want to use to fill NA values with现在我有其他数据框,我想用它来填充 NA 值
ID Date Due
A1 2018-01 8500
A2 2018-01 9000
A3 2018-02 4000
A4 2018-01 1000
Now from the date in this dataframe (month) to the next value that is not na in the first dataframe I want to fill with the value from Due
column: So result is this:现在从这个数据帧(月)中的日期到第一个数据帧中不是 na 的下一个值,我想用Due
列中的值填充: 所以结果是这样的:
ID 2018-01 2018-02 2018-03 2018-04
A1 8500 8500 8500 8500
A2 9000 1900 1900 1900
A3 NA 4000 4000 3000
A4 1000 1000 1000 0
How could I do that?我怎么能那样做?
EDIT: There is a case when there are no prepopulated values in the row at all编辑:有一种情况,行中根本没有预填充值
ID 2018-01 2018-02 2018-03 2018-04
A1 8500 8500 8500 8500
A2 NA 1900 1900 1900
A3 NA NA NA 3000
A4 NA NA NA 0
A5 NA NA NA NA
ID Date Due
A1 2018-01 8500
A2 2018-01 9000
A3 2018-02 4000
A4 2018-01 1000
A5 2018-03 1500
In such a case is it possible to only put corresponding value in on column according to the date without filling it all the way ?在这种情况下,是否可以只根据日期将相应的值放入列中而不完全填写?
So the result:所以结果:
ID 2018-01 2018-02 2018-03 2018-04
A1 8500 8500 8500 8500
A2 9000 1900 1900 1900
A3 NA 4000 4000 3000
A4 1000 1000 1000 0
A5 NA NA 1500 NA
If ID
is column in df1
use DataFrame.pivot
, then forward filling missin values, last replace missing values by DataFrame.fillna
or DataFrame.combine_first
:如果ID
是df1
中的列,则使用DataFrame.pivot
,然后向前填充缺失值,最后用DataFrame.fillna
或DataFrame.combine_first
替换缺失值:
df = df1.set_index('ID').fillna(df2.pivot('ID','Date','Due').ffill(axis=1))
print (df)
2018-01 2018-02 2018-03 2018-04
ID
A1 8500.0 8500.0 8500.0 8500.0
A2 9000.0 1900.0 1900.0 1900.0
A3 NaN 4000.0 4000.0 3000.0
A4 1000.0 1000.0 1000.0 0.0
A5 NaN NaN 1500.0 NaN
Using pd.crosstab
and DataFrame.update
:使用pd.crosstab
和DataFrame.update
:
Since you want to update NaN
values from one dataframe in another, we can use DataFrame.update
for this, but first we set the right axis, since this method aligns on these:由于您想从另一个数据帧中的一个数据帧更新NaN
值,我们可以为此使用DataFrame.update
,但首先我们设置右轴,因为此方法与这些值对齐:
df1 = df1.set_index('ID')
df1.update(pd.crosstab(df2['ID'], df2['Date'], df2['Due'], aggfunc='sum'))
df1 = df1.ffill(axis=1)
2018-01 2018-02 2018-03 2018-04
ID
A1 8500.0 8500.0 8500.0 8500.0
A2 9000.0 1900.0 1900.0 1900.0
A3 NaN 4000.0 4000.0 3000.0
A4 1000.0 1000.0 1000.0 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.