[英]How do I fill in missing values using another dataframe based on date condition, but which are different shapes?
I am trying to fill missing num values (scores) by conditionally matching 'date' objects in two datasets.我试图通过有条件地匹配两个数据集中的“日期”对象来填充缺失的数值(分数)。 The challenge I am facing is that the datasets are not the same shape.我面临的挑战是数据集的形状不同。 I have 17k rows in the df I am attempting to fill to, but only 1500 in the df I am attempting to fill from and keep receiving the error, "ValueError: Can only compare identically-labeled Series objects"我在尝试填充的 df 中有 17k 行,但在我尝试填充的 df 中只有 1500 行并不断收到错误“ValueError:只能比较相同标记的系列对象”
Obviously I am using the comparison operator '==' and that's where I am going wrong according to the error code, but I can't figure out how to work around that.显然,我正在使用比较运算符“==”,根据错误代码,这就是我出错的地方,但我不知道如何解决这个问题。 Anyone have any suggestions?有人有什么建议吗?
df1['score'] = df1.where(df1.Date == df2.Date, df1['score'].fillna(df2['score'], inplace=True) )
df1 looks like: |Index|Attrib1|Attrib2|Date| df1 看起来像:|Index|Attrib1|Attrib2|Date| |-----|-------|-------|----| |-----|--------|--------|----| |0 | |0 | 123 | 123 | 98 |2022-01-31T00:00:00.000Z| 98 |2022-01-31T00:00:00.000Z| |1 | |1 | 456 | 456 | 56 |2022-01-30T00:00:00.000Z| 56 |2022-01-30T00:00:00.000Z| |2 | |2 | 8901 |456 |2022-01-29T00:00:00.000Z| 8901 |456 |2022-01-29T00:00:00.000Z| |3 | |3 | 566 |456 |2022-01-28T00:00:00.000Z| 566 |456 |2022-01-28T00:00:00.000Z| |4 | |4 | 12 |987 |2022-01-30T00:00:00.000Z| 12 |987 |2022-01-30T00:00:00.000Z| |5 | |5 | 354 |00 |2022-01-29T00:00:00.000Z| 354 |00 |2022-01-29T00:00:00.000Z| |6 | |6| 25 |915 |2022-01-28T00:00:00.000Z| 25 |915 |2022-01-28T00:00:00.000Z|
df2 looks like: |Index|score|Date| df2 看起来像:|Index|score|Date| |-----|-----|----| |-----|-----|----| |0 | |0 | 50 |2022-01-31T00:00:00.000Z| 50 |2022-01-31T00:00:00.000Z| |1 | |1 | 12 |2022-01-30T00:00:00.000Z| 12 |2022-01-30T00:00:00.000Z| |2 | |2 | 78 |2022-01-29T00:00:00.000Z| 78 |2022-01-29T00:00:00.000Z| |3 | |3 | 25 |2022-01-28T00:00:00.000Z| 25 |2022-01-28T00:00:00.000Z|
You first need make the date column the index of each DataFrame, then the fillna should work:您首先需要将日期列作为每个 DataFrame 的索引,然后 fillna 应该可以工作:
df1 = df1.set_index('Date')
df2 = df2.set_index('Date')
df1['score'].fillna(df2['score'], inplace = True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.