简体   繁体   English

如何根据日期条件使用另一个 dataframe 填充缺失值,但形状不同?

[英]How do I fill in missing values using another dataframe based on date condition, but which are different shapes?

I am trying to fill missing num values (scores) by conditionally matching 'date' objects in two datasets.我试图通过有条件地匹配两个数据集中的“日期”对象来填充缺失的数值(分数)。 The challenge I am facing is that the datasets are not the same shape.我面临的挑战是数据集的形状不同。 I have 17k rows in the df I am attempting to fill to, but only 1500 in the df I am attempting to fill from and keep receiving the error, "ValueError: Can only compare identically-labeled Series objects"我在尝试填充的 df 中有 17k 行,但在我尝试填充的 df 中只有 1500 行并不断收到错误“ValueError:只能比较相同标记的系列对象”

Obviously I am using the comparison operator '==' and that's where I am going wrong according to the error code, but I can't figure out how to work around that.显然,我正在使用比较运算符“==”,根据错误代码,这就是我出错的地方,但我不知道如何解决这个问题。 Anyone have any suggestions?有人有什么建议吗?

df1['score'] = df1.where(df1.Date == df2.Date, df1['score'].fillna(df2['score'], inplace=True) )

df1 looks like: |Index|Attrib1|Attrib2|Date| df1 看起来像:|Index|Attrib1|Attrib2|Date| |-----|-------|-------|----| |-----|--------|--------|----| |0 | |0 | 123 | 123 | 98 |2022-01-31T00:00:00.000Z| 98 |2022-01-31T00:00:00.000Z| |1 | |1 | 456 | 456 | 56 |2022-01-30T00:00:00.000Z| 56 |2022-01-30T00:00:00.000Z| |2 | |2 | 8901 |456 |2022-01-29T00:00:00.000Z| 8901 |456 |2022-01-29T00:00:00.000Z| |3 | |3 | 566 |456 |2022-01-28T00:00:00.000Z| 566 |456 |2022-01-28T00:00:00.000Z| |4 | |4 | 12 |987 |2022-01-30T00:00:00.000Z| 12 |987 |2022-01-30T00:00:00.000Z| |5 | |5 | 354 |00 |2022-01-29T00:00:00.000Z| 354 |00 |2022-01-29T00:00:00.000Z| |6 | |6| 25 |915 |2022-01-28T00:00:00.000Z| 25 |915 |2022-01-28T00:00:00.000Z|

df2 looks like: |Index|score|Date| df2 看起来像:|Index|score|Date| |-----|-----|----| |-----|-----|----| |0 | |0 | 50 |2022-01-31T00:00:00.000Z| 50 |2022-01-31T00:00:00.000Z| |1 | |1 | 12 |2022-01-30T00:00:00.000Z| 12 |2022-01-30T00:00:00.000Z| |2 | |2 | 78 |2022-01-29T00:00:00.000Z| 78 |2022-01-29T00:00:00.000Z| |3 | |3 | 25 |2022-01-28T00:00:00.000Z| 25 |2022-01-28T00:00:00.000Z|

You first need make the date column the index of each DataFrame, then the fillna should work:您首先需要将日期列作为每个 DataFrame 的索引,然后 fillna 应该可以工作:

df1 = df1.set_index('Date')
df2 = df2.set_index('Date')

df1['score'].fillna(df2['score'], inplace = True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据另一列的日期条件获取熊猫数据框中特定列的值? - How do I get the values of a particular column in a pandas dataframe based on a date condition on another column? 如何使用 Pandas 中的另一个 DataFrame 填充 DataFrame 中的缺失值 - How to fill missing values in DataFrame using another DataFrame in Pandas 如何使用熊猫中的for循环根据另一列的条件填充一列中的缺失值? - How to fill in missing values in one column based on a condition form another column using for loops in pandas? 如何根据条件为具有缺失值的 pandas dataframe 单元格分配另一个单元格的值? - How do I assign a value to a pandas dataframe cell with a missing value with the value of another cell based on a condition? 如何根据 Pandas 数据框中的另一列值填充列中的缺失值? - How to fill missing values in a column based on another column values in a Pandas dataframe? 根据pandas DataFrame中的另一列填充缺失值 - Fill missing values based on another column in a pandas DataFrame 用其他形状填充另一个Pandas数据框中的缺失值 - Fill missing values from another Pandas dataframe with different shape 如何根据 dataframe 中的条件获得具有特定值的系列? - How do I get a series with certain values based on a condition in a dataframe? 如何根据 dataframe 中的条件计算字符串值的出现次数? - How do I count the occurrence of string values based on a condition in a dataframe? 如何使用不同的条件填充缺失值 - How to fill missing values by using different conditions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM