[英]Replace missing values in Pandas with previous value if not NAN
I need your help with the following code.我需要您对以下代码的帮助。 I have df1 with Exchange Rate and Date Columns that I'm trying to merge with df2.
我有 df1 和我试图与 df2 合并的汇率和日期列。 The df1 has missing values for the Exchange Rates (on weekends and holidays).
df1 缺少汇率值(周末和节假日)。 For the weekends exchange rates values i want to use the the last available value (for example, if Exchange Rate for 2019-05-01 is nan, i want it to use the 2019-04-01 Exchange rate value).
对于周末汇率值,我想使用最后一个可用值(例如,如果 2019-05-01 的汇率为 nan,我希望它使用 2019-04-01 汇率值)。 I've tried unsuccessfuly two options:
我尝试了两种选择但没有成功:
Here are both dataframes (if you copy and paste it you get an error that Timestamp name is not recognized. I couldnt get the date value to paste it here since i always got the date value as a TimeStamp object).这是两个数据帧(如果你复制并粘贴它,你会得到一个错误,即时间戳名称无法识别。我无法获得将它粘贴到这里的日期值,因为我总是将日期值作为时间戳对象)。 I hope you can help me solve both ways since i'm sure it will be usefull to know.
我希望你能帮我解决这两种方法,因为我相信知道它会很有用。
df1={'Fecha': {0: Timestamp('2019-01-01 00:00:00'),
1: Timestamp('2019-01-02 00:00:00'),
2: Timestamp('2019-01-03 00:00:00'),
3: Timestamp('2019-01-04 00:00:00'),
4: Timestamp('2019-01-05 00:00:00'),
5: Timestamp('2019-01-06 00:00:00'),
6: Timestamp('2019-01-07 00:00:00'),
7: Timestamp('2019-01-08 00:00:00'),
8: Timestamp('2019-01-09 00:00:00'),
9: Timestamp('2019-01-10 00:00:00')},
'ER': {0: nan,
1: 19.1098,
2: 19.2978,
3: 19.2169,
4: nan,
5: nan,
6: 19.076,
7: 19.1627,
8: nan,
9: 19.7792}}
df2={'Fecha': {0: Timestamp('2019-01-01 00:00:00'),
1: Timestamp('2019-01-02 00:00:00'),
2: Timestamp('2019-01-03 00:00:00'),
3: Timestamp('2019-01-04 00:00:00'),
4: Timestamp('2019-01-05 00:00:00'),
5: Timestamp('2019-01-06 00:00:00'),
6: Timestamp('2019-01-07 00:00:00'),
7: Timestamp('2019-01-08 00:00:00'),
8: Timestamp('2019-01-09 00:00:00'),
9: Timestamp('2019-01-10 00:00:00')},
'letters': {0: "a",
1: "b",
2: "c",
3: "d",
4: "e",
5: "f",
6: "g",
7: "h",
8: "i",
9: "j"}}
thanks a lot!多谢!
I don't think you need lambda (as you mentioned in the comments).我认为您不需要 lambda(正如您在评论中提到的)。 What you're trying to achieve could be done by
.ffill
method:您想要实现的目标可以通过
.ffill
方法完成:
>>> df1["ER"].ffill()
0 NaN
1 19.1098
2 19.2978
3 19.2169
4 19.2169
5 19.2169
6 19.0760
7 19.1627
8 19.1627
9 19.7792
Name: ER, dtype: float64
To merge two dataframes, use pd.merge
:要合并两个数据帧,请使用
pd.merge
:
>>> df1["ER"].ffill(inplace=True)
>>> pd.merge(df1, df2, on="Fecha")
Fecha ER letters
0 2019-01-01 NaN a
1 2019-01-02 19.1098 b
2 2019-01-03 19.2978 c
3 2019-01-04 19.2169 d
4 2019-01-05 19.2169 e
5 2019-01-06 19.2169 f
6 2019-01-07 19.0760 g
7 2019-01-08 19.1627 h
8 2019-01-09 19.1627 i
9 2019-01-10 19.7792 j
Just for general knowledge: in your exemple's data, it will raise an error for not recognized 'Timestamp' and 'nan'.仅用于一般知识:在您示例的数据中,它会因无法识别的“时间戳”和“nan”而引发错误。 To make this dataset avaiable you just have to add the
pandas
or pd
before de Timestamp:要使此数据集可用,您只需在 de Timestamp 之前添加
pandas
或pd
:
pd.Timestamp('2019-01-06 00:00:00')
And for indicate null values, you could use:对于指示空值,您可以使用:
# First option - pandas system
import pandas as pd
{0: pd.NA}
# Second option - numpy system
import numpy as np
{0: np.nan}
# Third oprtion - Pure python
{0: None}
I found a way to achieve this using the pd.merge_asof() function.我找到了一种使用 pd.merge_asof() 函数来实现这一点的方法。 If it doesn't find the keyvalue to merge, it gives you the previous one.
如果它没有找到要合并的键值,它会给你前一个。 Sorting is crucial, though.
不过,排序很重要。
It works just as the excel lookup (NOT VLOOK UP, but LOOKUP -without the v or the h-).它的工作原理与 excel 查找一样(不是 VLOOK UP,而是 LOOKUP - 没有 v 或 h-)。
thanks everyone!谢谢大家!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.