简体   繁体   中英

Replace missing values in Pandas with previous value if not NAN

I need your help with the following code. I have df1 with Exchange Rate and Date Columns that I'm trying to merge with df2. The df1 has missing values for the Exchange Rates (on weekends and holidays). For the weekends exchange rates values i want to use the the last available value (for example, if Exchange Rate for 2019-05-01 is nan, i want it to use the 2019-04-01 Exchange rate value). I've tried unsuccessfuly two options:

  1. eliminate the nan values from DF1 and somehow indicate merge to get the last available value if it doesn't find the date (cause we eliminated it)
  2. Fill the df1 nan values with the last available value.

Here are both dataframes (if you copy and paste it you get an error that Timestamp name is not recognized. I couldnt get the date value to paste it here since i always got the date value as a TimeStamp object). I hope you can help me solve both ways since i'm sure it will be usefull to know.

df1={'Fecha': {0: Timestamp('2019-01-01 00:00:00'),
  1: Timestamp('2019-01-02 00:00:00'),
  2: Timestamp('2019-01-03 00:00:00'),
  3: Timestamp('2019-01-04 00:00:00'),
  4: Timestamp('2019-01-05 00:00:00'),
  5: Timestamp('2019-01-06 00:00:00'),
  6: Timestamp('2019-01-07 00:00:00'),
  7: Timestamp('2019-01-08 00:00:00'),
  8: Timestamp('2019-01-09 00:00:00'),
  9: Timestamp('2019-01-10 00:00:00')},
 'ER': {0: nan,
  1: 19.1098,
  2: 19.2978,
  3: 19.2169,
  4: nan,
  5: nan,
  6: 19.076,
  7: 19.1627,
  8: nan,
  9: 19.7792}}



df2={'Fecha': {0: Timestamp('2019-01-01 00:00:00'),
  1: Timestamp('2019-01-02 00:00:00'),
  2: Timestamp('2019-01-03 00:00:00'),
  3: Timestamp('2019-01-04 00:00:00'),
  4: Timestamp('2019-01-05 00:00:00'),
  5: Timestamp('2019-01-06 00:00:00'),
  6: Timestamp('2019-01-07 00:00:00'),
  7: Timestamp('2019-01-08 00:00:00'),
  8: Timestamp('2019-01-09 00:00:00'),
  9: Timestamp('2019-01-10 00:00:00')},
 'letters': {0: "a",
  1: "b",
  2: "c",
  3: "d",
  4: "e",
  5: "f",
  6: "g",
  7: "h",
  8: "i",
  9: "j"}}

thanks a lot!

I don't think you need lambda (as you mentioned in the comments). What you're trying to achieve could be done by .ffill method:

>>> df1["ER"].ffill()
0        NaN
1    19.1098
2    19.2978
3    19.2169
4    19.2169
5    19.2169
6    19.0760
7    19.1627
8    19.1627
9    19.7792
Name: ER, dtype: float64

To merge two dataframes, use pd.merge :

>>> df1["ER"].ffill(inplace=True)
>>> pd.merge(df1, df2, on="Fecha")
       Fecha       ER letters
0 2019-01-01      NaN       a
1 2019-01-02  19.1098       b
2 2019-01-03  19.2978       c
3 2019-01-04  19.2169       d
4 2019-01-05  19.2169       e
5 2019-01-06  19.2169       f
6 2019-01-07  19.0760       g
7 2019-01-08  19.1627       h
8 2019-01-09  19.1627       i
9 2019-01-10  19.7792       j

Just for general knowledge: in your exemple's data, it will raise an error for not recognized 'Timestamp' and 'nan'. To make this dataset avaiable you just have to add the pandas or pd before de Timestamp:

pd.Timestamp('2019-01-06 00:00:00')

And for indicate null values, you could use:

# First option - pandas system
import pandas as pd
{0: pd.NA}

# Second option - numpy system
import numpy as np
{0: np.nan}

# Third oprtion - Pure python
{0: None}

I found a way to achieve this using the pd.merge_asof() function. If it doesn't find the keyvalue to merge, it gives you the previous one. Sorting is crucial, though.

It works just as the excel lookup (NOT VLOOK UP, but LOOKUP -without the v or the h-).

thanks everyone!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM