简体   繁体   English

如果不是 NAN,则用以前的值替换 Pandas 中的缺失值

[英]Replace missing values in Pandas with previous value if not NAN

I need your help with the following code.我需要您对以下代码的帮助。 I have df1 with Exchange Rate and Date Columns that I'm trying to merge with df2.我有 df1 和我试图与 df2 合并的汇率和日期列。 The df1 has missing values for the Exchange Rates (on weekends and holidays). df1 缺少汇率值(周末和节假日)。 For the weekends exchange rates values i want to use the the last available value (for example, if Exchange Rate for 2019-05-01 is nan, i want it to use the 2019-04-01 Exchange rate value).对于周末汇率值,我想使用最后一个可用值(例如,如果 2019-05-01 的汇率为 nan,我希望它使用 2019-04-01 汇率值)。 I've tried unsuccessfuly two options:我尝试了两种选择但没有成功:

  1. eliminate the nan values from DF1 and somehow indicate merge to get the last available value if it doesn't find the date (cause we eliminated it)从 DF1 中消除 nan 值并以某种方式指示合并以获取最后一个可用值,如果它没有找到日期(因为我们消除了它)
  2. Fill the df1 nan values with the last available value.用最后一个可用值填充 df1 nan 值。

Here are both dataframes (if you copy and paste it you get an error that Timestamp name is not recognized. I couldnt get the date value to paste it here since i always got the date value as a TimeStamp object).这是两个数据帧(如果你复制并粘贴它,你会得到一个错误,即时间戳名称无法识别。我无法获得将它粘贴到这里的日期值,因为我总是将日期值作为时间戳对象)。 I hope you can help me solve both ways since i'm sure it will be usefull to know.我希望你能帮我解决这两种方法,因为我相信知道它会很有用。

df1={'Fecha': {0: Timestamp('2019-01-01 00:00:00'),
  1: Timestamp('2019-01-02 00:00:00'),
  2: Timestamp('2019-01-03 00:00:00'),
  3: Timestamp('2019-01-04 00:00:00'),
  4: Timestamp('2019-01-05 00:00:00'),
  5: Timestamp('2019-01-06 00:00:00'),
  6: Timestamp('2019-01-07 00:00:00'),
  7: Timestamp('2019-01-08 00:00:00'),
  8: Timestamp('2019-01-09 00:00:00'),
  9: Timestamp('2019-01-10 00:00:00')},
 'ER': {0: nan,
  1: 19.1098,
  2: 19.2978,
  3: 19.2169,
  4: nan,
  5: nan,
  6: 19.076,
  7: 19.1627,
  8: nan,
  9: 19.7792}}



df2={'Fecha': {0: Timestamp('2019-01-01 00:00:00'),
  1: Timestamp('2019-01-02 00:00:00'),
  2: Timestamp('2019-01-03 00:00:00'),
  3: Timestamp('2019-01-04 00:00:00'),
  4: Timestamp('2019-01-05 00:00:00'),
  5: Timestamp('2019-01-06 00:00:00'),
  6: Timestamp('2019-01-07 00:00:00'),
  7: Timestamp('2019-01-08 00:00:00'),
  8: Timestamp('2019-01-09 00:00:00'),
  9: Timestamp('2019-01-10 00:00:00')},
 'letters': {0: "a",
  1: "b",
  2: "c",
  3: "d",
  4: "e",
  5: "f",
  6: "g",
  7: "h",
  8: "i",
  9: "j"}}

thanks a lot!多谢!

I don't think you need lambda (as you mentioned in the comments).我认为您不需要 lambda(正如您在评论中提到的)。 What you're trying to achieve could be done by .ffill method:您想要实现的目标可以通过.ffill方法完成:

>>> df1["ER"].ffill()
0        NaN
1    19.1098
2    19.2978
3    19.2169
4    19.2169
5    19.2169
6    19.0760
7    19.1627
8    19.1627
9    19.7792
Name: ER, dtype: float64

To merge two dataframes, use pd.merge :要合并两个数据帧,请使用pd.merge

>>> df1["ER"].ffill(inplace=True)
>>> pd.merge(df1, df2, on="Fecha")
       Fecha       ER letters
0 2019-01-01      NaN       a
1 2019-01-02  19.1098       b
2 2019-01-03  19.2978       c
3 2019-01-04  19.2169       d
4 2019-01-05  19.2169       e
5 2019-01-06  19.2169       f
6 2019-01-07  19.0760       g
7 2019-01-08  19.1627       h
8 2019-01-09  19.1627       i
9 2019-01-10  19.7792       j

Just for general knowledge: in your exemple's data, it will raise an error for not recognized 'Timestamp' and 'nan'.仅用于一般知识:在您示例的数据中,它会因无法识别的“时间戳”和“nan”而引发错误。 To make this dataset avaiable you just have to add the pandas or pd before de Timestamp:要使此数据集可用,您只需在 de Timestamp 之前添加pandaspd

pd.Timestamp('2019-01-06 00:00:00')

And for indicate null values, you could use:对于指示空值,您可以使用:

# First option - pandas system
import pandas as pd
{0: pd.NA}

# Second option - numpy system
import numpy as np
{0: np.nan}

# Third oprtion - Pure python
{0: None}

I found a way to achieve this using the pd.merge_asof() function.我找到了一种使用 pd.merge_asof() 函数来实现这一点的方法。 If it doesn't find the keyvalue to merge, it gives you the previous one.如果它没有找到要合并的键值,它会给你前一个。 Sorting is crucial, though.不过,排序很重要。

It works just as the excel lookup (NOT VLOOK UP, but LOOKUP -without the v or the h-).它的工作原理与 excel 查找一样(不是 VLOOK UP,而是 LOOKUP - 没有 v 或 h-)。

thanks everyone!谢谢大家!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas Dataframe基于键列,将NaN值替换为以前的值 - pandas Dataframe Replace NaN values with with previous value based on a key column Pandas:用前一个和下一个非缺失值的平均值动态替换 NaN 值 - Pandas: Dynamically replace NaN values with the average of previous and next non-missing values 如何在pandas中的两个或多个重复值之后检查数据是否丢失,并用以前的值替换缺失值? - How to check if data is missing after two or more repeating values in pandas and replace missing value with previous value? 如果一个值是NaN,则熊猫用NaN替换一行中的所有值 - Pandas replace all values in a row with NaN if one value is NaN 用np.NaN替换熊猫数据框中的缺失值(以字符串形式给出) - Replace missing values (given as strings) in pandas dataframe by np.NaN 根据时间序列中的先前和后续值将值替换为 NaN - Replace value with NaN based on previous and subsequent values in the time series 根据另一列(条件)的值替换缺失值 NAN - Replace the missing value NAN based on values of another columns (conditions) 在Pandas中将多个值替换为缺失值(None) - Replace multiple values to missing value (None) in Pandas Pandas缺失值:填充最接近的非NaN值 - Pandas missing values : fill with the closest non NaN value Pandas 动态替换 nan 值 - Pandas dynamically replace nan values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM