简体   繁体   English

根据列从另一个数据框中填充 NaN

[英]Fill NaN from another adataframe based on a column

I have one dataframe with shape (23251, 8) and another dataframe with shape (3652, 14).我有一个形状为 (23251, 8) 的 dataframe 和另一个形状为 (3652, 14) 的 dataframe。 The DATE column in the first dataframe contains dates from 1955-01-01 up to 2020-12-31 and the DATA column in the second contains dates from 2010-01-01 to 2019-12-31.In the first dataframe the majority of the columns contains few or many missing values.第一个 dataframe 中的 DATE 列包含从 1955-01-01 到 2020-12-31 的日期,第二个中的 DATA 列包含从 2010-01-01 到 2019-12-31 的日期。的列包含很少或很多缺失值。 I want to fill the missing values in the in the first dataframe(whenever is possible) from the second dataframe based on the DATE(where the DATE in the second dataframe matches the DATE in the first dataframe ).我想根据 DATE 从第二个 dataframe 中填充第一个数据帧中的缺失值(只要可能)(其中第二个 dataframe 中的 DATE 与第一个 Z6A8064B5DF4794550550 中的 DATE 匹配)

The first Dataframe:第一个 Dataframe:

在此处输入图像描述

The second Dataframe:第二个 Dataframe:

在此处输入图像描述

To make it clear I want, if the rows(from the first dataframe) that refer to date from 2010-01-01 to 2019-12-31 contain NaN in columns PRCP, TAVG, TMAX and in TMIN to be filled with the values from the second dataframe based on the critirion that the DATE in each row matches betwwen the two dataframes.为了清楚起见,我想要,如果引用从 2010-01-01 到 2019-12-31 的日期的行(来自第一个数据帧)在 PRCP、TAVG、TMAX 和 TMIN 列中包含 NaN 以填充值从第二个 dataframe 基于每行中的日期与两个数据帧之间匹配的标准。

Without sample as plain text data, it's difficult to help you.没有样本作为纯文本数据,很难为您提供帮助。 Maybe this should work:也许这应该工作:

COLS = ['TMIN', 'TMAX']
df1 = df1.fillna(df2.set_index('DATE').reindex(df1['DATE'])[COLS] \
         .reset_index(drop=True))
print(df1)

# Output
         DATE NAME  TMIN  TMAX
0  1955-01-01    L  28.0  40.0
1  1955-01-02    L  27.0  41.0
2  1955-01-03    L   NaN   NaN
3  1955-01-01    M  28.0  40.0
4  1955-01-02    M  27.0  41.0
5  1955-01-03    M   NaN   NaN

Setup:设置:

import pandas as pd
import numpy as np

d1 = {'DATE': ['1955-01-01', '1955-01-02', '1955-01-03',
               '1955-01-01', '1955-01-02', '1955-01-03'],
               'NAME': ['L', 'L', 'L', 'M', 'M', 'M'],
               'TMIN': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
               'TMAX': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}
df1 = pd.DataFrame(d1)

d2 = {'DATE': ['1955-01-01', '1955-01-02'], 'TMIN': [28, 27], 'TMAX': [40, 41]}
df2 = pd.DataFrame(d2)

print(df1)
print(df2)

# Output
         DATE NAME  TMIN  TMAX
0  1955-01-01    L   NaN   NaN
1  1955-01-02    L   NaN   NaN
2  1955-01-03    L   NaN   NaN
3  1955-01-01    M   NaN   NaN
4  1955-01-02    M   NaN   NaN
5  1955-01-03    M   NaN   NaN


         DATE  TMIN  TMAX
0  1955-01-01    28    40
1  1955-01-02    27    41

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM