[英]Filling NaN values from another dataframe based on a condition
I need to populate NaN values for some columns in one dataframe based on a condition between two data frames.我需要根据两个数据帧之间的条件为一个数据帧中的某些列填充 NaN 值。
DF1 has SOL (start of line) and EOL (end of line) columns and DF2 has UTC_TIME for each entry. DF1 具有 SOL(行首)和 EOL(行尾)列,DF2 具有每个条目的 UTC_TIME。
For every point in DF2 where the UTC_TIME is >= the SOL and is <= the EOL of each record in the DF1, that row in DF2 must be assigned the LINE, DEVICE and TAPE_FILE.对于 DF2 中 UTC_TIME >= SOL 且 <= DF1 中每条记录的 EOL 的每个点,必须为 DF2 中的该行分配 LINE、DEVICE 和 TAPE_FILE。
So, every one of the points will be assigned a LINE, DEVICE and TAPE_FILE based on the SOL/EOL time the UTC_TIME is between in DF1.因此,每个点都将根据 DF1 中 UTC_TIME 之间的 SOL/EOL 时间分配一个 LINE、DEVICE 和 TAPE_FILE。
I'm trying to use the numpy where function for each column like this我正在尝试像这样对每一列使用 numpy where 函数
df2['DEVICE'] = np.where(df2['UTC_TIME'] >= df1['SOL'] and <= df1['EOL'])
Or using a for loop to iterate through each row或者使用 for 循环遍历每一行
for point in points:
if df1['SOL'] >= df2['UTC_TIME'] and df1['EOL'] <= df2['UTC_TIME']
return df1['DEVICE']
Try with merge_asof
:尝试使用
merge_asof
:
#convert to datetime if needed
df1["SOL"] = pd.to_datetime(df1["SOL"])
df1["EOL"] = pd.to_datetime(df1["EOL"])
df2["UTC_TIME"] = pd.to_datetime(df2["UTC_TIME"])
output = pd.merge_asof(df2[["ID", "UTC_TIME"]],df1,left_on="UTC_TIME",right_on="SOL").drop(["SOL","EOL"],axis=1)
>>> output
ID UTC_TIME LINE DEVICE TAPE_FILE
0 1 2022-04-25 06:50:00 1 Huntec 10
1 2 2022-04-25 07:15:00 2 Teledyne 11
2 3 2022-04-25 10:20:00 3 Huntec 12
3 4 2022-04-25 10:30:00 3 Huntec 12
4 5 2022-04-25 10:50:00 3 Huntec 12
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.