如何从迭代两个数据帧中提取特定值并动态地提取 Append？

Question

I am trying to perform a multi-step way of using values in an existing dataframe to create subset dataframes (or series) and compare those values against the existing dataframe to ultimately create one new column in the original dataframe.我正在尝试使用现有 dataframe 中的值来创建子集数据帧（或系列）并将这些值与现有的 dataframe 进行比较，以最终在原始 Z6A8064ZB55DF47945505700 中创建一个新列

The step by step process is:逐步的过程是：

Create a dataframe that isolates the minimum date within each month (df2) based on dates in the original dataframe, df1, and this is successfully being created.创建一个 dataframe，它根据原始 dataframe、df1 中的日期隔离每个月内的最小日期（df2），这已成功创建。
The second step is to match where the actual_date (df1) matches to the the min_date (df2) and then take the extract_value (df1) and store it in df1 for all instances where the actual_date matches the min_date .第二步是将actual_date (df1) 与min_date (df2) 匹配的位置进行匹配，然后将extract_value (df1) 存储在 df1 中，以用于 actual_date 与min_date匹配的所有实例。

Some attempts I've taken but am receiving errors are: Comparing date values between dfs:我已经采取但收到错误的一些尝试是：比较 dfs 之间的日期值：

df1.loc[df1['actual_date']==df2[df2['min_date']]
#Produces unexpected EOF while parsing

df['actual_date']==df2['min_date']
#Produces ValueError: Can only compare identically-labeled Series objects

Iterating through conditions:遍历条件：

for each in range(len(df1):
    if df1[df1['actual_date']]==df2[df2['min_date']]:
        df1['exctract_value_new']=df2['extract_value']

#Produces: KeyError: "None of [DatetimeIndex....are in the [columns]"

I've tried searching both value and key errors and am having trouble understanding the threads regarding indexing.我尝试搜索值和键错误，但无法理解有关索引的线程。 Specifically I am unsure how df1 and/or df2 need to be reformatted to compare date values in this manner and then extract a separate column based on when criteria matches in both dataframes.具体来说，我不确定如何重新格式化 df1 和/或 df2 以这种方式比较日期值，然后根据两个数据帧中的条件何时匹配来提取单独的列。

Here is the sample data working with:以下是使用的示例数据：

df1 (base) df1（基础）

actual_date实际日期	extract_value提取值
2021-01-22 2021-01-22	22 22
2021-01-23 2021-01-23	24 24
2021-01-24 2021-01-24	15 15
2021-02-22 2021-02-22	16 16
2021-02-05 2021-02-05	34 34
2021-02-04 2021-02-04	18 18

df2 df2

month月	min_date min_date
2021-01-01 2021-01-01	2021-01-22 2021-01-22
2021-02-01 2021-02-01	2021-02-04 2021-02-04
2021-03-01 2021-03-01	2021-03-01 2021-03-01

End Goal for df1 df1 的最终目标

actual_date实际日期	min_date min_date	extract_value_new extract_value_new	extract_value_original extract_value_original
2021-01-22 2021-01-22	2021-01-22 2021-01-22	22 22	22 22
2021-01-23 2021-01-23	2021-01-22 2021-01-22	22 22	24 24
2021-02-04 2021-02-04	2021-02-04 2021-02-04	18 18	18 18
2021-02-05 2021-02-05	2021-02-04 2021-02-04	18 18	34 34

Appreciate any help!感谢任何帮助！

Answer 1

You might want to check out pd.merge_asof ...solution here gives me the desired table!您可能想查看pd.merge_asof ...solution 这里给了我想要的表！

merge_asof essentially lets us join on the "closest" values in two columns. merge_asof本质上让我们加入两列中“最接近”的值。

import pandas as pd
import requests
from datetime import datetime, timedelta


df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv')

datetime_format = '%Y-%m-%d'  # format of date (needs to be a datetime to merge in pandas)
tolerance = timedelta(days=1)  # the tolerance to join dates on

# convert our dates
df1['actual_date'] = pd.to_datetime(df1['actual_date'], format=datetime_format)
df2['min_date'] = pd.to_datetime(df2['min_date'], format=datetime_format)

# sort...
df1.sort_values(by='actual_date', inplace=True)

out = pd.merge_asof(df1, df2, left_on='actual_date', right_on='min_date', tolerance=tolerance).dropna()

Gives me:给我：

  actual_date  extract_value       month   min_date
0  2021-01-22             22  2021-01-01 2021-01-22
1  2021-01-23             24  2021-01-01 2021-01-22
3  2021-02-04             18  2021-02-01 2021-02-04
4  2021-02-05             34  2021-02-01 2021-02-04

如何从迭代两个数据帧中提取特定值并动态地提取 Append？

问题描述

1 个解决方案

解决方案1
0 2022-01-23 23:08:39

如何从迭代两个数据帧中提取特定值并动态地提取 Append？

问题描述

1 个解决方案

解决方案1 0 2022-01-23 23:08:39

解决方案1
0 2022-01-23 23:08:39